PTMOracle: A Cytoscape App for Covisualizing and Coanalyzing Post

Mar 28, 2017 - To address this, we developed PTMOracle, a Cytoscape app for coanalyzing PTMs within PPI networks. PTMOracle also allows extensive data...
0 downloads 11 Views 4MB Size
Subscriber access provided by UB + Fachbibliothek Chemie | (FU-Bibliothekssystem)

Article

PTMOracle: a Cytoscape app for co-visualising and co-analysing post-translational modifications in protein interaction networks Aidan P Tay, Chi Nam Ignatius Pang, Daniel L Winter, and Marc R. Wilkins J. Proteome Res., Just Accepted Manuscript • DOI: 10.1021/acs.jproteome.6b01052 • Publication Date (Web): 28 Mar 2017 Downloaded from http://pubs.acs.org on March 29, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Proteome Research is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 63

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

PTMOracle: a Cytoscape app for co-visualising and co-analysing post-translational modifications in protein interaction networks Aidan P. Tay,‡ Chi Nam Ignatius Pang, Daniel L. Winter, Marc R. Wilkins* Systems Biology Initiative, School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Sydney, New South Wales 2052, Australia

KEYWORDS post-translational

modification, protein-protein

interaction,

data integration,

networks,

visualisation, Cytoscape

1

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 63

ABSTRACT:

Post-translational modifications of proteins (PTMs) act as key regulators of protein activity and of protein-protein interactions (PPIs). To date, it has been difficult to comprehensively explore functional links between PTMs and PPIs. To address this, we developed PTMOracle, a Cytoscape app for co-analysing PTMs within PPI networks. PTMOracle also allows extensive data to be integrated and co-analysed with PPI networks, allowing the role of domains, motifs and disordered regions to be considered. For proteins of interest, or a whole proteome, PTMOracle can generate network visualisations to reveal complex PTM-associated relationships. This is assisted by the OraclePainter for colouring proteins by modifications, OracleTools for network analytics, and OracleResults for exploring tabulated findings. To illustrate the use of PTMOracle, we investigate PTM-associated relationships and their role in PPIs in four case studies. In the yeast interactome, and its rich set of PTMs, we construct and explore a histoneassociated network, domain-domain interaction network and show how integrative approaches can predict kinases involved in phosphodegrons. In the human interactome, a phosphotyrosine associated network is analysed but highlights the sparse nature of human PPI networks and lack of PTM-associated data. PTMOracle is open source and available on the Cytoscape app store: http://apps.cytoscape.org/apps/ptmoracle.

2

ACS Paragon Plus Environment

Page 3 of 63

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

INTRODUCTION The majority of eukaryotic proteins are modified by at least one post-translational modification (PTM) 1. These modifications are mediated by the action of modifying enzymes, such as kinases, acetyltransferases and methyltransferases. Some modifications are also reversible and dynamically removed by demodifying enzymes such as phosphatases, deacetylases and demethylases. PTMs can occur on multiple sites of the same protein. Yet with more than 400 types of PTMs 1, some sites are known to be modified by different types of PTMs. In conjunction with alternative splicing, PTMs serve to diversify the function of proteins by acting as key regulators of protein activity in response to changing conditions 2. For this reason, many proteins potentially exist as different modification forms (modforms), each with specific properties that can affect how they function 3. Protein-protein interactions (PPIs) can be regulated by PTMs. Single PTMs can do this through allosteric change, or by forming motifs and binding interfaces that are recognized by domains

4,5

. PTMs can also act in concert with each other to modulate protein interactions. This

is exemplified on histone proteins, whose ‘tails’ are highly modified with different PTM types to regulate the interactions with their binding partners 6. A general hypothesis of ‘protein interaction codes’ 7, where PTMs modulate interaction specificity, has been described for a small number of other proteins

8–12

. An emerging challenge is therefore to decipher how PTMs, alone

or in combinations, regulate protein-protein interactions 1. Knowledge of how PPIs are regulated will inform our understanding of cellular processes. Understanding of the regulatory role of PTMs requires the identification of PTMs. Thousands of PTM sites can now be routinely identified in large-scale tandem mass spectrometry (MS/MS) experiments

13–16

. For unambiguous identification of PTM sites, detection of modified peptides

3

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 63

which are unique to the protein as well as localisation of the modified residue is required

17,18

.

However, proteome-wide identification of PTM sites by MS remains challenging since PTMs can be labile, thereby affecting peptide ionisation and fragmentation 17,18. Modified peptides can also be present in low abundance and therefore difficult to detect

17,18

. Despite these challenges,

PTM sites identified by high-throughput methods are available in various online repositories 25

19–

. The quality and coverage of PTM sites discovered by high-throughput methods could be

improved by consolidating PTM sites identified by different research groups or curated by different repositories 17. High-confidence identification of PPIs is essential for understanding the role of PPIs in cellular processes. Interactions between proteins are typically detected using two main experimental methods, namely binary and co-complex methods

26

. Binary methods such as yeast two-hybrid

(Y2H) 27, measure direct interactions between protein pairs whereas co-complex methods such as affinity purification-mass spectrometry (AP-MS), measure interactions within protein complexes but lack information of direct physical binding 26. However, regardless of which method is used to detect interactions, each approach will have different strengths and weaknesses making it difficult to attain proteome-scale coverage of PPIs

28

. For example, Yu et al.

29

showed that

different large-scale screens in Saccharomyces cerevisiae, including Y2H and AP-MS, have little overlap due to differences in completeness of the search space and assay sensitivity. In the same way, Rolland et al.

30

highlighted that large zones of the human interactome remain uncharted

due to the fact that there is considerable bias in the literature towards proteins associated with disease. The combinatorial complexity of the human interactome as well as the fact that weak or transient interactions are difficult to detect also contribute to the challenges of attaining proteome-scale coverage of PPIs

28,30

. The above suggests that proteome-wide coverage of PPIs

4

ACS Paragon Plus Environment

Page 5 of 63

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

can be improved by combining PPIs detected with different approaches, including those that are curated in various online repositories 31–33. Systematic investigations that link the functional role of PTMs to PPIs remain challenging 34. It requires integrating high-quality data on PTMs and PPIs, and co-analysing them to gain insights into how PTMs might regulate PPIs. Several methods have been developed to directly identify PTMs that modulate PPIs, including the yeast tribrid or conditional two-hybrid system 35,36

. More recently, Grossmann et al.

37

reported an approach to identify binary PTM-mediated

interactions. They extended the traditional Y2H system to detect interactions between bait proteins containing phosphotyrosine (pY)-recognition domains and phosphorylated prey proteins. By using their approach, Grossmann et al. 37 identified 292 mostly novel pY-dependent interactions in humans with a single experiment. Yet because of the lack of experimental methods to map PTM-mediated PPIs in a high-throughput setup, recent studies have focused on connecting PTM and PPI datasets through network analysis. This is especially useful for providing insights into global PTM-associated relationships. For example, Duan et al.

38

investigated whether different types of PTMs are characterised by different network properties. From their analysis of 12 different PTM types, Duan et al.

38

found that proteins with PTMs

engage in more interactions and are involved in regulatory functions such as relaying information compared to proteins without PTMs. A number of software platforms are available for visualising and analyzing PPI networks 39–41. In the same way, existing tools such as the adaptation of GEOMI by Ho et al. 42, PTMapper and PhosphoPath

44

43

, allow users to co-visualise and co-analyse some PTM data within PPI

networks. Despite this however, there is a lack of tools for exploring the regulatory role of PTMs and how they might be involved in PPIs. Such tools can improve our understanding of PTM-

5

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

mediated interactions or PTM-associated relationships

28

Page 6 of 63

. In this study, we developed

PTMOracle, a new Cytoscape app that facilitates the co-visualisation and co-analysis of PTMs in the context of PPI networks. This allows users to develop systematic searches for proteins of interest and explore network visualisations that address complex PTM-associated relationships. Furthermore, PTMOracle allows users to integrate other protein data into PPI networks, allowing the role of domains, motifs and disordered regions to be considered. This allows users to coanalyse relationships between PTMs, PPIs and other sequence annotations to generate testable hypotheses regarding the functional role of PTMs in PPIs. To illustrate how PTMOracle can be used to explore protein PTMs and their involvement in PPIs, we present case studies using the interactomes of yeast S. cerevisiae and human.

MATERIALS & METHODS Construction of PTMOracle PTMOracle (v1.00) is a Cytoscape app designed to facilitate the co-visualisation of PTMs in the context of PPI networks. Apart from PTMs, PTMOracle also allows users to integrate other types of protein data into PPI networks, such as the protein sequence and sequence annotations including domains, motifs and disordered regions. To do this, all protein data must be formatted into either a XML-based or tab-separated (TSV) file. The format of the XML-based or TSV file required by PTMOracle is available in Supporting Information 1 (Method S1 and Figure S1). Protein data that was formatted into the required format were parsed and mapped onto protein nodes in an existing Cytoscape network and summarized in new columns of the Cytoscape Table Panel by PTMOracle. In conjunction with the Cytoscape framework, protein data were then visualised, queried and explored with different features of PTMOracle. The features of

6

ACS Paragon Plus Environment

Page 7 of 63

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

PTMOracle are described in more detail in the Results section. PTMOracle requires Cytoscape v3.1

or

higher,

and

is

available

for

download

on

the

Cytoscape

app

store:

http://apps.cytoscape.org/apps/ptmoracle. The source code for PTMOracle is under the GPL v3 license

and

is

available

for

download

via

the

BitBucket

repository

https://bitbucket.org/aidantay/ptmoracle/src.

Yeast interaction network data and visualisation The yeast PPI network described by Pang et al.

45

was loaded into Cytoscape and used to co-

visualise protein PTM sites and sequence annotations. The network (referred to as SBI) was constructed using PPIs that were verified by two or more experiments. This network comprised of 3,843 protein nodes and 13,292 non-redundant interactions. Protein identifiers and descriptions from Uniprot

25

were also mapped onto the network as node attributes using the

Cytoscape framework.

Human interaction network data and visualisation Two human PPI networks derived from Rolland et al.

30

were used in this study. The first

network (referred to as Lit-BM-13) was constructed from literature-curated binary pairs supported by multiple lines of evidence, and contained 5,545 protein nodes and 11,045 interactions. The second network (referred to as HI-II-14) was constructed using pairwise twohybrid experiments, validated with orthogonal interaction assays to systematically map highquality binary PPIs. This network comprised of 4,303 protein nodes and 13,944 interactions. Using the Cytoscape framework, protein identifiers from Uniprot

25

and gene descriptions from

NCBI Entrez Gene 46 were also mapped onto each network as node attributes.

Data sources 7

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 63

PTM data PTM sites on yeast and human proteins were downloaded from Uniprot ProteomeScout

24

25

, dbPTM

23

and

. To increase proteome-wide coverage, PTM sites in each repository were

consolidated into a single set using Uniprot as a reference. This was done by using custom scripts to map protein IDs and PTM sites from dbPTM and ProteomeScout onto Uniprot accessions numbers and protein sequences respectively. All PTM sites reported in each repository considered position 1 as the initiating methionine. PTM sites from dbPTM or ProteomeScout, which were mapped onto the Uniprot protein sequence but reported a different residue compared to the Uniprot protein sequence, were removed from the final dataset. Following from this, scripts were used to tally 8 main PTM types. Of these were 5 commonly occurring intracellular PTMs: acetylation, methylation, phosphorylation, ubiquitination and sumoylation; and 3 PTMs which predominantly occur on membrane-associated or secreted proteins: lipidation, N-linked glycosylation and O-linked glycosylation 2. We note that several subtypes exist for different PTM types, which may have different functions, properties, modifying enzymes and substrates (e.g. phosphotyrosine and phosphoserine). However, in the case of lipidation, we only considered myristoylation, palmitoylation, farnesylation and geranylgeranylation. We also note that O-linked glycosylation can exist as O-GlcNAc and occur intracellularly to regulate signal transduction, in conjunction with phosphoserines and phosphothreonines 2. Together, this generated a total of 18,198 unique PTM sites on 3,356 yeast proteins, and 321,584 unique PTM sites on 19,342 human proteins. Custom scripts were also used to parse and format all PTM sites into the XML-based format required by PTMOracle. The XML-based files containing PTM sites for yeast and human proteins are provided in Supporting Information 2.

8

ACS Paragon Plus Environment

Page 9 of 63

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Domain and domain-domain interaction data Yeast domain-domain interactions (DDIs) were obtained from Kim et al. 47. Briefly, Kim et al. mapped DDIs from Pfam onto solved crystal structures to identify distinct binding interfaces. To identify the exact domain positions of interacting domains, we also downloaded domain annotations from Pfam 48 and mapped them to each DDI using custom scripts. DDIs that involve at least 1 domain with an e-value of greater than 0.01 or could not be mapped to domain annotations from Pfam were removed from the final dataset. Following this, custom scripts were used to parse and format DDIs and domain annotations for use in Cytoscape and PTMOracle respectively. Overall, a total of 1,445 non-redundant DDIs between corresponding Pfam domains on yeast proteins were mapped onto yeast PPI networks as edge attributes. The XML-based file containing exact domain positions for yeast proteins is provided in Supporting Information 2.

Kinase-substrate interaction data Yeast kinase-substrate interactions (KSIs) were obtained from the Yeast Kinase Interaction Database (KID, March 14, 2016 release) described by Sharifpoor et al.

49

. The KID database

contains data from both high- and low-throughput experiments associated with phosphorylation events. Custom scripts were then used to remove interactions with a KID score of less than 6.73. For high-quality interactions, Sharifpoor et al. recommended a threshold cut-off of 6.73 which corresponds to a p-value of less than 0.01 49. Custom scripts were then used to parse and format interactions for use in Cytoscape. A total of 582 non-redundant KSIs were mapped onto yeast PPI networks as edge attributes.

9

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 63

Phosphotyrosine-dependent interaction data Human phosphotyrosine (pY)-dependent interactions were obtained from two large-scale studies. Grossmann et al.

37

introduced a third plasmid expressing human kinases into the Y2H

system to detect pY-dependent interactions involving SH2 or PTB domains. On the other hand, Tinti et al.

50

employed a text-mining approach and manually curated SH2-pY dependent

interactions from the literature. This included SH2-pY dependent interactions between nonhuman proteins, which were subsequently removed with custom scripts. Custom scripts were also used to parse and format interactions for use in Cytoscape. Overall, a total of 292 pYdependent interactions from Grossmann et al. and 264 SH2-pY dependent interactions from Tinti et al. were mapped onto human PPI networks as edge attributes.

Phosphodegron motif data To predict kinases that are involved in phosphorylation-dependent ubiquitination and degradation, we searched the yeast PPI network for proteins with phosphodegron motifs. This was done by using the MotifFinder module in OracleTools (See Results) to calculate the location of

two

known

phosphodegron

patterns

([LIVMP]X{0,2}[ST]PXXE

[LIVMP]X{0,2}[ST]PXX[ST]) described by Swaney et al.

51

and

. For each sequence pattern, X

matches any amino acid, square brackets match any amino acid inside the brackets and curly brackets describe the number of times the preceding amino acid can be repeated. XML-based files containing the location of phosphodegron motifs on yeast proteins identified by MotifFinder are provided in Supporting Information 2.

Peak gene expression data

10

ACS Paragon Plus Environment

Page 11 of 63

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

To explore how phosphorylation-dependent ubiquitination and degradation regulate cell cycle progression, we obtained peak gene expression data from Granovskaia et al.

52

. Granovskaia et

al. measured the expression of RNAs at 5 minute intervals for up to 3 cell division cycles using microarrays. This includes protein-coding transcripts as well as non-coding transcripts from antisense or intergenic regions. However, only peak gene expression data for protein-coding transcripts were considered in our analyses. In total, peak gene expression for 587 protein-coding transcripts were mapped onto the yeast PPI network as node attributes.

RESULTS Overview and features of PTMOracle PTMOracle is a Cytoscape app that allows users to integrate different types of protein data into PPI networks. This includes the protein sequence, PTM sites and annotations such as domains, motifs and disordered regions. When running PTMOracle, users have access to several analytical tools that perform PTM-related queries. With these queries, users can develop systematic searches within the Cytoscape environment. These can highlight proteins of interest and explore network visualisations that address complex PTM-associated relationships. PTM sites can also be mapped on to protein sequences and annotations to help generate hypotheses about PTMs and their roles in PPIs. To facilitate the integration of protein data with networks, a new PTMOracle framework was developed in Cytoscape (shown in Figure 1). Using the Cytoscape framework, users create PPI networks with network data. This includes PPIs as well as additional edge data (e.g. kinasesubstrate, domain-domain) and node data (e.g. protein IDs, protein descriptions). To map protein data onto nodes in the existing Cytoscape network, users can format protein data into an XML-

11

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 63

based or TSV file that can be processed by the PTMOracle framework (Supporting Information 1; Method S1 and Figure S1). In addition to the new framework for processing protein data, PTMOracle introduces three new components to the Cytoscape interface, labelled as OraclePainter, OracleTools and OracleResults (Figure 2). By using these components, users can visualise, query and explore protein data that has been mapped onto protein nodes in the network. This is achieved through standard Cytoscape operations, including searching for nodes, finding neighbour nodes, creating subnetworks and applying network layouts. Details of each PTMOracle component are discussed in the following sections.

Figure 1: Overview of Cytoscape and the PTMOracle framework. PPIs and network annotations are processed by the Cytoscape framework to create PPI networks. By using PTMOracle, protein data can also be imported into Cytoscape. This is achieved by formatting PTM sites and sequence annotations into an XML-based or tab-separated (TSV) file format that PTMOracle can parse and map onto protein nodes in the Cytoscape network.

12

ACS Paragon Plus Environment

Page 13 of 63

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure 2: The PTMOracle app running within Cytoscape. PTMOracle introduces three new components into the Cytoscape interface (highlighted in yellow boxes). These components are the OraclePainter, OracleTools and OracleResults. OraclePainter (highlighted in green box) can visualise PTMs on protein nodes as pie charts in the network view (highlighted in purple box). OracleTools (highlighted in red box) contains several analytical modules (Calculator, PairFinder, RegionFinder and MotifFinder) that output results into newly created columns in the Cytoscape Table Panel (highlighted in brown box). These results can be used with standard Cytoscape operations to identify proteins of interest. OracleResults (highlighted in blue box) can be used to examine, compare and co-locate PTM sites and sequence annotations such as domains, motifs and disordered regions on protein nodes.

13

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 63

Visualisation of proteins and their modifications The OraclePainter, located in the Cytoscape Control Panel (Figure 2) allows protein nodes to be displayed in multiple colours depending on their PTM types. It does this by utilising PTM information to render protein nodes as pie charts (Figure 3A). Each section in the pie chart represents a PTM type and its frequency as a proportion of the total number of PTM sites on the protein node. Each section of the pie chart matches the colours assigned by the user in the OraclePainter colour palette. Protein nodes that do not have PTM types corresponding to the ones in the colour palette are coloured grey. By visualising protein nodes with OraclePainter, users are able to quickly identify and highlight proteins with a specific type of PTM or combination of PTMs.

Displaying protein PTM sites and sequence annotations Located in the Cytoscape Results Panel, OracleResults tabulates PTM sites and sequence annotations associated with any given protein node/s (Figure 2). To view this information, users input the name/s (or another identifier) of the protein node into the search box. PTM sites and sequence annotations associated with the protein are then presented in a table (Figure 3B). The sequence for the protein node is also shown in a text area and highlighted with PTM sites and sequence annotations from the table if applicable. With OracleResults, users are able to explore which PTM type or sites might be involved in interactions and where these interactions might take place.

Searching for proteins of interest

14

ACS Paragon Plus Environment

Page 15 of 63

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

OracleTools, located in the “Apps” menu bar under the PTMOracle submenu, contains a number of analytical modules that utilise the PTM sites and sequence annotations mapped onto protein nodes (Figure 2). The output of each module is displayed in newly created column/s of the Cytoscape Table Panel, which can then be used for searching or filtering nodes and edges with standard Cytoscape operations. The output for each module can also be used in combination with each other to build complex or detailed queries based on several criteria. Interacting partners of proteins of interest can also be highlighted and used to construct subnetworks with the Cytoscape framework for more targeted investigations (Figure 3C). OracleTools contains four analytical modules labelled as Calculator, PairFinder, RegionFinder and MotifFinder. The Calculator module counts the total number of PTM sites, the frequency of specific PTM types or the number of sequence annotations (e.g. domains, motifs and disordered regions) on protein nodes. The PairFinder module helps to identify protein nodes with pairs of PTM sites that are separated by a specified number of amino acids. This is useful for identifying PTM interplay or cross-talk and other logic modules that may function in roles that are common across higher eukaryotes

53

. The RegionFinder module identifies protein nodes with PTM sites

that are co-located within specified sequence annotations such as domains, motifs and disordered regions. This is useful because PTMs that occur within certain sequences are able to create or block protein-protein or protein-ligand interactions through conformational or physiochemical changes

4,5

. Finally, the MotifFinder module identifies protein nodes whose sequence contains a

specified motif. This is of use because sequence motifs can reflect interaction specificity, and hence, recurring PTM sites within sequence motifs that may be involved in systems for the regulation of interactions

54

. To illustrate how PTMOracle can integrate and visualise PTM-

associated relationships, below we explore four case studies. An overview of the workflow used

15

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 63

for each case study with yeast is provided in Supporting Information 1 (Figure S2). The Cytoscape sessions for all case studies are also provided in Supporting Information 3.

Figure 3: Major features of PTMOracle. A) OraclePainter is used to render protein nodes in the PPI network as pie charts. Each sector in the pie chart represents a specific PTM type and its frequency as a proportion of the total number of PTMs on protein nodes. The colours in each section of the pie chart are user-assigned in the colour palette, located in the Cytoscape Control Panel (highlighted in red box). B) Located in the Cytoscape Results Panel, OracleResults is used to visualise protein data associated with a group of protein nodes. The OracleResults output shows a table containing information on PTM sites and sequence annotations such as domains, motifs and disordered regions (highlighted in blue box), a text area for the protein sequence 16

ACS Paragon Plus Environment

Page 17 of 63

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

(highlighted in green box) and a legend for PTM sites and sequence annotations mapped onto the sequence (highlighted in purple box). PTM sites and sequence annotations are also highlighted in the protein sequence if applicable. C) Different modules (e.g. Calculator, PairFinder, RegionFinder and Motif Finder) in OracleTools are used to find proteins of interest (highlighted in yellow). Specialised subnetworks containing proteins of interest and their interacting partners can also be created and explored for more targeted investigations.

Case Study 1: Visualising PTMs on proteins with PTMOracle Histone proteins are highly modified on their ‘tails’ with various types of PTMs. Combinations of their PTMs regulate the specificity of their PPIs in a code-like manner, leading to the compaction or relaxation of chromatin 6. Following from this, it has been proposed that other highly modified proteins may also feature “interaction codes” 7. However, visualising highly modified proteins without increasing the visual complexity of the network can be difficult with the basic functions of Cytoscape. To address this, we demonstrate how PTMOracle can be used to do this or assist in this process. An overview of the workflow for this case study is provided in Supporting Information 1 (Figure S2A). For the purpose of this study, we visualised and quantified PTMs on yeast histone proteins. To account for different curation methods and database completeness, we used an integrated set of all PTM sites on yeast proteins from Uniprot 25, dbPTM 23 and ProteomeScout 24. A comparison of the number of PTM sites identified in each database is provided in Supporting Information 1 (Results S1 and Figure S3). Using PTMOracle, we then mapped PTM sites from the integrated data set onto the SBI yeast network described by Pang et al. 45. Interactions in the SBI network have been reported in two or more experiments. With standard Cytoscape operations, we then

17

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 63

searched for histone proteins and created a subnetwork to reduce visual complexity. Interactions with non-histone proteins were not included. Finally, we used OraclePainter to visualise the types of PTMs, and the Calculator module in OracleTools to count the number of PTM sites on each histone protein. The resulting subnetwork contained 10 histone proteins, including 2 copies of each core histone protein (H2A, H2B, H3 and H4), 1 linker protein (H1) and 1 histone variant protein (H2A.Z). Upon visualising the network with OraclePainter, as expected, we found that different histone proteins carry different PTM types and in different proportions (Figure 4). All histone proteins have phosphorylation and acetylation sites except histone H2A.Z (HTZ1) and H1 (HHO1) respectively. Within the histone octamer, we see that histones H2A (HTA1 and HTA2) and H2B (HTB1 and HTB2) have sumoylation sites, H2A and H3 (HHT1 and HHT2) have methylation sites but only H2B has known ubiquitination sites. Surprisingly, some known PTM types on histone proteins were not visualised as they were not present in the underlying databases. A reason for this could be that some PTMs only occur under specific conditions, such as sumoylation on histone H2A.Z which occurs during double-stranded DNA damage

55

. Other

types of PTMs characterised on eukaryotic orthologs, such as methylation and ubiquitination on histone H1 and H2A.Z 55,56, remain largely uncharacterised in yeast. Although all and any types of PTMs and their subtypes can be visualised (Supporting Information 1; Figure S4), another reason for missing some known PTM types is that only 8 main PTM types were considered in our analysis (See Methods). Because of this, some PTMs were not visualised, including ADPribosylation which has been detected on all core histone proteins and the linker histone H1 6. Nevertheless, these results demonstrate the value of using OraclePainter to quickly visualise combinations of PTMs on proteins in Cytoscape.

18

ACS Paragon Plus Environment

Page 19 of 63

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Visualisation of the network also shows that histone proteins have many interactions (Figure 4). Core histone proteins were interconnected, highlighting their biological context within the nucleosome. In conjunction with OraclePainter, we observed some known PTM types on core histone proteins that are involved in regulating interactions associated with nucleosome assembly/disassembly and stability. Notably, we observed acetylation on histone H4 (HHF1 and HHF2) as well as acetylation and phosphorylation on histone H3. Located at the interface between H4 and H2B, acetylation of K92 on histone H4 has been suggested to weaken interactions between the H3-H4 tetramer and the H2A-H2B dimer 57. Because of this, acetylation on histone H4 has been proposed to destabilise the nucleosome structure

57

. On the other hand,

acetylation of K57 on histone H3 has been shown to indirectly affect the interactions between the H3-H4 tetramer and H2A-H2B dimers through histone chaperone proteins 57. Acetylation on H3 increases the affinity of the H3-H4 tetramer to histone chaperone proteins which are involved in nucleosome assembly/disassembly

57

. In the same way, phosphorylation of S58 on histone H3

has been associated with increasing H2A-H2B dimer exchange of nucleosomes

57

. As a result,

phosphorylation on histone H3 has been suggested to interfere with the interactions between the H3-H4 tetramer and the H2A-H2B dimers which are necessary for octamer formation 57. We also used the Calculator module in OracleTools to explore the number of known PTM sites for each PTM type on histone proteins in the subnetwork. The number of PTM sites for each PTM type ranged between 48 (for acetylation) and 2 (for ubiquitination), whereas the total number of PTM sites on histone proteins in the subnetwork ranged between 17 for H3 and 1 for H1 (Supporting Information 1; Table S1). The total number of PTM sites on each histone protein in Figure 4 is represented by the node size. Surprisingly, we found that the PTM distribution for isoforms of histone H2B was different, with one phosphorylation site on HTB2 but not HTB1.

19

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 63

However, since only PTM sites from databases were considered in our analysis (See Methods), this reflects what sites are known. Whilst it is likely that both isoforms of H2B can carry this phosphorylation site, PTMOracle does not make predictions as to which PTMs occur on which residues and/or proteins. Because of this, users should be aware that PTM-specific isoforms may not be accurately represented on histones or other proteins with isoforms. We also observed that the range of PTM type and number of PTM sites on histone proteins showed a bias towards wellstudied PTMs. This was expected since databases are more likely to contain PTM sites on proteins that have been studied extensively, such as acetylation on H3. Together, the results from OraclePainter and the Calculator module in OracleTools are complementary, providing both qualitative and quantitative results respectively. In doing so, more accurate queries and searches can be performed to highlight potential proteins of interest and features of their PTMs. In the future, expanding the histone network to interacting partners of histone proteins as in Perner et al.

58

, could be useful for predicting proteins that interact with PTMs involved in transcription

and its regulation.

20

ACS Paragon Plus Environment

Page 21 of 63

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure 4: Visualisation of yeast histone proteins, their interactions and PTMs with OraclePainter, using the PPI network from Pang et al. 45. The subnetwork contains 10 histone proteins; each core histone protein (HTA1, HTA2, HTB1, HTB2, HHT1, HHT2, HHF1 and HHF2), 1 linker protein (HHO1) and 1 histone variant protein (HTZ1). PPI interactions between histone proteins are highlighted in grey and histone proteins are coloured with OraclePainter. The types of PTMs and the proportion of each type are illustrated by the pie charts on each histone protein. The total number of PTM sites on each histone protein is represented by the node size. Across all histone proteins, 5 PTM types are shown including phosphorylation (red), acetylation (blue), methylation (black), ubiquitination (green) and sumoylation (purple).

Case Study 2: Exploring PTM-mediated interactions Interactions between proteins often occur at distinct binding interfaces

59

. For some proteins,

this may involve recognizing PTM sites with structural domains 5. However, identifying PTMmediated interactions as well as the PTM sites and recognition domains that facilitate interactions remains difficult 54. To address this, we demonstrate how PTMOracle can be used to explore PTM-mediated interactions and better understand the role of PTMs in this process, and identify regions of interacting proteins that are involved. An overview of the workflow for this case study is provided in Supporting Information 1 (Figure S2B). To identify PTM-mediated interactions between distinct binding interfaces, we visualised a domain-domain interaction (DDI) subnetwork of the yeast interactome. This was done by using the Cytoscape framework to map DDIs from Kim et al.

47

onto the SBI network as edge

attributes. DDIs have been mapped onto solved crystal structures by Kim et al. and therefore likely to be a distinct binding interface between protein complex members. Standard Cytoscape

21

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 63

operations were subsequently used to create the DDI subnetwork and remove non-DDIs to reduce visual complexity. Following from this, PTMOracle was used to map exact domain positions from Pfam

48

onto the DDI subnetwork. Identifiers from PDB were also mapped onto

the DDI subnetwork as node attributes, and edge attributes if interacting proteins were part of the same PDB structure (Figure S5). Together with standard Cytoscape operations, we then used the RegionFinder module in OracleTools to identify proteins with domains containing at least 1 PTM site, and DDIs that involve at least 1 of these proteins. Finally, OracleResults was used to tabulate the location of PTM sites and recognition domains on proteins that might be involved in PTM-mediated interactions. Visualisation of the DDI subnetwork revealed several clusters of proteins, with the number of nodes in each cluster ranging between 2 and 101 (Figure S5). Upon examining the network, we found that most protein members of each protein cluster were functionally related. This was illustrated in the 3 largest protein clusters of the DDI subnetwork (Figure 5), whereby each cluster was enriched (p-value < 0.01) for one biological process: cell signaling, transcription, or cytoskeleton organization (Supporting Information 1; Method S2 and Supporting Information 4). This was expected since DDIs often occur between protein members of stable complexes or subcomplexes and therefore are likely to be involved in similar functions 60. With RegionFinder and standard Cytoscape operations, we then queried the DDI subnetwork to find putative PTM-mediated interactions. These DDIs are potentially PTM-mediated since the PTMs on proteins identified by RegionFinder might be recognised or facilitate the interaction between domains on interacting partners. This analysis identified 197 proteins with domains containing at least 1 PTM site and 370 DDIs that involve at least 1 of these proteins. For example, we found that CDC28 contains at least 1 PTM site that might be recognised by

22

ACS Paragon Plus Environment

Page 23 of 63

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

domains on interacting proteins (Figure 5). Because of this, we then used OracleResults to visualise the location of PTM sites and recognition domains that might facilitate these interactions. Upon examining the OracleResults for CDC28, 2 known phosphotyrosine (Y19 and Y168) and 2 known phosphothreonine (T18 and T169) sites were co-located within a Pkinase domain, which is known to be the catalytic domain responsible for the function of CDC28 48. At the same time, examining the OracleResults for the interacting proteins of CDC28 revealed several domains that interact with the Pkinase domain of CDC28. Interestingly, we observed that all B-type cyclins (CLB1-CLB6) interact with the Pkinase domain of CDC28 via Cyclin_C domains. However, previous studies have shown that the interactions between CDC28 and Btype cyclins may also occur in the absence of the phosphorylation of CDC28 and therefore are not necessarily PTM-mediated 61. This suggests that not all DDIs we identified will be mediated by PTMs and that validation of such DDIs is essential. We also note that some domains will recognise PTMs outside of canonical domains on their interaction partners and these are likely to be missed by our analysis. Together, our results demonstrate the value of using RegionFinder and OracleResults for visualising PTMs and recognition domains, allowing the generation of hypotheses to better understand the functional role of PTMs in PPIs. However, the above suggests that some PTM-mediated interactions will be difficult to identify and therefore visualise or analyse with PTMOracle.

23

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 63

Figure 5: A search for PTM-mediated interactions in a domain-domain interaction (DDI) network. A) Visualisation of the 3 largest protein clusters in the yeast domain-domain interaction subnetwork, highlighting DDIs that might involve PTMs. Protein nodes with at least 1 PTM within domain sequence annotations, as identified by RegionFinder, are highlighted in green, whereas all other protein nodes are highlighted in purple. DDIs that might involve PTMs are also represented as orange edges whereas other DDIs are grey. Note however, that our analysis 24

ACS Paragon Plus Environment

Page 25 of 63

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

cannot reveal whether PTMs are actually required on one, both or neither proteins for an interaction to occur. B) OracleResults tables showing the sequence position of PTM sites and domains mapped onto i) CDC28 and ii) CLB1. The sequence position of PTM sites for CDC28 and recognition domains for CLB1 that might facilitate interactions are both indicated with red arrows. PTM sites and sequence annotations are also co-located onto the protein sequence (highlighted in pink and blue respectively).

Case Study 3: Predicting kinases involved in phosphorylation-dependent ubiquitination and degradation Phosphodegron motifs are short linear motifs whereby phosphorylation within the motif, typically on serine/threonine residues, promotes ubiquitination and subsequent protein degradation

62

. Although it is possible to identify phosphorylation sites within phosphodegron

motif sequences, identifying the kinases responsible remains difficult. To demonstrate how PTMOracle can be used to predict kinases involved in phosphorylation-dependent ubiquitination and degradation, we focused on a kinase-substrate interaction (KSI) subnetwork of the yeast interactome. An overview of the workflow for this case study is provided in Supporting Information 1 (Figure S2C). To predict kinases that are involved in phosphorylation-dependent ubiquitination and degradation, we used the Cytoscape framework to map KSIs from Sharifpoor et al. gene expression times from Granovskaia et al.

52

49

and peak

onto the SBI network as edge and node

attributes respectively. Standard Cytoscape operations were then used to create a KSI subnetwork of the yeast interactome by removing non-KSIs. Using the MotifFinder module in OracleTools, we then calculated the location of phosphodegron motifs by identifying sequences

25

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

that

contain

known

phosphodegron

patterns

[LIVMP]X{0,2}[ST]PXX[ST]) described by Swaney et al.

([LIVMP]X{0,2}[ST]PXXE 51

Page 26 of 63

and

. We then used the RegionFinder

module in OracleTools to identify proteins with at least 1 phosphorylation site on [ST] residue/s within phosphodegron motifs. Proteins which are phosphorylated on [ST] reisdue/s of the motif are likely to be associated with phosphorylation-dependent ubiquitination and degradation

62

.

Subsequently, proteins that do not interact with proteins identified with RegionFinder were removed to reduce visual complexity. Finally, OracleResults was used to visualise the location of serine/threonine phosphorylation sites within phosphodegron motifs. Visualisation of the KSI subnetwork revealed one interconnected cluster and several smaller clusters, with the number of nodes in each cluster ranging between 2 and 120 (Figure 6A). With RegionFinder, we identified 77 proteins with at least 1 serine/threonine phosphorylation site within a theoretical phosphodegron motif. Of the 77 proteins identified with RegionFinder, 28 proteins (36%) were also observed to be periodically expressed. This includes several proteins (ASH1, FAR1, SWI5 and SIC1) which have been reported to undergo phosphorylationdependent ubiquitination and degradation 51. However, for the purpose of this study, we focused on the interacting partners of SIC1. Upon examining the network, we observed several kinases that target SIC1 including PHO85, HOG1, CDC28 and IME2. Based on this, we predict that at least 1 kinase interacting with SIC1 could be responsible for the serine/threonine phosphorylation of SIC1 within the phosphodegron motif, leading to degradation. Phosphorylation-dependent ubiquitination and degradation of SIC1 has previously been reported to be an important regulator of cell cycle progression 62. SIC1 binds to and inhibits the activity of the Cyclin-B/Cdk1 protein complex, preventing progression from G1 and S phases of the cell cycle 62. However, since SIC1 was also observed to be periodically expressed, inhibition of the

26

ACS Paragon Plus Environment

Page 27 of 63

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Cyclin-B/Cdk1 protein complex by SIC1 is likely to occur right before SIC1 is degraded (‘justin-time’)

63

. Degradation of SIC1 therefore releases the Cyclin-B/Cdk1 protein complex,

ensuring that the cell commits into the S phase 62. Together, our results demonstrate the value of using OracleTools for performing targeted investigations on PTM-associated relationships, allowing us to predict kinases that may be important for regulating cell-cycle protein complexes. Having predicted the kinases that could be responsible for the phosphorylation of SIC1 within the phosphodegron motif, we then used OracleResults to identify which serine/threonine phosphorylation sites are likely to promote ubiquitination and degradation. By examining the OracleResults for SIC1, we identified one known phosphothreonine (T5) and one known phosphoserine (S76) site on SIC1 that were located within separate phosphodegron motifs matching the [LIVMP]X{0,2}[ST]PXX[ST] pattern (Figure 6B). Besides this, we also identified several known ubiquitination sites (K32, K36 and K84) on SIC1 that are known to specify the degradation of SIC1. Kõivomägi et al.

64

previously demonstrated that CDC28 phosphorylates

SIC1 at T5 and S76, both of which act as primers for additional serine/threonine phosphorylation sites (namely at T33, T45, T48, S69 and S80). Interestingly however, with the exception of T33, the remaining serine/threonine phosphorylation sites were not observed despite being located within a phosphodegron motif. This highlights that some PTMs will be missing from repositories because they occur dynamically and therefore will be difficult to detect. Nonetheless, phosphorylation at these sites allows the SCF protein complex to bind and catalyses the ubiquitination of SIC1 (namely at K32, K36, K49 or K84)

65

. Ubiquitination of SIC1 at these

sites is therefore sufficient for the degradation of SIC1 by the proteasome and therefore commitment to the S phase of the cell cycle 65. Overall, our results highlight the value of using OracleResults for visualising serine/threonine phosphorylation sites within phosphodegron

27

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 28 of 63

motifs, allowing us to gain insights into phosphorylation-dependent ubiquitination and degradation.

Figure 6: Prediction of kinases involved in the phosphodegron. A) Yeast kinase-substrate interaction subnetwork highlighting protein nodes with phosphorylation sites within phosphodegron motifs. Protein nodes with at least 1 phosphorylation site on [ST] residue/s within phosphodegron motifs, as identified by RegionFinder, are highlighted in green. All other protein nodes are highlighted in blue. Protein nodes that are periodically expressed are shown with a red border. All interactions in the network are shown as directed edges, with kinases as the source and non-kinases as the target. B) OracleResults table showing the sequence position of phosphodegron motifs and PTM sites mapped onto the SIC1 protein (highlighted in the brown box in (A)). Calculated phosphodegron motifs as well as serine/threonine phosphorylation sites

28

ACS Paragon Plus Environment

Page 29 of 63

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

that have been previously reported are indicated with red and pink arrows respectively, whereas ubiquitination sites that are known to specify the degradation of SIC1 are indicated with blue arrows 64,65. All phosphodegron motifs and PTM sites on SIC1 are also highlighted in the protein sequence (highlighted in pink and red respectively).

Case Study 4: Visualising phosphotyrosine-mediated interactions in the human interactome In signal transduction, phosphorylation on tyrosine residues can act as a signal which can be propagated through an elaborate network

66,67

. This is done by creating binding sites that are

recognised by domains such as the Src-homology 2 (SH2) or phosphotyrosine binding (PTB) domains. The presence or absence of phosphotyrosine (pY) thus switches PPIs on and off. However, with over 10,000 pY sites and hundreds of possible recognition domains in human proteins, visualising and exploring their PTM-mediated interactions remains challenging

66

.

Here, we explore the potential of PTMOracle for investigating pY-dependent interactions in the human cell. Unlike yeast, human interaction networks remain largely incomplete, making it a challenge to accurately interpret and understand PTM-mediated interactions

30

. For this reason, we first

combined interactions in 4 human networks to increase coverage of the human interactome. To understand the difference between all interaction networks used, we compared the number of interactions across the 4 networks, which includes 2 PPI networks (referred to as HI-II-14 and Lit-BM-13) described by Rolland et al. 30, and 2 pY-dependent interaction networks (referred to as Tinti-pY and Grossmann-pY) described by Tinti et al.

50

and Grossmann et al.

37

. A

comparison of the strategies and techniques used to construct each network is provided in

29

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 30 of 63

Supporting Information 1 (Table S2). In general, we found very few interactions were common across all networks. Comparing similar human interaction networks (e.g. PPI/PPI or pY/pY), we found a total of 410 interactions that were present in HI-II-14 and Lit-BM-13, as well as 6 interactions that were present in Tinti-pY and Grossmann-pY (Figure 7A). By contrast, comparing different human interaction networks (e.g. PPI/pY) revealed that no interactions in Tinti-pY or Grossmann-pY were present in HI-II-14, whereas 118 interactions in Tinti-pY or Grossmann-pY were present in Lit-BM-13. The differences are likely due to different strategies and techniques used to construct these networks. The traditional Y2H method used to construct HI-II-14 does not involve the addition of pY to proteins and therefore will detect few pYdependent interactions. With this in mind, the lack of interactions in HI-II-14 and Grossmann-pY suggests that the majority of interactions for human proteins in Y2H are not pY-dependent. Furthermore, our results support previous findings that the majority of experimentally identified interactions have not been reported in the literature, suggesting that there are more PPIs in human that are still to be discovered

30,37

. This demonstrates the importance of using different

strategies to cover a large proportion of the human interactome. Having established the differences between all 4 networks, we derived a pY-dependent interaction subnetwork from the Lit-BM-13 network. This was done by using standard Cytoscape operations to map pY-dependent interactions from Tinti-pY and Grossmann-pY onto Lit-BM-13 as edge attributes. Interactions from Tinti-pY and Grossmann-pY were reported to involve the recognition of pY by SH2 or PTB domains 37,50. Standard Cytoscape operations were then used to create the pY-dependent subnetwork and remove non-pY-dependent interactions to reduce visual complexity. Using the PTMOracle framework, we then mapped PTM sites on human proteins from an integrated set from Uniprot 25, dbPTM 23 and ProteomeScout 24 onto the

30

ACS Paragon Plus Environment

Page 31 of 63

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

pY-dependent subnetwork. Finally, OraclePainter and OracleResults were used to investigate interactions in the human pY-dependent subnetwork. Visualisation of the network revealed one large interconnected cluster and several small clusters, with the number of nodes in each cluster ranging between 69 and 2 (Figure 8). Although the resulting network was too small and sparse for topological analysis, visualising this network in the context of PTMOracle could provide valuable biological insight. For example, visualising protein nodes with OraclePainter revealed that most proteins in the pY-dependent subnetwork carry phosphorylation sites and also carry other types of PTMs including N-linked glycosylation, ubiquitination and acetylation (Figure 8). Notably, we observed that proteins with N-linked glycosylation are receptor proteins with extracellular domains. Proteins in the pY-dependent subnetwork were enriched (p-value < 0.01) for biological processes such as cell surface receptor signalling, signal transduction, immune response and immune system processes (Supporting Information 1; Method S2 and Supporting Information 4). This was expected since N-linked glycosylation is important for cell surface receptor functions as well as T and B cell functions, whereas tyrosine phosphorylation commonly regulates PPIs involved in signalling processes 70

68–

. Although acetylation and ubiquitination act as important regulators of cellular metabolism and

protein turnover, they can also be involved in modulating signal processes alongside phosphorylation 2. Parker et al.

15

demonstrated that cross-talk between lysine acetylation and

serine phosphorylation likely acts as a mechanism for activating or suppressing signals involved in myocardial ischemia and cardioprotection. Similarly, both lysine acetylation and serine phosphorylation have previously been reported to cross-talk with ubiquitination to control protein stability and therefore dictate the fate of proteins 71. Together, our results demonstrate the

31

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 32 of 63

capacity of PTMOracle for visualising PTM-mediated interaction networks, and how it can reveal possible cross-talk between different types of PTMs. Using OracleResults, we also found that the majority (163 / 166) of interacting pY residues identified in Grossmann-pY or Tinti-pY were also present in Uniprot, dbPTM and ProteomeScout (Figure 7B). At the same time, some of the interacting pY residues identified in Grossmann-pY or Tinti-pY were not present in the integrated set of PTMs and vice versa. Notably, we found 3 interacting pY residues in Grossmann-pY or Tinti-pY that were not present in Uniprot, dbPTM and ProteomeScout. Since these sites have a functional role in facilitating pY-dependent interactions, they are likely to be phosphorylated and therefore could be added into online repositories. By contrast, we found 849 interacting pY residues in Uniprot, dbPTM and ProteomeScout that were not in Grossmann-pY or Tinti-pY. These sites could be potential sites to screen or could be involved in other PTM-mediated interactions.

32

ACS Paragon Plus Environment

Page 33 of 63

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure 7: Human protein-protein interactions and their overlap with phosphotyrosine (pY)dependent interactions. A) Venn diagram showing the number of interactions that were identified in each network dataset. Overall, there is minimal overlap between the networks, suggesting that pY-dependent interactions are sparsely mapped. The numbers in brackets, below each network name, represents the total number of interactions in the network. B) Venn diagram showing the number of interacting pY residues identified in each dataset. The majority of pY sites present in the integrated set of PTMs were not identified in Grossmann-pY or Tinti-pY, suggesting that these could be potential sites to screen. The numbers in brackets, below each dataset name, represent the total number of interacting pY residues identified in the dataset.

Figure 8: Phosphotyrosine (pY)-dependent human PPI subnetwork visualised with OraclePainter, with data from Rolland et al. 30. Interactions in the network have been reported to involve the recognition of pY by SH2 or PTB domains by Tinti et al. 50 or Grossmann et al. 37. 33

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 34 of 63

Overall, visualisation of the network revealed one large interconnected cluster and several small clusters, with the number of nodes in each cluster ranging between 69 and 2. Visualising the network with OraclePainter revealed that most proteins carry phosphorylation sites and other types of PTMs, including N-linked glycosylation, ubiquitination and acetylation.

DISCUSSION In this study, we presented PTMOracle, a Cytoscape app that facilitates the co-visualisation and co-analysis of PTMs in the context of PPI networks. PTMOracle also allows users to integrate other types of protein data such as sequences, and their corresponding domains, motifs and disordered regions. PTMOracle is one of the first tools to do this, allowing users to explore relationships between PTMs, sequence annotations and PPIs to better understand the regulatory role of PTMs. Nevertheless, there are several other Cytoscape apps that also recognise the importance of integrative network analysis of PTMs. Both PTMapper

43

and PhosphoPath

44

allow users to visualise and analyse quantitative PTM proteomic datasets within Cytoscape to understand interaction dynamics. They do this by representing PTM sites as individual nodes and using colours to visualise the quantification data for each PTM site. In contrast to PTMapper, PhosphoPath also allows users to visualise quantitative information across multiple time-points or conditions. However, both PTMapper and PhosphoPath do not co-visualise and co-analyse PTMs in the context of the protein sequence and other sequence annotations such as domains or motifs. On the other hand, web-based tools or online repositories often display PTM information alongside other types of protein data, but lack detailed views of PPI networks. For example, ProteomeScout

24

allows users to visualise the co-location of different PTMs and sequence

annotations such as sequence mutations and protein secondary structures. This information is

34

ACS Paragon Plus Environment

Page 35 of 63

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

presented in a track-based system allowing users to browse and compare the relationships between multiple sequence annotations on a whole protein or sequence level.

Analyses with PTMOracle PTMOracle introduces three new components to the Cytoscape user interface, namely OraclePainter, OracleResults and OracleTools. We demonstrated how the components of PTMOracle can be used in conjunction with standard Cytoscape operations to explore PTMassociated relationships, through case studies in S. cerevisiae and human. Firstly, we showed that rendering protein nodes as pie charts with OraclePainter is useful for representing highly modified proteins such as histones. Compared to textual annotations previously described by Ho et al.

42

, our strategy is more effective for visually representing combinations of PTMs on

proteins. Secondly, we demonstrated how OracleResults can be used to identify PTM types or specific sites that might be recognised by domains, and visualised regions in the network where these interactions take place. Although this was useful for identifying serine/threonine phosphorylation sites on CDC28 that might be recognised by domains on B-type cyclins, we showed that identifying DDIs that are mediated by PTMs remains challenging. Thirdly, we demonstrated the utility of OracleTools for performing targeted investigations on PTMassociated relationships. We were able to predict kinases that are involved in phosphorylationdependent degradation by identifying proteins with serine/threonine phosphorylation sites within phosphodegron motifs. Finally, we illustrated the use of PTMOracle for exploring PTMmediated protein interactions in the human cell. By using PTMOracle, certain types of PTMs could be associated with protein function; specifically, PTMs on proteins were enriched for

35

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 36 of 63

signal transduction and immune system processes. Besides this, we showed that the majority of known interactions in the human interactome are not pY-dependent.

Future Applications Cytoscape allows the construction of interaction networks from different types of interaction data. This helps facilitate the visualisation of context-specific interaction networks. For example, visualisation of enzyme-substrate networks is of use to understand how PTM information in the cell is managed 72. Because of this, we anticipate that one future application of PTMOracle could involve the co-visualisation of multiple enzyme-substrate interactions on single proteins (e.g. kinase-substrate, methyltransferase-substrate, acetyltransferase-substrate) for the analysis of PTM regulation and cross-talk

15

. Visualising edges in different colours or line types will be

useful in identifying proteins that are modified by two or more enzymes. The PairFinder module in OracleTools could also be useful for identifying PTMs that compete for the same residue (e.g. acetylation and methylation on lysine residues), or PTMs known to influence other PTMs (e.g. phosphorylation and acetylation 73). Both Woodsmith et al. 74 and Beltrao et al. 75 suggested that PTMs within certain domains (“hot spots”) or disordered regions (“PTMi”) might represent functional entities that can modulate interaction specificity. Identifying multiple PTMs on proteins in the context of networks, could help highlight potential ‘interaction codes’ and decipher how PTMs, alone or in combinations, regulate PPIs 7. Therefore, by using PTMOracle to identify possible interaction codes, users can make predictions and testable hypotheses as to which molecular states facilitate the interactions between certain proteins. Proteins often interact with one another to form multi-subunit protein complexes

28

Interactions between different subunits of protein complexes can be regulated by PTMs

34

.

.

36

ACS Paragon Plus Environment

Page 37 of 63

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Because of this, an important application of PTMOracle will be the analysis of protein complexes and their PTMs. This could involve visualising networks with data from methods other than Y2H. For example, co-complex methods such as protein correlation profiling-MS (PCP-MS) that identify interactions between members of stable complexes

76

. PCP-MS

combined with chemical cross-linking can increase the coverage of difficult-to-detect interactions such as weak interactions or those involving membrane proteins 76. Besides this, the analysis of protein complexes will benefit from special network layouts

77

. This could involve

grouping protein nodes that belong to the same complex while simultaneously separating protein nodes that belong to different protein complexes 42,77,78. Visualising the network in this way will facilitate the identification of proteins or groups of proteins that belong to multiple complexes and therefore are shared

79,80

. Proteins that are shared between different complexes are likely to

be important regulators or co-regulators of complex formation and thus of function

79

. Both

OraclePainter and OracleResults could be useful for visualising PTM types on proteins that are shared between protein complexes, and identifying sites that are likely to be responsible for regulating how and when they interact with different protein complexes. Co-analysing PTMs alongside structural information such as DDIs, domain-motif interactions and surface accessibility will be important for identifying interactions that are likely to be important PTMmediated binding interfaces

7,60,81

. In the same way, integrating subcellular localisation data and

cellular abundances will be also useful for understanding how PTMs might regulate proteins that are shared between complexes in different subcellular locations 74,82,83. Alternative splicing of mRNAs, in higher eukaryotes including humans, can result in different protein isoforms 84,85. Consequently, different protein isoforms can have different PTM sites and/or interact with different partners 86. For this reason, we envision that PTMOracle will be

37

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 38 of 63

useful in the future for co-analysing protein isoforms and their PTMs to better understand how PTMs and alternative splicing might co-regulate PPIs of different protein isoforms. This could involve visualising nodes as a distinct protein isoform and edges as the interaction between two protein isoforms 87. Grouping together protein isoforms from the same gene will also be useful for identifying interactions that occur between some protein isoforms from the same gene, but not others 88. Yang et al. 88 suggested that different interactions shown by isoforms of one protein can underlie functional differences that are likely to be important in specific tissue types and/or diseases. Such differences could be due to the inclusion or exclusion of exons containing PTM sites, interacting motifs and/or domains 88. Therefore, using OraclePainter or OracleResults to compare PTM patterns and PPI networks from healthy and diseased cells should be of value for identifying biomarkers or potential drug targets 86. However, it must be kept in mind that localising PTM sites onto specific protein isoforms remains an ongoing challenge. Almost all isoform-specific sequence information is lost and difficult to recover with bottom-up proteomic approaches. Specifically, some protein isoforms will lack unique peptides from protease digest (e.g. trypsin), making it difficult to identify isoform-specific PTM sites 85. In the same way, topdown approaches face several challenges associated with protein solubility and separation as well as lower sensitivity and throughput, making it difficult to map PTMs on a large-scale 89. Nonetheless, the analysis of intact proteins using top-down approaches should ultimately be more useful for identifying PTM sites on specific isoforms 90.

Considerations and limitations The advantage of PTMOracle is that it can be used to co-visualise and co-analyse PTM data within PPI networks for any species. For this, however, PTMOracle requires both PTM and

38

ACS Paragon Plus Environment

Page 39 of 63

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

interaction data from the same species. Although PTM and interaction data can be obtained from public databases, the quality and coverage of these datasets will vary for different species. Together, this can have a major impact on the completeness and utility of PTM and interaction network co-visualisation with PTMOracle. High-throughput methods have been used to identify PTMs. Yet because of the limitations of these methods, current PTM data sets are not complete for any single organism. At the same time, the quality and coverage of PTM data is biased towards well-studied species. For example, high-throughput methods have been routinely used to identify PTMs in yeast and human and therefore are likely to be high quality and comprehensive 13,16. Although not to the same extent, high-throughput methods have also been used to identify PTMs in other model organisms including bacteria

91

, plant

92,93

and other eukaryotes

94,95

. However, for most non-model

organisms, high-throughput methods have not been used extensively and therefore PTM data will likely be poor quality or largely incomplete, or be non-existent altogether. Besides the limitations on PTM data across species, some PTM sites and types will be difficult to identify with high-throughput methods

17

. Alternative proteases will be useful for generating different

sets of modified peptides with mass spectrometry and therefore increase the coverage and quality of PTM sites 17. In the same way, the use of different approaches and enrichment techniques will be useful for identifying low abundance PTM sites or under-studied PTM types 17. Current interaction data sets are by no means complete due to experimental limitations and the combinatorial complexity associated with attaining proteome-scale coverage 28. Because of these challenges, the quality and coverage of interaction data will vary for different species. To date, large-scale identification of PPIs has largely focused on model organisms. Both binary and cocomplex methods have been used to identify PPIs in yeast 29,79, Escherichia coli 96,97, Drosophila

39

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

melanogaster

98,99

and human

30,100

Page 40 of 63

. By contrast, for most non-model organisms, the progress in

identifying PPIs has been non-existent or slow, suggesting that the quality and coverage of interaction data will be poor for some time. Apart from the limitations on interaction data across species, some interactions will be difficult to detect with conventional PPI detection methods, including those that are PTM-mediated 28. Complementary PPI detection strategies will be useful for covering a larger proportion of the interactome. In conclusion, we have developed PTMOracle, a Cytoscape app that facilitates the covisualisation and co-analysis of PTMs in the context of PPI networks. Through case studies of yeast S. cerevisiae and human, we demonstrated how PTMOracle can be used to integrate PTM and PPI data, and develop queries to explore PTM-associated relationships. We also demonstrated how PTMs, domains and motifs can be visualised to provide insights on where interactions take place as well as generate hypotheses on their functional roles in PPIs. This software thus forms an important link between PTM and PPI data, as well as a means to systematically investigate PTM-associated relationships to better understand how PTMs regulate PPIs.

ASSOCIATED CONTENT Supporting Information Available Supporting Information 1: Additional methods, results, figures and tables. Supporting Information 2: Zip compressed file containing XML-based files required for PTMOracle used in each case study example. Supporting Information 3: Zip compressed file containing Cytoscape sessions for each case study example. Supporting Information 4: GO enrichment for the 3 largest clusters in the yeast DDI subnetwork and human pY-dependent network. 40

ACS Paragon Plus Environment

Page 41 of 63

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

AUTHOR INFORMATION Corresponding Author *

Telephone: (+61 2) 9385 3633, Fax: (+61 2) 9385 1483, Email: [email protected]

Author Contribution A.P.T developed PTMOracle and generated all case studies with design input from C.N.I.P, D.L.W and M.R.W. A.P.T and M.R.W wrote and edited the manuscript with contributions from C.N.I.P and D.L.W. All authors have given approval to the final version of the manuscript.

Notes The author(s) declare no competing financial interest.

ACKNOWLEDGEMENTS A.P.T. acknowledges the support of an Australian Postgraduate Award and M.R.W. acknowledges support from the Australian Research Council. We thank Daniela-Lee Smith for testing and providing feedback on the PTMOracle.

ABBREVIATIONS

41

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 42 of 63

PTM: Post-translational modification; PPI: Protein-protein interaction; Y2H: Yeast two-hybrid; MS: Mass spectrometry; AP-MS: Affinity purification-mass spectrometry; XL-MS: Chemical crosslinking with mass spectrometry; pY: Phosphotyrosine; DDI: Domain-domain interaction; KSI: Kinase-substrate interaction; PCP-MS: Protein correlation profiling with mass spectrometry

REFERENCES (1)

Lothrop, A. P.; Torres, M. P.; Fuchs, S. M. Deciphering post-translational modification codes. FEBS Lett 2013, 587 (8), 1247–1257.

(2)

Karve, T. M.; Cheema, A. K. Small Changes Huge Impact: The Role of Protein Posttranslational Modifications in Cellular Homeostasis and Disease. J Amino Acids 2011, 2011, Article ID: 207691.

(3)

Prabakaran, S.; Lippens, G.; Steen, H.; Gunawardena, J. Post-translational modification: Nature’s escape from genetic imprisonment and the basis for dynamic information encoding. Wiley Interdiscip Rev Syst Biol Med 2012, 4, 565–583.

(4)

Nussinov, R.; Tsai, C.-J.; Xin, F.; Radivojac, P. Allosteric post-translational modification codes. Trends Biochem Sci 2012, 37 (10), 447–455.

(5)

Seet, B. T.; Dikic, I.; Zhou, M.-M.; Pawson, T. Reading protein modifications with interaction domains. Nat Rev Mol Cell Biol 2006, 7 (7), 473–483.

(6)

Bannister, A. J.; Kouzarides, T. Regulation of chromatin by histone modifications. Cell Res 2011, 21 (3), 381–395.

(7)

Winter, D. L.; Erce, M. A.; Wilkins, M. R. A Web of Possibilities: Network-Based 42

ACS Paragon Plus Environment

Page 43 of 63

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Discovery of Protein Interaction Codes. J Proteome Res 2014, 13 (12), 5333–5338. (8)

Calnan, D. R.; Brunet, A. The FoxO code. Oncogene 2008, 27 (16), 2276–2288.

(9)

Cloutier, P.; Coulombe, B. Regulation of molecular chaperones through post-translational modifications: decrypting the chaperone code. Biochim Biophys Acta 2013, 1829 (5), 443– 454.

(10)

Eick, D.; Geyer, M. The RNA polymerase II carboxy-terminal domain (CTD) code. Chem Rev 2013, 113 (11), 8456–8490.

(11)

Gu, B.; Zhu, W.-G. Surf the Post-translational Modification Network of p53 Regulation. Int J Biol Sci 2012, 8 (5), 672–684.

(12)

Munro, S.; Carr, S. M.; la Thangue, N. B. Diversity within the pRb pathway: is there a code of conduct? Oncogene 2012, 31 (40), 4343–4352.

(13)

Henriksen, P.; Wagner, S. A.; Weinert, B. T.; Sharma, S.; Bačinskaja, G.; Rehman, M.; Juffer, A. H.; Walther, T. C.; Lisby, M.; Choudhary, C. Proteome-wide analysis of lysine acetylation suggests its broad regulatory scope in Saccharomyces cerevisiae. Mol Cell Proteomics 2012, 1510–1522.

(14)

Weinert, B. T.; Iesmantavicius, V.; Moustafa, T.; Schölz, C.; Wagner, S. A.; Magnes, C.; Zechner, R.; Choudhary, C. Acetylation dynamics and stoichiometry in Saccharomyces cerevisiae. Mol Syst Biol 2014, 10, 716.

(15)

Parker, B. L.; Shepherd, N. E.; Trefely, S.; Hoffman, N. J.; White, M. Y.; EngholmKeller, K.; Hambly, B. D.; Larsen, M. R.; James, D. E.; Cordwell, S. J. Structural Basis for Phosphorylation and Lysine Acetylation Crosstalk in a Kinase Motif Associated with 43

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 44 of 63

Myocardial Ischemia and Cardioprotection. J Biol Chem 2014, 289 (37), 25890–25906. (16)

Sharma, K.; D’Souza, R. C. J.; Tyanova, S.; Schaab, C.; Wiśniewski, J. R.; Cox, J.; Mann, M. Ultradeep Human Phosphoproteome Reveals a Distinct Regulatory Nature of Tyr and Ser/Thr-Based Signaling. Cell Rep 2014, 8 (5), 1583–1594.

(17)

Olsen, J. V; Mann, M. Status of large-scale analysis of post-translational modifications by mass spectrometry. Mol Cell Proteomics 2013, 12 (12), 3444–3452.

(18)

Kim, M. S.; Zhong, J.; Pandey, A. Common errors in mass spectrometry-based analysis of post-translational modifications. Proteomics 2016, 16 (5), 700–714.

(19)

Minguez, P.; Letunic, I.; Parca, L.; Garcia-Alonso, L.; Dopazo, J.; Huerta-Cepas, J.; Bork, P. PTMcode v2: a resource for functional associations of post-translational modifications within and between proteins. Nucleic Acids Res 2014, 43 (Database issue), D494–D502.

(20)

Craveur, P.; Rebehmed, J.; de Brevern, A. G. PTM-SD: a database of structurally resolved and annotated posttranslational modifications in proteins. Database (Oxford) 2014, 2014, bau041.

(21)

Gnad, F.; Gunawardena, J.; Mann, M. PHOSIDA 2011: the posttranslational modification database. Nucleic Acids Res 2011, 39 (Database issue), D253–D260.

(22)

Hornbeck, P. V; Kornhauser, J. M.; Tkachev, S.; Zhang, B.; Skrzypek, E.; Murray, B.; Latham, V.; Sullivan, M. PhosphoSitePlus: a comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse. Nucleic Acids Res 2012, 40 (Database issue), D261-270.

(23)

Lu, C. T.; Huang, K. Y.; Su, M. G.; Lee, T. Y.; Bretaña, N. A.; Chang, W. C.; Chen, Y. J.; 44

ACS Paragon Plus Environment

Page 45 of 63

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Chen, Y. J.; Huang, H. Da. DbPTM 3.0: An informative resource for investigating substrate site specificity and functional association of protein post-translational modifications. Nucleic Acids Res 2013, 41 (Database issue), 295–305. (24)

Matlock, M. K.; Holehouse, A. S.; Naegle, K. M. ProteomeScout: a repository and analysis resource for post-translational modifications and proteins. Nucleic Acids Res 2014, 43 (Database issue), D521–D530.

(25)

The UniProt Consortium. UniProt: a hub for protein information. Nucleic Acids Res 2014, 43 (Database issue), D204-12.

(26)

Rao, V. S.; Srinivas, K.; Sujini, G. N.; Kumar, G. N. S. Protein-Protein Interaction Detection: Methods and Analysis. Int J Proteomics 2014, 2014 (147648), 1–12.

(27)

Fields, S.; Song, O. A novel genetic system to detect protein-protein interactions. Nature 1989, 340 (6230), 245–246.

(28)

Snider, J.; Kotlyar, M.; Saraon, P.; Yao, Z.; Jurisica, I.; Stagljar, I. Fundamentals of protein interaction network mapping. Mol Syst Biol 2015, 11 (12), 848–848.

(29)

Yu, H.; Braun, P.; Yildirim, M. A.; Lemmens, I.; Venkatesan, K.; Sahalie, J.; HirozaneKishikawa, T.; Gebreab, F.; Li, N.; Simonis, N.; et al. High-quality binary protein interaction map of the yeast interactome network. Science 2008, 322 (5898), 104–110.

(30)

Rolland, T.; Tas, M.; Sahni, N.; Yi, S.; Lemmens, I.; Fontanillo, C.; Mosca, R.; Kamburov, A.; Ghiassian, S. D.; Yang, X.; et al. A Proteome-Scale Map of the Human Interactome Network. Cell 2014, 159 (5), 1212–1226.

(31)

Chatr-Aryamontri, A.; Breitkreutz, B. J.; Oughtred, R.; Boucher, L.; Heinicke, S.; Chen, 45

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 46 of 63

D.; Stark, C.; Breitkreutz, A.; Kolas, N.; O’Donnell, L.; et al. The BioGRID interaction database: 2015 update. Nucleic Acids Res 2015, 43 (Database issue), D470–D478. (32)

Das, J.; Yu, H. HINT: High-quality protein interactomes and their applications in understanding human disease. BMC Syst Biol 2012, 6 (1), 92.

(33)

Szklarczyk, D.; Franceschini, A.; Wyder, S.; Forslund, K.; Heller, D.; Huerta-Cepas, J.; Simonovic, M.; Roth, A.; Santos, A.; Tsafou, K. P.; et al. STRING v10: Protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res 2015, 43 (Database issue), D447–D452.

(34)

Woodsmith, J.; Stelzl, U. Studying post-translational modifications with protein interaction networks. Curr Opin Struct Biol 2014, 24 (1), 34–44.

(35)

Osborne, M. A.; Dalton, S.; Kochan, J. P. The yeast tribrid system--genetic detection of trans-phosphorylated ITAM-SH2-interactions. Biotechnology (N Y) 1995, 13 (13), 1474– 1478.

(36)

Erce, M. A.; Low, J. K. K.; Hart-Smith, G.; Wilkins, M. R. A conditional two-hybrid (C2H) system for the detection of protein-protein interactions that are mediated by posttranslational modification. Proteomics 2013, 13 (7), 1059–1064.

(37)

Grossmann, A.; Benlasfer, N.; Birth, P.; Hegele, A.; Wachsmuth, F.; Apelt, L.; Stelzl, U. Phospho-tyrosine dependent protein-protein interaction network. Mol Syst Biol 2015, 11 (3), 794.

(38)

Duan, G.; Walther, D. The Roles of Post-translational Modifications in the Context of Protein Interaction Networks. PLoS Comput Biol 2015, 11 (2), 1–23. 46

ACS Paragon Plus Environment

Page 47 of 63

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

(39)

Shannon, P.; Markiel, A.; Ozier, O.; Baliga, N. S.; Wang, J. T.; Ramage, D.; Amin, N.; Schwikowski, B.; Ideker, T. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 2003, 13 (11), 2498–2504.

(40)

Ahmed, A.; Dwyer, T.; Forster, M.; Fu, X.; Ho, J.; Hong, S. H.; Koschützki, D.; Murray, C.; Nikolov, N. S.; Taib, R.; et al. GEOMI: GEOmetry for maximum insight. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); 2006; Vol. 3843 LNCS, pp 468–479.

(41)

Hu, Z.; Mellor, J.; Wu, J.; Yamada, T.; Holloway, D.; Delisi, C. VisANT: data-integrating visual framework for biological networks and modules. Nucleic Acids Res 2005, 33 (Web Server issue), W352–W357.

(42)

Ho, E.; Webber, R.; Wilkins, M. R. Interactive three-dimensional visualization and contextual analysis of protein interaction networks. J Proteome Res 2008, 7 (1), 104–112.

(43)

Narushima, Y.; Kozuka-Hata, H.; Tsumoto, K.; Inoue, J.; Oyama, M. Quantitative phosphoproteomics-based molecular network description for high-resolution kinasesubstrate interactome analysis. Bioinformatics 2016, 32 (14), 2083–2088.

(44)

Raaijmakers, L. M.; Giansanti, P.; Possik, P. A.; Mueller, J.; Peeper, D. S.; Heck, A. J. R.; Altelaar, A. F. M. PhosphoPath: Visualization of Phosphosite-centric Dynamics in Temporal Molecular Networks. J Proteome Res 2015, 14 (10), 4332–4341.

(45)

Pang, C. N. I.; Goel, A.; Li, S. S.; Wilkins, M. R. A Multidimensional Matrix for Systems Biology Research and Its Application to Interaction Networks. J Proteome Res 2012, 11 (11), 5204–5220.

47

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(46)

Page 48 of 63

Maglott, D.; Ostell, J.; Pruitt, K. D.; Tatusova, T. Entrez gene: Gene-centered information at NCBI. Nucleic Acids Res 2011, 39 (Database issue), D52–D57.

(47)

Kim, P. M.; Sboner, A.; Xia, Y.; Gerstein, M. B. The role of disorder in interaction networks: a structural analysis. Mol Syst Biol 2008, 4 (1), 179.

(48)

Finn, R. D.; Coggill, P.; Eberhardt, R. Y.; Eddy, S. R.; Mistry, J.; Mitchell, A. L.; Potter, S. C.; Punta, M.; Qureshi, M.; Sangrador-Vegas, A.; et al. The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res 2015, 44 (Database issue), D279–D285.

(49)

Sharifpoor, S.; Nguyen Ba, A. N.; Young, J.-Y.; van Dyk, D.; Friesen, H.; Douglas, A. C.; Kurat, C. F.; Chong, Y. T.; Founk, K.; Moses, A. M.; et al. A quantitative literaturecurated gold standard for kinase-substrate pairs. Genome Biol 2011, 12 (4), R39.

(50)

Tinti, M.; Kiemer, L.; Costa, S.; Miller, M. L.; Sacco, F.; Olsen, J. V; Carducci, M.; Paoluzi, S.; Langone, F.; Workman, C. T.; et al. The SH2 domain interaction landscape. Cell Rep 2013, 3 (4), 1293–1305.

(51)

Swaney, D. L.; Beltrao, P.; Starita, L.; Guo, A.; Rush, J.; Fields, S.; Krogan, N. J.; Villén, J. Global analysis of phosphorylation and ubiquitylation cross-talk in protein degradation. Nat Methods 2013, 10 (7), 676–682.

(52)

Granovskaia, M. V; Jensen, L. J.; Ritchie, M. E.; Toedling, J.; Ning, Y.; Bork, P.; Huber, W.; Steinmetz, L. M. High-resolution transcription atlas of the mitotic cell cycle in budding yeast. Genome Biol 2010, 11 (3), R24.

(53)

Beltrao, P.; Bork, P.; Krogan, N. J.; Van Noort, V. Evolution and functional cross-talk of 48

ACS Paragon Plus Environment

Page 49 of 63

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

protein post-translational modifications. Mol Syst Biol 2013, 9 (1), 714. (54)

Akiva, E.; Friedlander, G.; Itzhaki, Z.; Margalit, H. A Dynamic View of Domain-Motif Interactions. PLoS Comput Biol 2012, 8 (1), e1002341.

(55)

Sevilla, A.; Binda, O. Post-translational modifications of the histone variant h2az. Stem Cell Res 2014, 12 (1), 289–295.

(56)

Harshman, S. W.; Young, N. L.; Parthun, M. R.; Freitas, M. A. H1 histones: Current perspectives and challenges. Nucleic Acids Res 2013, 41 (21), 9593–9609.

(57)

Bowman, G. D.; Poirier, M. G. Post-Translational Modifications of Histones That Influence Nucleosome Dynamics. Chem Rev 2015, 115, 2274–2295.

(58)

Perner, J.; Lasserre, J.; Kinkley, S.; Vingron, M.; Chung, H. R. Inference of interactions between Chromatin modifiers and histone modifications: From ChIP-Seq data to chromatin-signaling. Nucleic Acids Res 2014, 42 (22), 13689–13695.

(59)

Kim, P. M.; Lu, L. J.; Xia, Y.; Gerstein, M. B. Relating Three-Dimensional Structures to Protein Networks Provides Evolutionary Insights. Science 2006, 314 (5807), 1938–1941.

(60)

Pang, C. N. I.; Krycer, J. R.; Lek, A.; Wilkins, M. R. Are protein complexes made of cores, modules and attachments? Proteomics 2008, 8 (3), 425–434.

(61)

Mendenhall, M. D.; Hodge, a E. Regulation of Cdc28 cyclin-dependent protein kinase activity during the cell cycle of the yeast Saccharomyces cerevisiae. Microbiol Mol Biol Rev 1998, 62 (4), 1191–1243.

(62)

Holt, L. J. Regulatory modules: Coupling protein stability to phopshoregulation during

49

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 50 of 63

cell division. FEBS Lett 2012, 586 (17), 2773–2777. (63)

De Lichtenberg, U.; Jensen, T. S.; Brunak, S.; Bork, P.; Jensen, L. J. Evolution of cell cycle control: Same molecular machines, different regulation. Cell Cycle 2007, 6 (15), 1819–1825.

(64)

Kõivomägi, M.; Valk, E.; Venta, R.; Iofik, A.; Lepiku, M.; Balog, E. R. M.; Rubin, S. M.; Morgan, D. O.; Loog, M. Cascades of multisite phosphorylation control Sic1 destruction at the onset of S phase. Nature 2011, 480 (7375), 128–131.

(65)

Petroski, M. D.; Deshaies, R. J. Context of multiubiquitin chain attachment influences the rate of Sic1 degradation. Mol Cell 2003, 11 (6), 1435–1444.

(66)

Kaneko, T.; Joshi, R.; Feller, S. M.; Li, S. S. Phosphotyrosine recognition domains: the typical, the atypical and the versatile. Cell Commun Signal 2012, 10 (1), 32.

(67)

Yaffe, M. B. Phosphotyrosine-Binding Domains in Signal Transduction. Nat Rev Mol Cell Biol 2002, 3 (3), 177–186.

(68)

Zhang, X. Roles of glycans and glycopeptides in immune system and immune-related diseases. Curr Med Chem 2006, 13 (10), 1141–1147.

(69)

Arey, B. J. The Role of Glycosylation in Receptor Signaling. Glycosylation 2012, 273– 286.

(70)

Sawyer, T. K.; Shakespeare, W. C.; Wang, Y.; Sundaramoorthi, R.; Huang, W. S.; Metcalf 3rd, C. A.; Thomas, M.; Lawrence, B. M.; Rozamus, L.; Noehre, J.; et al. Protein phosphorylation and signal transduction modulation: chemistry perspectives for smallmolecule drug discovery. Med.Chem. 2005, 1 (3), 293–319. 50

ACS Paragon Plus Environment

Page 51 of 63

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

(71)

Herhaus, L.; Dikic, I. Expanding the ubiquitin code through post-translational modification. EMBO Rep 2015, 16 (9), 1071–1083.

(72)

Mok, J.; Kim, P. M.; Lam, H. Y. K.; Piccirillo, S.; Zhou, X.; Jeschke, G. R.; Sheridan, D. L.; Parker, S. a; Desai, V.; Jwa, M.; et al. Deciphering protein kinase specificity through large-scale analysis of yeast phosphorylation site motifs. Sci Signal 2010, 3 (109), ra12.

(73)

Venne, A. S.; Kollipara, L.; Zahedi, R. P. The next level of complexity: Crosstalk of posttranslational modifications. Proteomics 2014, 14 (4–5), 513–524.

(74)

Woodsmith, J.; Kamburov, A.; Stelzl, U. Dual Coordination of Post Translational Modifications in Human Protein Networks. PLoS Comput Biol 2013, 9 (3), 1–15.

(75)

Beltrao, P.; Albanèse, V.; Kenner, L. R.; Swaney, D. L.; Burlingame, A.; Villén, J.; Lim, W. A.; Fraser, J. S.; Frydman, J.; Krogan, N. J. Systematic functional prioritization of protein posttranslational modifications. Cell 2012, 150 (2), 413–425.

(76)

Larance, M.; Kirkwood, K. J.; Tinti, M.; Murillo, A. B.; Ferguson, M. A. J.; Lamond, A. I. Global Membrane Protein Interactome Analysis using In vivo Crosslinking and MS-based Protein Correlation Profiling. Mol Cell Proteomics 2016.

(77)

Li, S. S.; Xu, K.; Wilkins, M. R. Visualization and analysis of the complexome network of Saccharomyces cerevisiae. J Proteome Res 2011, 10 (10), 4744–4756.

(78)

Baryshnikova, A. Systematic Functional Annotation and Visualization of Biological Networks. Cell Syst 2016, 2 (6), 412–421.

(79)

Gavin, A.-C.; Aloy, P.; Grandi, P.; Krause, R.; Boesche, M.; Marzioch, M.; Rau, C.; Jensen, L. J.; Bastuck, S.; Dümpelfeld, B.; et al. Proteome survey reveals modularity of 51

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 52 of 63

the yeast cell machinery. Nature 2006, 440 (7084), 631–636. (80)

Kristensen, A. R.; Gsponer, J.; Foster, L. J. A high-throughput approach for measuring temporal changes in the interactome. Nat Methods 2012, 9 (9), 907–909.

(81)

Pang, C. N. I.; Hayen, A.; Wilkins, M. R. Surface accessibility of protein posttranslational modifications. J Proteome Res 2007, 6 (5), 1833–1845.

(82)

Chavez, J. D.; Schweppe, D. K.; Eng, J. K.; Zheng, C.; Taipale, A.; Zhang, Y.; Takara, K.; Bruce, J. E. Quantitative interactome analysis reveals a chemoresistant edgotype. Nat Commun 2015, 6, 7928.

(83)

Hein, M. Y.; Hubner, N. C.; Poser, I.; Cox, J.; Nagaraj, N.; Toyoda, Y.; Gak, I. A.; Weisswange, I.; Mansfeld, J.; Buchholz, F.; et al. A Human Interactome in Three Quantitative Dimensions Organized by Stoichiometries and Abundances. Cell 2015, 163 (3), 712–723.

(84)

Wang, E. T.; Sandberg, R.; Luo, S.; Khrebtukova, I.; Zhang, L.; Mayr, C.; Kingsmore, S. F.; Schroth, G. P.; Burge, C. B. Alternative isoform regulation in human tissue transcriptomes. Nature 2008, 456 (7221), 470–476.

(85)

Tay, A. P.; Pang, C. N. I.; Twine, N. A.; Hart-Smith, G.; Harkness, L.; Kassem, M.; Wilkins, M. R. Proteomic Validation of Transcript Isoforms, Including Those Assembled from RNA-Seq Data. J Proteome Res 2015, 14 (9), 3541–3554.

(86)

Huang, Q.; Chang, J.; Cheung, M. K.; Nong, W.; Li, L.; Lee, M. T.; Kwan, H. S. Human proteins with target sites of multiple post-translational modification types are more prone to be involved in disease. J Proteome Res 2014, 13 (6), 2735–2748. 52

ACS Paragon Plus Environment

Page 53 of 63

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

(87)

Davis, M. J.; Shin, C. J.; Jing, N.; Ragan, M. A. Rewiring the dynamic interactome. Mol Biosyst 2012, 8 (8), 2054.

(88)

Yang, X.; Coulombe-Huntington, J.; Kang, S.; Sheynkman, G. M.; Hao, T.; Richardson, A.; Sun, S.; Yang, F.; Shen, Y. A.; Murray, R. R.; et al. Widespread Expansion of Protein Interaction Capabilities by Alternative Splicing. Cell 2016, 164 (4), 805–817.

(89)

Stastna, M.; Van Eyk, J. E. Analysis of protein isoforms: Can we do it better? Proteomics 2012, 12 (19–20), 2937–2948.

(90)

Tran, J. C.; Zamdborg, L.; Ahlf, D. R.; Lee, J. E.; Catherman, A. D.; Durbin, K. R.; Tipton, J. D.; Vellaichamy, A.; Kellie, J. F.; Li, M.; et al. Mapping intact protein isoforms in discovery mode using top-down proteomics. Nature 2011, 480 (7376), 254–258.

(91)

Cain, J. A.; Solis, N.; Cordwell, S. J. Beyond gene expression: The impact of protein posttranslational modifications in bacteria. J Proteomics 2014, 97, 265–286.

(92)

Wu, X.; Oh, M.-H.; Schwarz, E. M.; Larue, C. T.; Sivaguru, M.; Imai, B. S.; Yau, P. M.; Ort, D. R.; Huber, S. C. Lysine acetylation is a widespread protein modification for diverse proteins in Arabidopsis. Plant Physiol 2011, 155 (4), 1769–1778.

(93)

Ye, J.; Zhang, Z.; You, C.; Zhang, X.; Lu, J.; Ma, H. Abundant protein phosphorylation potentially regulates Arabidopsis anther development. J Exp Bot 2016, 67 (17), 4993– 5008.

(94)

Mawuenyega, K. G.; Kaji, H.; Yamuchi, Y.; Shinkawa, T.; Saito, H.; Taoka, M.; Takahashi, N.; Isobe, T. Large-scale identification of Caenorhabditis elegans proteins by multidimensional liquid chromatography-tandem mass spectrometry. J Proteome Res 53

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 54 of 63

2003, 2 (1), 23–35. (95)

Zhai, B.; Villén, J.; Beausoleil, S. a; Mintseris, J.; Gygi, S. P. Phosphoproteome Analysis of Drosophila melanogaster Embryos research articles. J Proteome Res 2008, 7, 1675– 1682.

(96)

Rajagopala, S. V; Sikorski, P.; Kumar, A.; Mosca, R.; Vlasblom, J.; Arnold, R.; FrancaKoh, J.; Pakala, S. B.; Phanse, S.; Ceol, A.; et al. The binary protein-protein interaction landscape of Escherichia coli. Nat Biotechnol 2014, 32 (3), 285–290.

(97)

Hu, P.; Janga, S. C.; Babu, M.; J. Javier, D.-M.; Butland, G.; Yang, W.; Pogoutse, O.; Guo, X.; Phanse, S.; Wong, P.; et al. Global functional atlas of Escherichia coli encompassing previously uncharacterized proteins. PLoS Biol 2009, 7 (4), 0929–0947.

(98)

Giot, L.; Bader, J. S.; Brouwer, C.; Chaudhuri, A.; Kuang, B.; Li, Y.; Hao, Y. L.; Ooi, C. E.; Godwin, B.; Vitols, E.; et al. A Protein Interaction Map of Drosophila melanogaster. Science 2003, 302 (December), 1727–1737.

(99)

Guruharsha, K. G.; Rual, J.-F.; Zhai, B.; Mintseris, J.; Vaidya, P.; Vaidya, N.; Beekman, C.; Wong, C.; Rhee, D. Y.; Cenaj, O.; et al. A protein complex network of Drosophila melanogaster. Cell 2011, 147 (3), 690–703.

(100) Havugimana, P. C.; Hart, G. T.; Nepusz, T.; Yang, H.; Turinsky, A. L.; Li, Z.; Wang, P. I.; Boutz, D. R.; Fong, V.; Phanse, S.; et al. A census of human soluble protein complexes. Cell 2012, 150 (5), 1068–1081.

54

ACS Paragon Plus Environment

Page 55 of 63

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

For TOC only:

55

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 56 of 63

Figure 1: Overview of Cytoscape and the PTMOracle framework. PPIs and network annotations are processed by the Cytoscape framework to create PPI networks. By using PTMOracle, protein data can also be imported into Cytoscape. This is achieved by formatting PTM sites and sequence annotations into an XML-based or tab-separated (TSV) file format that PTMOracle can parse and map onto protein nodes in the Cytoscape network. 227x120mm (96 x 96 DPI)

ACS Paragon Plus Environment

Page 57 of 63

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure 2: The PTMOracle app running within Cytoscape. PTMOracle introduces three new components into the Cytoscape interface (highlighted in yellow boxes). These components are the OraclePainter, OracleTools and OracleResults. OraclePainter (highlighted in green box) can visualise PTMs on protein nodes as pie charts in the network view (highlighted in purple box). OracleTools (highlighted in red box) contains several analytical modules (Calculator, PairFinder, RegionFinder and MotifFinder) that output results into newly created columns in the Cytoscape Table Panel (highlighted in brown box). These results can be used with standard Cytoscape operations to identify proteins of interest. OracleResults (highlighted in blue box) can be used to examine, compare and co-locate PTM sites and sequence annotations such as domains, motifs and disordered regions on protein nodes. 522x338mm (96 x 96 DPI)

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 3: Major features of PTMOracle. A) OraclePainter is used to render protein nodes in the PPI network as pie charts. Each sector in the pie chart represents a specific PTM type and its frequency as a proportion of the total number of PTMs on protein nodes. The colours in each section of the pie chart are user-assigned in the colour palette, located in the Cytoscape Control Panel (highlighted in red box). B) Located in the Cytoscape Results Panel, OracleResults is used to visualise protein data associated with a group of protein nodes. The OracleResults output shows a table containing information on PTM sites and sequence annotations such as domains, motifs and disordered regions (highlighted in blue box), a text area for the protein sequence (highlighted in green box) and a legend for PTM sites and sequence annotations mapped onto the sequence (highlighted in purple box). PTM sites and sequence annotations are also highlighted in the protein sequence if applicable. C) Different modules (e.g. Calculator, PairFinder, RegionFinder and Motif Finder) in OracleTools are used to find proteins of interest (highlighted in yellow). Specialised subnetworks containing proteins of interest and their interacting partners can also be created and explored for more targeted investigations. 358x308mm (96 x 96 DPI)

ACS Paragon Plus Environment

Page 58 of 63

Page 59 of 63

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure 4: Visualisation of yeast histone proteins, their interactions and PTMs with OraclePainter, using the PPI network from Pang et al. The subnetwork contains 10 histone proteins; each core histone protein (HTA1, HTA2, HTB1, HTB2, HHT1, HHT2, HHF1 and HHF2), 1 linker protein (HHO1) and 1 histone variant protein (HTZ1). PPI interactions between histone proteins are highlighted in grey and histone proteins are coloured with OraclePainter. The types of PTMs and the proportion of each type are illustrated by the pie charts on each histone protein. The total number of PTM sites on each histone protein is represented by the node size. Across all histone proteins, 5 PTM types are shown including phosphorylation (red), acetylation (blue), methylation (black), ubiquitination (green) and sumoylation (purple). 255x190mm (96 x 96 DPI)

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 5: A search for PTM-mediated interactions in a domain-domain interaction (DDI) network. A) Visualisation of the 3 largest protein clusters in the yeast domain-domain interaction subnetwork, highlighting DDIs that might involve PTMs. Protein nodes with at least 1 PTM within domain sequence annotations, as identified by RegionFinder, are highlighted in green, whereas all other protein nodes are highlighted in purple. DDIs that might involve PTMs are also represented as orange edges whereas other DDIs are grey. Note however, that our analysis cannot reveal whether PTMs are actually required on one, both or neither proteins for an interaction to occur. B) OracleResults tables showing the sequence position of PTM sites and domains mapped onto i) CDC28 and ii) CLB1. The sequence position of PTM sites for CDC28 and recognition domains for CLB1 that might facilitate interactions are both indicated with red arrows. PTM sites and sequence annotations are also co-located onto the protein sequence (highlighted in pink and blue respectively). 279x308mm (96 x 96 DPI)

ACS Paragon Plus Environment

Page 60 of 63

Page 61 of 63

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure 6: Prediction of kinases involved in the phosphodegron. A) Yeast kinase-substrate interaction subnetwork highlighting protein nodes with phosphorylation sites within phosphodegron motifs. Protein nodes with at least 1 phosphorylation site on [ST] residue/s within phosphodegron motifs, as identified by RegionFinder, are highlighted in green. All other protein nodes are highlighted in blue. Protein nodes that are periodically expressed are shown with a red border. All interactions in the network are shown as directed edges, with kinases as the source and non-kinases as the target. B) OracleResults table showing the sequence position of phosphodegron motifs and PTM sites mapped onto the SIC1 protein (highlighted in the brown box in (A)). Calculated phosphodegron motifs as well as serine/threonine phosphorylation sites that have been previously reported are indicated with red and pink arrows respectively, whereas ubiquitination sites that are known to specify the degradation of SIC1 are indicated with blue arrows. All phosphodegron motifs and PTM sites on SIC1 are also highlighted in the protein sequence (highlighted in pink and red respectively). 366x238mm (96 x 96 DPI)

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 7: Human protein-protein interactions and their overlap with phosphotyrosine (pY)-dependent interactions. A) Venn diagram showing the number of interactions that were identified in each network dataset. Overall, there is minimal overlap between the networks, suggesting that pY-dependent interactions are sparsely mapped. The numbers in brackets, below each network name, represents the total number of interactions in the network. B) Venn diagram showing the number of interacting pY residues identified in each dataset. The majority of pY sites present in the integrated set of PTMs were not identified in Grossmann-pY or Tinti-pY, suggesting that these could be potential sites to screen. The numbers in brackets, below each dataset name, represent the total number of interacting pY residues identified in the dataset. 371x309mm (96 x 96 DPI)

ACS Paragon Plus Environment

Page 62 of 63

Page 63 of 63

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure 8: Phosphotyrosine (pY)-dependent human PPI subnetwork visualised with OraclePainter, with data from Rolland et al. Interactions in the network have been reported to involve the recognition of pY by SH2 or PTB domains by Tinti et al. or Grossmann et al. Overall, visualisation of the network revealed one large interconnected cluster and several small clusters, with the number of nodes in each cluster ranging between 69 and 2. Visualising the network with OraclePainter revealed that most proteins carry phosphorylation sites and other types of PTMs, including N-linked glycosylation, ubiquitination and acetylation. 368x225mm (96 x 96 DPI)

ACS Paragon Plus Environment