PTMOracle: A Cytoscape App for Covisualizing ... - ACS Publications

Mar 28, 2017 - Kinase Interaction Database (KID, March 14, 2016 release) described by ..... another reason for missing some known PTM types is that on...
0 downloads 0 Views 4MB Size
Article pubs.acs.org/jpr

PTMOracle: A Cytoscape App for Covisualizing and Coanalyzing Post-Translational Modifications in Protein Interaction Networks Aidan P. Tay, Chi Nam Ignatius Pang, Daniel L. Winter, and Marc R. Wilkins* Systems Biology Initiative, School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Sydney, New South Wales 2052, Australia

Downloaded via DURHAM UNIV on August 30, 2018 at 14:12:41 (UTC). See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.

S Supporting Information *

ABSTRACT: Post-translational modifications of proteins (PTMs) act as key regulators of protein activity and of protein−protein interactions (PPIs). To date, it has been difficult to comprehensively explore functional links between PTMs and PPIs. To address this, we developed PTMOracle, a Cytoscape app for coanalyzing PTMs within PPI networks. PTMOracle also allows extensive data to be integrated and coanalyzed with PPI networks, allowing the role of domains, motifs, and disordered regions to be considered. For proteins of interest, or a whole proteome, PTMOracle can generate network visualizations to reveal complex PTM-associated relationships. This is assisted by OraclePainter for coloring proteins by modifications, OracleTools for network analytics, and OracleResults for exploring tabulated findings. To illustrate the use of PTMOracle, we investigate PTM-associated relationships and their role in PPIs in four case studies. In the yeast interactome and its rich set of PTMs, we construct and explore histone-associated and domain−domain interaction networks and show how integrative approaches can predict kinases involved in phosphodegrons. In the human interactome, a phosphotyrosine-associated network is analyzed but highlights the sparse nature of human PPI networks and lack of PTMassociated data. PTMOracle is open source and available at the Cytoscape app store: http://apps.cytoscape.org/apps/ptmoracle. KEYWORDS: post-translational modification, protein−protein interaction, data integration, networks, visualization, Cytoscape



INTRODUCTION The majority of eukaryotic proteins are modified by at least one post-translational modification (PTM).1 These modifications are mediated by the action of modifying enzymes, such as kinases, acetyltransferases, and methyltransferases. Some modifications are also reversible and dynamically removed by demodifying enzymes such as phosphatases, deacetylases, and demethylases. PTMs can occur on multiple sites of the same protein. Yet with more than 400 types of PTMs,1 some sites are known to be modified by different types of PTMs. In conjunction with alternative splicing, PTMs serve to diversify the function of proteins by acting as key regulators of protein activity in response to changing conditions.2 For this reason, many proteins potentially exist as different modification forms (modforms), each with specific properties that can affect how they function.3 Protein−protein interactions (PPIs) can be regulated by PTMs. Single PTMs can do this through allosteric change or by forming motifs and binding interfaces that are recognized by domains.4,5 PTMs can also act in concert with each other to modulate protein interactions. This is exemplified on histone proteins whose “tails” are highly modified with different PTM types to regulate the interactions with their binding partners.6 A general hypothesis of “protein interaction codes”,7 where PTMs modulate interaction specificity, has been described for a small number of other proteins.8−12 An emerging challenge is © 2017 American Chemical Society

therefore to decipher how PTMs, alone or in combination regulate protein−protein interactions.1 Knowledge of how PPIs are regulated will inform our understanding of cellular processes. Understanding of the regulatory role of PTMs requires the identification of PTMs. Thousands of PTM sites can now be routinely identified in large-scale tandem mass spectrometry (MS/MS) experiments.13−16 For unambiguous identification of PTM sites, detection of modified peptides that are unique to the protein as well as localization of the modified residue is required.17,18 However, proteome-wide identification of PTM sites by MS remains challenging because PTMs can be labile, thereby affecting peptide ionization and fragmentation.17,18 Modified peptides can also be present in low abundance and therefore difficult to detect.17,18 Despite these challenges, PTM sites identified by high-throughput methods are available in various online repositories.19−25 The quality and coverage of PTM sites discovered by high-throughput methods could be improved by consolidating PTM sites identified by different research groups or curated by different repositories.17 High-confidence identification of PPIs is essential for understanding the role of PPIs in cellular processes. Interactions between proteins are typically detected using two Received: December 19, 2016 Published: March 28, 2017 1988

DOI: 10.1021/acs.jproteome.6b01052 J. Proteome Res. 2017, 16, 1988−2003

Article

Journal of Proteome Research main experimental methods, namely, binary and cocomplex.26 Binary methods such as yeast two-hybrid (Y2H)27 measure direct interactions between protein pairs, whereas cocomplex methods such as affinity purification-mass spectrometry (APMS) measure interactions within protein complexes but lack information on direct physical binding.26 However, regardless of which method is used to detect interactions, each approach will have different strengths and weaknesses, making it difficult to attain proteome-scale coverage of PPIs.28 For example, Yu et al.29 showed that different large-scale screens in Saccharomyces cerevisiae, including Y2H and AP-MS, have little overlap due to differences in completeness of the search space and assay sensitivity. In the same way, Rolland et al.30 highlighted that large zones of the human interactome remain uncharted due to the fact that there is considerable bias in the literature toward proteins associated with disease. The combinatorial complexity of the human interactome as well as the fact that weak or transient interactions are difficult to detect also contribute to the challenges of attaining proteome-scale coverage of PPIs.28,30 The above suggests that proteome-wide coverage of PPIs can be improved by combining PPIs detected with different approaches, including those that are curated in various online repositories.31−33 Systematic investigations that link the functional role of PTMs to PPIs remain challenging.34 It requires integrating high-quality data on PTMs and PPIs and coanalyzing them to gain insights into how PTMs might regulate PPIs. Several methods have been developed to directly identify PTMs that modulate PPIs, including the yeast tribrid or conditional twohybrid system.35,36 More recently, Grossmann et al.37 reported an approach to identify binary PTM-mediated interactions. They extended the traditional Y2H system to detect interactions between bait proteins containing phosphotyrosine (pY)-recognition domains and phosphorylated prey proteins. By using their approach, Grossmann et al.37 identified 292 mostly novel pY-dependent interactions in humans with a single experiment. However, because of the lack of experimental methods to map PTM-mediated PPIs in a highthroughput setup, recent studies have focused on connecting PTM and PPI data sets through network analysis. This is especially useful for providing insights into global PTMassociated relationships. For example, Duan et al.38 investigated whether different types of PTMs are characterized by different network properties. From their analysis of 12 different PTM types, Duan et al.38 found that proteins with PTMs engage in more interactions and are involved in regulatory functions such as relaying information compared to proteins without PTMs. A number of software platforms are available for visualizing and analyzing PPI networks.39−41 In the same way, existing tools such as the adaptation of GEOMI by Ho et al.,42 PTMapper,43 and PhosphoPath44 allow users to covisualize and coanalyze some PTM data within PPI networks. Despite this, however, there is a lack of tools for exploring the regulatory role of PTMs and how they might be involved in PPIs. Such tools can improve our understanding of PTM-mediated interactions or PTM-associated relationships.28 In this study, we developed PTMOracle, a new Cytoscape app that facilitates the covisualization and coanalysis of PTMs in the context of PPI networks. This allows users to develop systematic searches for proteins of interest and explore network visualizations that address complex PTM-associated relationships. Furthermore, PTMOracle allows users to integrate other protein data into PPI networks, allowing the role of domains, motifs, and

disordered regions to be considered. This allows users to coanalyze relationships between PTMs, PPIs, and other sequence annotations to generate testable hypotheses regarding the functional role of PTMs in PPIs. To illustrate how PTMOracle can be used to explore protein PTMs and their involvement in PPIs, we present case studies using the interactomes of yeast S. cerevisiae and human.



MATERIALS AND METHODS

Construction of PTMOracle

PTMOracle (v1.00) is a Cytoscape app designed to facilitate the covisualization of PTMs in the context of PPI networks. Apart from PTMs, PTMOracle also allows users to integrate other types of protein data into PPI networks, such as the protein sequence and sequence annotations including domains, motifs, and disordered regions. To do this, all protein data must be formatted into either an XML-based or tab-separated (TSV) file. The format of the XML-based or TSV file required by PTMOracle is available in Supporting Information 1 (Method S1 and Figure S1). Protein data that was formatted into the required format were parsed and mapped onto protein nodes in an existing Cytoscape network and summarized in new columns of the Cytoscape Table Panel by PTMOracle. In conjunction with the Cytoscape framework, protein data were then visualized, queried, and explored with different features of PTMOracle. The features of PTMOracle are described in more detail in the Results. PTMOracle requires Cytoscape v3.1 or higher and is available for download at the Cytoscape app store: http://apps.cytoscape.org/apps/ptmoracle. The source code for PTMOracle is under the GPL v3 license and is available for download via the BitBucket repository https://bitbucket.org/ aidantay/ptmoracle/src. Yeast Interaction Network Data and Visualization

The yeast PPI network described by Pang et al.45 was loaded into Cytoscape and used to covisualize protein PTM sites and sequence annotations. The network (referred to as SBI) was constructed using PPIs that were verified by two or more experiments. This network comprised of 3,843 protein nodes and 13,292 nonredundant interactions. Protein identifiers and descriptions from Uniprot25 were also mapped onto the network as node attributes using the Cytoscape framework. Human Interaction Network Data and Visualization

Two human PPI networks derived from Rolland et al.30 were used in this study. The first network (referred to as Lit-BM-13) was constructed from literature-curated binary pairs supported by multiple lines of evidence and contained 5,545 protein nodes and 11,045 interactions. The second network (referred to as HI-II-14) was constructed using pairwise two-hybrid experiments and validated with orthogonal interaction assays to systematically map high-quality binary PPIs. This network was comprised of 4,303 protein nodes and 13,944 interactions. Using the Cytoscape framework, protein identifiers from Uniprot25 and gene descriptions from NCBI Entrez Gene46 were also mapped onto each network as node attributes. Data Sources

PTM Data. PTM sites on yeast and human proteins were downloaded from Uniprot,25 dbPTM,23 and ProteomeScout.24 To increase proteome-wide coverage, PTM sites in each repository were consolidated into a single set using Uniprot as a reference. This was done by using custom scripts to map protein IDs and PTM sites from dbPTM and ProteomeScout 1989

DOI: 10.1021/acs.jproteome.6b01052 J. Proteome Res. 2017, 16, 1988−2003

Article

Journal of Proteome Research

from the literature. This included SH2-pY-dependent interactions between nonhuman proteins, which were subsequently removed with custom scripts. Custom scripts were also used to parse and format interactions for use in Cytoscape. Overall, a total of 292 pY-dependent interactions from Grossmann et al. and 264 SH2-pY-dependent interactions from Tinti et al. were mapped onto human PPI networks as edge attributes. Phosphodegron Motif Data. To predict kinases that are involved in phosphorylation-dependent ubiquitination and degradation, we searched the yeast PPI network for proteins with phosphodegron motifs. This was done using the MotifFinder module in OracleTools (see Results) to calculate the location of two known phosphodegron patterns ([LIVMP]X{0,2}[ST]PXXE and [LIVMP]X{0,2}[ST]PXX[ST]) described by Swaney et al.51 For each sequence pattern, X matches any amino acid, square brackets match any amino acid inside the brackets, and curly brackets describe the number of times the preceding amino acid can be repeated. XML-based files containing the location of phosphodegron motifs on yeast proteins identified by MotifFinder are provided in Supporting Information 2. Peak Gene Expression Data. To explore how phosphorylation-dependent ubiquitination and degradation regulate cell cycle progression, we obtained peak gene expression data from Granovskaia et al.52 Granovskaia et al. measured the expression of RNAs at 5 min intervals for up to 3 cell division cycles using microarrays. This includes protein-coding transcripts as well as noncoding transcripts from antisense or intergenic regions. However, only peak gene expression data for protein-coding transcripts were considered in our analyses. In total, peak gene expression for 587 protein-coding transcripts were mapped onto the yeast PPI network as node attributes.

onto Uniprot accessions numbers and protein sequences, respectively. All PTM sites reported in each repository considered position 1 as the initiating methionine. PTM sites from dbPTM or ProteomeScout, which were mapped onto the Uniprot protein sequence but reported a different residue compared to the Uniprot protein sequence, were removed from the final data set. Following this, scripts were used to tally eight main PTM types. Of these were five commonly occurring intracellular PTMs−acetylation, methylation, phosphorylation, ubiquitination, and sumoylation−and three PTMs that predominantly occur on membrane-associated or secreted proteins−lipidation and N- and O-linked glycosylation.2 We note that several subtypes exist for different PTM types, which may have different functions, properties, modifying enzymes, and substrates (e.g., phosphotyrosine and phosphoserine). However, in the case of lipidation, we only considered myristoylation, palmitoylation, farnesylation, and geranylgeranylation. We also note that O-linked glycosylation can exist as O-GlcNAc and occur intracellularly to regulate signal transduction in conjunction with phosphoserines and phosphothreonines.2 Together, this generated a total of 18,198 unique PTM sites on 3,356 yeast proteins and 321,584 unique PTM sites on 19,342 human proteins. Custom scripts were also used to parse and format all PTM sites into the XML-based format required by PTMOracle. The XML-based files containing PTM sites for yeast and human proteins are provided in Supporting Information 2. Domain and Domain−Domain Interaction Data. Yeast domain−domain interactions (DDIs) were obtained from Kim et al.47 Briefly, Kim et al. mapped DDIs from Pfam onto solved crystal structures to identify distinct binding interfaces. To identify the exact domain positions of interacting domains, we also downloaded domain annotations from Pfam48 and mapped them to each DDI using custom scripts. DDIs that involve at least one domain with an e-value of greater than 0.01 or could not be mapped to domain annotations from Pfam were removed from the final data set. Following this, custom scripts were used to parse and format DDIs and domain annotations for use in Cytoscape and PTMOracle, respectively. Overall, a total of 1,445 nonredundant DDIs between corresponding Pfam domains on yeast proteins were mapped onto yeast PPI networks as edge attributes. The XML-based file containing exact domain positions for yeast proteins is provided in Supporting Information 2. Kinase−Substrate Interaction Data. Yeast kinase− substrate interactions (KSIs) were obtained from the Yeast Kinase Interaction Database (KID, March 14, 2016 release) described by Sharifpoor et al.49 The KID database contains data from both high- and low-throughput experiments associated with phosphorylation events. Custom scripts were then used to remove interactions with a KID score of less than 6.73. For high-quality interactions, Sharifpoor et al. recommended a threshold cutoff of 6.73, which corresponds to a p-value of less than 0.01.49 Custom scripts were then used to parse and format interactions for use in Cytoscape. A total of 582 nonredundant KSIs were mapped onto yeast PPI networks as edge attributes. Phosphotyrosine-Dependent Interaction Data. Human phosphotyrosine (pY)-dependent interactions were obtained from two large-scale studies. Grossmann et al.37 introduced a third plasmid expressing human kinases into the Y2H system to detect pY-dependent interactions involving SH2 or PTB domains. Alternatively, Tinti et al.50 employed a text-mining approach and manually curated SH2-pY-dependent interactions



RESULTS

Overview and Features of PTMOracle

PTMOracle is a Cytoscape app that allows users to integrate different types of protein data into PPI networks. This includes the protein sequence, PTM sites, and annotations such as domains, motifs, and disordered regions. When running PTMOracle, users have access to several analytical tools that perform PTM-related queries. With these queries, users can develop systematic searches within the Cytoscape environment. These can highlight proteins of interest and explore network visualizations that address complex PTM-associated relationships. PTM sites can also be mapped onto protein sequences and annotations to help generate hypotheses about PTMs and their roles in PPIs. To facilitate the integration of protein data with networks, a new PTMOracle framework was developed in Cytoscape (shown in Figure 1). Using the Cytoscape framework, users create PPI networks with network data. This includes PPIs as well as additional edge data (e.g., kinase−substrate, domain− domain) and node data (e.g., protein IDs, protein descriptions). To map protein data onto nodes in the existing Cytoscape network, users can format protein data into an XML-based or TSV file that can be processed by the PTMOracle framework (Supporting Information 1; Method S1 and Figure S1). In addition to the new framework for processing protein data, PTMOracle introduces three new components to the Cytoscape interface, labeled as OraclePainter, OracleTools, and OracleResults (Figure 2). By using these components, users can visualize, query, and explore protein data that has been mapped 1990

DOI: 10.1021/acs.jproteome.6b01052 J. Proteome Res. 2017, 16, 1988−2003

Article

Journal of Proteome Research

protein nodes with OraclePainter, users are able to quickly identify and highlight proteins with a specific type of PTM or combination of PTMs. Displaying Protein PTM Sites and Sequence Annotations. Located in the Cytoscape Results Panel, OracleResults tabulates PTM sites and sequence annotations associated with any given protein node/s (Figure 2). To view this information, users input the name/s (or another identifier) of the protein node into the search box. PTM sites and sequence annotations associated with the protein are then presented in a table (Figure 3B). The sequence for the protein node is also shown in a text area and highlighted with PTM sites and sequence annotations from the table if applicable. With OracleResults, users are able to explore which PTM type or sites might be involved in interactions and where these interactions might take place. Searching for Proteins of Interest. OracleTools, located in the “Apps” menu bar under the PTMOracle submenu, contains a number of analytical modules that utilize the PTM sites and sequence annotations mapped onto protein nodes (Figure 2). The output of each module is displayed in newly created column/s of the Cytoscape Table Panel, which can then be used for searching or filtering nodes and edges with standard Cytoscape operations. The output for each module can also be used in combination with each other to build complex or detailed queries based on several criteria. Interacting partners of proteins of interest can also be highlighted and used to construct subnetworks with the Cytoscape framework for more targeted investigations (Figure 3C). OracleTools contains four analytical modules labeled as Calculator, PairFinder, RegionFinder, and MotifFinder. The Calculator module counts the total number of PTM sites, the frequency of specific PTM types, or the number of sequence annotations (e.g., domains, motifs, and disordered regions) on protein nodes. The PairFinder module helps to identify protein nodes with pairs of PTM sites that are separated by a specified number of amino acids. This is useful for identifying PTM

Figure 1. Overview of Cytoscape and the PTMOracle framework. PPIs and network annotations are processed by the Cytoscape framework to create PPI networks. By using PTMOracle, protein data can also be imported into Cytoscape. This is achieved by formatting PTM sites and sequence annotations into an XML-based or tabseparated (TSV) file format that PTMOracle can parse and map onto protein nodes in the Cytoscape network.

onto protein nodes in the network. This is achieved through standard Cytoscape operations, including searching for nodes, finding neighbor nodes, creating subnetworks, and applying network layouts. Details of each PTMOracle component are discussed in the following sections. Visualization of Proteins and Their Modifications. OraclePainter, located in the Cytoscape Control Panel (Figure 2), allows protein nodes to be displayed in multiple colors depending on their PTM types. It does this by utilizing PTM information to render protein nodes as pie charts (Figure 3A). Each section in the pie chart represents a PTM type and its frequency as a proportion of the total number of PTM sites on the protein node. Each section of the pie chart matches the colors assigned by the user in the OraclePainter color palette. Protein nodes that do not have PTM types corresponding to the ones in the color palette are colored gray. By visualizing

Figure 2. PTMOracle app running within Cytoscape. PTMOracle introduces three new components into the Cytoscape interface (highlighted in yellow boxes). These components are the OraclePainter, OracleTools, and OracleResults. OraclePainter (highlighted in green box) can visualize PTMs on protein nodes as pie charts in the network view (highlighted in purple box). OracleTools (highlighted in red box) contains several analytical modules (Calculator, PairFinder, RegionFinder, and MotifFinder) that output results into newly created columns in the Cytoscape Table Panel (highlighted in brown box). These results can be used with standard Cytoscape operations to identify proteins of interest. OracleResults (highlighted in blue box) can be used to examine, compare, and colocate PTM sites and sequence annotations such as domains, motifs, and disordered regions on protein nodes. 1991

DOI: 10.1021/acs.jproteome.6b01052 J. Proteome Res. 2017, 16, 1988−2003

Article

Journal of Proteome Research

Figure 3. Major features of PTMOracle. (A) OraclePainter is used to render protein nodes in the PPI network as pie charts. Each sector in the pie chart represents a specific PTM type and its frequency as a proportion of the total number of PTMs on protein nodes. The colors in each section of the pie chart are user-assigned in the color palette, located in the Cytoscape Control Panel (highlighted in red box). (B) Located in the Cytoscape Results Panel, OracleResults is used to visualize protein data associated with a group of protein nodes. The OracleResults output shows a table containing information on PTM sites and sequence annotations such as domains, motifs, and disordered regions (highlighted in blue box), a text area for the protein sequence (highlighted in green box), and a legend for PTM sites and sequence annotations mapped onto the sequence (highlighted in purple box). PTM sites and sequence annotations are also highlighted in the protein sequence if applicable. (C) Different modules (e.g., Calculator, PairFinder, RegionFinder, and MotifFinder) in OracleTools are used to find proteins of interest (highlighted in yellow). Specialized subnetworks containing proteins of interest and their interacting partners can also be created and explored for more targeted investigations.

Cytoscape. To address this, we demonstrate how PTMOracle can be used to do this or assist in this process. An overview of the workflow for this case study is provided in Supporting Information 1 (Figure S2A). For the purpose of this study, we visualized and quantified PTMs on yeast histone proteins. To account for different curation methods and database completeness, we used an integrated set of all PTM sites on yeast proteins from Uniprot,25 dbPTM,23 and ProteomeScout.24 A comparison of the number of PTM sites identified in each database is provided in Supporting Information 1 (Results S1 and Figure S3). Using PTMOracle, we then mapped PTM sites from the integrated data set onto the SBI yeast network described by Pang et al.45 Interactions in the SBI network have been reported in two or more experiments. With standard Cytoscape operations, we then searched for histone proteins and created a subnetwork to reduce visual complexity. Interactions with nonhistone proteins were not included. Finally, we used OraclePainter to visualize the types of PTMs and the Calculator module in OracleTools to count the number of PTM sites on each histone protein. The resulting subnetwork contained 10 histone proteins, including two copies of each core histone protein (H2A, H2B, H3, and H4), one linker protein (H1), and one histone variant protein (H2A.Z). Upon visualizing the network with OraclePainter, as expected, we found that different histone proteins carry different PTM types and in different proportions (Figure 4). All histone proteins have phosphorylation and acetylation sites except histone H2A.Z (HTZ1) and H1 (HHO1), respectively. Within the histone octamer, we see

interplay or cross-talk and other logic modules that may function in roles that are common across higher eukaryotes.53 The RegionFinder module identifies protein nodes with PTM sites that are colocated within specified sequence annotations such as domains, motifs, and disordered regions. This is useful because PTMs that occur within certain sequences are able to create or block protein−protein or protein−ligand interactions through conformational or physiochemical changes.4,5 Finally, the MotifFinder module identifies protein nodes whose sequence contains a specified motif. This is of use because sequence motifs can reflect interaction specificity and, hence, recurring PTM sites within sequence motifs that may be involved in systems for the regulation of interactions.54 To illustrate how PTMOracle can integrate and visualize PTMassociated relationships, below we explore four case studies. An overview of the workflow used for each case study with yeast is provided in Supporting Information 1 (Figure S2). The Cytoscape sessions for all case studies are also provided in Supporting Information 3. Case Study 1: Visualizing PTMs on Proteins with PTMOracle

Histone proteins are highly modified on their “tails” with various types of PTMs. Combinations of their PTMs regulate the specificity of their PPIs in a code-like manner, leading to the compaction or relaxation of chromatin.6 Following this, it has been proposed that other highly modified proteins may also feature “interaction codes”.7 However, visualizing highly modified proteins without increasing the visual complexity of the network can be difficult with the basic functions of 1992

DOI: 10.1021/acs.jproteome.6b01052 J. Proteome Res. 2017, 16, 1988−2003

Article

Journal of Proteome Research

Figure 4. Visualization of yeast histone proteins, their interactions, and PTMs with OraclePainter using the PPI network from Pang et al.45 The subnetwork contains 10 histone proteins: each core histone protein (HTA1, HTA2, HTB1, HTB2, HHT1, HHT2, HHF1, and HHF2), one linker protein (HHO1), and one histone variant protein (HTZ1). PPI interactions between histone proteins are highlighted in gray, and histone proteins are colored with OraclePainter. The types of PTMs and the proportion of each type are illustrated by the pie charts on each histone protein. The total number of PTM sites on each histone protein is represented by the node size. Across all histone proteins, five PTM types are shown including phosphorylation (red), acetylation (blue), methylation (black), ubiquitination (green), and sumoylation (purple).

dimers through histone chaperone proteins.57 Acetylation on H3 increases the affinity of the H3-H4 tetramer to histone chaperone proteins, which are involved in nucleosome assembly/disassembly.57 In the same way, phosphorylation of S58 on histone H3 has been associated with increasing H2AH2B dimer exchange of nucleosomes. 57 As a result, phosphorylation on histone H3 has been suggested to interfere with the interactions between the H3-H4 tetramer and the H2A-H2B dimers, which are necessary for octamer formation.57 We also used the Calculator module in OracleTools to explore the number of known PTM sites for each PTM type on histone proteins in the subnetwork. The number of PTM sites for each PTM type ranged between 48 (for acetylation) and 2 (for ubiquitination), whereas the total number of PTM sites on histone proteins in the subnetwork ranged between 17 for H3 and 1 for H1 (Supporting Information 1; Table S1). The total number of PTM sites on each histone protein in Figure 4 is represented by the node size. Surprisingly, we found that the PTM distribution for isoforms of histone H2B was different with one phosphorylation site on HTB2 but not HTB1. However, because only PTM sites from databases were considered in our analysis (see Materials and Methods), this reflects what sites are known. Although it is likely that both isoforms of H2B can carry this phosphorylation site, PTMOracle does not make predictions as to which PTMs occur on which residues and/or proteins. Because of this, users should be aware that PTM-specific isoforms may not be accurately represented on histones or other proteins with isoforms. We also observed that the range of PTM type and number of PTM sites on histone proteins showed a bias toward well-studied PTMs. This was expected because databases are more likely to contain PTM sites on proteins that have been studied extensively, such as acetylation on H3. Together, the results from OraclePainter and the Calculator module in OracleTools are complementary, providing both qualitative and quantitative results, respectively. In doing so, more accurate queries and searches can be performed to highlight potential

that histones H2A (HTA1 and HTA2) and H2B (HTB1 and HTB2) have sumoylation sites; H2A and H3 (HHT1 and HHT2) have methylation sites, but only H2B has known ubiquitination sites. Surprisingly, some known PTM types on histone proteins were not visualized as they were not present in the underlying databases. A reason for this could be that some PTMs only occur under specific conditions, such as sumoylation on histone H2A.Z, which occurs during doublestranded DNA damage.55 Other types of PTMs characterized on eukaryotic orthologs, such as methylation and ubiquitination on histone H1 and H2A.Z,55,56 remain largely uncharacterized in yeast. Although all and any types of PTMs and their subtypes can be visualized (Supporting Information 1; Figure S4), another reason for missing some known PTM types is that only eight main PTM types were considered in our analysis (see Materials and Methods). Because of this, some PTMs were not visualized, including ADP-ribosylation, which has been detected on all core histone proteins and the linker histone H1.6 Nevertheless, these results demonstrate the value of using OraclePainter to quickly visualize combinations of PTMs on proteins in Cytoscape. Visualization of the network also shows that histone proteins have many interactions (Figure 4). Core histone proteins were interconnected, highlighting their biological context within the nucleosome. In conjunction with OraclePainter, we observed some known PTM types on core histone proteins that are involved in regulating interactions associated with nucleosome assembly/disassembly and stability. Notably, we observed acetylation on histone H4 (HHF1 and HHF2) as well as acetylation and phosphorylation on histone H3. Located at the interface between H4 and H2B, acetylation of K92 on histone H4 has been suggested to weaken interactions between the H3H4 tetramer and the H2A-H2B dimer.57 Because of this, acetylation on histone H4 has been proposed to destabilize the nucleosome structure.57 On the other hand, acetylation of K57 on histone H3 has been shown to indirectly affect the interactions between the H3-H4 tetramer and H2A-H2B 1993

DOI: 10.1021/acs.jproteome.6b01052 J. Proteome Res. 2017, 16, 1988−2003

Article

Journal of Proteome Research

Figure 5. A search for PTM-mediated interactions in a domain−domain interaction (DDI) network. (A) Visualization of the three largest protein clusters in the yeast domain−domain interaction subnetwork, highlighting DDIs that might involve PTMs. Protein nodes with at least one PTM within domain sequence annotations, as identified by RegionFinder, are highlighted in green, whereas all other protein nodes are highlighted in purple. DDIs that might involve PTMs are also represented as orange edges, whereas other DDIs are gray. Note, however, that our analysis cannot reveal whether PTMs are actually required on one, both, or neither protein for an interaction to occur. (B) OracleResults tables showing the sequence position of PTM sites and domains mapped onto (i) CDC28 and (ii) CLB1. The sequence position of PTM sites for CDC28 and recognition domains for CLB1 that might facilitate interactions are both indicated with red arrows. PTM sites and sequence annotations are also colocated onto the protein sequence (highlighted in pink and blue, respectively).

(DDI) subnetwork of the yeast interactome. This was done by using the Cytoscape framework to map DDIs from Kim et al.47 onto the SBI network as edge attributes. DDIs have been mapped onto solved crystal structures by Kim et al. and therefore likely to be a distinct binding interface between protein complex members. Standard Cytoscape operations were subsequently used to create the DDI subnetwork and remove non-DDIs to reduce visual complexity. Following this, PTMOracle was used to map exact domain positions from Pfam48 onto the DDI subnetwork. Identifiers from PDB were also mapped onto the DDI subnetwork as node attributes, and edge attributes if interacting proteins were part of the same PDB structure (Supporting Information 1; Figure S5). Together with standard Cytoscape operations, we then used the RegionFinder module in OracleTools to identify proteins with domains containing at least one PTM site and DDIs that involve at least one of these proteins. Finally, OracleResults was used to tabulate the location of PTM sites and recognition domains on proteins that might be involved in PTM-mediated interactions.

proteins of interest and features of their PTMs. In the future, expanding the histone network to interacting partners of histone proteins as in Perner et al.58 could be useful for predicting proteins that interact with PTMs involved in transcription and its regulation. Case Study 2: Exploring PTM-Mediated Interactions

Interactions between proteins often occur at distinct binding interfaces.59 For some proteins, this may involve recognizing PTM sites with structural domains.5 However, identifying PTM-mediated interactions as well as the PTM sites and recognition domains that facilitate interactions remains difficult.54 To address this, we demonstrate how PTMOracle can be used to explore PTM-mediated interactions, better understand the role of PTMs in this process, and identify regions of interacting proteins that are involved. An overview of the workflow for this case study is provided in Supporting Information 1 (Figure S2B). To identify PTM-mediated interactions between distinct binding interfaces, we visualized a domain−domain interaction 1994

DOI: 10.1021/acs.jproteome.6b01052 J. Proteome Res. 2017, 16, 1988−2003

Article

Journal of Proteome Research

Figure 6. Prediction of kinases involved in the phosphodegron. (A) Yeast kinase-substrate interaction subnetwork highlighting protein nodes with phosphorylation sites within phosphodegron motifs. Protein nodes with at least one phosphorylation site on [ST] residue/s within phosphodegron motifs, as identified by RegionFinder, are highlighted in green. All other protein nodes are highlighted in blue. Protein nodes that are periodically expressed are shown with a red border. All interactions in the network are shown as directed edges with kinases as the source and nonkinases as the target. (B) OracleResults table showing the sequence position of phosphodegron motifs and PTM sites mapped onto the SIC1 protein (highlighted in the brown box in (A)). Calculated phosphodegron motifs as well as serine/threonine phosphorylation sites that have been previously reported are indicated with red and pink arrows, respectively, whereas ubiquitination sites that are known to specify the degradation of SIC1 are indicated with blue arrows.64,65 All phosphodegron motifs and PTM sites on SIC1 are also highlighted in the protein sequence (highlighted in pink and red, respectively).

Visualization of the DDI subnetwork revealed several clusters of proteins with the number of nodes in each cluster ranging between 2 and 101 (Supporting Information 1; Figure S5). Upon examining the network, we found that most protein members of each protein cluster were functionally related. This was illustrated in the three largest protein clusters of the DDI subnetwork (Figure 5), whereby each cluster was enriched (pvalue