Interactive Three-Dimensional Visualization and ... - ACS Publications

To understand the biology of the interactome, the covisualization of protein interactions and other protein-related data is required. In this study, w...
0 downloads 10 Views 2MB Size
Interactive Three-Dimensional Visualization and Contextual Analysis of Protein Interaction Networks Edwin Ho,† Richard Webber,‡ and Marc R. Wilkins*,† School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW 2052, Australia, and National ICT Australia, Locked Bag 9013, Alexandria NSW 1435, Australia Received May 11, 2007

To understand the biology of the interactome, the covisualization of protein interactions and other protein-related data is required. In this study, we have adapted a 3-D network visualization platform, GEOMI, to allow the coanalysis of protein–protein interaction networks with proteomic parameters such as protein localization, abundance, physicochemical parameters, post-translational modifications, and gene ontology classification. Working with Saccharomyces cerevisiae data, we show that rich and interactive visualizations, constructed from multidimensional orthogonal data, provide insights on the complexity of the interactome and its role in biological processes and the architecture of the cell. We present the first organelle-specific interaction networks, that provide subinteractomes of high biological interest. We further present some of the first views of the interactome built from a new combination of yeast two-hybrid data and stable protein complexes, which are likely to approximate the true workings of stable and transient aspects of the interactome. The GEOMI tool and all interactome data are freely available by contacting the authors. Keywords: interactome • visualization • complexome • S. cerevisiae

Introduction Global surveys of protein interactions have led to elucidation of the main features of the interactome.1,2 In these surveys, yeast two-hybrid assays3–6 and immunoaffinity chromatographic techniques such as TAP-tag7–9 and FLAG-tag affinity purification10 have been the dominant methodologies employed to determine pairwise protein interactions and protein complexes, respectively. It is now accepted that some proteins interact in a pairwise fashion while others participate in stable or transient multiprotein complexes to deliver their functions inside the cell. Concurrent with the large-scale protein interaction surveys, the proliferation of high-throughput technology has increased our knowledge of protein characteristics such as localization,11 abundance,12 half-life,13 and post-translational modifications.14 While databases such as Swiss-Prot15 bring together much of this information through manual curation and cross-links with other databases, the analysis of protein characterization data in the context of the interactome has been of lesser focus. Coanalysis of orthogonal protein interaction and characterization data has advantages in reducing noise and enabling discovery of nonobvious trends.16 In terms of understanding the biology of the interactome, it has two key advantages. To reduce noise, pairwise interaction and protein complex data * Corresponding author: Prof. Marc Wilkins, Department of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW 2052, Australia. E-mail: [email protected]. Tel.: +61-2-9385-3633. Fax: +61-2-9385-1483. † University of New South Wales. ‡ National ICT Australia.

104 Journal of Proteome Research 2008, 7, 104–112 Published on Web 11/20/2007

can be intersected to minimize spurious false positives to reflect the true nature of the interactome. To enable the discovery of nonobvious trends, protein characterization information can be analyzed in the context of the interaction network to increase insight into the biology of the interactome. Such contextual analysis captures the spirit of systems biology. Some examples include work by Reguly et al.17 to coanalyze literaturecurated protein and genetic networks and by Han et al.1 to highlight protein hub hierarchy by combining interaction data and gene and protein expression data. Contextual analyses have involved vast amounts of protein interaction and other protein data, primarily represented as graphs composed of nodes (proteins and associated characterization) and edges (interactions). The majority of current graph drawing applications, such as Cytoscape18 and VisANT,19 produce complicated 2-D representations of biomolecular networks, such as metabolic,20,21 genetic, and protein interactions.22 These are not easily interpretable due to the large number of intersecting and overlapping edges. The representation of biomolecular networks in 3-D, particularly if such visualizations are interactive, should help the minimization of edge intersections that complicate interpretation. However, applications that provide user interfaces for the interactive exploration of complicated networks in 3-D are yet to be built. An important step toward improving our understanding of the interactome is to integrate the mass of associated proteomic parameters into visualizations, resulting in an enriched knowledgebase that is represented graphically to enhance biological interpretation. 10.1021/pr070274m CCC: $40.75

 2008 American Chemical Society

research articles

Analysis of Protein Interaction Networks Here we report the adaptation of a Java-based visualization platform, called GEOMI,23 for the interactive 3-D visualization of protein interaction networks. We describe the covisualization of interaction data with protein characteristics, such as protein localization, abundance, physicochemical parameters, post-translational modifications, and gene ontology functional classification to improve interpretation of proteomic information. In addition, we build a model of the interactome that considers both pairwise interactions and sets of multiprotein complexes. We illustrate that GEOMI can provide a usercentered visualization platform to promote the generation of novel proteomic hypotheses.

Materials and Methods Interaction Data. Names and descriptions of 1379 Saccharomyces cerevisiae proteins and 2493 interactions were retrieved from the filtered yeast interactome (FYI) data set described by Han et al.1 This is a high-quality data set resulting from the intersection of interaction data from yeast two-hybrid experiments, affinity purification studies, as well as information from the literature. The designation of proteins as date hub, party hub, and nonhub was from the same publication. Data describing S. cerevisiae protein complexes were from the largescale tandem affinity purification study by Krogan et al.,24 which defined 547 nonoverlapping protein complexes involving 2703 proteins. Associated Proteomic Data. Gene ontology25 (GO) classification of proteins for all 1379 proteins was retrieved using the SGD Gene Ontology Slim Mapper tool on the SGD Web site.26 Separating proteins into a manageable number of broad categories is possible using GO SLIM annotation because proteins are mapped to their more general parent GO term. This approach simplifies the identification of trends in protein function throughout the interactome and individual complexes. Cellular localization data were taken from the GFP-tagging study of Huh et al.11 Protein copy number and abundance data were sourced from Ghememaghami et al.12 Post-translational modification data were obtained from the UniprotKB/SwissProt database (release 50.4)27 by using Swissknife,28 a Perl package for parsing SwissProt, to retrieve the FT (feature) lines of each protein entry with the MOD_RES (post-translational modification) or CARBOHYD (glycosylation site) key names. Protein pI was predicted using the Compute pI 29,30 tool on ExPASy, and Grand Average Hydropathy (GRAVY) scores were calculated using the algorithm of Kyte and Doolittle31 as described in Ho et al.32 Data Collation and Integration. Customized Perl scripts were used to combine interaction data sets and associated proteomic data into an XML-based format suitable for use in GEOMI. The document type definition (DTD) file that defines the syntax of this XML-based format is available in the Supporting Information. The resulting .xwg files are read by GEOMI’s file parser and rendered into the Java3D display. Each interaction was parsed as an edge element, each protein as a node element, and the proteomic data as property elements associated with each node according to the DTD. Each property element is a key-value pairsthe key specifies the type of proteomic data, and the value stores its value(s) for the associated protein. Adaptation of GEOMI for Visualization of Protein Interaction Networks. GEOMI is a Java-based cross-platform application for visualizing interaction networks.23 To use GEOMI for this study, we developed a suite of modules for the visualization

Figure 1. GEOMI visualization of the entire Filtered Yeast Interactome data set. A 3-D spherical projection of the entire FYI data set (using the Force Directed Layout with parameters: repulsion ) 12, origin ) 80, spring ) 50). Note that the apparent differences in node size in this visualization are due to the 3-D layout rendering nodes projected towards the viewer that are larger than those that are projected away from the viewer and are not indicative of any differences in proteomic attributes unless specified.

of protein–protein interaction networks and associated proteomic parameters. Development was undertaken in the Eclipse IDE on an Apple Macintosh iBook. A user manual, which details the use of the program as well as how it can be used to view new protein-associated data, is given as Supporting Information. GEOMI and all modules are freely available by contacting the authors. Fundamentals of GEOMI Visualization. In GEOMI, the visualization of protein nodes and their interconnections by edges is governed by four fundamental parameters. These parameters can be changed in real time to generate the most informative visualizations. The spring parameter is the attractive force between proteins, where a higher spring value will shorten edge length. The spring parameter will also affect the degree to which groups of proteins that are highly interconnected have shorter edge lengths than those with single interactions. The repulsion parameter is the repulsive force between proteins, which works in opposition to the spring parameter. The origin parameter is the degree to which, in 3-D spherical visualizations, a force is applied to draw all nodes into a single central point. It controls the size of the sphere in which all proteins and interactions are projected. The planar parameter controls the dimensionality of the graph. It is most useful for controlling whether visualizations are projected into a 3-D space or are flattened down to a 2-D representation. For all figures, parameters used for their generation are given. Further information on their use is given in the user manual as Supporting Information.

Results Visualization of the Yeast Interactome. To visualize protein–protein interactions, we adapted GEOMI by constructing a series of layout modules. Proteins are represented as a shape (node), and each interaction between two proteins is represented as a line (edge). All nodes and edges are projected into a navigable 3-D sphere or, if desired, can be flattened onto a 2-D circle. Figure 1 shows the visualization of the 2493 pairwise interactions between 1391 proteins in the filtered yeast interactome (FYI) data set. The spring embedder algorithm underlies all layouts, including the Force Directed Layout used for this Journal of Proteome Research • Vol. 7, No. 01, 2008 105

research articles

Ho et al.

Figure 2. Integrated data associated with each protein. It is accessible by a right mouse button click on the GEOMI interface. The arrow points at the protein for which information is displayed.

visualization.33 This mimics a mechanical system where a repulsion force is applied to all nodes, but the edges that connect nodes behave as springs that apply an attractive force. These forces are used in the visualization in a series of iterations until it settles into a minimum energy state, which should correspond to a good visualization. From Figure 1, we see that the filtered yeast interactome appears to have some regions of high connectivity and some regions that are sparsely connected. As we are visualizing approximately only one-quarter of all yeast proteins, a large number of nonconnecting components are to be expected. From a visual approximation, the interactome resembles a scale-free network,2 with the majority of proteins having few partners and a small number of proteins, called hubs, having many partners. Proteomic Data Have Been Integrated for Each Protein. To increase biological insight into the interactome, we enriched the visualization by integrating auxiliary data from a number of different sources. For each protein, information included gene ontology molecular function and biological process, cellular localization, copies per cell (which equates to abundance), known post-translational modifications, and physicochemical parameters. All information was integrated into the underlying XML file. In total, the 1379 S. cerevisiae proteins of the FYI data set mapped to 32 broad categories of cellular process and 22 categories of molecular function. Subcellular localization could be assigned to 1169 proteins, or 85% of proteins present, and a total of 294 proteins had known posttranslational modifications. The physicochemical parameters of 1039 proteins were integrated from Ho et al.9 The FYI data set, while generated from the intersection of data from a number of different sources, is poor at defining the multiprotein complexes inside the cell. This is because it is centered on pairwise protein–protein interactions from yeast two-hybrid assays. To provide information on the membership 106

Journal of Proteome Research • Vol. 7, No. 01, 2008

of any protein in a multiprotein complex, we integrated information from a large-scale tandem affinity purification study24 which defined 547 nonoverlapping protein complexes containing 2703 proteins. From this, 940 proteins from the FYI data set could be assigned to multiprotein complexes, representing 68% of proteins present. As a first step in visualizing integrated proteomic data with the interactome, we implemented a “retrieve information” feature. For this, the selection of a protein with the right mouse button shows all relevant data. Figure 2 shows how rich information about a protein is displayed as text in a pop-up box when requested by the user. This provides a direct link between proteins and their associated proteomic data. Covisualization of Integrated Data with the Interactome: Single Parameter. While the availability of associated information with each protein provided a means to better understand proteins of interest, we believed that the greatest insight into the interactome could be achieved when protein-associated data are covisualized with protein interactions. However, covisualization of the interactome and all associated proteomic data in its entirety would be overwhelming for the user. Therefore, a series of extension modules for GEOMI were built to allow the user to adjust the complexity of the visualization. These modules were designed to visually highlight the trends in the integrated data, with each individual module responsible for exhibiting one or more proteomic parameters. A key consideration was to make it possible for users to adjust visual complexity in real time to produce visualizations that are easily interpretable. The mapping of protein parameters to node color is one means by which the GEOMI modules corepresent proteinassociated data with the interactome. For example, GO biological process terms can be mapped to different node colors, allowing the user to easily identify groups of proteins with the

Analysis of Protein Interaction Networks

research articles

Figure 4. Textual annotation in context of the interactome, showing a portion of the nuclear network. (A) Nodes labeled by gene name. (B) The same nodes labeled by post-translational modification where P is phosphorylation and A is acetylation. The number associated with the modification type indicates the number of known sites per protein.

Figure 3. Covisualization of protein interactions and associated data in the biggest connected component of the yeast nuclear interactome. Visualizations have been flattened into two dimensions for representation in print. (A) Proteins colored by the GO SLIM biological process. Yellow ) RNA metabolism; orange ) organelle organization and biogenesis; light blue ) protein biosynthesis; dark blue ) cell cycle; green ) transcription; red ) process unknown. (B) Proteins colored by localization. Blue ) nucleus; light blue ) nucleolus; green ) mitochondrion; gold ) cytoplasm; red ) unknown. For all visualizations, parameters used were: repulsion ) 20, origin ) 80, spring ) 50, planar ) 50.

same annotation (Figure 3A). GO molecular function and protein localization (Figure 3B) were also implemented to be viewed in this manner as were continuous variables such as protein pI and average hydropathy (data not shown). Where

node color was not suitable for representation of associated data, such as in the case of gene name or a qualitative and quantitative description of post-translational modifications, GEOMI modules were built to provide textual labeling of the proteins with associated data. Figure 4 illustrates this with protein name and with type and number of post-translational modifications. The visualization of post-translational modifications with the interactome is likely to be particularly powerful in revealing signal transduction and other information pathways within the cell. Figure S1 (Supporting Information) presents one detailed example of this. Covisualization of Integrated Data with the Interactome: Multiple Parameters. Node color was useful where a protein was a member of one biological category of interest. However, if a protein is a member of multiple categories, for example, where a protein is found in multiple cellular locations or participates in multiple biological processes, the role of the protein inside the cell cannot be fully represented by a single node color. To address this, we developed a multisphere node view. This increases the information content of a visualization by representing each protein node as multiple spheres of different colors. Textual information such as protein name can also be covisualized with the multisphere node view if desired. Two examples can illustrate how multiparametric biological information can easily be understood by this approach. Journal of Proteome Research • Vol. 7, No. 01, 2008 107

research articles

Figure 5. Visualization of multiple parameters. Visualization of pairwise protein interactions (nodes and edges), membership of the stable protein complex (numbers), and one or more GO Slim biological processes (node color and shape). This shows that interacting proteins are homogeneous in biological process. (A) All members of the LSm complex, involved in RNA processing, have a yellow sphere representing the GO term RNA metabolism. (B) All six members of a complex involved in the control of Actin polymerization share the same two terms (organelle and cytoskeleton organization/biogenesis). (C and D) General transcription factor TFIID/histone deacetylase (HAT) complex and the nuclear pore complex, both with a large number of biological process annotations shared between all their members.

In a first example (Figure 5), we have covisualized pairwise protein interactions, membership of a stable complex, and the biological processes in which proteins are involved. Strikingly, although not unexpectedly, proteins that interact with one another show a homogeneity of biological process. Further, where one member of a group of interacting proteins is documented to be part of two or more biological processes, other members of a group are likely to be involved in the same multiplicity of processes. This type of visualization can also highlight proteins whose functions are either not comprehensively understood or are poorly documented in databases. Protein SPT8, the protein shown as a single sphere in Figure 5C, complexes with other proteins in the visualization to form the general transcription factor TFIID/histone deacetylase (HAT) complex. However its gene ontology annotation does not reflect the multiplicity of functions that are shown for other proteins in the same complex. A second example of multiparameter visualization is shown in Figure 6. In this instance, we have covisualized pairwise protein interactions, membership of the stable complex, and the protein cellular localization. The examples given in Figure 6 indicate how interacting proteins are localized to the same compartment of the cell. Where proteins have been localized to two cellular compartments, as in Figure 6B, other members of the same complex can easily be seen as also localizing to the same two parts of the cell. Importantly, the multiparameter visualization has highlighted interacting proteins of unknown 108

Journal of Proteome Research • Vol. 7, No. 01, 2008

Ho et al.

Figure 6. Understanding protein localizations by interactome visualization. Unannotated proteins are placed in context of their interaction partners and complexes, allowing putative localization to be inferred. (A) Complex 18 (left) is the cleavage and polyadenylation factor (CPF) complex, which works together with complex 233 (right), the cleavage factor IA (CFIA) complex, to bind and process premRNA in the nucleus. FIP1 is the unannotated (red) protein in the CPF complex, whereas all of its partners in the complex are nuclear (navy). The function of FIP1, and the localization of its parent complex, suggests that it is localized in the nucleus. Localization to the nucleus can also be inferred for CLP1 and RNA15, the two unannotated proteins in the CFIA complex. (B) Complex 121 is involved in bud formation and is localized to the bud neck (green) and cell periphery (blue). The unannotated member (red), CDC3, can be inferred to colocalize with the rest of this complex. (C) Complex 61 is implicated in the control of Actin polymerization, and four of the six members localize with Actin (purple). The party hub ARP2 has a localization description of punctate composite (yellow) and is central to this stable complex. The unannotated member, ARP3, can be inferred to also localize with Actin. The following force strengths were used: protein complex force ) 100, planar ) 100, origin ) 1, repulsion ) 1.

localization (shown in red) that are members of complexes (as shown by the numerical annotation). In these cases, the covisualization of parameters presents strong evidence for the likely localization of these proteins. Organelle-Specific Interactomes. Using protein localization data from a high-throughput study in yeast,11 we extracted subnetworks of interacting proteins from the FYI that represented organelle-specific interactomes for the nucleus, mitochondrion, Golgi, and endoplasmic reticulum. Each organellespecific interactome is a network composed of interactions between proteins localized to the organelle and their direct interaction partners, some of which may be external to that organelle. The purpose of organelle-specific visualizations is 2-fold. First, they improve our understanding of the interactions and functional molecular biology of organelles and subcellular compartments. It is important to study organelles to elucidate their roles as semi-independent machines that provide unique functions for the cell. Second, partition of an entire interactome into organelle-specific subnetworks is a biologically meaningful way to reduce network complexity. The reduction in network complexity and concomitant increase in visual clarity can be observed when comparing a visualization of an entire interactome (Figure 1) and those from a single organelle (Figure 3). Realistic View of the Interactome. Typically, the interactome is represented as a series of pairwise interactions, reflecting

Analysis of Protein Interaction Networks

research articles

the widespread use of the yeast two-hybrid technique. Although the yeast two-hybrid technique has a capacity to detect interactions including those that are weak and transient,34 it can only define multiprotein complexes by inference. This is a major weakness as the cell predominantly consists of collections of multiprotein molecular machines. Data sets from affinity purification assays, such as from large-scale TAP-tag studies,7,24 describe groups of proteins that associate as stable complexes. These approaches thus provide a more realistic view of the architecture of the cell. Interestingly, the pairwise interaction and protein complex views of the interactome have predominantly been studied independently. We believe that by merging these data and allowing their covisualization a realistic view of the interactome can be produced. To generate a realistic view of the interactome, we combined the largest set of known protein complexes24 with high-quality pairwise protein interactions.1 Covisualization was achieved by labeling proteins with numbers that describe their membership in stable complexes onto a network built from pairwise protein interactions (Figures 5 and 6). This showed that groups of proteins that are highly interconnected form stable complexes. It was also seen that some proteins were not part of stable complexes but appeared to form connections between two or more stable complexes (Figure 7A). To better visualize this phenomenon, we developed a further layout. The Protein Complex Force layout groups together all proteins that are members of a single complex into a common region in the 3-D space. This is achieved by increasing the attractive force between the members of each stable complex. This allows a clear covisualization of pairwise interactions and stable protein complexes (Figure 7B). The strength of the attraction force between known members of complexes is user-controlled and can be changed dynamically.

Discussion Data visualization is required if we are to understand protein interaction networks in the context of high-throughput proteomic analyses. Holistic and detailed views of large data sets can be obtained more easily with visualization than from either textual or numeric representations. GEOMI, as presented in this manuscript, is a platform that generates rich visualizations of the interactome and associated proteomic data. It can help increase our understanding of the biology of the interactome. Just as researchers mapped genetic elements onto physical entities (chromosomes) to improve their understanding of the molecular biology of genes, the development of accurate and rich protein interaction maps improves our understanding of the molecular biology of proteins. In particular, a mapping of proteomic information and interactions onto physical entities (complexes) to capture spatial and dynamic aspects of interactions should generate novel insight into the molecular organization of the cell from small protein multimers up to large multifunctional complexes. Dimensionality of Visualizations and the Focus/Context Balance. In many popular visualization tools, such as Cytoscape,18 nodes and edges representing the interactome are arranged in 2-D, calculated using the spring embedder algorithm.33 In GEOMI, the spring embedder algorithm has been extended so that the interactome is represented as navigable 3-D layouts. The parameters of the layouts such as edge length and node repulsion/attraction can be adjusted in real time to attain the most appropriate visualization. The 3-D layout is important in minimizing the number of edge crossings in the

Figure 7. Realistic view of the interactome. (A) Part of the nuclear interactome displayed, where nodes have been labeled according to their membership of complexes24 (force strengths: repulsion ) 12, origin ) 80, spring ) 50). Protein complex member nodes repel each other but are held together by edges. (B) The nodes that represent members of each complex now attract the other members so that they appear in one region in the layout (repulsion ) 0, origin ) 80, spring ) 0, protein complex force ) 100). Intercomplex interactions can be clearly identified. If intracomplex interactions and intercomplex interactions are to be covisualized, a slight increase in the repulsion value (repulsion ) 2, origin ) 80, spring ) 0, protein complex force ) 100) will generate a visualization halfway between that in panels A and B.

displays of large and complex networks. Interviewer335 and NAVIGaTOR36 can also represent networks as 3-D graphs. The amount and density of the data presented to the user at any one time must be effectively managed to allow interpretation.37 To maximize biological insight, a balance needs to be struck between a high-resolution focus (the amount of detail presented for individual proteins or subnetworks) and low-resolution context (the level of abstraction of relevant data). A number of visualization platforms allow the focus/context balance to be controlled by the user. The GenePro38 plug-in for Cytoscape addresses this by summarizing each cluster of Journal of Proteome Research • Vol. 7, No. 01, 2008 109

research articles highly interconnected nodes as a pie graph that reflects the distribution for a particular proteomic characteristic of its members (e.g., localization or protein function). In this way, pairwise protein interactions can be hidden and complexes summarized when a low-resolution, overall representation of the interactome is needed. Visant19 also offers network visualization with multiple levels of detail. This is achieved through Visant’s “meta-graph” strategy, where clusters of connected nodes can be collapsed into “meta-nodes” for a low-resolution overview. Interviewer335 and NAVIGaTOR36 work in a similar way, grouping nodes together according to their degree of connectivity. GEOMI, by contrast, generates interaction networks in which the user can navigate a continuum between an overview and detailed views of the interactome. Uniquely, GEOMI groups together nodes according to pairwise interactions and/or membership of proteins in a stable complex. To our knowledge, GEOMI is the first to provide such a mechanism. GOlorize,39 a Cytoscape plug-in, can group nodes together according to the shared attribute of GO annotation, rather than node connectivity. These latter approaches demonstrate the value of enhancing visualizations of protein interaction networks by clustering of proteins according to proteomic parameters other than just node connectivity. Organellar subinteractomes, as built for GEOMI, are a biologically relevant means of reducing network complexity. Instead of abstracting or visually grouping together certain proteins in an entire interactome, a subset of the interactome is selected depending on subcellular localization or membership of an organelle. Each organelle delivers a series of complex functions in the eukaryotic cell and must have an interactome of specific topology to allow it to undertake these tasks. Interestingly, the relative simplicity of organellar interactomes means that their interactomes might be fully elucidated in the near term, particularly in S. cerevisiae. Questions concerning the complexity, connectedness, and topology of different organellar interactomes can then be explored. The technical difficulties in determining the interactions of and between membrane proteins,8 however, may challenge our capacity to understand the interface of any organellar interactome to the global cellular network. Merging of Orthogonal Data Illustrating the Complexity of the Interactome. Abstraction in visualization is useful to gain a global perspective; however, this is done at the risk of oversimplification. A challenge is how to corepresent a multiplicity of parameters in a comprehensible way. The multisphere node view capability in GEOMI allows clear visualization of single proteins that have multiple subcellular localizations, functions, or biological processes through the use of unique shapes and/or color. This is one of the first attempts to fully visualize this multiplicity and complexity of protein characteristics in the interactome. The visualizations presented here (Figures 5 and 6) show that multiple biologically relevant phenomena can be represented effectively and in a manner that facilitates interpretation. The combination of pairwise and complex interaction data achieved in this report is of biological importance. The major high-throughput sources of pairwise interaction and protein complex data are yeast two-hybrid and affinity chromatography purification studies. The data from these different approaches are complementary because each method has some unique capabilities compared to the other.40 While weak and transient pairwise interactions can be detected using the yeast twohybrid technique, the TAP-tag approach is better at defining 110

Journal of Proteome Research • Vol. 7, No. 01, 2008

Ho et al. stable protein complexes. In the past, interaction data sets generated from these two complementary methodologies have largely been studied independently of one another. However, both methodologies give insight into the true nature of the interactome. Via their visualization in GEOMI, a more realistic model of the interactome should be obtained. The unified view studies the interactome as a network of interacting protein complexes to reveal hierarchy and coordination between various cellular processes. Toward a Dynamic Model of the Interactome. Protein characteristics pertinent to interactome dynamics, such as protein half-life and abundance, have recently been elucidated on a proteome-wide scale.12,13 Some protein post-translational modifications have also been studied globally.41 Future proteome-wide studies will expand and refine the range of protein characteristics allowing dynamic models of the interactome to be built. This should shed light on the requirement for certain protein–protein interactions in association with temporal, environmental, or genetic changes. This would be an advance on current representations of the interactome as a static biological network akin to a photograph.16 Our labeling of posttranslational modifications onto the interactome, including those that are reversible, should help to understand the broader role of modifications in network topology. The contextualization of these data with other proteomic information will further help in the deciphering of the molecular requirement of certain modifications. It should be possible to distinguish posttranslational modifications associated with signal transduction (phosphorylation to activate a kinase) versus those involved with controlling the interaction of proteins (phosphorylation of a motif to allow interaction with an interaction partner containing an SH2 domain). The framework of GEOMI will allow for any interaction network and associated proteomic data in the .xwg format to be visualized and analyzed in a variety of contexts. Concluding Remarks. Here, we have transformed tabulated, textual data into rich and navigable visualizations. We have combined proteomic data with protein–protein interaction networks to build a platform for understanding the biology of the interactome. We believe that this type of approach will be central to the understanding and guiding of future highthroughput experiments in the fields of interactome studies and proteomics. Abbreviations: 2-D, two-dimensional; 3-D, three-dimensional; FYI, filtered yeast interactome; GO, gene ontology.

Acknowledgment. We acknowledge helpful discussions with Seokhee Hong, Rohan Williams, and Igantius Pang. M.R.W. acknowledges support from the UNSW Faculty Research Grants Program. Supporting Information Available: A user manual is available which gives information on the use of GEOMI including the format and use of the .xwg file. Figure S1 is a detailed example on how post-translational modifications may be useful in elucidating pathways. This material is available free of charge via the Internet at http://pubs.acs.org. References (1) Han, J. D.; Bertin, N.; Hao, T.; Goldberg, D. S.; Berriz, G. F.; Zhang, L. V.; Dupuy, D.; Walhout, A. J.; Cusick, M. E.; Roth, F. P.; Vidal, M. Evidence for dynamically organized modularity in the yeast protein-protein interaction network. Nature 2004, 430 (6995), 88– 93.

Analysis of Protein Interaction Networks (2) Barabasi, A. L.; Oltvai, Z. N. Network biology: understanding the cell’s functional organization. Nat. Rev. Genet. 2004, 5 (2), 101– 113. (3) Fromont-Racine, M.; Mayes, A. E.; Brunet-Simon, A.; Rain, J. C.; Colley, A.; Dix, I.; Decourty, L.; Joly, N.; Ricard, F.; Beggs, J. D.; Legrain, P. Genome-wide protein interaction screens reveal functional networks involving Sm-like proteins. Yeast 2000, 17 (2), 95– 110. (4) Fromont-Racine, M.; Rain, J. C.; Legrain, P. Toward a functional analysis of the yeast genome through exhaustive two-hybrid screens. Nat. Genet. 1997, 16 (3), 277–282. (5) Ito, T.; Chiba, T.; Ozawa, R.; Yoshida, M.; Hattori, M.; Sakaki, Y. A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc. Natl. Acad. Sci. USA 2001, 98 (8), 4569–4574. (6) Uetz, P.; Giot, L.; Cagney, G.; Mansfield, T. A.; Judson, R. S.; Knight, J. R.; Lockshon, D.; Narayan, V.; Srinivasan, M.; Pochart, P.; Qureshi-Emili, A.; Li, Y.; Godwin, B.; Conover, D.; Kalbfleisch, T.; Vijayadamodar, G.; Yang, M.; Johnston, M.; Fields, S.; Rothberg, J. M. A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature 2000, 403 (6770), 623–627. (7) Gavin, A. C.; Aloy, P.; Grandi, P.; Krause, R.; Boesche, M.; Marzioch, M.; Rau, C.; Jensen, L. J.; Bastuck, S.; Dumpelfeld, B.; Edelmann, A.; Heurtier, M. A.; Hoffman, V.; Hoefert, C.; Klein, K.; Hudak, M.; Michon, A. M.; Schelder, M.; Schirle, M.; Remor, M.; Rudi, T.; Hooper, S.; Bauer, A.; Bouwmeester, T.; Casari, G.; Drewes, G.; Neubauer, G.; Rick, J. M.; Kuster, B.; Bork, P.; Russell, R. B.; SupertiFurga, G. Proteome survey reveals modularity of the yeast cell machinery. Nature 2006, 440 (7084), 631–636. (8) Gavin, A. C.; Bosche, M.; Krause, R.; Grandi, P.; Marzioch, M.; Bauer, A.; Schultz, J.; Rick, J. M.; Michon, A. M.; Cruciat, C. M.; Remor, M.; Hofert, C.; Schelder, M.; Brajenovic, M.; Ruffner, H.; Merino, A.; Klein, K.; Hudak, M.; Dickson, D.; Rudi, T.; Gnau, V.; Bauch, A.; Bastuck, S.; Huhse, B.; Leutwein, C.; Heurtier, M. A.; Copley, R. R.; Edelmann, A.; Querfurth, E.; Rybin, V.; Drewes, G.; Raida, M.; Bouwmeester, T.; Bork, P.; Seraphin, B.; Kuster, B.; Neubauer, G.; Superti-Furga, G. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 2002, 415 (6868), 141–147. (9) Ho, Y.; Gruhler, A.; Heilbut, A.; Bader, G. D.; Moore, L.; Adams, S. L.; Millar, A.; Taylor, P.; Bennett, K.; Boutilier, K.; Yang, L.; Wolting, C.; Donaldson, I.; Schandorff, S.; Shewnarane, J.; Vo, M.; Taggart, J.; Goudreault, M.; Muskat, B.; Alfarano, C.; Dewar, D.; Lin, Z.; Michalickova, K.; Willems, A. R.; Sassi, H.; Nielsen, P. A.; Rasmussen, K. J.; Andersen, J. R.; Johansen, L. E.; Hansen, L. H.; Jespersen, H.; Podtelejnikov, A.; Nielsen, E.; Crawford, J.; Poulsen, V.; Sorensen, B. D.; Matthiesen, J.; Hendrickson, R. C.; Gleeson, F.; Pawson, T.; Moran, M. F.; Durocher, D.; Mann, M.; Hogue, C. W.; Figeys, D.; Tyers, M. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 2002, 415 (6868), 180–183. (10) Einhauer, A.; Jungbauer, A. The FLAG peptide, a versatile fusion tag for the purification of recombinant proteins. J. Biochem. Biophys. Methods 2001, 49 (1–3), 455–465. (11) Huh, W. K.; Falvo, J. V.; Gerke, L. C.; Carroll, A. S.; Howson, R. W.; Weissman, J. S.; O’Shea, E. K. Global analysis of protein localization in budding yeast. Nature 2003, 425 (6959), 686–691. (12) Ghaemmaghami, S.; Huh, W. K.; Bower, K.; Howson, R. W.; Belle, A.; Dephoure, N.; O’Shea, E. K.; Weissman, J. S. Global analysis of protein expression in yeast. Nature 2003, 425 (6959), 737–741. (13) Belle, A.; Tanay, A.; Bitincka, L.; Shamir, R.; O’Shea, E. K. Quantification of protein half-lives in the budding yeast proteome. Proc. Natl. Acad. Sci. USA 2006, 103 (35), 13004–13009. (14) Gruhler, A.; Olsen, J. V.; Mohammed, S.; Mortensen, P.; Faergeman, N. J.; Mann, M.; Jensen, O. N. Quantitative phosphoproteomics applied to the yeast pheromone signaling pathway. Mol. Cell Proteomics 2005, 4 (3), 310–327. (15) Boeckmann, B.; Bairoch, A.; Apweiler, R.; Blatter, M. C.; Estreicher, A.; Gasteiger, E.; Martin, M. J.; Michoud, K.; O’Donovan, C.; Phan, I.; Pilbout, S.; Schneider, M. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 2003, 31 (1), 365–370. (16) Gerstein, M.; Lan, N.; Jansen, R. Proteomics. Integrating interactomes. Science 2002, 295 (5553), 284–287. (17) Reguly, T.; Breitkreutz, A.; Boucher, L.; Breitkreutz, B. J.; Hon, G. C.; Myers, C. L.; Parsons, A.; Friesen, H.; Oughtred, R.; Tong, A.; Stark, C.; Ho, Y.; Botstein, D.; Andrews, B.; Boone, C.; Troyanskya, O. G.; Ideker, T.; Dolinski, K.; Batada, N. N.; Tyers, M. Comprehensive curation and analysis of global interaction networks in Saccharomyces cerevisiae. J. Biol. 2006, 5 (4), 11.

research articles (18) Shannon, P.; Markiel, A.; Ozier, O.; Baliga, N. S.; Wang, J. T.; Ramage, D.; Amin, N.; Schwikowski, B.; Ideker, T. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003, 13 (11), 2498–2504. (19) Hu, Z.; Mellor, J.; Wu, J.; Yamada, T.; Holloway, D.; Delisi, C., VisANT: data-integrating visual framework for biological networks and modules. Nucleic Acids Res. 2005, 33 (Web Server issue), W352–7. (20) Borisjuk, L.; Hajirezaei, M.-R.; Klukas, C.; Rolletschek, H.; Schrieber, F. Integrating data from biological experiments into metabolic networks with the DBE information system. In Silico Biol. 2004, 5, 0011. (21) Schreiber, F. High quality visualization of biochemical pathways in BioPath. In Silico Biol. 2002, 2 (2), 59–73. (22) Drabkin, H. J.; Hollenbeck, C.; Hill, D. P.; Blake, J. A. Ontological visualization of protein-protein interactions. BMC Bioinf. 2005, 6, 29. (23) Ahmed, A.; Dwyer, T.; Forster, M.; Fu, X.; Ho, J.; Hong, S.-H.; Koschutzki, D.; Murray, C.; Nikolov, N. S.; Taib, R.; Tarassov, A.; Xu, K. In GEOMI: GEOmetry for Maximum Insight, Proceeding of 13th International Symposium on Graph Drawing, Limerick, Ireland, September 2005, 2005; Limerick, Ireland, 2005; pp 468– 479. (24) Krogan, N. J.; Cagney, G.; Yu, H.; Zhong, G.; Guo, X.; Ignatchenko, A.; Li, J.; Pu, S.; Datta, N.; Tikuisis, A. P.; Punna, T.; PeregrinAlvarez, J. M.; Shales, M.; Zhang, X.; Davey, M.; Robinson, M. D.; Paccanaro, A.; Bray, J. E.; Sheung, A.; Beattie, B.; Richards, D. P.; Canadien, V.; Lalev, A.; Mena, F.; Wong, P.; Starostine, A.; Canete, M. M.; Vlasblom, J.; Wu, S.; Orsi, C.; Collins, S. R.; Chandran, S.; Haw, R.; Rilstone, J. J.; Gandi, K.; Thompson, N. J.; Musso, G.; St Onge, P.; Ghanny, S.; Lam, M. H.; Butland, G.; Altaf-Ul, A. M.; Kanaya, S.; Shilatifard, A.; O’Shea, E.; Weissman, J. S.; Ingles, C. J.; Hughes, T. R.; Parkinson, J.; Gerstein, M.; Wodak, S. J.; Emili, A.; Greenblatt, J. F. Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature 2006, 440 (7084), 637–643. (25) Ashburner, M.; Ball, C. A.; Blake, J. A.; Botstein, D.; Butler, H.; Cherry, J. M.; Davis, A. P.; Dolinski, K.; Dwight, S. S.; Eppig, J. T.; Harris, M. A.; Hill, D. P.; Issel-Tarver, L.; Kasarskis, A.; Lewis, S.; Matese, J. C.; Richardson, J. E.; Ringwald, M.; Rubin, G. M.; Sherlock, G. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 2000, 25 (1), 25–29. (26) SGD Gene Ontology Slim Mapper. http://db.yeastgenome.org/ cgi-bin/GO/goTermMapper.pl (Accessed 05/09/06). (27) Apweiler, R.; Bairoch, A.; Wu, C. H.; Barker, W. C.; Boeckmann, B.; Ferro, S.; Gasteiger, E.; Huang, H.; Lopez, R.; Magrane, M.; Martin, M. J.; Natale, D. A.; O’Donovan, C.; Redaschi, N.; Yeh, L. S. UniProt: the Universal Protein knowledgebase. Nucleic Acids Res. 2004, 32 (Database issue), D115–9. (28) Hermjakob, H.; Fleischmann, W.; Apweiler, R. Swissknife - ‘lazy parsing’ of SWISS-PROT entries. Bioinformatics 1999, 15 (9), 771– 772. (29) Bjellqvist, B.; Hughes, G. J.; Pasquali, C.; Paquet, N.; Ravier, F.; Sanchez, J. C.; Frutiger, S.; Hochstrasser, D. The focusing positions of polypeptides in immobilized pH gradients can be predicted from their amino acid sequences. Electrophoresis 1993, 14 (10), 1023–1031. (30) Bjellqvist, B.; Basse, B.; Olsen, E.; Celis, J. E. Reference points for comparisons of two-dimensional maps of proteins from different human cell types defined in a pH scale where isoelectric points correlate with polypeptide compositions. Electrophoresis 1994, 15 (3–4), 529–539. (31) Kyte, J.; Doolittle, R. F. A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 1982, 157 (1), 105– 132. (32) Ho, E.; Hayen, A.; Wilkins, M. R. Characterisation Of Organellar Proteomes: A Guide To Subcellular Proteomic Fractionation And Analysis. Proteomics 2006, 6 (21), 5746–5757. (33) Eades, P. A heuristic for graph drawing. Congressus Numerantium 1984, 42, 149–160. (34) Cusick, M. E.; Klitgord, N.; Vidal, M.; Hill, D. E. Interactome: gateway into systems biology. Hum. Mol. Genet. 2005,14 (Spec No. 2), R171–81. (35) Han, K.; Ju, B. H. A fast layout algorithm for protein interaction networks. Bioinformatics 2003, 19, 1882–1888. (36) Navigator: http://ophid.utoronto.ca/navigator/ (37) Webber, R. J. Finding the best viewpoint for three-dimensional graph drawing. Ph.D. Thesis. University of Newcastle, Australia, 1998.

Journal of Proteome Research • Vol. 7, No. 01, 2008 111

research articles (38) Vlasblom, J.; Wu, S.; Pu, S.; Superina, M.; Liu, G.; Orsi, C.; Wodak, S. J. GenePro: a Cytoscape plug-in for advanced visualization and analysis of interaction networks. Bioinformatics 2006, 22 (17), 2178–2179. (39) Garcia, O.; Saveanu, C.; Cline, M.; Fromont-Racine, M.; Jacquier, A.; Schwikowski, B.; Aittokallio, T. GOlorize: a Cytoscape plug-in for network visualization with Gene Ontology-based layout and coloring. Bioinformatics 2007, 23, 394–396.

112

Journal of Proteome Research • Vol. 7, No. 01, 2008

Ho et al. (40) Dziembowski, A.; Seraphin, B. Recent developments in the analysis of protein complexes. FEBS Lett. 2004, 556 (1–3), 1–6. (41) Denison, C.; Rudner, A. D.; Gerber, S. A.; Bakalarski, C. E.; Moazed, D.; Gygi, S. P. A proteomic strategy for gaining insights into protein sumoylation in yeast. Mol. Cell. Proteomics 2005, 4 (3), 246–54.

PR070274M