A Draft of Protein Interactions in the Malaria Parasite - American

Recent advances have provided a working interactome map for the human malaria parasite Plasmodium falciparum. The aforementioned map, generated from ...
0 downloads 0 Views 656KB Size
A Draft of Protein Interactions in the Malaria Parasite P. falciparum Stefan Wuchty*,† and Jonathan J. Ipsaro‡ Northwestern Institute on Complexity, Northwestern University, Evanston, Illinois 60208, and Department of Biochemistry, Northwestern University, Evanston, Illinois 60208 Received October 31, 2006

Recent advances have provided a working interactome map for the human malaria parasite Plasmodium falciparum. The aforementioned map, generated from genome-scale analyses, has provided a basis for proteomic studies of the parasite; however, such large-scale approaches commonly suffer from undersampling and lack of coverage. The current map bears no exception, containing only one-quarter of the organism’s proteins. Inspired by the needs of the current map and the wealth of bioinformatics data, we assembled a map of 19 979 interactions among 2321 proteins in P. falciparum. The resultant map was generated by computationally inferring protein-protein interactions from evolutionarily conserved protein interactions, underlying domain interactions, and experimental observations. To compile this information into a repository of meaningful data, we assessed interaction quality by applying a logistic regression method, which correlated the presence of an interaction with relevant cellular parameters. Interestingly, it was found that sub-networks from different sources are quite dissimilar in their topologies and overlap to a very small extent. Applying Markov clustering, we observe a typical cluster composition, featuring common cellular functions that were previously reported absent, making this map a valuable resource for understanding the biology of this organism. Keywords: human malaria parasite • P. falciparum • interactome

Introduction Recent sequencing efforts have yielded extensive annotations of malaria parasites’ genomes including that of Plasmodium falciparum, the most virulent human malaria parasite.1-5 Despite this abundance of primary genomic and proteomic information, comparatively little is known regarding the network of protein interactions that governs the underlying molecular biology of malaria parasites. Recently, an experimentally based preliminary interactome map for P. falciparum has been released. Although clearly impressive from an experimental perspective, the presented data only covers roughly one-quarter of the P. falciparum proteome.6 In working toward a complete interaction map, the question arises if the wealth of information accruing from the study of various model organisms can be applied to this pervasive and highly intractable organism. Studies of protein network topologies have established that evolutionary interaction information is conserved at higher orders of genome organization.7,8 Further, comparison of interaction webs in various organisms’ networks suggests that a small number of organizing principles govern the emergence of complex protein network features.8 The most dramatic of these organizing features is the scale-free nature of these networks, a remarkable inhomogeneity that highlights a small * To whom correspondence should be addressed. Voice, +1 574 386 4456; Fax, +1 847 467 1280; E-Mail, [email protected]. † Northwestern Institute on Complexity. ‡ Department of Biochemistry. 10.1021/pr0605769 CCC: $37.00

 2007 American Chemical Society

number of highly interacting proteins that secure the network’s integrity. The special role such proteins play for the stability of interaction networks is further indicated by their significant propensity to be simultaneously essential as well as evolutionarily conserved.10 In light of these properties, a similar topology can be extended to the malaria parasite as well. Reflecting their inherent cohesive nature, complex networks are characterized by the accumulation of discernible modules. Such clusters of densely interconnected nodes combine in an overlapping manner, share well-defined functions and hubs as the modules connectors.8,11,12 Similarly, cohesively bound motifs of protein networks are frequently conserved as a whole, suggesting their role as evolutionarily relevant building blocks.7 This blueprint is then typically reinforced as genes in such modules tend to be coexpressed.9 Indeed, this comparative concept has already been successfully applied to the computational determination of the human interactome,13-17 suggesting that known protein interaction maps can be considered a rich resource to computationally infer potential protein interactions in other organisms, such as P. falciparum. In addition to known protein interaction networks and coexpression data, domain interactions can also be used to provide interaction information. Of particular interest are PFAM domain interactions, which have been assessed by an expectation score reflecting the confidence that the domain interaction in question gives rise to the observed protein interaction.18 Expectation scores of domain interaction are randomly distributed, indicating that each protein interaction is indeed governed by a single domain interaction.19 This observation Journal of Proteome Research 2007, 6, 1461-1470

1461

Published on Web 02/15/2007

research articles suggests that the expectation score can be used as a parameter to screen potential protein pairs as interaction candidates using high-scoring interacting domain pairs.20 Pooling experimental results and applying computational methods, we aim to combine the abundant genomic and proteomic information about protein interaction webs from well-studied model organisms to provide an improved draft of the P. falciparum interactome. The presented work combines interaction information from experimental observations6 and infers potential interactions from evolutionarily conserved protein links in organisms as disparate as Homo sapiens, Drosophila melanogaster, Caenorhabditis elegans, Saccharomyces cerevisiae, and Escherichia coli. Further, the abundance of domain information in the proteome of P. falciparum allows the addition of potential interactions.20 Using this composite strategy, we obtained a network of 19 979 interactions among 2321 proteins in P. falciparum. Considering webs of interactions from these sources separately reveals considerable topological differences, prompting questions about the quality of the underlying interactions. Although a network of physical pairwise protein interactions can certainly have fundamental implications for our understanding of the parasite’s biology, the severe error-proneness of methods for the determination of protein interactions casts doubt not only on the integrity of such data sets. For instance, an estimate of the accuracy of protein interactions in S. cerevisae uncovered startlingly high false negative and false positive rates of 90 and 50%, respectively.21 Despite this noise, information remains in the topology of a protein interaction network to assess an individual link’s quality. Using a link-based clustering coefficient that reflects the degree of clustering of an interactions immediate network neighborhood, Goldberg and Roth identified pronounced correlations between local clustering and the actual presence of a confirmed protein interaction.22 Because we previously found a strong evolutionary signal in cohesive areas (i.e., modules),23 it can be assumed that even in the presence of extreme noise there may exist a pronounced correlation between reliability of an interaction and a protein’s propensity to be conserved. Another parameter that provides an assessment of an interaction’s quality is the coexpression correlation because genes with similar expression patterns are likely to indicate potential protein interactions. The combination of these criteria provides a rational basis to use information from many sources for the inference and assessment of protein interactions in P. falciparum. Another special, yet general, feature of biological networks is their tendency to shape densely connected and wellpronounced modules that largely share similar functions. An evolutionary corollary to modularity comes from the observation that tightly connected modules not only show a high degree of functional homogeneity but also are largely evolutionary conserved as orthologs in other organisms.7,10,24 In particular, a network comparison of the currently available interactome of P. falciparum revealed that functional modules that are largely conserved in the comparative set of other eukaryotic organisms are present (to a rudimentary extent) in the parasite’s experimentally determined interactome.25 Identification of this cohesion will highlight fundamental, conserved, modular units identified by applying a Markov clustering procedure, allowing us to conclude that especially conserved interactions compensate for the lack of important cellular functions in experimental observations. 1462

Journal of Proteome Research • Vol. 6, No. 4, 2007

Wuchty et al.

Given a comprehensive set of protein interactions, a framework that incorporates network architecture represents a tool and repository supporting the elucidation of relevant protein interactions. It also helps to focus experimental studies on specific interactions unique to this pathogen for which, at present, limited or no protein interaction data exist.

Experimental Procedures Orthologous Protein Data. The InParanoid database26 provides putative orthologous sequence information for the complete proteomes of organism pairs E. coli, S. cerevisiae, C. elegans, D. melanogaster, H. sapiens, and P. falciparum. The algorithm for detecting orthologous relationships is based on pairwise similarity scores, which are by default calculated with BLASTP. InParanoid detects mutual best hits between sequences from two different species, being the two main orthologs that form an orthologous group. Other sequences are added to this group if they are closely related to one of the main orthologs and are known as “in-paralogs”. A confidence value for each in-paralog is provided by a standard bootstrap procedure and shows its relatedness to the main ortholog. In our study, we only selected the main sequence pairs of each orthologous group, allowing us to obtain 355 orthologous protein pairs in E. coli, 2319 in S. cerevisae, 1333 in C. elegans,1351 in D. melanogaster, and 1525 in H. sapiens with putative orthologs in P. falciparum. Protein Interactions. As a basis for the inference of potential protein interactions in P. falciparum, we utilized a large scale compilation of yeast protein interactions, combining 47 783 experimentally obtained protein interactions in S. cerevisiae,27 which have been obtained from sources such as mRNA expression studies and yeast two-hybrid screens. Each interaction was characterized by a confidence score by applying a logistic regression model. Analogously, the quality of experimentally determined protein interactions in D. melanogaster was assessed, providing 6222 proteins and 16 914 links.28 As a reliable source of interactions among proteins of C. elegans, we utilized data sets as of the DIP database,29 allowing us to obtain 3926 interactions among 2718 proteins. Protein interactions in E. coli from the DIP database29 were used as well, totaling 5911 interactions among 1526 proteins. Additionally, a large-scale compilation of human interactions provided 89 572 interactions among 9018 proteins.13,14 As for direct experimental observations of protein interactions in P. falciparum, a set of 2811 interactions among 1308 proteins that have been obtained by the application of the yeast two-hybrid method was used.6 A third source of protein interactions in P. falciparum are domain interactions. In particular, we utilized a set of protein interactions that annotates each protein with its PFAM domains.30 An interaction between a pair of proteins is found if corresponding domain interactions score above a threshold of a likelihood score reflecting the probability that a certain domain interaction indeed can explain the presence of a protein interaction.18 This approach allowed for 386 proteins embedded in 1428 interactions.20 Hypergeometric Clustering Coefficient. Recently, a network topology based approach uncovered a remarkable correlation between enhanced quality of protein interactions and the degree of clustering of their immediate network neighborhood.22 Considering a protein-protein interaction network with 19 979 nodes, the hypergeometric clustering coefficient is the negative logarithm of a hypergeometric distribution defined as

research articles

Protein Interactions in P. falciparum

min(|N(v)|,|N(w)|)

Cvw ) -log



( )( ) ( ) N - |N(v)| |N(w)| - i

|N(v)| i

(1)

N |N(w)|

i)|N(v)∩N(w)|

where N represents the neighborhood of a vertex x. Given fixed neighborhood sizes N(x) and x of proteins N(v) and N(w), the hypergeometric clustering coefficient increases with elevated overlap between the protein’s neighborhoods. As such, the summation can be interpreted as a v value, reflecting the probability that interacting proteins share a certain number of neighbors in the network at or above the observed number by chance. High values of this edge based clustering coefficient point to an elevated level of reliability, allowing a quality assessment of the underlying interaction.22 GO Annotation Data and Functional Homogeneity. Similarly to the hypergeometric clustering coefficient (see above), we define the functional homogeneity of a protein pair w as the negative logarithm of the hypergeometric distribution

min(|GO(v)|,|GO(w)|)

fhij ) -log



(

|GO(v)| i

(

)(

T - |GO(v)| |GO(w)| - i

T |GO(w)|

i)|GO(v)∩GO(w)|

)

)

(2)

where ij is the set of GO Terms of protein and GO(i) is the total number of different GO terms.31 In analogy to the hypergeometric clustering coefficient, the summation can be interpreted as a i value, reflecting the probability that a protein pair shares a certain number of GO terms at or above the observed number by chance. As such, a high value corresponds to an elevated level of functional similarity, a characteristic that correlates well with an interactions reliability. In order to ensure highest specificity, we only utilized GO terms of the deepest level. Topological Measures. The mean connectivity of a node is an indication of the networks density (or sparsity), reflecting the mean number of interaction partners, N

∑k 〈k〉 )

i

i)1

(3)

N

where ki is the number of neighbors of node i and N is the total number of nodes in the network. Investigating the local cohesiveness of network areas, the unweighted representation of the clustering coefficient Ci measures the degree of cohesiveness around a particular protein i.32 The clustering coefficient is defined as Ci )

2E Ni(Ni - 1)

(4)

where E is the number of actual links between the Ni neighbors of protein i. Another measure that provides insight into the correlation of the nodes’ degrees is the average-weighted nearest-neighbors degree:33 knn,i )

1 ki

∑k

j

(5)

j∈Γi

where ki refers to the number of interaction partners of protein i and Γi is the set of interaction partners of protein i. This term

effectively assigns a quantity to the immediate neighborhood of a particular node that measures the level of degree-degree correlations of nodes around a central node. A network is said to show assortative mixing if nodes tend to be connected to other nodes of similar degree. We used Newman’s assortativity measure34 defined as M-1

∑j k i i

i

r) M

-1

∑ i

[

M-1

[

1

]

∑ 2(j + k ) i

i

1 2 (ji + k2i ) - M-1 2

∑ i

i

2

]

1 (ji + ki) 2

2

(6)

where ji, ki refer to the degree of the nodes at both ends of the ith link with M being the total number of links. Thus, the assortativity coefficient r ranges between -1 e r e 1 and can be interpreted as a Pearson correlation coefficient for the degrees of either node of a given interaction. Coexpression Data. Clearly, in order to interact, two proteins must both be present spatiotemporally. Accordingly, expression profiles of interacting proteins are likely to be similar.9 For P. falciparum, we utilized gene expression data over all different cell stages of P. falciparum, compiling 5156 genes from Winzeler et al.5,35,41 As a gene similarity metric, we calculated Pearson’s correlation coefficient for every protein interaction over time points defined as 1 rP )

m

m

∑x y - 〈x〉〈y〉 i i

i)1

σiσj

(7)

where 〈x〉 and 〈y〉 are the sample means of expression values xi and xj, and σi and σj are their standard deviations. Logistic Regression. In order to provide an estimate of an interaction’s reliability, we applied a logistic regression model where the probability of a true interaction Tvw is governed by two input variables. Utilizing hypergeometric clustering coefficient x1 ) Cvw and coexpression correlation coefficient x2 ) rP, X ) (x1, x2) we obtain Pr(Tvw|X) )

exp(β0 + β1x1 + β2x2) 1 + exp(β0 + β1x1 + β2x2)

(8)

where βn are the parameters of the distribution. Given training data, we optimized the distribution parameters by maximizing the likelihood of the data. Here, we applied the corresponding routines as of the Biopython package.36 As a training set for true positives, we choose 213 high scoring protein-interactions in yeast27 that are fully conserved in Plasmodium. In the same way, we selected 173 low-scoring interactions as true negative training set. For all interactions in the training sets, we calculated hypergeometric clustering and coexpression coefficients, therefore allowing an unbiased training of the regression model. Applying a leave-one-out analysis to determine the prediction accuracy, our model is recalculated from the training data after removing the interaction to be predicted, determining the correct result in more than 95% of cases. Markov-Cluster-Algorithm (MCL). In order to uncover the community structure of the inferred core protein interaction network of Plasmodium, we utilized a Markov-Cluster-Algorithm (MCL),37 which has been designed specifically for computational graph clustering. Topologically, nodes that are embedded in well-interconnected parts of a network share Journal of Proteome Research • Vol. 6, No. 4, 2007 1463

research articles

Wuchty et al.

most of their links with nodes of the same cluster, whereas a small fraction of links connect remote clusters. Hence, it is expected that a random walk will predominantly travel within cluster and jump to other ones sporadically. Mathematically, an undirected network consisting of k nodes can be represented as a k × k dimensional matrix M, where Mij ) wij, and wij is the weight of the interaction between i and j. In our case, we applied the function wij ) cvij, where cvij is the confidence value of interaction ij. Due to convergence reasons for this algorithm, we introduce self-loops on each node, i.e., Mii ) 1. M turns into a column stochastic matrix T by normalizing each column sum to unity through the diagonal matrix d, whose entries are dkk ) ∑kMik, giving T ) Md-1. Thus, the entry Tij represents the probability for a random walker to directly jump from node i to j. The stochastic matrix T is alternately (i) expanded by matrix multiplication (i.e., matrix squaring) and (ii) renormalized by an inflation procedure resulting again in a stochastic matrix. Formally, the inflation operator Γr is defined as (Tpq)r

(ΓrT)pq )

(9)

k

∑(T

r iq)

i)1

This process of alternating inflation and expansion is repeated until the resulting stochastic matrix T takes the form of a doubly idempotent matrix, i.e., it does not change anymore with further inflation/expansion cycles. The final matrix is composed of several connected components which take the form of starlike shapes, which are interpreted as the sought after clusters. Modularity of the Network. In order to elucidate meaningful partitions of the network, we applied the MCL algorithm, choosing the inflation parameter r ) 1, . . . , 5 in steps of 0.25. Evaluating the partitions thus obtained, we defined protein modules by optimizing the functional coherence and size of the clusters.38 In particular, the functional coherence of cluster i, fci is calculated as the fraction of annotated gene pairs that share at least one functional annotation fpi, fci )

fpi pi

(10)

given the ith cluster with a total of pi, annotated protein pairs. This measure tends to be high for small clusters but diminishes if more proteins are included. As a source of reliable annotation information and to maximize functional coherence, we utilized Gene Ontology terms of the most specific, deepest level.39 In turn, we balance that trend by maximizing the size of the given clusters, defining the modulation efficiency EM as n

EM ) N-1

∑fc × N i

i

(11)

i)1

where n is the number of clusters, N is the total number of proteins whereas Ni is the number of proteins in the ith cluster. Thus, the partition with the highest modulation efficiency reflects the best compromise between efficiency of clustering and degree of functional association between proteins in a cluster.

Results and Discussion The first set of experimentally derived protein interactions available for P. falciparum6 was relatively small and has 1464

Journal of Proteome Research • Vol. 6, No. 4, 2007

Table 1. Summary of Data Sources for the Inference of Evolutionary Conserved Protein Interactions in P. falciparuma original source

H. sapiens D. melanogaster C. elegans S. cerevisiae E. coli

conserved

proteins interactions proteins interactions

8968 6222 2718 4175 1526

89 286 16 914 3926 42 231 5911

944 397 238 736 217

11 193 365 327 7900 648

ref

13,14 28 2,51 27 29

a For each organism we show the number of proteins and interactions as well as the respective numbers of evolutionarily conserved entities.

minimal overlap with yeast, necessitating an alternative approach to fully elucidate the protein interaction network of the malaria parasite.25 As sources of potential protein interaction data in P. falciparum, we chose to utilize so-called interologs, protein interactions that are considered evolutionarily conserved if the involved proteins have orthologs in a different organism.9 Despite phylogenetic differences between P. falciparum and organisms for which interaction data is available, there exists a considerable number of identifiable evolutionarily conserved protein interactions that can be used to derive potential protein interactions in P. falciparum. Utilizing orthologous protein information of the InParanoid database26 and sets of protein interactions in S. cerevisiae,27 we found 7900 interactions between 736 proteins in yeast that have orthologs in Plasmodium. Similarly, we utilized protein interaction sets of H. sapiens,04,14 D. melanogaster,28 C. elegans, and E. coli,29 which establish 11 193 interactions among 944 orthologous human proteins, 365 interactions among 397 orthologous fly proteins, 327 interactions among 238 orthologous worm proteins, and 648 interactions among 217 orthologous E. coli proteins. These data are summarized in Table 1. Combining the organism-specific data sets, 16 026 unique interactions among 1304 proteins of P. falciparum were acquired. A second source of potential interactions between proteins in P. falciparum accounts for interactions among PFAM domains.20,30 In particular, this approach18 considers protein interactions as governed by a single domain interaction19,20 with the highest expectation value, E. Annotating each protein of P. falciparum with its corresponding PFAM domains, an interaction between proteins was assumed if (i) a high scoring interacting domain pair was found and (ii) both proteins appear in the same cellular compartment as annotated in the GO Slim database.31 These approach allowed for 1428 interactions between 386 proteins.20 As a third source, the experimentally determined set of protein interactions in the malaria parasite P. falciparum, was derived by compiling 2811 interactions among 1308 proteins.6 Pooling all sources of protein interactions, we obtain 19 979 interactions among 2321 proteins (Interaction data are available in the Supporting Information). The Venn diagram in Figure 1a shows that the overlaps between the different data sets is minimal. In particular, we only find a considerable intersection between the evolutionary derived interactions and predictions obtained from domain interactions, whereas there exists almost no overlap with experimental observations. Each of the separate data sets are characterized according to topological measures, summarized in the Table of Figure 1. Significantly, we observe that the interologs and domain interaction data set show an elevated level of clustering, whereas the experimentally obtained interaction set is minimally clustered. In comparison,

Protein Interactions in P. falciparum

Figure 1. (a) Candidates of potential protein interactions in P. falciparum were obtained from interactions of orthologous proteins (interologs) in H. sapiens, S. cervisiae, D. melanogaster, C. elegans, E. coli, as well as experimental observations and domain interactions. The Venn diagram shows the sizes of the different data sets and their overlaps, suggesting that data from different sources only overlap to a small but significant extent (P < 10-4, as of a hypergeometric distribution). Out of the 19 interactions that overlap in the experimental and evolutionary conserved data sets, we find 9 interactions of yeast, whereas there are only 1 of worm and fly, 4 of E. coli, and 14 human interactions that are experimentally confirmed. The table provides a topological analysis of the different sources of protein interactions. In particular, we observe that the web of experimentally determined interactions is not only sparse with a low mean number of edges per node, 〈k〉, but also weakly clustered indicated by a small mean clustering coefficient 〈C〉. In contrast, the web of interologs and protein interactions derived from domain interactions are strongly clustered, whereas the assortativity r of interologs and domains indicates a stronger trend to be connected to proteins of similar connectivity. In addition, we observe elevated levels of the mean degree of a proteins interaction partners (〈knn〉) in the web of interologs, a result that is widely absent in the other webs of interactions. (b) Compiling 19 979 interactions among 2321 proteins in Plasmodium, we color each interaction according to its origin (interologs, green; domain interactions, red; experimental observations, orange).

the interologs set has a high mean number of interaction partners, whereas the other sets are roughly of the same lower mean degree. The differences between interologs and the other sets is even more striking when comparing the values of their average mean neighbors degree, a topological measure that

research articles reflects the mean degree of neighbors of a given protein. Determining the assortativity of proteins in the different sets, we observe that interactions we obtained from domain interactions are especially strongly correlated, indicating that proteins are predominately linked to neighbors that have a similar degree. The topological differences do not come as a surprise given the fact that the underlying data sets only overlap to a small extent. Strikingly, the differences appear in the graphical depiction of the total network in Figure 1b. Coloring each interaction edge by its origin, we observe that the different data sets are almost spatially separated from each other when applying a standard graph layout algorithm provided by the program Cytoscape.40 As indicated in Figure 1b and the table therein, this analysis supports the conclusion that the topological characteristics of the combined network are strongly dominated by the interologs and domain interactions set, whereas experimental observations do not significantly influence the topological characteristics. To evaluate each of these potential protein interactions, we characterize each interaction by interaction-independent measures that allow a reliable classification. As mentioned previously, genes with similar expression profiles have an increased likelihood to interact. For P. falciparum, we utilized gene expression data over 48 time points from micro-array analysis5,35,41 in each stage of the parasite’s development. Searching for preferential coexpression of these protein pairs, we determined the Pearson correlation coefficient, rP, for each interaction from this comprehensive set of Plasmodium specific coexpression data. In addition, for each interaction, the hypergeometric clustering coefficient Cvw measures the local connectivity of the neighborhood around an interaction, strongly correlating with the quality of the interaction.22 In combination, a logistic regression method trained by corresponding measures of carefully selected sets of true positive and negative interactions is used, allowing us to assess the quality of each interaction. Confidence values obtained for the predicted protein interactions in P. falciparum show that about half of the interactions score in an upper confidence range (Figure 2a). Comparing interactions according to their source we observe that interactions derived from interologs and domain interactions largely show a high confidence whereas experimentally determined links score in the lower range of confidence values (Figure 2b). The reason for this differing behavior is corroborated in the fact that the sub-network of experimentally determined interactions show a low degree of clustering. As such, we only find a comparatively small number of interactions that are placed in a highly clustered environment, limiting the number of links with a high hypergeometric clustering coefficient. Focusing on interologs, we find that the majority of interactions occur in only one organism (Figure 2c), whereas only a small minority of interactions occur in more than two organisms. Characterizing these sets of links, we also find that interactions that occur in many organisms as interologs show increased levels of confidence (Figure 2d). To test if the confidence values of predicted and observed interactions carry a biological signal, we grouped (binned) each potential protein interaction in P. falciparum by confidence score. In each bin, we calculated the hypergeometric distribution between most specific GO terms of interacting proteins therein, a measure that reflects the probability that the GO terms of interacting proteins were placed randomly. As such, a high value of the negative logarithm of this measure show Journal of Proteome Research • Vol. 6, No. 4, 2007 1465

research articles

Wuchty et al.

Figure 2. In order to evaluate predictions, we utilized a logistic regression method that we trained with carefully selected sets of true positive and negative interactions. In (a), we show confidence scores of predicted protein interactions in P. falciparum and the positive (good) and negative (bad) training sets, where interactions have been binned according to their confidence values. (b) Distinguishing the different origin of the derived protein interactions in Plasmodium, we observe that interactions derived from interologs and domain interactions largely show a high confidence whereas experimentally determined links score in the lower range of confidence values. (c) Determining the number of occurrences of interologs we observe that the majority of interactions only appears in one or two organisms. In particular, we find 12 interactions that only occur in one organism to be experimentally confirmed whereas there are 5 and 2 experimentally determined links that appear in two and more organisms, respectively. (d) In analogy to (a,b), we evaluate these sets, allowing us to find that interactions that appear in many organisms are comparatively more reliable.

Figure 3. (a) As a measure of quality, we determined the functional homogeneity of interacting proteins. In particular, we calculated the negative logarithm of the hypergeometric distribution of GO terms, a P-value that reflects the likelihood if GO terms of proteins were distributed randomly. In each bin, we observe that high confidence accompanies increasing levels of functional homogeneity. (b) Similarly, we choose likelihood scores for the presence of functional links between protein pairs. In particular, we find correlations toward higher likelihood with increasing confidence values in the web of interologs and domains, whereas we do not observe such an ascending trend for experimentally obtained interaction data. In both plots, error bars indicate the 95% confidence intervals of the arithmetic mean in each bin. (c) As a third indication of the quality of our interactions, we expect that the resulting network to display scale-free characteristics. Pooling all interactions from the three sources, we observe that the distribution of the protein’s number of interactions follows a truncated power-law (P(k) ) e-1.54-0.01xx-1.03), an observation that still prevails if we focus on interactions with elevated confidence.

high functional homogeneity, another indicator that proteins interact, because it is well-known that interactions occur between proteins of similar function.42,13 Indeed, we find significant correlations of increasing functional homogeneity with elevated levels of confidence (Figure 3a). In comparison, we observe that all three separate data sets differ considerably 1466

Journal of Proteome Research • Vol. 6, No. 4, 2007

whereas interactions obtained from domain interactions appear functionally most homogeneous. Although interologs share a high degree of functional homogeneity as well, experimentally determined interactions tend to appear between proteins that have the lowest functional similarity. Although quantitatively different, we observe in all sets a considerable upturn toward

research articles

Protein Interactions in P. falciparum

similar GO profiles of proteins if they are placed in highly reliable interactions. As a different measure of the quality, we utilize log likelihood scores of functional links between interacting proteins. In particular, we utilized results from a recent study where potential functional links between proteins in P. falciparum have been determined by purely genomic means, allowing to find 270 457 functional links between 3373 proteins in P. falciparum.43 As a hypothesis, we assume that high-scoring functional links have an elevated probability to appear as highly reliable interactions. Determining the percentage of interactions that actually have functional links, we obtain relatively low values: 12.1 and 8.1% of interologs and interactions obtained from domains indeed have a functional link, whereas only 2.9% of the experimental interactions are supported. In Figure 3b, we observe that especially interologs and domain derived interactions show a reasonable correlation between their confidence and likelihood of a functional link, whereas there is no indication that such a correlation exists for experimental observations. As a third indication of the quality of our interactions, we expect that the resulting network displays scale-free characteristics.44 Considering all interactions, we observe a truncated power-law dependence in the distribution of the protein’s number of interactions (Figure 3c). This behavior is still recovered, if we focus on interactions with elevated confidence. Initial analysis of the experimentally obtained web of interactions in P. falciparum revealed a considerable lack of local network clustering. Indeed, the degree of clustering is comparatively low in the experimental interactions, yet we already found that especially interologs tend to be placed in a highly clustered way (Table in Figure 1). In evaluating the modular composition, a topological alignment of the parasites interaction’s with other eukaryotes uncovered a considerable lack of vital cellular functions in the interactome of P. falciparum. In particular, vital cellular machinery such as the proteasome and ribosome were conserved to a very rudimentary extent.25 To establish these functional associations further, we focused on identifying densely connected sub-networks in the combined set of predicted interactions. For this purpose, we utilized the Markov-Clustering-Algorithm (MCL),37 which makes use of the topological fact that nodesswhich are embedded in well inter-connected parts of a networksshare most of their links with other nodes of the same cluster, whereas a small fraction of links connect remote clusters. In such a system, a random walk predominantly traces a path within a cluster and jump to other ones only sporadically. Mathematically, we calculated the probability that a random walk will travel to a neighboring node, resulting in a stochastic matrix which takes the form of a doubly idempotent matrix composed of several connected components. To obtain an assessment of the clustering quality we applied a recently introduced functional modulation measure.38 Essentially, each cluster we obtain from the Markovclustering procedure provides a count of the occurrences of protein pairs that share the same GO annotation,39 a fraction that is balanced by the size of each cluster. We find 705 clusters (Supporting Information), many of them as disconnected or loosely associated components of the underlying network. As expected, most clusters are rather small whereas a minority of clusters are larger in size. Although our functional clustering routine aims for the elucidation of functionally homogeneous parts we do not expect the determination of actual protein complexes. Further, our clustering routine does not account for overlapping clusters, a feature of real protein complexes

that is supported by the well-established fact that proteins can have multiple affiliations to different protein complexes. Despite this caveat, we do find clusters that are composed of functionally similar proteins, allowing us to observe a predominance of ubiquitination functions, components of the proteasome, ribosome, ABC transporters, Serine/Threonine kinases, and parts of the U6 spliceosome and RNA polymerases as coherent clusters. Figure 4 illustrates some representative examples. Although the original assessment of the experimental interactome’s modular composition did not indicate that important cellular functions have been largely conserved, the integrated interactome provides strong evidence to the contrary. In particular, we show large clusters that predominantly harbors functions of the 26S proteasome and 40S and 60S part of the ribosome embedded among a vast majority of proteins with orthologs in H. sapiens. Both functions are responsible for the production and degradation of proteins and therefore indispensable for a eukaryotic cell. Both represent large multimolecular assemblies of proteins that are tightly bound to each other, a result that is well reflected by the high clustering of interactions among the corresponding proteins. Note, however, that the vast majority of interactions have been contributed by interologs, whereas we only sporadically find contributions of experimentally obtained interactions. Similarly, we find other vital cellular functions, such as small nuclear ribonucleotides (snRNPs) as part of the U6 spliceosome apparatus and SMlike proteins, interactions necessary for the tailoring of proteins. Supported by interologs and interactions of domains carrying a LSM domain, the latter observation points to the presence of protein interactions that largely are governed by selfinteractions of LSM, allowing for the constructions of large molecular assemblies as observed in the proteasome, ribosome and spliceosome.45-47 As a final example, we present a cluster that predominately consists of RNA polymerase protein subunits. The cluster is partly populated by experimentally obtained interactions, whereas the majority of well clustered interactions are interologs. Although experimental observations do not reflect this function properly, the gaps are filled by interologs, once again indicating the impact of evolutionary conserved interactions in our network of interactions in P. falciparum. All remaining clusters can be found in the Supporting Information. Despite the approximation we introduced with the clustering procedure, the composition of clusters allows a first glimpse of the putative function of hypothetical proteins and unknown proteins as indicated by the their proximity to functionally annotated proteins. In many cases, we find similar and homogeneous functions throughout the clusters, although in a considerable amount of clusters, functional annotations are lacking. This observation is largely a consequence of the incompleteness of functional annotations because many functions of proteins are simply not characterized. The placement of such uncharacterized proteins in different clusters provides suggestions regarding their putative role(s), allowing for directed experimental determination of protein function.

Conclusions Here we have presented a combined draft network of protein interactions in the malaria parasite P. falciparum that were generated from three independent data sources. In particular, we utilized experimentally determined protein interactions, interologs from an array of interactomes of well-characterized organisms, and domain interactions. Interactions derived from Journal of Proteome Research • Vol. 6, No. 4, 2007 1467

research articles

Wuchty et al.

Figure 4. Here, we present selected sub-networks as suggested by the Markov clustering procedure. In particular, we find large and well clustered sub-networks, featuring proteasomal, ribosomal, RNA polymerase functions as well as small ribonucleoproteins (snRNPs). In the network representations, each edge is colored according to its origin, whereas squares are proteins of P. falciparum that have a human ortholog. We observe large amounts of interologs that organize proteins in the proteasome and ribosome. Although we also find experimentally obtained interactions, predominant presence of interologs as well as interactions derived from domain interactions suggest that contemporary experimental procedure miss many potential protein interactions especially between evolutionary conserved proteins of the parasite.

evolutionarily conserved interactions in other organisms provided the largest amount of information. Notably, the interologspecific data only overlap to a relatively small extent with interactions we inferred from domain interaction data and experimentally determined protein interactions. In order to obtain an assessment of individual interactions, we applied a 1468

Journal of Proteome Research • Vol. 6, No. 4, 2007

logistic regression method, which we trained by independent genomic and topological parameters. As a proof of concept, high confidence values of individual interactions are accompanied by functional similarity, increasing likelihood of being supported by a functional link as well as preserving scalefree topology of the underlying networks. In comparison,

research articles

Protein Interactions in P. falciparum

however, experimentally determined interactions fall short in confidence scores. Comparing the three different data sets, we observe that computationally obtained interaction are largely similar in their topological features, whereas experimentally obtained interactions show low clustering and assortativity values. Especially the considerable low degree of local clustering in the experimental data set seems to predominately influence the low confidence values of experimentally determined interactions because we used hypergeometric clustering as a parameter for evaluating the quality of an individual interaction. Clustering the combined network of protein interactions in P. falciparum allows for a typical composition of clusters. As for functional annotations, a considerable amount of proteins in Plasmodium still are hypothetical or unknown. Despite this lack of functional annotation, the placement of such uncharacterized proteins in clusters of largely annotated proteins can potentially give a hint of their putative function. In this functional respect, we find clusters that predominantly recover proteosomal, ribosomal and spliceosomal activities. This observation is particularly remarkable, because it has been reported that the experimental interactome of the parasite lacks a pronounced degree of modular composition in comparison to other eukaryotes such as S. cerevisae, D. melanogaster and C. elegans, only recovering fragments of the previously mentioned functions.25 This remarkable absence of clustering in the experimentally determined protein interactions network of P. falciparum and lack of shared functions with other eukaryotes despite the abundance of putatively conserved interactions raises interesting questions. In particular, we wonder if this absence of clustered areas in the experimental data set is rooted in the limitations of the applied two-hybrid methods, that are currently not yet able to capture protein interactions in Plasmodium on a larger scale. Such a conclusion appears to be plausible, because comparisons of protein interaction sets in Yeast which largely have been obtained by yeast two-hybrid methods suggested that 90% of interactions are indeed false-negative.21 Although such an assessment is not available for Plasmodium, we assume that this data set is error prone as well, especially because interactions have been determined by an adapted Y2H approach. The observation that many interactions have been missed although we find evolutionary conserved interactions might be the consequence of technical difficulties to express Plasmodium proteins in yeast and sampling issues that arise from choosing random fragments to screen. This leads to a random sample of interactions might be a reason for obtaining such a remarkable topology of experimental interactions because random sampling methods tend to inaccurately reflect the underlying topology of an interaction network.48,49 An argument in favor of interologs emphasizes the massive presence of orthologs. However, as the interactions are inferred from interologs in other organisms as long as there are orthologs, we inevitably will find evolutionary conserved clusters, not necessarily meaning that the interactions and corresponding clusters in question indeed exist in Plamodium. However, in the light of the massive presence of wellestablished orthologs, we do not have any reason to assume that interactions do not occur in P. falciparum between present orthologous proteins that interact in other organisms, allowing for relatively dense and clustered network. Given the fact that there is almost no overlap between these data sets, the experimentally obtained set of interactions may indeed suffer

from a high rate of false negatives, because there currently is no reason that explains the considerable topological differences in the web of experimental and evolutionary conserved interactions. On the other hand, the apparent separation between experiments and interologs might indeed constitute a line between functions that have been evolutionary conserved and are parasite specific, because a considerable amount of proteins that appear in experimentally determined interactions mediate functions which are necessary for the maintenance of a parasitic lifestyle.6 Nevertheless, methods to capture evolutionarily conserved proteins also have their limits. In particular, protein sequences of Plasmodium pose challenges for contemporary sequence analysis algorithms, because taxa of Plasmodium are remotely related to the eukaryotic organisms considered.1-3 As such, the divergence in sequence composition can pose difficulties for contemporary sequence alignment algorithms in properly finding homology. Despite the pronounced evolutionary distance, proteins share significant similarities; however, sequence disruptions caused by repeats and other inserts can aggravate the proper detection of evolutionary relationships of Plasmodium genes and proteins with other organisms.50 Yet, the pronounced placement of orthologous proteins in protein interaction networks mitigates the ramifications of such noise, allowing to reliably infer protein interactions on evolutionary grounds.23 The abundance of evolutionary interaction information as well as the variety of organisms that provide this information suggests that the sparsity of experimental interactions may simply be a consequence of current technological limitations rather than constrains in the computational detection of conserved proteins from sequence data. The massive generation of protein interactions from different sources provides a framework that not only represents a tool and repository for information about the interactome of the parasite but also will help to focus experimental studies on specific interactions unique to this pathogen for which currently limited or no experimental evidence yet exists.

Acknowledgment. We thank L. Hiller and K. Haldar for fruitful discussions. S.W. gratefully acknowledges L.A.N. Amaral for providing computer equipment. The Northwestern Institute on Complexity (NICO) supported this study. Supporting Information Available: Interaction data. This material is available free of charge via the Internet at http://pubs.acs.org. References (1) Gardner, M. J.; Shallom, S. J.; Carlton, J. M.; Salzberg, S. L.; Nene, V.; Shoaibi, A.; Ciecko, A.; Lynn, J.; Rizzo, M.; Weaver, B.; Jarrahi, B.; Brenner, M.; Parvizi, B.; Tallon, L.; Moazzez, A.; Granger, D.; Fujii, C.; Hansen, C.; Pederson, J.; Feldblyum, T.; Peterson, J.; Suh, B.; Angiuoli, S.; Pertea, M.; Allen, J.; Selengut, J.; White, O.; Cummings, L. M.; Smith, H. O.; Adams, M. D.; Venter, J. C.; Carucci, D. J.; Hoffman, S. L.; Fraser, C. M. Nature 2002, 419, 531-534. (2) Hall, N.; Pain, A.; Berriman, M.; Churcher, C.; Harris, B.; Harris ad, D.; Mungall, K.; Bowman, S.; Atkin, R.; Baker, S.; Barron, A.; Brooks, K.; Buckee, C. O.; Burrows, C.; Cherevach, I.; Chillingworth, C.; Chillingworth, T.; Christodoulou, Z.; Clark, L.; Clark, R.; Corton, C.; Cronin, A.; Davies, R.; Davis, P.; Dear, P.; Dearden, F.; Doggett, J.; Feltwell, T.; Goble, A.; Goodhead, I.; Gwilliam, R.; Hamlin, N.; Hance, Z.; Harper, D.; Hauser, H.; Hornsby, T.; Holroyd, S.; Horrocks, P.; Humphray, S.; Jagels, K.; James, K.; Johnson, D. D.; Kerhornou, A.; Knights, A.; Konfortov, B.; Kyes,

Journal of Proteome Research • Vol. 6, No. 4, 2007 1469

research articles

(3) (4) (5) (6)

(7) (8) (9) (10) (11) (12) (13) (14) (15)

(16) (17)

(18) (19) (20) (21) (22) (23) (24) (25) (26) (27)

1470

S.; Larke, N.; Lawson, D.; Lennard, N.; Line, A.; Maddison, M.; McLean, J.; Mooney, P.; Moule, S.; Murphy, L.; Oliver, K.; Ormond, D.; Price, C.; Quail, M. A.; Rabbinowitsch, E.; Rajandream, M.A.; Rutter, S.; Rutherford, K. M.; Sanders, M.; Simmonds, M.; Seeger, K.; Sharp, S.; Smith, R.; Squares, R.; Squares, S.; Stevens, K.; Taylor, K.; Tivey, A.; Unwin, L.; Whitehead, S.; Woodward, J.; Sulston, J. E.; Craig, A.; Newbold, C.; Barrell, B. G. Nature 2002, 419, 527-531. Hyman, R. W.; Fung, E.; Conway, A.; Kurdi, O.; Mao, J.; Miranda, M.; Nakao, B.; Rowley, D.; Tamaki, T.; Wang, F.; Davis, R. W. Nature 2002, 419, 534-537. Bozdech, Z.; Zhu, J.; Joachimiak, M. P.; Cohen, F. E.; Pulliam, B.; DeRisi, J. L. Genome Biol. 2003, 4, R9. Le Roch, K. G.; Zhou, Y.; Blair, P. L.; Grainger, M.; Moch, J. K.; Haynes, J. D.; De la Vega, P.; Holder, A. A.; Batalov, S.; Carucci, D. J.; Winzeler, E. A. Science 2003, 301, 1503-1508. LaCount, D. J.; Vignali, M.; Chettier, R.; Phansalkar, A.; Bell, R.; Hesselberth, J. R.; Schoenfeld, L. W.; Sahasrabudhe, S.; Ota, I.; Kurschner, C.; Fields, S.; Hughes, R. E. Nature 2005, 438, 103107. Wuchty, S.; Barabas´i, A. L.; Oltvai, Z. N. Nat. Genet. 2003, 35, 176179. Barabas´i, A.-L.; Oltvaim, Z. N. Nat. Rev. Gen. 2004, 5, 101-113. Ge, H.; Liu, Z.; Church, G. M.; Vidal, M. Nat. Genetics 2001, 29, 482-486. Wuchty, S. Genome Res. 2004, 14, 1310-1314. Han, J. J.; Bertin, N.; Hao, T.; Goldberg, D. S.; Berriz, G. F.; Zhang, L. V.; Dupuy, D.; Walhout, A. J. M.; Cusick, M. E.; Roth, F. P.; Vidal, M. Nature 2004, 430, 88-93. Guimera, R.; Amaral, L. A. N. Nature 2005, 433, 895-900. Lehner, B.; Fraser, A. G. Genome Biol. 2004, 5 (9), R63. Ramani, A. K.; Bunescu, R. C.; Mooney, R. J.; Marcotte, E. M. Genome Biol. 2005, 6 (5), R40. Gandhi, T. K. B.; Zhong, J.; Mathivanan, S.; Karthick, L.; Chandrika, K. N.; Mohan, S. S.; Sharma, S.; Pinkert, S.; Nagaraju, S.; Periaswamy, B.; Mishra, G.; Nandakumar, K.; Shen, B.; Deshpande, N.; Nayak, R.; Sarker, M.; Boeke, J. D.; Parmigiani, G.; Schultz, J.; Bader, J. S.; Pandey, A. Nat. Genetics 2006, 38 (3), 285293. Rhodes, D. R.; Tomlins, S. A.; Varambally, S.; Mahavisno, V.; Barrette, T.; Kalyana-Sundaram, S.; Ghosh, D.; Pandey, A.; Chinnaiyan, A. M. Nat. Biotechn. 2005, 23, 951-959. Stelzl, U.; Worm, U.; Lalowski, M.; Haenig, C.; Brembeck, F. H.; Goehler, H.; Stroedicke, M.; Zenkner, M.; Schoenherr, A.; Koeppen, S.; Timm, J.; Mintzlaff, S.; Abraham, C.; Bock, N.; Kietzmann, S.; Goedde, A.; Tokso¨z, E.; Droege, A.; Krobitsch, S.; Korn, B.; Birchmeier, W.; Lehrach, H.; Wanker, E. E. Cell 2005, 122, 957968. Riley, R.; Sabatti, C.; Lee, C.; Eisenberg, D. Genome Biol. 2005, 6 (10), R89. Aloy, P.; Bo¨ttcher, B.; Ceulemans, H.; Leutwein, C.; Mellwig, C.; Fischer, S.; Gavin, A.-C.; Bork, P.; Superti-Furga, G.; Serrano, L.; Russell, R. B. Science 2004, 303, 2026-2029. Wuchty, S. BMC Genomics 2006, 7, 122. von Mering, C.; Krause, R.; Snel, B.; Cornell, M.; Oliver, S. G.; Fields, S.; Bork, P. Nature 2002, 31, 399-403. Goldberg, D.; Roth, F. P. Proc. Natl. Acad. Sci. U.S.A. 2003, 100, 4372-4376. Wuchty, S.; Barabasi, A. L.; Ferdig, M. T. BMC Evol. Biol. 2006, 6, 8. Fraser, H. B.; Hirsh, A. E.; Steinmetz, L. M.; Scharfe, C.; Feldman, M. W. Science 2002, 296, 750-752. Suthram, S.; Sittler, T.; Ideker, T. Nature 2006, 438, 108-112. Remm, M.; Storm, C. E. V.; Sonnhammer, E. L. J. Mol. Biol. 2001, 314, 1041-1052. Rothberg, J. M.; Bader, J. S.; Chaudhuri, D.; Chant, J. Nat. Biotech. 2004, 22, 78-85.

Journal of Proteome Research • Vol. 6, No. 4, 2007

Wuchty et al. (28) Giot, L.; Bader, J. S.; Brouwer, C.; Chaudhuri, A.; Kuang, B.; Li, Y.; Hao, Y. L.; Ooi, C. E.; Godwin, B.; Vitols, E.; Vijayadamodar, G.; Pochart, P.; Machineni, H.; Welsh, M.; Kong, Y.; Zerhusen, B.; Malcolm, R.; Varrone, Z.; Collis, A.; Minto, M.; Burgess, S.; McDaniel, L.; Stimpson, E.; Spriggs, F.; Williams, J.; Neurath, K.; Ioime, N.; Agee, M.; Voss, E.; Furtak, K.; Renzulli, R.; Aanensen, N.; Carrolla, S.; Bickelhaupt, E.; Lazovatsky, Y.; DaSilva, A.; Zhong, J.; Stanyon, C. A.; Finley, R. L., Jr.; White, K.; Braverman, P. M.; Jarvie, T.; Gold, S.; Leach, M.; Knight, J.; Shimkets, R. A.; McKenna, M. P.; Chant, J.; Rothberg, J. M. Science 2004, 302, 1727-1736. (29) Xenarios, I.; Salwinski, L.; Duan, X. J.; Higney, P.; Kim, S.-M.; Eisenberg, D. Nucleic Acids Res. 2002, 30, 303-305. (30) Bateman, A.; Coin, L.; Durbin, R.; Finn, R. D.; Hollich, V.; GriffithsJones, S.; Khanna, A.; Marshall, M.; Moxon, S.; Sonnhammer, E. L. L.; Studholme, D. J.; Yeats, C.; Eddy, S. R. Nucleic Acids Res. 2004, 32, D138-D141. (31) GO Consortium Nucleic Acids Res. 2004, 32, D258-D261. (32) Watts, D. J.; Strogatz, S. H. Nature 1998, 393, 440-442. (33) Barrat, A.; Barthe´lemy, M.; Pastor-Satorras, R.; Vespignani, A. Proc. Natl. Acad. Sci. U.S.A. 2004, 101 (11), 3747-3752. (34) Newman, M. E. J. Phys. Rev. Lett. 2002, 89, 208701. (35) Winzeler, E. A. Nat. Rec. Micro. 2006, 4, 145-151. (36) The biopython package; http://www.biopython.org. (37) Enright, A. J.; Van Dongen, S.; Ouzounis, C. A. Nucleic Acids Res. 2002, 30 (7). (38) Lee, I.; Date, S. V.; Adai, A. T.; Marcotte, E. M. Science 2004, 306, 1555-1558. (39) Gene Ontology Consortium, Nucleic Acids. Res. 2004, 32, D258D261. (40) Shannon, P.; Markiel, A.; Ozier, O.; Baliga, N. S.; Wang, J. T.; Ramage, D.; Amin, N.; Schwikowski, B.; Ideker, T. Genome Res. 2003, 13, 2498-2504. (41) Le Roch, K. G.; Johnson, J. R.; Florens, L.; Zhou, Y.; Santrosyan, A.; Grainger, M.; Yan, S. F.; Williamson, K. C.; Holder, A. A.; Carucci, D. J.; Yates, J. R., III; Winzeler, E. A. Genome Res. 2004, 14, 2308-2318. (42) Vazquez, A.; Flammini, A.; Maritan, A.; Vespignani, A. Com. Plex Us 2003, 1 (38), 38-44. (43) Date, S. V.; Stoeckert, C. J. Genome Res. 2006, 16 (4), 542-549. (44) Jeong, H.; Mason, S. P.; Baraba´si, A. L.; Oltvai, Z. N. Nature 2001, 411, 41-42. (45) Bochtler, M.; Ditzel, L.; Groll, M.; Hartmann, C.; Huber, R. Annu. Rev. Biomol. Struct. 1999, 28, 295-317. (46) Matadeen, R.; Patwardhan, A.; Gowen, B.; Orlova, E. V.; Pape, T.; Cuf, M.; Mueller, F.; Brimacombe, R.; van Heel, M. Structure Fold. Des. 1999, 7, 1575-1583. (47) Mura, C.; Cascio, D.; Sawaya, M. R.; Eisenberg, D. S. Proc. Natl. Acad. Sci. U.S.A. 2001, 98, 5532-5537. (48) Han, J. J.; Bertin, N.; Hao, T.; Goldberg, D. S.; Berriz, G. F.; Zhang, L. V.; Dupuy, D.; Walhout, A. J. M.; Cusick, M. E.; Roth, F. P.; Vidal, M. Nat. Biotech. 2005, 23, 839-844. (49) Stumpf, M.; Wiuf, C.; May, R. Proc. Natl. Acad. Sci. U.S.A. 2005, 101, 4221-4224. (50) Aravind, L.; Iyer, L. M.; Wellems, T. E.; Miller, L. H. Cell 2003, 115, 771-785. (51) Li, S.; Armstrong, C. M.; Bertin, N.; Ge, H.; Milstein, S.; Boxem, M.; Vidalain, P. O.; Han, J. D.; Chesneau, A.; Hao, T.; Goldberg, D. S.; Li, N.; Martinez, M.; Rual, J. F.; Lamesch, P.; Xu, L.; Tewari, M.; Wong, S. L.; Zhang, L. V.; Berriz, G. F.; Jacotot, L.; Vaglio, P.; Reboul, J.; Hirozane-Kishikawa, T.; Li, Q.; Gabel, H. W.; Elewa, A.; Baumgartner, B.; Rose, D. J.; Yu, H.; Bosak, S.; Sequerra, R.; Fraser, A.; Mango, S. E.; Saxton, W. M.; Strome, S.; Van Den Heuvel, S.; Piano, F.; Vandenhaute, J.; Sardet, C.; Gerstein, M.; Doucette-Stamm, L.; Gunsalus, K. C.; Harper, J. W.; Cusick, M. E.; Roth, F. P.; Hill, D. E.; Vidal, M. Science 2004, 303, 540-543.

PR0605769