Integrating multifaceted information to predict Mycobacterium

Oct 1, 2018 - Studying the protein-protein interactions (PPIs) between MTB and human can deepen our understanding of the pathogenesis of TB and offer ...
0 downloads 0 Views 2MB Size
Subscriber access provided by University of Sunderland

Article

Integrating multifaceted information to predict Mycobacterium tuberculosis-human protein-protein interactions Jun Sun, Ling-Li Yang, Xi Chen, De-Xin Kong, and Rong Liu J. Proteome Res., Just Accepted Manuscript • DOI: 10.1021/acs.jproteome.8b00497 • Publication Date (Web): 01 Oct 2018 Downloaded from http://pubs.acs.org on October 2, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 41 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Integrating multifaceted information to predict Mycobacterium tuberculosis-human protein-protein interactions Jun Sun1, Ling-Li Yang1, Xi Chen2,3, De-Xin Kong1,2, and Rong Liu1* 1

Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China; 2

State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan 430070, China;

3

College of Veterinary Medicine, Huazhong Agricultural University, Wuhan 430070, China.

*

Corresponding author

Email: [email protected] Tel: +86-27-87280877 Fax: +86-27-87280877

1

ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ABSTRACT Tuberculosis (TB) is one of the biggest infectious disease killers caused by Mycobacterium tuberculosis (MTB). Studying the protein-protein interactions (PPIs) between MTB and human can deepen our understanding of the pathogenesis of TB and offer new clues to the treatment against MTB infection, but the experimentally validated interactions are especially scarce in this regard. Herein we proposed an integrated framework that combined template-, domain-domain interaction-, and machine learning-based methods to predict MTB-human PPIs. As a result, we established a network composed of 13,758 PPIs including 451 MTB proteins and 3,167 human proteins (http://liulab.hzau.edu.cn/MTB/). Compared to known human targets of various pathogens, our predicted human targets show a similar tendency in terms of the network topological properties and enrichment in important functional genes. Additionally, these human targets largely have longer sequence lengths, more protein domains, more disordered residues, lower evolutionary rates, and older protein ages. Functional analysis demonstrates that these proteins show strong preferences toward the phosphorylation, kinase activity, and signaling transduction processes and the disease and immune related pathways. Dissecting the cross-talk among top-ranked pathways suggests that the cancer pathway may serve as a bridge in MTB infection. Triplet analysis illustrates that the paired targets interacting with the same partner are adjacent to each other in the intra-species network and tend to share similar expression patterns. Finally, we identified 36 potential anti-MTB human targets by integrating known drug target information and molecular properties of proteins.

Keywords: tuberculosis; protein-protein interactions; functional analysis; drug target 2

ACS Paragon Plus Environment

Page 2 of 41

Page 3 of 41 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

INTRODUCTION Tuberculosis (TB) is one of the deadliest infectious diseases of humankind caused by Mycobacterium tuberculosis (MTB) that usually infects the lungs. According to the World Health Organization (WHO) report, approximately 10.4 million new cases of TB are identified worldwide, together with 1.7 million deaths each year1. Nevertheless, the underlying molecular mechanism of MTB interfering with its host cell is still unclear. More unfortunately, although the four first-line drugs including isoniazid, rifampicin, ethambutol, and pyrazinamide have been commonly applied to the treatment of TB, these drugs were discovered more than 50 years ago1,2, and there is a growing prevalence of multidrug-resistant and extensively drug-resistant TB. Studying the protein-protein interactions (PPIs) between MTB and human could provide novel insights into the pathogenesis of TB and the response mechanism of human against MTB infection, but the known pathogen-host interactions (PHIs), especially MTB-human PPIs, are relatively scarce. The primary reason might be that the experimental determination of PHIs is time-consuming and labor-intensive. Therefore, it is highly needed to develop computational methods to guide or aid experimental techniques for identifying MTB-human PPIs. During the past decade, a large number of computational algorithms have been established to predict PHIs3-13. These methods mainly depend on sequence similarity, structural similarity, domain-domain interaction (DDI), and machine learning, which have been summarized in the recent excellent reviews14,15. Especially, regarding the MTB-human PPI prediction, Huo et al.16 developed an algorithm based on sequence similarity and 3

ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

DDI-based refinement. Zhou et al.17 invented a stringent DDI-based method by utilizing the known structural templates of domain interactions. Zhou et al.18 also created a stringent sequence homology-based approach in which bacteria-human interactions were used as template PPIs. Compared with sequence information, as is well known, protein structures tend to be more evolutionarily conserved and can better reflect the binding modes of PPIs19. Cui et al.20 thus designed a computational pipeline based on pairwise structure similarity, but this method was heavily dependent on the limited known complex structures which would restrict its prediction coverage. By contrast, integration of the abundant binary PPI information and protein structural similarity offered more comprehensive predictions. For instance, the hypothesis that paired proteins sharing a highly similar structure might interact with the same partners has been applied to HIV-1-human, dengue virus-human, and influenza A-human PPI predictions4,5,21. To the best of our knowledge, nevertheless, this semi-structured strategy has not been adopted to predict MTB-human PPIs. Similarly, machine learning-based algorithms were also commonly developed for the prediction of PHIs6,9,10,22-25 but have yet to be used for MTB-human PPIs. Furthermore, the aforementioned works generally performed the prediction from an independent viewpoint, easily resulting in a large number of false positives. Therefore, exploring the complementarity among these strategies might provide more reliable MTB-human PPIs. With these limitations in mind, we proposed an integrated algorithm combining template(semi-structured), DDI-, and machine learning-based methods to predict PPIs between MTB and human. We first used the three component methods for prediction independently and the overlapped results were retained to construct the predicted MTB-human PPI network (Fig. 1). 4

ACS Paragon Plus Environment

Page 4 of 41

Page 5 of 41 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

The prediction accuracy of our algorithm was evaluated by applying it to the prediction of HIV-1-human PPIs. To further examine the quality of our prediction results, we considered the known human targets of pathogens as the reference and assessed the topological features of predicted human proteins targeted by MTB and their enrichment in functionally important genes. We systematically investigated the differences in sequence and structural characteristics between target and non-target proteins. Gene ontology (GO) and pathway enrichment analyses were employed to reveal the functional roles of predicted target proteins. In addition, the triplet analysis was conducted to investigate the relationship between paired target proteins interacting with the same partners. Finally, the potential anti-MTB druggable targets and associated drugs were identified through integration of known resources and molecular properties of proteins, which might provide important clues to the precaution and treatment against MTB infection.

MATERIALS AND METHODS

Overview of our prediction system As shown in Fig. 1, our prediction pipeline comprises three partitions, including template-, DDI-, and machine learning-based methods. Regarding template-based method, we collected the abundant intra- and inter-species PPIs as the templates and utilized the structural similarity between proteins to infer putative MTB-human PPIs. The DDI-based method was established on the domain-domain interaction information, provided that the physical interactions between different proteins are fulfilled by their structural domains. As a complement of the two methods, another machine learning-based method was developed 5

ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

using random forest with the sequence features of paired pathogen and human proteins. Because the individual methods might lead to a lot of false positives, the consensus results provided by at least two component methods were reserved as high-quality predictions for constructing the MTB-human PPI network.

Template-based method for predicting MTB-human interactions In order to establish template-based method (Fig. 1), we first constructed a comprehensive template library composed of intra- and inter-species PPIs. The intra-species templates contain human-human and MTB-MTB PPIs, while the inter-species templates contain bacteria-human, viruses-human, and fungi-human PPIs. The human-human PPIs were extracted from HIPPIE database26 and the MTB-MTB PPIs were obtained from Wang et al.’s study27. The inter-species PPIs were derived from several well-established pathogen-host interaction databases, including PATRIC28, HPIDB29, VirHostNet30, and PHISTO31. More details are given in Supplementary Table S-1. Afterwards we retrieved 20,199, 3,972, and 5,055 UniProt accessions for human proteins, MTB proteins, and the pathogen proteins involved in inter-species PPIs, respectively. Because the corresponding PDB structures of a protein often partially or completely overlap with each other, the greedy algorithm proposed by Kamburov et al.32 was used to remove the redundancy. Briefly, the SIFTS mapping33 was used to calculate the coverage score between PDB sequence and reference sequence. The related PDB structures of each protein were sorted in descending order based on the coverage score. We built a set of representative structures for each protein by gradually adding the sorted structures. Any pair of structures

6

ACS Paragon Plus Environment

Page 6 of 41

Page 7 of 41 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

overlapped with each other no more than 10% of the shorter structure. Finally, we obtained 477, 6,527, 210, 373, and 261 representative structures corresponding to 466, 5,333, 191, 251, and 221 proteins for MTB, human, bacteria, viruses, and fungi, respectively. We further inferred the putative MTB-human PPIs based on the hypothesis that if two proteins hold similar structures, they have a high probability to interact with the same protein partners. Given an MTB protein with its representative structures, we identified its structural homologs in human and pathogens (e.g., bacteria, viruses, and fungi). The structural similarity was evaluated using TM-align which is one of the best structure alignment algorithms34. The human proteins in our template library interacting with the retrieved structural homologs were considered to be the interacting partners of the query MTB protein. Given a human protein, we can conduct the similar procedure to find and replace its structural homologs involved in the known MTB-MTB PPIs.

DDI-based method for predicting MTB-human interactions We collected known DDIs by integrating the information form 3did35, DOMINE36, and iPfam37. It is worth noting that the DDIs with low and middle confidence in DOMINE were excluded. A total number of 12,275 DDIs were attained. We extracted domain annotations for all MTB and human UniProt accessions by searching the reference sequence against Pfam-A database with PfamScan38, resulting in 1,831 and 5,491 domains for 3,357 MTB proteins and 18,335 human proteins, respectively. We then predicted MTB-human PPIs with the assumption that two proteins would physically interact if there exists a known DDI between one domain of an MTB protein and that of a human protein.

7

ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Machine leaning-based method for predicting MTB-human interactions Regarding machine learning-based approach, we first collected bacteria-human PPIs as the positive samples from PHISTO31, HPIDB29, and PATRIC28, thus achieving 8,989 interactions between 2,712 bacteria proteins and 3,702 human proteins. These interactions were filtered out by restricting the sequence similarity between any two bacteria or human proteins below 30%, which resulted in 5,413 positive samples. The negative samples were generated by randomly associating the bacteria and human proteins. To characterize each protein, the position specific scoring matrix (PSSM) shown below was obtained by running three iterations of PSI-BLAST searches against NCBI non-redundant database. 

  ,  =  , ⋯ ,

 ,  , ⋯ ,

⋯  ,  …  ,  ⋱ …  … , 

where , represents the raw element in PSSM matrix and L denotes the length of a given sequence. For each row of the matrix, we performed a Z-score transformation as follows:

,

=

1 , −  ∑  ,

1   ∑ (, −  ∑ , ) −1

where , means the normalized value, N denotes the number of residue types and is equal

to 20 in this work. For each column, we then computed the average score as follows: 

1

""" ! = ⋅ % , # 

Through this procedure, each protein was represented by a 20-dimensional vector and the feature vectors were concatenated for each protein pair. We used the random forest algorithm in R with 500 trees to implement our prediction model. The 5-fold cross-validation was used 8

ACS Paragon Plus Environment

Page 8 of 41

Page 9 of 41 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

to evaluate the performance. As shown in Supplementary Table S-2, the optimal ratio between positive and negative examples was 1:5. Using the best prediction model, we checked whether the inferred MTB-human PPIs depending on template- or DDI-based method were classified as positive predictions by machine learning method.

Feature extraction for proteins involved in MTB-human PPIs We calculated nine sequence and structural features for the proteins involved in MTB-human PPIs, including protein length, number of domains, percentage of disordered residues, percentage of buried residues, percentages of residues in the three secondary structure states (helix/sheet/coil), dN/dS ratio, and protein age. The disordered regions in each protein were identified by DISOPRED339. The secondary structure states of residues and the buried residues with relative accessible surface area less than 20% were generated by SPIDER240. The dN/dS ratio was calculated based on human-mouse orthologous genes from Ensemble BioMart41. The protein age can be inferred using phylogenetic analysis and extracted from ProteinHistorian42, in which we selected ‘PPODv4_Jaccard_families’ as the protein family database and ‘Wagner parsimony’ as the ancestral reconstruction algorithm. To annotate the fold type of each human protein, the reference sequence was aligned against the SUPERFAMILY database43 and the fold information was derived from SCOPe44.

Functional enrichment analysis of proteins involved in MTB-human PPIs To explore the functional roles of proteins involved in MTB-human PPIs, we identified the associated GO terms using DAVID45, in which ‘Homo sapiens’ and ‘Mycobacterium tuberculosis H37Rv’ were selected as the background, respectively. The over-represented GO 9

ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

biological processes, cellular components, molecular functions, and KEGG pathways were reserved only if the Benjamini P-value was less than 0.1.

Triplet proximity and co-expression correlation The interaction modes of host-pathogen-host (H-P-H) and pathogen-host-pathogen (P-H-P) were called as triplets6. In this work, we were interested in whether two human proteins interacting with the same MTB protein (or two MTB proteins interacting with the same human protein) are close to each other in the human PPI network (or MTB PPI network). For each triplet in our prediction results, the shortest path between paired proteins in the intra-species PPI network was calculated using NetworkX. We evaluated the gene co-expression correlation for paired human proteins in the H-P-H triplets using the microarray gene expression data of 126 normal tissues46, while the gene co-expression correlation for paired MTB proteins in the P-H-P triplets was assessed using all the 76 samples in GPL16972 from the GEO database. The expression values of the probes were processed by log2 transformation and quantile normalization using the lumi package in R. The probes were then mapped to corresponding genes, and the expression values of multiple probes for a given gene were averaged to compute the Pearson correlation coefficient (PCC).

Derivation of anti-MTB druggable targets and compounds The known drug targets and related drugs were extracted from Drugbank47 and TTD48. The successful human drug targets were collected based on TTD and Rask-Andersen et al.’s study49. The successful MTB drug targets were derived from the existing database 10

ACS Paragon Plus Environment

Page 10 of 41

Page 11 of 41 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

(http://www.bioinformatics.org/tbdtdb/druglist.php). Regarding the predicted MTB drug targets, we extracted their information from three literatures50-52. According to the molecular properties of proteins investigated in this work, we identified the anti-MTB human targets with high confidence if their attribute values were consistently higher than the averages of all human proteins involved in the predicted MTB-human network.

RESULTS AND DISCUSSION

Evaluation of our algorithm using known HIV-1-human PPIs In this study, we fused protein structural similarity, DDI information, and machine learning to develop an integrated algorithm which can be used to predict PPIs between pathogens and human. The effectiveness of our pipeline was first assessed by the known HIV-1-human PPIs because of their relative abundance at present. The 6,723 experimentally validated PPIs were derived from the HIV-1-human interaction database53. Regarding the template-based method, the structural similarity threshold is an important parameter. Thus, we tried to use different TM-score cutoffs for retrieving the structural homologs. As shown in Table 1, when the structural similarity threshold is increased, the template-based method returns relatively fewer predicted PPIs, which directly results in a decrease in the number of final predictions generated by the integrated algorithm, but our prediction precision is improved consecutively. Additionally, we evaluated the DDI- and machine learning-based methods individually (Supplementary Figure S-1). Although template-based method yields a remarkably higher recall compared to the other two component methods and the consensus method, the integration of different signatures can effectively delete the false positives and 11

ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

increase the precision, which is highly critical to the construction of a reliable PPI network. By balancing precision and recall, we set the optimal TM-score cutoff to 0.6. As a result, 122 out of 995 PPIs predicted by our integrated algorithm are found in the known HIV-1-human PPIs (Table 1). According to both precision and recall, our approach (0.123 and 0.018) outperforms the high-throughput screening (44/416 = 0.106 and 44/6723 = 0.007) and Cui et al.’s method based on pairwise structural similarity (14/187 = 0.075 and 14/6723 = 0.002)20,54. We then explored the potential reasons for the relatively poor prediction performance. As described in our methodology, the vast majority of predicted interactions are derived from the template- and DDI-based methods. The 6,723 known HIV-1-human PPIs include a total of 10 HIV-1 genes. Due to the scarcity of structural homologs and domain annotations for partial HIV-1 proteins, the template- and DDI-based methods miss 6 and 7 HIV-1 genes, respectively, corresponding to 3,205 and 3,749 known PPIs, which largely results in the poor recall of our pipeline. On the other hand, the lower precision is probably determined by two factors. The first one is the lack of reasonable templates which might hamper the identified HIV-1 proteins to be associated with the correct human targets, and the second one is the incompleteness of current HIV-1-human interactome which might lead to the underestimation of our precision measure. However, if we used the human proteins targeted by HIV-1 to assess our method, the precision can reach 0.524 (154/294, Supplementary Figure S-1). Collectively, these results suggest that our pipeline could be used to predict pathogen-host PPIs.

Construction of MTB-human interactome map 12

ACS Paragon Plus Environment

Page 12 of 41

Page 13 of 41 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

The proposed algorithm was then adopted to infer MTB-human PPIs (Fig. 1). Using the template-based method, we got 373,612 MTB-human PPIs. Supplementary Figure S-2 shows that the vast majority of predicted PPIs are inferred from intra-species templates, while the inter-species templates also provide useful clues. Based on the DDI-based method, it is found that there exist 226,738 known DDIs between MTB and human proteins. We further applied the machine learning-based method to filter out template- and DDI-based predictions, resulting in 1,110 and 191 overlapped interactions, respectively. Finally, the PPIs generated by at least two component methods were considered as high-confidence predictions. The resulting PPI network between MTB and human holds 13,758 PPIs, including 451 MTB proteins and 3,167 human proteins (Fig. 2A). Generally speaking, the secreted and membrane proteins in bacteria have a higher probability of interacting with host proteins. Therefore, we extracted 105 secreted MTB proteins from Penn et al.’s recent work55, among which 19 (4.2%) proteins appear in our predictions and are involved in 227 (1.6%) interactions. According to GO annotations, 242 (53.7%) predicted targets in MTB are membrane proteins, corresponding to 12,270 (89.2%) interactions. Considering the coverage of above PPIs and the observation that the inner proteins of bacteria can interact with host proteins (Supplementary Figure S-3), we did not set additional constraints on the 13,758 MTB-human PPIs. In Supplementary Figure S-4, we can see that the number of our predicted PPIs is at least 10-fold more than those of existing methods and our study shares small overlaps with existing studies, suggesting that our algorithm could provide additional information for studying the cross-talk between MTB and human. Based on the assumption that interacting 13

ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

proteins tend to share similar functions, we preliminarily evaluated the reliability of predicted PPIs. The BMA score calculated by GOSemSim was used to assess the functional consistency between proteins56. A greater BMA score means a closer functional relationship. As the control, we randomly selected the same number of MTB-human protein pairs. The GO semantic similarities between paired proteins for the predicted PPIs is significantly higher than those for the random pairs (Fig. 2C), suggesting that the predicted MTB-human PPI network would be reliable from an overall perspective. In this study, a total of 39 PPIs including 15 MTB proteins and 11 human proteins are consistently identified by the three component methods (Fig. 2B). In this local PPI network, the NF-κB protein is the highly connected node interacting with 12 MTB proteins, among which 5 are serine threonine protein kinases, such as PknA, PknB, PknD, PknE, and PknG. The PknB-NFκb pair was also predicted as a putative PPI with high confidence by Cui et al.’s approach20. Previous studies suggested that the phosphorylated NF-κB protein can translocate into the nucleus for regulating the production of type II interferon (IFN-γ), which plays an important role in innate and adaptive immunity57. Our prediction results reveal that these serine threonine protein kinases can also interact with other immune related human proteins, including large proline-rich protein (BAG6), two transcription factors proteins in the STAT family (STAT3 and STAT6), syntenin-1 (SDCB1), and ankyrin repeat domain-containing protein 13A (AN13A). In IFN-γ-stimulated macrophages, the BAG6 protein can down-regulate the release of nitric oxide and pro-inflammatory cytokines58. Both type I and type II interferons could facilitate the recruitment and phosphorylation of STAT359,60. The phosphorylated STAT3 can inhibit apoptosis and promote proliferation with the help of 14

ACS Paragon Plus Environment

Page 14 of 41

Page 15 of 41 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

anti-apoptotic genes from the B cell lymphoma (BCL) family61. Type I interferon (IFN-ɑ) was found to directly induce activation of STAT662. These functional clues imply that the above serine threonine protein kinases from MTB have a higher probability to target immune related proteins in the host signaling pathways regulated by type I and type II interferons.

Validation of human targets involved in MTB-human PPIs To provide more evidence for our prediction results, we collected the known host factors associated with MTB infection. Kumar et al. identified a total of 275 host factors by RNAi experiments63, among which 40 are in agreement with the human target proteins (P-value = 0.002). Berry et al. detected 393 differentially expressed genes during the infection stage of MTB64. In this study, 53 predicted target proteins are consistent with their results (P-value = 0.018). Additionally, Sambarey et al. identified 380 core proteins highly involved in MTB infection65, among which 153 are identical to our predictions (P-value = 3.4e-19). All these overlapped human proteins are relevant to MTB infection and involved in 1,157 MTB-human PPIs, reaching a coverage rate of 8.4% in the network. In order to further evaluate whether our algorithm can discover reasonable human target proteins, we considered the known human targets of pathogens as the reference. The viruses-human, bacteria-human, and fungi-human PPIs were collected from several existing databases (Supplementary Table S-1). We obtained 5,414, 3,171 and 556 human targets for viruses, bacteria, and fungi, respectively. In each case, all the remaining human proteins were regarded as non-target proteins. The centrality measures can reflect the local and global importance of proteins in the host PPI network. As shown in Table 2, the known targets of

15

ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

viruses, fungi, and bacteria possess the higher degree and betweenness values compared to the non-target proteins, which is consistent with the observations from previous studies66-68. As expected, the predicted human targets in this study show a similar tendency (Table 2 and Supplementary Figure S-5). Especially, these human proteins overlap with known targets with the number of 1,554, 957, and 168 for viruses, bacteria, and fungi, respectively (P-values = 3.7e-75, 2.0e-49, and 3.6e-08, respectively; Supplementary Figures S-6A and S-6B). Meanwhile, we found 1,262 unique human targets interacting with MTB which tend to possess the lower degree and betweenness values in comparison with the overlapped targets (Supplementary Figures S-6B and S-6C). Furthermore, we checked whether these target proteins are enriched in essential genes, housekeeping genes, and inflammatory genes. The essential gene list including 1,216 genes is the consensus results generated by the genome-wide single-guide RNA screening69 and the haploid gene-trap screening70. The housekeeping gene list consists of 8,874 genes expressed in all tissues71. From the Table 2, we can see that all the three kinds of known targets and the predicted targets share high overlaps with essential genes and housekeeping genes, indicating that pathogens might facilitate their infection and survival by perturbing the functions of core proteins in the host cell. Further, we got 2,285 genes by searching the keyword ‘inflammatory’ in NCBI. The predicted human targets, together with the known targets of viruses and bacteria, are significantly enriched in inflammatory genes, indicating that these proteins are prone to be involved in the host immune response for fighting against the invasion of pathogens. More interestingly, the measures of the five properties of human proteins targeted by MTB are highly close to those of the known target proteins of bacteria, confirming the 16

ACS Paragon Plus Environment

Page 16 of 41

Page 17 of 41 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

reliability of the predicted MTB-human PPI network from another perspective.

Sequence, structural, and evolutionary features of proteins involved in MTB-human PPIs In this section, we first analyzed the properties of proteins involved in MTB-human PPIs at the sequence and structural level. Regarding the sequence features, both MTB and human target proteins have longer sequence lengths and more domains than non-targets (Fig. 3 and Supplementary Figure S-7). In regard to structural properties, the human target proteins include more disordered residues (P-value = 4.1e-09, Fig. 3), whereas the opposite trend is observed for MTB target proteins (P-value = 1.1e-03, Supplementary Figure S-7). These results are in line with previous results regarding viruses-human PPIs72, suggesting that MTB proteins and their human targets might complement each other using the structural disorder and order. Moreover, both human and MTB target proteins have a higher percentage of buried residues (P-values = 1.7e-09 and 0.036, respectively). The secondary structure analysis illustrates that human target proteins tend to hold a smaller proportion of helix residues but greater fractions of coil and sheet residues (P-values = 2.1e-20, 4.2e-13, and 1.0e-43, respectively). By contrast, MTB target proteins only show a significant difference in sheet residues. It is clear that these results provide useful insights into the inherent characteristics of MTB and human proteins involved in MTB infection. We then investigated the evolutionary conservation of predicted human targets. A lower dN/dS ratio indicates a stronger evolutionary conservation. As shown in Fig. 3, the human target proteins have lower dN/dS ratios compared to non-target proteins (P-value = 6.6e-66),

17

ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

indicating that target proteins are more conserved. By analyzing their evolutionary origins, as expected, Fig. 3 demonstrates that the predicted target proteins tend to be ancient proteins (1652.4±22.0 Ma versus 1149.0±9.7 Ma, P-value = 3.0e-141). Further, we explored whether these ancient proteins have strong preference toward specific fold types. Supplementary Table S-3 shows that protein kinase-like (PK-like) (d.144), P loop-containing NTP hydrolases (c.37), and 7-bladed beta-propeller (b.69) are the most significantly enriched fold types. The protein ages of human targets in the top 10 fold types are given in Supplementary Figure S-8, in which the beta-hairpin-alpha-hairpin repeat (d.211) is the oldest fold, followed by the most abundant folds d.144 and c.37. To a certain extent, the aforementioned analyses deepen our outstanding of the evolution of human proteins related to MTB infection.

GO enrichment analysis of proteins involved in MTB-human PPIs The GO enrichment analysis was used to reveal the possible functional relevance between human and MTB proteins in the predicted PPIs (Fig. 4A and Supplementary Figure S-9). The cellular component results indicate that about one third of human target proteins are annotated with ‘membrane’ or membrane related GO terms, while nearly a half of MTB proteins are annotated with ‘cell wall’ or ‘plasma membrane’ (Supplementary Figure S-9). Clearly, the proteins located in the cell surface are prone to participate in MTB-human interactions. For instance, mycobacterial surface proteins ESXA and ESXB could bind to a series of surface antigen proteins (CD4+, CD8+, CD14+, and CD19+)73 that contribute to the escape of bacteria into the macrophage cytoplasm74. In this study, ESXA and ESXB are

18

ACS Paragon Plus Environment

Page 18 of 41

Page 19 of 41 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

predicted to interact with both HLA class II histocompatibility antigen gamma chain (CD74) and HLA class I histocompatibility antigen B-42 alpha chain (HLA-B). Li et al. have identified four epitopes in ESXB (also named CFP10) recognized by CD8+ T cells through HLA-B molecules to interfere with antigen presentation75. Our prediction results also show that these two secreted proteins (ESXA and ESXB) can interact with the same host proteins (CD74, HLA-B, NFKB1, STAB1, and SRRM2), forming the P-H-P triplets as defined in this work. A previous study reported that the interaction between ESXA and TLR2 could result in the inhibition of NF-κB activation76. Additionally, ESXA and ESXB can cooperatively inhibit

lipopolysaccharide-induced

NF-κB

dependent

gene

expression

through

downregulation of reactive oxidative species production77. Therefore, the secreted and membrane proteins in MTB play critical roles in subverting host immunity. In addition, the biological process and molecular function analyses show that the top-ranked GO terms of human proteins are related to phosphorylation, kinase activity, and signaling processes (Fig. 4A). For MTB proteins, the kinase associated GO terms are also most significantly enriched (Supplementary Figure S-9). Extensive experimental results have illustrated the kinase PknG in MTB can promote bacilli survival in infected macrophages by preventing the fusion of phagosomes with lysosomes, PknA and PknB play critical roles in modulating cell shape and division, and PknE can sense nitric oxide stress and prevent apoptosis by interfering with host signaling pathways78-82. It has also been reported that bacterial pathogens preferentially use their kinases as effector proteins to regulate the host kinase signaling cascades for facilitating pathogen infection. For instance, the kinases OspG in Shigella ssp and LegK1 in L. pneumophila can directly influence NF-κB signaling 19

ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

pathway83,84. Consequently, our results suggest that the kinases in human and those in MTB might have closely functional associations during pathogen infection.

Pathway enrichment analysis of proteins involved in MTB-human PPIs The pathway enrichment analysis is the most intuitive way to understand the influence of MTB on the human host. Among the top 10 most significantly enriched pathways, we find that ‘Epstein-Barr virus infection’ and ‘Hepatitis B’ are closely relevant to MTB infection. Another three annotations including ‘MAPK signaling pathway’, ‘T cell receptor signaling pathway’, and ‘B cell receptor signaling pathway’ are essential for the host immune response (Fig. 4A). Moreover, the ‘phagocytosis’ and ‘Tuberculosis’ pathways are significantly enriched (P-values = 7.6e-10 and 7.5e-06, respectively). It is known that MTB can escape from phagocytosis-mediated antimicrobial activity85,86. We therefore extracted human target proteins involved in phagocytosis and their interacting MTB proteins (Supplementary Table S-4). This local network includes 302 interactions between 31 MTB proteins and 47 human proteins. We observe that 5 GTPases related proteins (PAK1, CDC42, WASP, RAC1, and RAF1) in human and 5 serine threonine protein kinases (PknA, PknB, PknD, PknE, and PknG) in MTB are almost fully connected (Supplementary Figure S-10). The host kinase PAK1 actually interacts with other three GTPases in our template PPI library. Based on the fact that 4 out of these 5 MTB kinases share structural similarities with PAK1 (Supplementary Figure S-10) and the following DDI-based validation, we propose that these MTB kinases could mimic the functions of host kinases to interact with the human GTPases. Previous studies have also reported the similar phenomenon. The kinase YpkA in Yersinia

20

ACS Paragon Plus Environment

Page 20 of 41

Page 21 of 41 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

can directly interact with the small GTPases RhoA and RAC187, which is essential to mollify phagocytes and disrupt the eukaryotic cytoskeleton in mouse88. The kinase PknB in Staphylococcus can phosphorylate the paxillin89, a cytoskeleton-associated protein which could bind to the GTPase activator90. During the process of infection, MTB can recruit host cell cytoskeletal factors such as WASP to escape from phagosomes91. Collectively, these MTB kinases might have the capacity to interplay with the host GTPase related proteins, and the predicted PPIs involved in phagocytosis could provide possible explanations for the evasion of MTB from the host immune system. Analyzing the relationships among different pathways comprising predicted human targets is useful for identifying the key pathways involved in MTB infection. We computed the number of common proteins and PPIs between the top-ranked 20 pathways based on KEGG annotations. In Figure 4B, ‘pathways in cancer’ (hsa05200), ‘MAPK signaling pathway’ (hsa04010), and ‘Ras signaling pathway’ (hsa04014) possess high overlaps with other pathways at the protein level. For instance, the cancer pathway not only shares the greatest number (63) of proteins with ‘Ras signaling pathway’ but also highly overlaps with another two essential pathways involved in MTB infection, namely ‘Rap1 signaling pathway’ and ‘ErBb signaling pathway’92-95. We further extracted 4,050 cancer genes using Cheng et al.’s approach96 and found 1,398 ones in line with our predicted human targets, suggesting that the pathogenesis of cancer might share a number of similarities with that of MTB, such as the evasion of host immune response, and the interruption of signaling pathways. In Figure 4C, the numbers of common PPIs between different pathways are clearly greater than those of common proteins. For instance, there exist 246 overlapped PPIs between ‘Hepatitis B’ 21

ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(hsa05161) and ‘pathways in cancer’. Again, the cancer pathway still serves as a bridge across the different pathways related to MTB infection.

Proximity and expression correlation of paired targets in triplets In the predicted MTB-human PPI network, the H-P-H triplet denotes two human proteins interacting with the same MTB protein and the P-H-P triplet represents two MTB proteins interacting with the same human protein. As a result, there are 2,796,207 human protein pairs and 28,102 MTB protein pairs involved in H-P-Hs and P-H-Ps, respectively. As the control, we randomly sampled protein pairs with the same number of H-P-Hs and P-H-Ps by coupling proteins in the intra-species networks of human and MTB, respectively. Fig. 5A shows that about 40% of paired human proteins in H-P-Hs have a distance not higher than two, while only 14% of random pairs are found in the same range. On the other hand, though the distribution of P-H-Ps is similar to that of random pairs, there still exists a difference at the distance of two. These results indicate that the predicted human and MTB targets tend to be adjacent in the host and pathogen PPI networks, respectively. As is well known, the adjacent proteins in the intra-species network usually have similar expression patterns. We computed the PCC between the gene expression profiles of paired proteins in triplets. As shown in Fig. 5B, both H-P-Hs and P-H-Ps show different distributions compared with random pairs. Specifically, the proportions of protein pairs with high correlation (|PCC|≥0.5) are 10% and 8% for human protein pairs and random pairs (P-value = 0), while the values are 24% and 20% for MTB protein pairs and random pairs (P-value = 3.3e-25). Collectively, it can be inferred that the paired proteins targeted by the

22

ACS Paragon Plus Environment

Page 22 of 41

Page 23 of 41 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

same MTB or human proteins tend to share a similar expression pattern. Further, the co-expressed proteins might be more functionally related. Compared to random pairs, the higher semantic similarities of paired proteins in H-P-Hs and P-H-Ps imply their stronger functional associations (Fig. 5C and Fig. 5D).

Potential anti-MTB druggable targets and compounds To estimate whether the proteins involved in MTB-human PPIs are potential drug targets, we extracted 3,722 human targets and 21 MTB targets from Drugbank and TTD, respectively. As shown in Table 3, 1,053 human proteins and 13 MTB proteins in our results overlap with the aforementioned drug targets (P-values = 9.1e-108 and 3.6e-08, respectively). We further collected 620 successful human drug targets based on TTD and Rask-Andersen et al.’s study49 and 10 successful MTB drug targets from existing databases. There are 176 and 2 entries for human and MTB consistent with our predictions (P-values = 1.8e-82 and 1.4e-04, respectively). Additionally, previous studies have developed a number of methods to identify potential drug targets in MTB. Ramakrishnan et al. recognized 78 potential targets by repurposing 130 FDA-approved drugs50. Melak et al. used network centrality measures to identify 807 potential drug targets in MTB52. Raman et al. invented a pipeline called TargetTB which returned 451 targets by integration of different resources51. Our predictions share 14, 126, and 83 MTB proteins with Ramakrishnan et al.’s, Melak et al.’s, and Raman et al.’s results (P-values = 0.03, 9.1e-06, and 8.5e-07), respectively. Accordingly, our predicted MTB-human PPIs might provide useful clues to finding new drug targets to fight against MTB.

23

ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Targeting the host rather than the pathogen offers one possible solution to cure or alleviate the symptoms caused by MTB infection. From the 1,053 overlapped human targets mentioned above, we selected 36 targets whose attribute values of sequence, structural, and evolutionary properties except secondary structure states are consistently greater than the average values of all the predicted human proteins (Supplementary Table S-5). Among these proteins, the ABL tyrosine kinases have been reported to regulate the host cytoskeleton or pathogenesis by a large number of pathogens97-102. Imatinib, the inhibitor of tyrosine-protein kinase ABL1 and tyrosine-protein kinase ABL2, can reduce the number of granulomatous lesions and bacterial load in mice infected with MTB103,104. These evidences suggest that the tyrosine kinases in the host cell could be considered as putative drug targets against MTB infection. Additionally, we obtained tyrosine kinase non-receptor protein 2 (ACK1) and receptor tyrosine-protein kinases ErbB-1 and ErbB-4 (EGFR and ERBB4), together with related drugs (Supplementary Table S-6). EGFR and ERBB4, two members in the epidermal growth factor receptor subfamily, serve as the cell surface receptor for EGF family members. ACK1 co-localizes with EGFR and is involved in the early stage of EGFR desensitization105. Gefitinib can interact with EGFR to inhibit p38 MAPK signaling pathway, resulting in activation of autophagy and restriction of MTB growth106. Additionally, it is reported that EGFR is the host factor of hepatitis C virus and the possible target for antiviral therapy107. Based on existing knowledge, we thus suggest that these proteins are possible host-directed drug targets for the specific treatment of MTB infection.

CONCLUSIONS 24

ACS Paragon Plus Environment

Page 24 of 41

Page 25 of 41 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

In this study, an inter-species PPI network between MTB and human composed of 13,758 interactions was established by combining the template-, DDI- and machine learning-based predictions. We demonstrate that the integrated strategy can effectively reduce the false positive predictions. Extensive validations based on different resources verify that the predicted MTB-human PPI network is largely reliable. According to the topological features and the enrichment in important functional genes, we illustrate that the predicted human target proteins share high similarities with the known targets of various pathogens, especially close to those of bacteria, indicating that our algorithm can discover reasonable human target proteins. By investigating the molecular properties of our predicted human targets, it is found that these proteins tend to have longer sequence lengths, more domains, more disordered residues, lower evolutionary rates, and older protein ages. Additionally, the functional analysis demonstrates that the human target proteins are mainly related to phosphorylation, kinase activity, and signaling transduction, while the pathway analysis demonstrates that these proteins are highly involved in the disease and immune related pathways. The interplays between top-ranked pathways suggest that the cancer pathway might play the role of a bridge in MTB infection. The triplet proximity and co-expression analyses reveal that the paired target proteins in both H-P-Hs and P-H-Ps are more adjacent in the intra-species network and tend to share similar expression patterns, implying their stronger functional associations. We finally identified 36 potential druggable targets and related drugs by integration of known resources and molecular properties of predicted human targets. Taken together, the established MTB-human PPI network inevitably contains a certain number of false predictions and is still far from complete, but it would provide useful insights into the 25

ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

precaution and treatment against the infection of MTB.

SUPPORTING INFORMATION The supporting information is available free of charge at ACS website http://pubs.acs.org. List of Figures Figure S-1. Evaluation of our algorithm using HIV-1-human PPIs. Figure S-2. Number of MTB-human PPIs depending on different types of PPI templates. Figure S-3. Protein subcellular localization distribution of different pathogens. Figure S-4. Overlapped MTB-human PPIs between our work and previous studies. Figure S-5. Distribution of degree and betweenness of human targets and non-targets. Figure S-6. Comparison of human proteins targeted by MTB and other pathogens. Figure S-7. Comparison of the features between MTB target and non-target proteins. Figure S-8. Relationship between fold type and protein age for human target proteins. Figure S-9. GO and KEGG enrichment analyses of MTB target proteins. Figure S-10. Subnetwork composed of human GTPase related proteins and MTB kinases. List of Tables Table S-1. Composition of different types of PPIs in our template library. Table S-2. Performance of machine learning method with different positive and negative ratios. Table S-3. Significantly enriched fold types for human target proteins. Table S-4. Predicted MTB-human PPIs involved in phagocytosis. Table S-5. Potential anti-MTB druggable targets in the host. Table S-6. Potential druggable targets of tyrosine kinases and related drugs. 26

ACS Paragon Plus Environment

Page 26 of 41

Page 27 of 41 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

ACKNOWLEDGEMENTS This work was supported by the National Natural Science Foundation of China (31301091) and the Fundamental Research Funds for the Central Universities (2662018JC031).

REFERENCES (1) Global tuberculosis report, Geneva: World Health Organization. 2017, Licence: CC BY-NCSA 3.0 IGO. (2) Feltcher, M. E.; Sullivan, J. T.; Braunstein, M., Protein export systems of Mycobacterium tuberculosis: novel targets for drug development? Future Microbiol. 2010, 5, 1581-1597. (3) Liu, X.; Huang, Y.; Liang, J.; Zhang, S.; Li, Y.; Wang, J.; Shen, Y.; Xu, Z.; Zhao, Y., Computational prediction of protein interactions related to the invasion of erythrocytes by malarial parasites. BMC Bioinf. 2014, 15, 393. (4) Doolittle, J. M.; Gomez, S. M., Mapping protein interactions between Dengue virus and Its human and insect hosts. PLoS Negl. Trop. Dis. 2011, 5, e954. (5) de Chassey, B.; Meyniel Schicklin, L.; Aublin Gex, A.; Navratil, V.; Chantier, T.; André, P.; Lotteau, V., Structure homology and interaction redundancy for discovering virus-host protein interactions. EMBO Rep. 2013, 14, 938-944. (6) Dyer, M. D.; Murali, T. M.; Sobral, B. W., Computational prediction of host-pathogen protein-protein interactions. Bioinformatics 2007, 23, 66. (7) Petrenko, P.; Doxey, A. C., mimicMe: a web server for prediction and analysis of host-like proteins in microbial pathogens. Bioinformatics 2015, 31, 590-592. (8) Qi, Y.; Tastan, O.; Carbonell, J. G.; Klein Seetharaman, J.; Weston, J., Semi-supervised multi-task learning for predicting interactions between HIV-1 and human proteins. Bioinformatics 2010, 26, i645-i652. (9) Tastan, O.; Qi, Y.; Carbonell, J. G.; Klein Seetharaman, J., Prediction of interactions between HIV-1 and human proteins by information integration. Pac. Symp. Biocomput. 2009, 516-527. (10) Eid, F. E.; ElHefnawi, M.; Heath, L. S., DeNovo: virus-host sequence-based protein-protein interaction prediction. Bioinformatics 2016, 32, 1144-1150. (11) Rapanoel, H. A.; Mazandu, G. K.; Mulder, N. J., Predicting and analyzing interactions between Mycobacterium tuberculosis and its human host. PloS one 2013, 8, e67472. (12) Garamszegi, S.; Franzosa, E. A.; Xia, Y., Signatures of pleiotropy, economy and convergent evolution in a domain-resolved map of human-virus protein-protein interaction networks. PLoS Pathog. 2013, 9, e1003778. (13) Franzosa, E. A.; Xia, Y., Structural principles within the human-virus protein-protein interaction network. Proc. NatI. Acad. Sci. U.S.A. 2011, 108, 10538-10543. (14) Nourani, E.; Khunjush, F.; Durmuş, S., Computational approaches for prediction of pathogen-host protein-protein interactions. Front Microbiol. 2015, 6, 94. (15) Arnold, R.; Boonen, K.; Sun, M. G.; Kim, P. M., Computational analysis of interactomes: current and future perspectives for bioinformatics approaches to model the host-pathogen interaction space. Methods 2012, 57, 508-518. (16) Huo, T.; Liu, W.; Guo, Y.; Yang, C.; Lin, J.; Rao, Z., Prediction of host-pathogen protein interactions between Mycobacterium tuberculosis and Homo sapiens using sequence motifs. BMC Bioinf. 2015, 16, 100. 27

ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(17) Zhou, H.; Rezaei, J.; Hugo, W.; Gao, S.; Jin, J.; Fan, M.; Yong, C. H.; Wozniak, M.; Wong, L., Stringent DDI-based prediction of H. sapiens-M. tuberculosis H37Rv protein-protein interactions. BMC Syst. Biol. 2013, 7 Suppl 6. (18) Zhou, H.; Gao, S.; Nguyen, N. N.; Fan, M.; Jin, J.; Liu, B.; Zhao, L.; Xiong, G.; Tan, M.; Li, S.; Wong, L., Stringent homology-based prediction of H. sapiens-M. tuberculosis H37Rv protein-protein interactions. Biol. Direct. 2014, 9, 5. (19) Lukatsky, D. B.; Shakhnovich, B. E.; Mintseris, J.; Shakhnovich, E. I., Structural similarity enhances interaction propensity of proteins. J. Mol. Biol. 2007, 365, 1596-1606. (20) Cui, T.; Li, W.; Liu, L.; Huang, Q.; He, Z. G., Uncovering new pathogen-host protein-protein Interactions by pairwise structure similarity. PloS one 2016, 11, e0147612. (21) Doolittle, J. M.; Gomez, S. M., Structural similarity-based predictions of protein interactions between HIV-1 and Homo sapiens. Virol. J. 2010, 7, 82. (22) Cui, G.; Fang, C.; Han, K., Prediction of protein-protein interactions between viruses and human by an SVM model. BMC Bioinf. 2012, 13, 1-10. (23) Mei, S.; Zhu, H., A novel one-class SVM based negative data sampling method for reconstructing proteome-wide HTLV-human protein interaction networks. Sci. Rep. 2015, 5, 8034. (24) Dyer, M. D.; Murali, T. M.; Sobral, B. W., Supervised learning and prediction of physical interactions between human and HIV proteins. Infect. Genet. Evol. 2011, 11, 917-923. (25) Mei, S.; Zhang, K., Computational discovery of Epstein-Barr virus targeted human genes and signalling pathways. Sci. Rep. 2016, 6, 30612. (26) Alanis Lobato, G.; Andrade Navarro, M. A.; Schaefer, M. H., HIPPIE v2.0: enhancing meaningfulness and reliability of protein-protein interaction networks. Nucleic Acids Res. 2017, 45, D408-D414. (27) Wang, Y.; Cui, T.; Zhang, C.; Yang, M.; Huang, Y.; Li, W.; Zhang, L.; Gao, C.; He, Y.; Li, Y.; Huang, F.; Zeng, J.; Huang, C.; Yang, Q.; Tian, Y.; Zhao, C.; Chen, H.; Zhang, H.; He, Z.-G., Global protein−protein interaction network in the human pathogen Mycobacterium tuberculosis H37Rv. J. Proteome Res. 2010, 9, 6665-6677. (28) Wattam, A. R.; Abraham, D.; Dalay, O.; Disz, T. L.; Driscoll, T.; Gabbard, J. L.; Gillespie, J. J.; Gough, R.; Hix, D.; Kenyon, R.; Machi, D.; Mao, C.; Nordberg, E. K.; Olson, R.; Overbeek, R.; Pusch, G. D.; Shukla, M.; Schulman, J.; Stevens, R. L.; Sullivan, D. E.; Vonstein, V.; Warren, A.; Will, R.; Wilson, M. J.; Yoo, H. S.; Zhang, C.; Zhang, Y.; Sobral, B. W., PATRIC, the bacterial bioinformatics database and analysis resource. Nucleic Acids Res. 2014, 42, 91. (29) Ammari, M. G.; Gresham, C. R.; McCarthy, F. M.; Nanduri, B., HPIDB 2.0: a curated database for host-pathogen interactions. Database 2016, baw103. (30) Guirimand, T.; Delmotte, S.; Navratil, V., VirHostNet 2.0: surfing on the web of virus/host molecular interactions data. Nucleic Acids Res. 2015, 43, D583-D587. (31) Durmuş Tekir, S.; Çakır, T.; Ardiç, E.; Sayılırbaş, A. S.; Konuk, G.; Konuk, M.; Sarıyer, H.; Uğurlu, A.; Karadeniz, Đ.; Özgür, A.; Sevilgen, F. E.; Ülgen, K. Ö., PHISTO: pathogen-host interaction search tool. Bioinformatics 2013, 29, 1357-1358. (32) Kamburov, A.; Lawrence, M. S.; Polak, P.; Leshchiner, I.; Lage, K.; Golub, T. R.; Lander, E. S.; Getz, G., Comprehensive assessment of cancer missense mutation clustering in protein structures. Proc. NatI. Acad. Sci. U.S.A. 2015, 112, 95. (33) Velankar, S.; Dana, J. M. M.; Jacobsen, J.; van Ginkel, G.; Gane, P. J.; Luo, J.; Oldfield, T. J.; O'Donovan, C.; Martin, M.-J. J.; Kleywegt, G. J., SIFTS: Structure Integration with Function, Taxonomy and Sequences resource. Nucleic Acids Res. 2013, 41, 9. 28

ACS Paragon Plus Environment

Page 28 of 41

Page 29 of 41 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

(34) Zhang, Y.; Skolnick, J., TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res. 2005, 33, 2302-2309. (35) Mosca, R.; Céol, A.; Stein, A.; Olivella, R.; Aloy, P., 3did: a catalog of domain-based interactions of known three-dimensional structure. Nucleic Acids Res. 2014, 42, 9. (36) Yellaboina, S.; Tasneem, A.; Zaykin, D. V.; Raghavachari, B.; Jothi, R., DOMINE: a comprehensive collection of known and predicted domain-domain interactions. Nucleic Acids Res. 2011, 39, 5. (37) Finn, R. D.; Miller, B. L.; Clements, J.; Bateman, A., iPfam: a database of protein family and domain interactions found in the Protein Data Bank. Nucleic Acids Res. 2014, 42, 73. (38) Mistry, J.; Bateman, A.; Finn, R. D., Predicting active site residue annotations in the Pfam database. BMC Bioinf. 2007, 8, 298. (39) Jones, D. T.; Cozzetto, D., DISOPRED3: precise disordered region predictions with annotated protein-binding activity. Bioinformatics 2015, 31, 857-863. (40) Yang, Y.; Heffernan, R.; Paliwal, K.; Lyons, J.; Dehzangi, A.; Sharma, A.; Wang, J.; Sattar, A.; Zhou, Y., SPIDER2: a package to predict secondary structure, accessible surface area, and main-chain torsional angles by deep neural networks. Methods Mol. Biol. 2017, 1484, 55-63. (41) Kinsella, R. J.; Kähäri, A.; Haider, S.; Zamora, J.; Proctor, G.; Spudich, G.; Almeida King, J.; Staines, D.; Derwent, P.; Kerhornou, A.; Kersey, P.; Flicek, P., Ensembl BioMarts: a hub for data retrieval across taxonomic space. Database 2011, bar030. (42) Capra, J. A.; Williams, A. G.; Pollard, K. S., ProteinHistorian: tools for the comparative analysis of eukaryote protein origin. PLoS Comput. Biol. 2012, 8, e1002567. (43) Gough, J.; Karplus, K.; Hughey, R.; Chothia, C., Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. J. Mol. Biol. 2001, 313, 903-919. (44) Fox, N. K.; Brenner, S. E.; Chandonia, J. M., SCOPe: structural classification of proteins-extended, integrating SCOP and ASTRAL data and classification of new structures. Nucleic Acids Res. 2014, 42, D304-D309. (45) Huang, D. W. a. W.; Sherman, B. T.; Lempicki, R. A., Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 2009, 37, 1-13. (46) Benita, Y.; Cao, Z.; Giallourakis, C.; Li, C.; Gardet, A.; Xavier, R. J., Gene enrichment profiles reveal T-cell development, differentiation, and lineage-specific transcription factors including ZBTB25 as a novel NF-AT repressor. Blood 2010, 115, 5376-5384. (47) Wishart, D. S.; Knox, C.; Guo, A.; Cheng, D.; Shrivastava, S.; Tzur, D.; Gautam, B.; Hassanali, M., DrugBank: a knowledgebase for drugs, drug actions and drug targets. Nucleic Acids Res. 2008, 36, D901-D906. (48) Zhu, F.; Han, B.; Kumar, P.; Liu, X.; Ma, X.; Wei, X.; Huang, L.; Guo, Y.; Han, L.; Zheng, C.; Chen, Y., Update of TTD: therapeutic target database. Nucleic Acids Res. 2010, 38, D787-D791. (49) Rask-Andersen, M.; Almén, M.; Schiöth, H. B., Trends in the exploitation of novel drug targets. Nat. Rev. Drug Discov. 2011, 10, 579-590. (50) Ramakrishnan, G.; Chandra, N. R.; Srinivasan, N., Recognizing drug targets using evolutionary information: implications for repurposing FDA-approved drugs against Mycobacterium tuberculosis H37Rv. Mol. Biosyst. 2015, 11, 3316-3331. (51) Raman, K.; Yeturu, K.; Chandra, N., targetTB: A target identification pipeline for Mycobacterium tuberculosis through an interactome, reactome and genome-scale structural analysis. BMC Syst. Biol. 2008, 2, 1-21. (52) Melak, T.; Gakkhar, S., Potential non homologous protein targets of mycobacterium tuberculosis H37Rv 29

ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

identified from protein-protein interaction network. J. Theor. Biol. 2014, 361, 152-158. (53) Fu, W.; Sanders Beer, B. E.; Katz, K. S.; Maglott, D. R.; Pruitt, K. D.; Ptak, R. G., Human immunodeficiency virus type 1, human protein interaction database at NCBI. Nucleic Acids Res. 2009, 37, 22. (54) Jäger, S.; Cimermancic, P.; Gulbahce, N.; Johnson, J. R.; McGovern, K. E.; Clarke, S. C.; Shales, M.; Mercenne, G.; Pache, L.; Li, K.; Hernandez, H.; Jang, G. M.; Roth, S. L.; Akiva, E.; Marlett, J.; Stephens, M.; D'Orso, I.; Fernandes, J.; Fahey, M.; Mahon, C.; O'Donoghue, A. J.; Todorovic, A.; Morris, J. H.; Maltby, D. A.; Alber, T.; Cagney, G.; Bushman, F. D.; Young, J. A.; Chanda, S. K.; Sundquist, W. I.; Kortemme, T.; Hernandez, R. D.; Craik, C. S.; Burlingame, A.; Sali, A.; Frankel, A. D.; Krogan, N. J., Global landscape of HIV-human protein complexes. Nature 2011, 481, 365-370. (55) Penn, B. H.; Netter, Z.; Johnson, J. R.; Dollen, J.; Jang, G. M.; Johnson, T.; Ohol, Y. M.; Maher, C.; Bell, S. L.; Geiger, K.; Golovkine, G.; Du, X.; Choi, A.; Parry, T.; Mohapatra, B. C.; Storck, M. D.; Band, H.; Chen, C.; Jäger, S.; Shales, M.; Portnoy, D. A.; Hernandez, R.; Coscoy, L.; Cox, J. S.; Krogan, N. J., An MTB-human protein-protein interaction map identifies a switch between host antiviral and antibacterial responses. Mol. Cell 2018, 71, 637-648. (56) Yu, G.; Li, F.; Qin, Y.; Bo, X.; Wu, Y.; Wang, S., GOSemSim: an R package for measuring semantic similarity among GO terms and gene products. Bioinformatics 2010, 26, 976-978. (57) Yang, J. P.; Hori, M.; Sanda, T.; Okamoto, T., Identification of a novel inhibitor of nuclear factor-kappaB, RelA-associated inhibitor. J. Biol. Chem. 1999, 274, 15662-15670. (58) Grover, A.; Izzo, A. A., BAT3 regulates Mycobacterium tuberculosis protein ESAT-6-mediated apoptosis of macrophages. PloS one 2012, 7, e40836. (59) Platanias, L. C., Mechanisms of type-I- and type-II-interferon-mediated signalling. Nat. Rev. Immunol. 2005, 5, 375-386. (60) Sato, T.; Selleri, C.; Young, N. S.; Maciejewski, J. P., Inhibition of interferon regulatory factor-1 expression results in predominance of cell growth stimulatory effects of interferon-gamma due to phosphorylation of Stat1 and Stat3. Blood 1997, 90, 4749-4758. (61) Dechow, T. N.; Pedranzini, L.; Leitch, A.; Leslie, K.; Gerald, W. L.; Linkov, I.; Bromberg, J. F., Requirement of matrix metalloproteinase-9 for the transformation of human mammary epithelial cells by Stat3-C. Proc. NatI. Acad. Sci. U.S.A. 2004, 101, 10602-10607. (62) Fasler-Kan, E.; Pansky, A.; Wiederkehr, M.; Battegay, M.; Heim, M. H., Interferon-alpha activates signal transducers and activators of transcription 5 and 6 in Daudi cells. Eur. J. Biochem. 1998, 254, 514-519. (63) Kumar, D.; Nath, L.; Kamal, M. A.; Varshney, A.; Jain, A.; Singh, S.; Rao, K. V., Genome-wide analysis of the host intracellular network that regulates survival of Mycobacterium tuberculosis. Cell 2010, 140, 731-743. (64) Berry, M. P.; Graham, C. M.; McNab, F. W.; Xu, Z.; Bloch, S. A. A.; Oni, T.; Wilkinson, K. A.; Banchereau, R.; Skinner, J.; Wilkinson, R. J.; Quinn, C.; Blankenship, D.; Dhawan, R.; Cush, J. J.; Mejias, A.; Ramilo, O.; Kon, O. M.; Pascual, V.; Banchereau, J.; Chaussabel, D.; O'Garra, A., An interferon-inducible neutrophil-driven blood transcriptional signature in human tuberculosis. Nature 2010, 466, 973-977. (65) Sambarey, A.; Devaprasad, A.; Baloni, P.; Mishra, M.; Mohan, A.; Tyagi, P.; Singh, A.; Akshata, J. S.; Sultana, R.; Buggi, S.; Chandra, N., Meta-analysis of host response networks identifies a common core in tuberculosis. NPJ Syst. Biol. Appl. 2017, 3, 4. (66) Calderwood, M. A.; Venkatesan, K.; Xing, L.; Chase, M. R.; Vazquez, A.; Holthaus, A. M.; Ewence, A. E.; Li, N.; Hirozane Kishikawa, T.; Hill, D. E.; Vidal, M.; Kieff, E.; Johannsen, E., Epstein-Barr virus and virus human protein interaction maps. Proc. NatI. Acad. Sci. U.S.A. 2007, 104, 7606-7611. (67) Dyer, M. D.; Murali, T. M.; Sobral, B. W., The landscape of human proteins interacting with viruses and other pathogens. PLoS Pathog. 2008, 4, e32. 30

ACS Paragon Plus Environment

Page 30 of 41

Page 31 of 41 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

(68) Navratil, V.; de Chassey, B.; Combe, C.; Lotteau, V., When the human viral infectome and diseasome networks collide: towards a systems biology platform for the aetiology of human diseases. BMC Syst. Biol. 2011, 5, 1-15. (69) Wang, T.; Birsoy, K.; Hughes, N. W.; Krupczak, K. M.; Post, Y.; Wei, J. J.; Lander, E. S.; Sabatini, D. M., Identification and characterization of essential genes in the human genome. Science 2015, 350, 1096-1101. (70) Blomen, V. A.; Májek, P.; Jae, L. T.; Bigenzahn, J. W.; Nieuwenhuis, J.; Staring, J.; Sacco, R.; van Diemen, F. R.; Olk, N.; Stukalov, A.; Marceau, C.; Janssen, H.; Carette, J. E.; Bennett, K. L.; Colinge, J.; Superti-Furga, G.; Brummelkamp, T. R., Gene essentiality and synthetic lethality in haploid human cells. Science 2015, 350, 1092-1096. (71) Uhlén, M.; Fagerberg, L.; Hallström, B. M.; Lindskog, C.; Oksvold, P.; Mardinoglu, A.; Sivertsson, Å.; Kampf, C.; Sjöstedt, E.; Asplund, A.; Olsson, I.; Edlund, K.; Lundberg, E.; Navani, S.; Szigyarto, C.; Odeberg, J.; Djureinovic, D.; Takanen, J.; Hober, S.; Alm, T.; Edqvist, P.-H.; Berling, H.; Tegel, H.; Mulder, J.; Rockberg, J.; Nilsson, P.; Schwenk, J. M.; Hamsten, M.; von Feilitzen, K.; Forsberg, M.; Persson, L.; Johansson, F.; Zwahlen, M.; von Heijne, G.; Nielsen, J.; Pontén, F., Tissue-based map of the human proteome. Science 2015, 347, 1260419. (72) Halehalli, R.; Nagarajaram, H., Molecular principles of human virus protein-protein interactions. Bioinformatics 2015, 31, 1025-1033. (73) Wang, X.; Barnes, P. F.; Dobos-Elder, K. M.; Townsend, J. C.; Chung, Y.-t. T.; Shams, H.; Weis, S. E.; Samten, B., ESAT-6 inhibits production of IFN-gamma by Mycobacterium tuberculosis-responsive human T cells. J. Immunol. 2009, 182, 3668-3677. (74) Ma, Y.; Keil, V.; Sun, J., Characterization of Mycobacterium tuberculosis EsxA membrane insertion: roles of N- and C-terminal flexible arms and central helix-turn-helix motif. J. Biol. Chem. 2015, 290, 7314-7322. (75) Li, L.; Yang, B.; Yu, S.; Zhang, X.; Lao, S.; Wu, C., Human CD8+ T cells from TB pleurisy respond to four immunodominant epitopes in MTB CFP10 restricted by HLA-B alleles. PLoS one 2013, 8, e82196. (76) Pathak, S.; Basu, S.; Basu, K.; Banerjee, A.; Pathak, S.; Bhattacharyya, A.; Kaisho, T.; Kundu, M.; Basu, J., Direct extracellular interaction between the early secreted antigen ESAT-6 of Mycobacterium tuberculosis and TLR2 inhibits TLR signaling in macrophages. Nat. Immunol. 2007, 8, 610-618. (77) Ganguly, N.; Giang, P. H.; Gupta, C.; Basu, S. K.; Siddiqui, I.; Salunke, D. M.; Sharma, P., Mycobacterium tuberculosis

secretory

proteins

CFP-10,

ESAT-6

and

the

CFP10:ESAT6

complex

inhibit

lipopolysaccharide-induced NF-kappaB transactivation by downregulation of reactive oxidative species (ROS) production. Immunol. Cell Biol. 2007, 86, 98-106. (78) Walburger, A.; Koul, A.; Ferrari, G.; Nguyen, L.; Prescianotto-Baschong, C.; Huygen, K.; Klebl, B.; Thompson, C.; Bacher, G.; Pieters, J., Protein Kinase G from Pathogenic Mycobacteria Promotes Survival Within Macrophages. Science 2004, 304, 1800-1804. (79) Gopalaswamy, R.; Narayanan, S.; Chen, B.; Jacobs, W. R.; Av‐Gay, Y., The serine/threonine protein kinase PknI controls the growth of Mycobacterium tuberculosis upon infection. FEMS Microbiol. Lett. 2009, 295, 23-29. (80) Malhotra, V.; Arteaga Cortés, L. T.; Clay, G.; Clark Curtiss, J. E., Mycobacterium tuberculosis protein kinase K confers survival advantage during early infection in mice and regulates growth in culture and during persistent infection: implications for immune modulation. Microbiology 2010, 156, 2829-2841. (81) Jayakumar, D.; Jacobs, W. R.; Narayanan, S., Protein kinase E of Mycobacterium tuberculosis has a role in the nitric oxide stress response and apoptosis in a human macrophage model of infection. Cell. Microbiol. 2008, 10, 365-374. (82) Papavinasasundaram, K. G.; Chan, B.; Chung, J.-H.; Colston, J. M.; Davis, E. O.; Av-Gay, Y., Deletion of 31

ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

the mycobacterium tuberculosis pknH gene confers a higher bacillary load during the chronic phase of infection in BALB/c mice. J. Bacteriol. 2005, 187, 5751-5760. (83) Kim, D.; Lenzen, G.; Page, A.-L.; Legrain, P.; Sansonetti, P. J.; Parsot, C., The Shigella flexneri effector OspG interferes with innate immune responses by targeting ubiquitin-conjugating enzymes. Proc. NatI. Acad. Sci. U.S.A. 2005, 102, 14046-14051. (84) Ge, J.; Xu, H.; Li, T.; Zhou, Y.; Zhang, Z.; Li, S.; Liu, L.; Shao, F., A Legionella type IV effector activates the NF-κB pathway by phosphorylating the IκB family of inhibitors. Proc. NatI. Acad. Sci. U.S.A. 2009, 106, 13725-13730. (85) Koul, A.; Herget, T.; Klebl, B.; Ullrich, A., Interplay between mycobacteria and host signalling pathways. Nat. Rev. Immunol. 2004, 2, 189-202. (86) Deretic, V.; Singh, S.; Master, S.; Harris, J.; Roberts, E.; Kyei, G.; Davis, A.; Haro, S.; Naylor, J.; Lee, H. H.; Vergne, I., Mycobacterium tuberculosis inhibition of phagolysosome biogenesis and autophagy as a host defence mechanism. Cell. Microbiol. 2006, 8, 719-727. (87) Barz, C.; Abahji, T. N.; Trülzsch, K.; Heesemann, J., The Yersinia Ser/Thr protein kinase YpkA/YopO directly interacts with the small GTPases RhoA and Rac-1. FEBS Lett. 2000, 482, 139-143. (88) Wiley, D. J.; Nordfeldth, R.; Rosenzweig, J.; DaFonseca, C. J.; Gustin, R.; Wolf-Watz, H.; Schesser, K., The Ser/Thr kinase activity of the Yersinia protein kinase A (YpkA) is necessary for full virulence in the mouse, mollifying phagocytes, and disrupting the eukaryotic cytoskeleton. Microb. Pathog. 2006, 40, 234-243. (89) Miller, M.; Donat, S.; Rakette, S.; Stehle, T.; Kouwen, T. R.; Diks, S. H.; Dreisbach, A.; Reilman, E.; Gronau, K.; Becher, D.; Peppelenbosch, M. P.; van Dijl, J. M.; Ohlsen, K., Staphylococcal PknB as the first prokaryotic representative of the proline-directed kinases. PloS one 2010, 5, e9057. (90) Kondo, A.; Hashimoto, S.; Yano, H.; Nagayama, K.; Mazaki, Y.; Sabe, H., A new paxillin-binding protein, PAG3/Papalpha/KIAA0400, bearing an ADP-ribosylation factor GTPase-activating protein activity, is involved in paxillin recruitment to focal adhesions and cell migration. Mol. Biol. Cell 2000, 11, 1315-1327. (91) Stamm, L. M.; Morisaki, J. H.; Gao, L.-Y. Y.; Jeng, R. L.; McDonald, K. L.; Roth, R.; Takeshita, S.; Heuser, J.; Welch, M. D.; Brown, E. J., Mycobacterium marinum escapes from phagosomes and is propelled by actin-based motility. J. Exp. Med. 2003, 198, 1361-1368. (92) Meena, L. S.; Rajni, Survival mechanisms of pathogenic Mycobacterium tuberculosis H37Rv. The FEBS journal 2010, 277, 2416-2427. (93) Ho, J.; Moyes, D. L.; Tavassoli, M.; Naglik, J. R., The role of ErbB receptors in infection. Trends Microbiol. 2017, 25, 942-952. (94) He, Y.; Li, W.; Liao, G.; Xie, J., Mycobacterium tuberculosis-specific phagosome proteome and underlying signaling pathways. J. Proteome Res. 2012, 11, 2635-2643. (95) Brumell, J. H.; Scidmore, M. A., Manipulation of rab GTPase function by intracellular bacterial pathogens. Microbiol. Mol. Biol. Rev. 2007, 71, 636-652. (96) Cheng, F.; Jia, P.; Wang, Q.; Lin, C. C. C.; Li, W. H. H.; Zhao, Z., Studying tumorigenesis through network evolution and somatic mutational perturbations in the cancer interactome. Mol. Biol. Evol. 2014, 31, 2156-2169. (97) Lebeis, S. L.; Kalman, D., Aligning antimicrobial drug discovery with complex and redundant host-pathogen interactions. Cell Host Microbe. 2009, 5, 114-122. (98) Swimm, A.; Bommarius, B.; Li, Y.; Cheng, D.; Reeves, P.; Sherman, M.; Veach, D.; Bornmann, W.; Kalman, D., Enteropathogenic Escherichia coli use redundant tyrosine kinases to form actin pedestals. Mol. Biol. Cell 2004, 15, 3520-3529. (99) Ly, K.; Casanova, J. E., Abelson tyrosine kinase facilitates salmonella enterica serovar typhimurium entry into epithelial cells. Infect. Immun. 2009, 77, 60-69. 32

ACS Paragon Plus Environment

Page 32 of 41

Page 33 of 41 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

(100) Burton, E. A.; Plattner, R.; Pendergast, A., Abl tyrosine kinases are required for infection by Shigella flexneri. EMBO J. 2003, 22, 5471-5479. (101) Elwell, C. A.; Ceesay, A.; Kim, J.; Kalman, D.; Engel, J. N., RNA interference screen identifies Abl kinase and PDGFR signaling in Chlamydia trachomatis entry. PLoS Pathog. 2008, 4, e1000021. (102) Pielage, J. F.; Powell, K. R.; Kalman, D.; Engel, J. N., RNAi screen reveals an Abl kinase-dependent host cell pathway involved in Pseudomonas aeruginosa internalization. PLoS Pathog. 2008, 4, e1000031. (103) Bruns, H.; Stegelmann, F.; Fabri, M.; Döhner, K.; van Zandbergen, G.; Wagner, M.; Skinner, M.; Modlin, R. L.; Stenger, S., Abelson tyrosine kinase controls phagosomal acidification required for killing of Mycobacterium tuberculosis in human macrophages. J. Immunol. 2012, 189, 4069-4078. (104) Napier, R. J.; Rafi, W.; Cheruvu, M.; Powell, K. R.; Zaunbrecher, M. A.; Bornmann, W.; Salgame, P.; Shinnick, T. M.; Kalman, D., Imatinib-sensitive tyrosine kinases regulate mycobacterial pathogenesis and represent therapeutic targets against tuberculosis. Cell Host Microbe. 2011, 10, 475-485. (105) Grøvdal, L.; Johannessen, L. E.; Rødland, M.; Madshus, I.; Stang, E., Dysregulation of Ack1 inhibits down-regulation of the EGF receptor. Exp. Cell Res. 2008, 314, 1292-1300. (106) Stanley, S. A.; Barczak, A. K.; Silvis, M. R.; Luo, S. S.; Sogi, K.; Vokes, M.; Bray, M.-A. A.; Carpenter, A. E.; Moore, C. B.; Siddiqi, N.; Rubin, E. J.; Hung, D. T., Identification of host-targeted small molecules that restrict intracellular Mycobacterium tuberculosis growth. PLoS Pathog. 2014, 10, e1003946. (107) Lupberger, J.; Zeisel, M. B.; Xiao, F.; Thumann, C.; Fofana, I.; Zona, L.; Davis, C.; Mee, C. J.; Turek, M.; Gorke, S.; Royer, C.; Fischer, B.; Zahid, M. N.; Lavillette, D.; Fresquet, J.; Cosset, F.-L.; Rothenberg, M. S.; Pietschmann, T.; Patel, A. H.; Pessaux, P.; Doffoël, M.; Raffelsberger, W.; Poch, O.; McKeating, J. A.; Brino, L.; Baumert, T. F., EGFR and EphA2 are host factors for hepatitis C virus entry and possible targets for antiviral therapy. Nat. Med. 2011, 17, 589-595.

33

ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 34 of 41

Table 1. Evaluation of integrated algorithm with experimental HIV-human PPIs TM-score

Overlap

Prediction

Benchmark

Precision

Recall

0.5

166

2617

6723

0.063

0.025

0.6

122

995

6723

0.123

0.018

0.7

31

429

6723

0.072

0.005

0.8

27

166

6723

0.163

0.004

0.9

19

106

6723

0.179

0.003

Precision = Overlap/Prediction and Recall = Overlap/Benchmark.

Table 2. Validation of predicted target proteins using known targets as benchmark Housekeeping Pathogen

No. of proteins

Degree

Betweenness

Inflammatory

Essential gene gene

5414

61.81

10064

16.58

556

75.02

Viruses

3.00E-04 ***

Fungi

*** 4.52E-05

30.82

3171

57.89

12307

25.84

3146

64.08

Bacteria

12332

24.32

1947

2445

582 ***

***

5623

336

8.65E-05

NS

7669 ***

***

60 ***

800

3.20E-04 ***

** 1200

399

359 ***

8.47E-05

***

* 1097

3.26E-04

807

4082

62

1.26E-04 ***

MTB

***

***

gene

3986

367

3.60E-04 ***

14922

792

1425

1897 ***

823

565 ***

***

6170

1442

For each group of pathogens, the upper row is the statistics of targets and the lower row is the statistics of non-targets. *** denotes p2e-10 and p1e-03 and p0.1.

Table 3. Overlap between proteins involved in MTB-human PPIs and existing resources Organism

Known target

Successful target

Drug repurposing method (Ramakrishnan et al., 2015)

Network centrality method (Melak et al., 2014)

Human

1053(3722)

176(620)

-

-

P-value

9.1e-108

1.8e-82

-

-

-

MTB

13(21)

2(10)

14(78)

126(807)

83(451)

P-value

3.6e-08

1.4e-04

0.03

9.1e-06

8.5e-07

The number of entries in the known resources are given in parentheses.

34

ACS Paragon Plus Environment

TargetTB (Raman et al., 2008)

Page 35 of 41 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure Legends Figure 1. Framework and outputs of our integrated algorithm. A. Flowchart of our algorithm. Our pipeline comprises three modules, including template-, DDI-, and machine learning-based methods. B. Number of predicted PPIs of the component and integrated methods.

Figure 2. Visualization and evaluation of predicted MTB-human PPI network. (A) Predicted MTB-human PPI network. (B) High-quality PPIs predicted by all the three component methods. (C) Distribution of functional similarity between paired proteins for predicted PPIs.

Figure 3. Comparison of the distributions of sequence, structural, and evolutionary features of human target proteins and non-target proteins.

Figure 4. Functional enrichment analyses and cross-talk among the host pathways targeted by MTB. (A) GO and KEGG enrichment analyses of human target proteins. (B) Number of shared human targets across different pathways. (C) Number of shared PPIs between human targets across different pathways.

Figure 5. Comparison of the distributions of characteristics between paired proteins in triplets and random pairs. (A) Shortest path distance. (B) Co-expression measure. (C) GO semantic similarity between paired human targets in H-P-Hs. (D) GO semantic similarity between paired MTB targets in P-H-Ps. 35

ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 1. Framework and outputs of our integrated algorithm. A. Flowchart of our algorithm. Our pipeline comprises three modules, including template-, DDI-, and machine learning-based methods. B. Number of predicted PPIs of the component and integrated methods. 405x220mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 36 of 41

Page 37 of 41 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure 2. Visualization and evaluation of predicted MTB-human PPI network. (A) Predicted MTB-human PPI network. (B) High-quality PPIs predicted by all the three component methods. (C) Distribution of functional similarity between paired proteins for predicted PPIs. 327x122mm (300 x 300 DPI)

ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 3. Comparison of the distributions of sequence, structural, and evolutionary features of human target proteins and non-target proteins. 228x218mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 38 of 41

Page 39 of 41 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure 4. Functional enrichment analyses and cross-talk among the host pathways targeted by MTB. (A) GO and KEGG enrichment analyses of human target proteins. (B) Number of shared human targets across different pathways. (C) Number of shared PPIs between human targets across different pathways. 535x290mm (300 x 300 DPI)

ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 5. Comparison of the distributions of characteristics between paired proteins in triplets and random pairs. (A) Shortest path distance. (B) Co-expression measure. (C) GO semantic similarity between paired human targets in H-P-Hs. (D) GO semantic similarity between paired MTB targets in P-H-Ps. 246x251mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 40 of 41

Page 41 of 41 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

For TOC only 249x220mm (300 x 300 DPI)

ACS Paragon Plus Environment