Subscriber access provided by University of Winnipeg Library
Bioinformatics
SPOT-peptide: Template-based prediction of peptide-binding proteins and peptide-binding sites Thomas Litfin, Yuedong Yang, and Yaoqi Zhou J. Chem. Inf. Model., Just Accepted Manuscript • DOI: 10.1021/acs.jcim.8b00777 • Publication Date (Web): 30 Jan 2019 Downloaded from http://pubs.acs.org on February 3, 2019
Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.
is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.
Page 1 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
SPOT-peptide: Template-based Prediction of Peptide-binding Proteins and Peptide-binding Sites Thomas Litfin,† Yuedong Yang,‡ and Yaoqi Zhou∗,†,¶ †School of Information and Communication Technology, Griffith University, Southport, QLD 4222, Australia ‡School of Data and Computer Science, Sun-Yat Sen University, Guangzhou, Guangdong 510006, China ¶Institute for Glycomics, Griffith University, Southport, QLD 4222, Australia E-mail:
[email protected] Abstract Peptide-binding domains have been successfully targeted in therapeutic applications. However, many peptide-binding proteins (PBPs) remain uncharacterized. Computational prediction of peptide-domain interfaces is challenging due to short lengths, lack of well-defined structures, and limited conservation of peptide motifs. Here we present SPOT-peptide, a template-based protocol for the simultaneous prediction of peptide-binding domains and peptide binding sites independent of specific peptide composition. SPOT-peptide leverages the dogmatic relationship between protein structure and function to predict peptide-binding characteristics for an unknown target based on remote structural homologs. In a leave-homolog out benchmark evaluation, PBPs are discriminated with Matthews correlation coefficient (MCC) of 0.420 and the correct binding sites are identified in 80% of the predicted PBPs. Furthermore, replacing
1
ACS Paragon Plus Environment
Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
the holo target structures with equivalent structures in the apo conformation only marginally diminished PBP recovery. The method is available as a web server at http://sparks-lab.org/tom/SPOT-peptide.
Introduction The comprehensive UniProt database contains more than 100 million protein sequences (UniProt100); the majority of which do not have adequate functional annotations even after homology-based inference 1 . For those proteins with functional annotations, it is not clear if they also possess other moonlighting functions yet to be discovered 2 . As sequencing a genome becomes increasingly inexpensive, more and more proteins will be discovered without functional annotations. The huge gap between annotated and un-annotated proteins makes it essential to predict protein function computationally, prior to any experimental characterizations. In this study, we focus on those proteins that perform their functions through binding to peptides (peptide-binding proteins). Interactions between a protein domain and a small peptide motif have been estimated to regulate between 15-40% of all protein-protein interactions 3 and are implicated in many diseases including cancer 4 and viral infections 5 . Furthermore, peptide-mediated protein-protein interactions (PPIs) are enriched with successfully inhibited drug targets 6 . This reinforces the need to identify PBPs at a genomic scale - independent of specific peptide interactions - to identify novel drug targets. For computational function prediction, there are three levels of resolution. Lowresolution analysis involves a simple two-state prediction of whether or not a protein is peptide-binding. Medium-resolution analysis is used to predict functional sites in proteins. Finally, a high-resolution prediction requires modelling of protein-peptide complex structures. To our knowledge, there is a lack of specific computational methods for predicting peptide-binding proteins except homology-based techniques, while a number of methods have been developed for binding site prediction and prediction of
2
ACS Paragon Plus Environment
Page 2 of 25
Page 3 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
complex structures. Peptide binding sites may be identified from both protein structure and protein sequence information. PepSite 7 identifies binding sites by comparing regions on a protein surface to the preferred binding environments of specific peptide residues. Conversely, PeptiMap 8 and ACCLUSTER 9 generate predictions based on probing the receptor surface with molecular fragments and amino acids, respectively. We have also developed sequence-based 10 and structure-based 11 methods called SPRINT that predict binding sites using machine learning techniques (support vector machines and random forest, respectively). These methods, however, assume that the query protein has already been annotated as peptide-binding. For high-resolution function prediction, protein-peptide complexes may be predicted by computational docking. For example, local docking methods such as Rosetta FlexPepDock ab-initio 12 and HADDOCK 13 predict peptide-protein complexes from an extended peptide conformation and a pre-specified binding site. Blind docking, whereby the peptide binding site is unknown, is more challenging for existing techniques and is, typically, intractable at the genomic scale. Furthermore, docking predictions are sensitive to the unbound conformation of the receptor 14 . pepATTRACT 15 and MDockPep 16 attempt to minimize the computational burden by combining coarsegrained sampling with sophisticated refinement techniques, but are unable to recover the same success in the absence of a known binding site. As a result, predicted binding sites are often employed in conjunction with local docking protocols to boost discrimination performance. This work was inspired by the previous success of template-based functional annotation. The underlying principle is that function is conserved in structural homologs. Based on this idea, known functional complex structures can be used as templates to infer the function of unknown query proteins. Because structures are more conserved than sequences, a structural comparison allows higher sensitivity than, commonlyused, sequence-homology-based approaches in detecting proteins with the same function. More importantly, a template-based approach facilitates function prediction at
3
ACS Paragon Plus Environment
Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
all three resolution levels (function classification, binding site prediction and modelling of complex structures). For example, DBD-Threader 17 and DBD-Hunter 18 were developed to identify DNA-binding proteins. Additionally, small molecule binding sites may be identified by FINDSITE 19–21 , while COFACTOR 22,23 and COACH 24 provide a range of functional annotations including Gene Ontology terms, Enzyme Commission numbers, and active sites. In our own studies, we have developed template-based techniques for the prediction of binding to DNA 25 , RNA 26 , glycans 27 and small molecule ligands
14,28 .
In addition, GalaxyPepDock 29 is a template-based approach to predict-
ing peptide binding sites and complex structures, however it cannot be used to identify peptide-binding proteins. In this work we extend the template-based approach to simultaneously identify peptide-binding proteins (PBPs), their corresponding binding sites and the complex structures between a query protein and template peptide. A major advantage of our approach is the ability to identify peptide-binding domains without knowledge of the specific peptide-binding partner. Instead, peptide-binding sites are inferred from a complex structure between the query protein and a template peptide based on the optimal superposition of a template protein with the query. We found that SPOT-peptide significantly outperforms naïve baselines at low resolution function annotation and also outperforms state-of-the-art binding site prediction tools.
Methodology Prediction protocol This method is a structure-based technique. As shown in Figure 1, the first step for inferring peptide-binding is to evaluate structural similarity between a query protein and functional proteins in a template library. Templates are defined as those proteins with experimentally determined peptide-bound complex structures. We score structural similarity using structural alignment program SP-align 30 because its size-independent scoring allows it to identify partial structural similarity. This measure of local similar-
4
ACS Paragon Plus Environment
Page 4 of 25
Page 5 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
ity is particularly useful since function can be conserved in a small region in proteins. In order to filter out trivial self-recognition and evaluate the potential for prediction in the absence of close sequence homologs, templates with >30% sequence identity were ignored. Significant structural matches were then used to generate a model complex structure between the query protein and the template peptide by superimposing the target protein on to the template. The binding affinity of the modelled complex structure was then evaluated using an all-atom statistical energy function based on a Distance-scaled Finite Ideal-gas Reference state 31–33 (DFIRE). Query proteins that formed modelled complexes with high predicted binding affinity were predicted to be peptide-binding. In addition, queries with a high evolutionary score in the aligned binding residues were also predicted to be peptide-binding. Targets which did not meet either of these criteria were considered non-binding. The final PBP ranking score is equivalent to the SP-score of the nearest neighbor in the template library, after removing those false positive templates with poor predicted affinity (DFIRE) or poor evolutionary similarity (EVO). More details for each step are described below.
Template complex library of PBPs (T21346) A peptide-binding template library was constructed based on the BioLiP 34 database (accessed 20/09/2017). BioLiP contains PDB structures of 18,910 peptide-protein complexes. Protein domains were annotated by DDOMAIN 35 and peptide interactions were remapped to binding domains in cases where all of the binding residues could be found within a single domain. Peptide-binding residues are based on those annotated in BioLiP. Full chains of multi-domain proteins were combined with single-domain complexes for a total of 21,346 protein-peptide interacting complexes (T21346).
5
ACS Paragon Plus Environment
Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 6 of 25
Structure alignment by SP-align Protein structures may be superimposed to optimize the SP-score between the two sets of Cα atoms 30 .
SP-score =
1 3L0.7
X dij 0.3, indicating strong overlap with the binding sites in the respective crystal structures. A cutoff of 0.3 was selected to reflect the bimodal distribution of targetlevel, binding site MCCs (Supplementary Figure S1). The binding sites of an additional 12 PBPs (86%) can be identified from at least one of the templates which meet the selection criteria. Figure 3 demonstrates a successful example of a binding site predicted for Histonelysine N-methyltransferase 2A (KMT2A), from the remotely homologous, peptidebound structure of Pygopus homolog 1 (PYGO1). KMT2A peptide-binding residues were discriminated from non-binding residues with MCC of 0.806, despite sharing only 21.3% sequence identity with the PYGO1 binding template. The backbone-RMSD (bb-RMSD) between the native and modelled peptide is 2.5Å, which is dominated by
12
ACS Paragon Plus Environment
Page 12 of 25
Page 13 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
the distance between the C-terminal threonine residues.
Application to structural genomics targets To demonstrate the usefulness of the SPOT-peptide protocol, we downloaded 4,444 non-redundant structural genomics domains for functional annotation. Among them, 545 domains are predicted as peptide-binding. In Table 3, we highlight the top 20 predicted PBPs from the structural genomics targets. 6/20 (30%) of the top hits have been associated with peptide-binding characteristics. The limited coverage of confirmed PBPs is likely due to their sparse annotation in the literature. The annotated PBPs are dominated by peptidases (4hvt, 5cxw, 6azi, 5dyf2, 4ppr) but also include peptidyl-prolyl cis-trans isomerase (3s6m) which has a number of reported synthetic peptide inhibitors 45 . Additionally, a Vibrio parahaemolyticus uncharacterized protein (1zbp) has also been predicted to have peptide-binding function by SPOT-peptide. Whilst the remaining proteins are characterized by a range of molecular functions they may also be involved in un-annotated peptide-mediated PPIs. In this way, computational prediction via structural homology may be a useful tool to enrich protein functional annotations at a proteomic scale.
Discussion This work describes a template-based approach to simultaneously predict PBPs and peptide-binding residues. The major novelty of this work is the ability to accurately identify PBPs for all proteins with known structure within a genome, which is distinct from existing techniques which focus on predicting the binding modes of specific protein-peptide interactions. A leave-homolog-out benchmark evaluation comparing 485 high-resolution, peptide-binding targets with 1,000 background SCOP domains recovered 47% of the PBPs with a high precision (70%). Of the correctly identified PBPs, 80% included a predicted binding site in good agreement with the site determined from the crystal structure. Furthermore, 86% of the correctly identified PBPs identified at
13
ACS Paragon Plus Environment
Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
least one template leading to an accurate prediction of the binding site. Moreover, the method is computationally efficient. It takes only 19 hours on a 16-core CPU (Intel Xeon E5-2670 @2.60GHz) to generate results for all 4,444 structural genomics domains. To address potential bias associated with utilizing protein structures in the holo conformation, an additional benchmark was constructed to include proteins in both apo and holo conformations. Predictions made using the apo conformation marginally decreased the recall from 46% to 44% compared with predictions based on the holo conformation. This is consistent with previous work demonstrating that domain-level structural similarity is a robust indicator of functional similarity 14 . It should be emphasized that introducing the DFIRE energy function and evolutionbased scoring yields a significant improvement in performance compared with relying on structural similarity alone. This approach allows the reduction of false positives and increase of true positive prediction as demonstrated by improvement in both precision and recall. The major limitation of this approach is the availability of peptide-bound templates. An increasing number of solved protein-peptide complex structures is likely to improve the functional characterization of PBPs using the SPOT-peptide protocol. Thus, as more peptide-binding complexes are available in protein databank, both the precision and coverage of SPOT-peptide will further improve. A web-based server is established at http://sparks-lab.org/tom/SPOT-peptide. The website contains an information page describing the algorithm, examples of singleuse and batch submission and the results for 4,444 structural genomics domains. The benchmark dataset is also available.
Supporting Information Distribution of binding-site MCCs
14
ACS Paragon Plus Environment
Page 14 of 25
Page 15 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
Acknowledgement This work was supported by the Australian Research Council (DP180102060 to Y.Z. and K. P.), the National Natural Science Foundation of China (U1611261, 61772566) and the program for Guangdong Introducing Innovative and Entrepreneurial Teams (2016ZT06D211) to Y.Y. and in part by National Health and Medical Research Council of Australia (1121629 to Y.Z.). We also acknowledge the use of the High Performance Computing Cluster ’Gowonda’ to complete this research. This research has also been undertaken with the aid of the research cloud resources provided by the Queensland Cyber Infrastructure Foundation (QCIF).
15
ACS Paragon Plus Environment
Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Figure 1: Flow diagram of SPOT-peptide protocol
Figure 2: Precision/Recall receiver operating characteristic curve for discrimination between the peptide-binding set BD485 and the non-binding set NB1000 by sequencesimilarity search (PSI-BLAST), structuralsimilarity search (SP-align) and structural similarity plus the DFIRE statistical energy and the EVO evolution-based scoring (SPOTPeptide)
16
ACS Paragon Plus Environment
Page 16 of 25
Page 17 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
Figure 3: (A) Actual binding residues of KMT2A PBP (B) Predicted residues of KMT2A PBP from remote structural homolog PYGO1 with binding site MCC of 0.806 (C) Complex structure with native (green) and modelled (red) histone H3 peptide Table 1: Leave homolog out benchmark evaluation of discrimination between the peptidebinding set BD548 from the non-binding set NB1000 by Matthews correlation coefficient (MCC), precision (PR) and recall (RE). Method PSI-BLAST SP-align SPOT-peptide
PR(%) RE(%) MCC 10CV MCC 60 43 0.323 0.301(0.08) 65 43 0.361 0.362(0.07) 70 47 0.420 0.413(0.04)
Table 2: Evaluation of binding site prediction performance for the binding set BD57 Method SPOT-peptide GalaxyPepDock SPRINT-peptide COACH COACH-peptide
MCCa N(MCC>0.3) 0.482 (0.305) 44 0.389 (0.334) 38 0.331 (0.283) 32 0.354 (0.344) 33 0.403 (0.321) 38
a
MCC averaged over BD57 proteins (standard deviation)
17
ACS Paragon Plus Environment
Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Table 3: Top scoring structural genomics targets predicted as PBPs Target Template SP-score DFIRE EVO Seq ID Function 4hvt 1uoqA0 1.35 -2.88 3.53 27.8 PBPa 4lg8 3pslB0 1.19 -1.11 -0.54 20.5 Otherb 4yvd 2ce9C0 1.14 -1.15 0.58 19.4 Other 5cxw 2v2fF0 1.14 -1.28 0.00 27.0 PBP 3i2n 4j8bA 1.10 -1.37 -1.79 20.8 Other 1zbp 4uqzA0 1.10 -1.10 0.20 28.3 UNKc 1fcy 5iawA0 1.09 -1.57 3.64 28.9 Other 5tnx1 4gkvB1 1.07 -0.42 2.50 29.0 Other 4asc 5x54B 1.06 -3.43 -0.40 26.7 Other 4n6h 4bwbA0 1.05 -2.46 -1.14 24.9 Other 1x541 4ycwB1 1.05 -1.10 1.18 23.5 Other 6azi 3itbC0 1.05 -1.01 1.00 27.5 PBP 4rz74 2bzkB0 1.05 -1.29 2.06 27.2 Other 4fce2 4ihgD1 1.05 -0.12 2.00 26.6 Other 4wed 3driA 1.04 -1.01 -1.67 27.6 Other 5dyf2 5ab2A 1.04 -1.73 2.83 20.1 PBP 4rn7 4m6gA 1.04 -1.22 0.80 25.4 PBP 3s6m 3bo7A0 1.03 -0.84 3.08 29.4 PBP 4ppr 3itbC 1.03 -1.19 2.29 28.0 PBP 2iqt 2ot0D0 1.02 -1.90 2.88 26.7 Other a b
Annotated with putative function related to peptide binding. Annotated with other function c Unknown function
PDB codes 4hvt, 1uoq, 4lg8, 3psl, 4yvd, 2ce9, 5cxw, 2v2f, 3i2n, 4j8b, 1zbp, 4uqz, 1fcy, 5iaw, 5tnx, 4gkv, 4asc, 5x54, 4n6h, 4bwb, 1x54, 4ycw, 6azi, 3itb, 4rz7, 2bzk, 4fce, 4ihg, 4wed, 3dri, 5dyf, 5ab2, 4rn7, 4m6g, 3s6m, 3bo7, 4ppr, 3itb, 2iqt, 2ot0
References (1) Consortium, T. U. UniProt: a Hub for Protein Information. Nucleic Acids Research 2015, 43, D204–D212. (2) Huberts, D. H. E. W.; van der Klei, I. J. Moonlighting Proteins: An Intriguing Mode of Multitasking. Biochimica et Biophysica Acta (BBA) - Molecular Cell Research 2010, 1803, 520–525. (3) Neduva, V.; Linding, R.; Su-Angrand, I.; Stark, A.; de Masi, F.; Gibson, T. J.; Lewis, J.; Serrano, L.; Russell, R. B. Systematic Discovery of New Recognition Peptides Mediating Protein Interaction Networks. PLOS Biology 2005, 3, e405.
18
ACS Paragon Plus Environment
Page 18 of 25
Page 19 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
(4) Uyar, B.; Weatheritt, R. J.; Dinkel, H.; Davey, N. E.; Gibson, T. J. Proteomewide Analysis of Human Disease Mutations in Short Linear Motifs: Neglected Players in Cancer? Molecular Biosystems 2014, 10, 2626–2642. (5) Kadaveru, K.; Vyas, J.; Schiller, M. R. Viral Infection and Human Disease Insights From Minimotifs. Frontiers in bioscience : a journal and virtual library 2008, 13, 6455–6471. (6) London, N.; Raveh, B.; Schueler-Furman, O. Druggable Protein–protein Interactions – From Hot Spots to Hot Segments. Current Opinion in Chemical Biology 2013, 17, 952–959. (7) Trabuco, L. G.; Lise, S.; Petsalaki, E.; Russell, R. B. PepSite: Prediction of Peptide-binding Sites From Protein Surfaces. Nucleic Acids Research 2012, 40, W423–W427. (8) Lavi, A.; Ngan, C. H.; Movshovitz-Attias, D.; Bohnuud, T.; Yueh, C.; Beglov, D.; Schueler-Furman, O.; Kozakov, D. Detection of Peptide-binding Sites on Protein Surfaces: The First Step Towards the Modeling and Targeting of Peptidemediated Interactions. Proteins 2013, 81, 2096–2105. (9) Yan, C.; Zou, X. Predicting Peptide Binding Sites on Protein Surfaces by Clustering Chemical Interactions. Journal of Computational Chemistry 2015, 36, 49–61. (10) Taherzadeh, G.; Yang, Y.; Zhang, T.; Liew, A. W.-C.; Zhou, Y. Sequence-based Prediction of Protein–peptide Binding Sites Using Support Vector Machine. Journal of Computational Chemistry 2016, 37, 1223–1229. (11) Taherzadeh, G.; Zhou, Y.; Liew, A. W.-C.; Yang, Y. Structure-based Prediction of Protein–peptide Binding Regions Using Random Forest. Bioinformatics 2018, 34, 477–484.
19
ACS Paragon Plus Environment
Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
(12) Raveh, B.; London, N.; Zimmerman, L.; Schueler-Furman, O. Rosetta FlexPepDock ab-initio: Simultaneous Folding, Docking and Refinement of Peptides onto Their Receptors. PLOS ONE 2011, 6, e18934. (13) Trellet, M.; Melquiond, A. S. J.; Bonvin, A. M. J. J. A Unified Conformational Selection and Induced Fit Approach to Protein-Peptide Docking. PLOS ONE 2013, 8, e58769. (14) Yang, Y.; Zhan, J.; Zhou, Y. SPOT-Ligand: Fast and Effective Structure-based Virtual Screening by Binding Homology Search According to Ligand and Receptor Similarity. Journal of Computational Chemistry 2016, 37, 1734–1739. (15) Schindler, C. E.; de Vries, S. J.; Zacharias, M. Fully Blind Peptide-Protein Docking with pepATTRACT. Structure 2015, 23, 1507–1515. (16) Yan, C.; Xu, X.; Zou, X. Fully Blind Docking at the Atomic Level for ProteinPeptide Complex Structure Prediction. Structure (London, England : 1993) 2016, 24, 1842–1853. (17) Gao, M.; Skolnick, J. A Threading-Based Method for the Prediction of DNABinding Proteins with Application to the Human Genome. PLoS Computational Biology 2009, 5, e1000567. (18) Gao, M.; Skolnick, J. DBD-Hunter: a Knowledge-based Method for the Prediction of DNA–protein Interactions. Nucleic Acids Research 2008, 36, 3978–3992. (19) Brylinski, M.; Skolnick, J. A Threading-based Method (FINDSITE) for Ligandbinding Site Prediction and Functional Annotation. Proceedings of the National Academy of Sciences 2008, 105, 129–134. (20) Zhou, H.; Skolnick, J. FINDSITEX: A Structure-based, Small Molecule Virtual Screening Approach with Application to all Identified Human GPCRs. Molecular Pharmaceutics 2012, 9, 1775–1784.
20
ACS Paragon Plus Environment
Page 20 of 25
Page 21 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
(21) Zhou, H.;
Skolnick, J. FINDSITEcomb:
A Threading/Structure-Based,
Proteomic-Scale Virtual Ligand Screening Approach. Journal of Chemical Information and Modeling 2013, 53, 230–240. (22) Roy, A.; Yang, J.; Zhang, Y. COFACTOR: an Accurate Comparative Algorithm For Structure-based Protein Function Annotation. Nucleic Acids Research 2012, 40, W471–W477. (23) Zhang, C.; Freddolino, P. L.; Zhang, Y. COFACTOR: Improved Protein Function Prediction by Combining Structure, Sequence and Protein–protein Interaction Information. Nucleic Acids Research 2017, 45, W291–W299. (24) Yang, J.; Roy, A.; Zhang, Y. Protein–ligand Binding Site Recognition Using Complementary Binding-specific Substructure Comparison and Sequence Profile Alignment. Bioinformatics 2013, 29, 2588–2595. (25) Zhao, H.; Yang, Y.; Zhou, Y. Structure-based Prediction of DNA-binding Proteins by Structural Alignment and a Volume-fraction Corrected DFIRE-based Energy Function. Bioinformatics 2010, 26, 1857–1863. (26) Zhao, H.; Yang, Y.; Zhou, Y. Structure-based Prediction of RNA-binding Domains and RNA-binding Sites and Application to Structural Genomics Targets. Nucleic Acids Research 2011, 39, 3017–3025. (27) Zhao, H.; Yang, Y.; von Itzstein, M.; Zhou, Y. Carbohydrate-binding Protein Identification by Coupling Structural Similarity Searching with Binding Affinity Prediction. Journal of Computational Chemistry 2014, 35, 2177–2183. (28) Litfin, T.; Zhou, Y.; Yang, Y. SPOT-Ligand 2: Improving Structure-based Virtual Screening by Binding-homology Search on an Expanded Structural Template Library. Bioinformatics 2017, btw829.
21
ACS Paragon Plus Environment
Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
(29) Lee, H.; Heo, L.; Lee, M. S.; Seok, C. GalaxyPepDock: a Protein-peptide Docking Tool based on Interaction Similarity and Energy Optimization. Nucleic acids research 2015, 43, W431–W435. (30) Yang, Y.; Zhan, J.; Zhao, H.; Zhou, Y. A New Size-independent Score for Pairwise Protein Structure Alignment and its Application to Structure Classification and Nucleic-acid Binding Prediction. Proteins: Structure, Function, and Bioinformatics 2012, 80, 2080–2088. (31) Zhou, H.; Zhou, Y. Distance-scaled, Finite Ideal-gas Reference State Improves Structure-derived Potentials of Mean Force for Structure Selection and Stability Prediction. Protein Science : A Publication of the Protein Society 2002, 11, 2714–2726. (32) Yang, Y.; Zhou, Y. Specific Interactions for Ab Initio Folding of Protein Terminal Regions with Secondary Structures. Proteins: Structure, Function, and Bioinformatics 2008, 72, 793–803. (33) Yang, Y.; Zhou, Y. Ab Initio Folding of Terminal Segments with Secondary Structures Reveals the Fine Difference Between Two Closely Related All-atom Statistical Energy Functions. Protein Science : A Publication of the Protein Society 2008, 17, 1212–1219. (34) Yang, J.; Roy, A.; Zhang, Y. BioLiP: a Semi-manually Curated Database for Biologically Relevant Ligand–protein Interactions. Nucleic Acids Research 2013, 41, D1096–D1103. (35) Zhou, H.; Xue, B.; Zhou, Y. DDOMAIN: Dividing Structures into Domains Using a Normalized Domain–domain Interaction Profile. Protein Science : A Publication of the Protein Society 2007, 16, 947–955. (36) Nagano, N.; Orengo, C. A.; Thornton, J. M. One Fold with Many Functions: The Evolutionary Relationships between TIM Barrel Families Based on their
22
ACS Paragon Plus Environment
Page 22 of 25
Page 23 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
Sequences, Structures and Functions. Journal of Molecular Biology 2002, 321, 741–765. (37) Liu, S.; Zhang, C.; Zhou, H.; Zhou, Y. A Physical Reference State Unifies the Structure-derived Potential of Mean Force for Protein Folding and Binding. Proteins: Structure, Function, and Bioinformatics 2004, 56, 93–101. (38) Vanhee, P.; Stricher, F.; Baeten, L.; Verschueren, E.; Lenaerts, T.; Serrano, L.; Rousseau, F.; Schymkowitz, J. Protein-Peptide Interactions Adopt the Same Structural Motifs as Monomeric Protein Folds. Structure 2009, 17, 1128–1136. (39) Henikoff, S.; Henikoff, J. G. Amino Acid Substitution Matrices from Protein Blocks. Proceedings of the National Academy of Sciences of the United States of America 1992, 89, 10915–10919. (40) Altschul, S. F.; Madden, T. L.; Schäffer, A. A.; Zhang, J.; Zhang, Z.; Miller, W.; Lipman, D. J. Gapped BLAST and PSI-BLAST: a New Generation of Protein Database Search Programs. Nucleic acids research 1997, 25, 3389–3402. (41) Andreeva, A.; Howorth, D.; Chandonia, J.-M.; Brenner, S. E.; Hubbard, T. J. P.; Chothia, C.; Murzin, A. G. Data Growth and its Impact on the SCOP Database: New Developments. Nucleic Acids Research 2008, 36, D419–D425. (42) Velankar, S.; Dana, J. M.; Jacobsen, J.; van Ginkel, G.; Gane, P. J.; Luo, J.; Oldfield, T. J.; O’Donovan, C.; Martin, M.-J.; Kleywegt, G. J. SIFTS: Structure Integration with Function, Taxonomy and Sequences resource. Nucleic Acids Research 2013, 41, D483–D489. (43) London, N.; Movshovitz-Attias, D.; Schueler-Furman, O. The Structural Basis of Peptide-Protein Binding Strategies. Structure 2010, 18, 188–199. (44) Das, A. A.; Sharma, O. P.; Kumar, M. S.; Krishna, R.; Mathur, P. P. PepBind: A Comprehensive Database and Computational Tool for Analysis of Pro-
23
ACS Paragon Plus Environment
Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
tein–peptide Interactions. Genomics, Proteomics & Bioinformatics 2013, 11, 241–246. (45) Wildemann, D.; Erdmann, F.; Alvarez, B. H.; Stoller, G.; Zhou, X. Z.; Fanghänel, J.; Schutkowski, M.; Lu, K. P.; Fischer, G. Nanomolar Inhibitors of the Peptidyl Prolyl Cis/Trans Isomerase Pin1 from Combinatorial Peptide Libraries. Journal of Medicinal Chemistry 2006, 49, 2147–2150.
24
ACS Paragon Plus Environment
Page 24 of 25
Page 25 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
for Table of Contents use only "SPOT-peptide: Template-based prediction of peptide-binding proteins and peptidebinding sites" Thomas Litfin, Yuedong Yang, and Yaoqi Zhou
Table of Contents Graphic
25
ACS Paragon Plus Environment