Hierarchical Flexible Peptide Docking by Conformer Generation and

May 8, 2018 - The peptide binding site may also be predicted by other programs. .... of the benchmark were directly downloaded from the authors' web s...
0 downloads 0 Views 996KB Size
Subscriber access provided by Kaohsiung Medical University

Bioinformatics

Hierarchical Flexible Peptide Docking by Conformer Generation and Ensemble Docking of Peptides Pei Zhou, Botong Li, Yumeng Yan, Bowen Jin, Libang Wang, and Sheng-You Huang J. Chem. Inf. Model., Just Accepted Manuscript • DOI: 10.1021/acs.jcim.8b00142 • Publication Date (Web): 08 May 2018 Downloaded from http://pubs.acs.org on May 9, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Hierarchical Flexible Peptide Docking by Conformer Generation and Ensemble Docking of Peptides Pei Zhou, Botong Li, Yumeng Yan, Bowen Jin, Libang Wang, and Sheng-You Huang∗

School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, P. R. China



Email: [email protected]; Phone: +86-27-87543881; Fax: +86-027-87556576

1

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Abstract Given the importance of peptide-mediated protein interactions in cellular processes, proteinpeptide docking has received increasing attention. Here, we have developed a Hierarchical flexible Peptide Docking approach through fast generation and ensemble docking of peptide conformations, which is referred to as HPepDock. Tested on the LEADS-PEP benchmark data set of 53 diverse complexes with peptides of 3 to 12 residues, HpepDock performed significantly better than the 11 docking protocols of five small-molecule docking programs (DOCK, AutoDock, AutoDock Vina, Surflex, and GOLD) in predicting near-native binding conformations. HPepDock was also evaluated on the 19 bound/unbound and 10 unbound/unbound protein-peptide complexes of the Glide SP-PEP benchmark and showed an overall better performance than Glide SP-PEP+MMGBSA and FlexPepDock in both bound and unbound docking. HPepDock is computationally efficient and the average running time for docking a peptide is ∼15 minutes with the range from about 1 minute for short peptides to around 40 minutes for long peptides.

2

ACS Paragon Plus Environment

Page 2 of 32

Page 3 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

1 INTRODUCTION Given their important roles in modulating protein-protein interactions, peptides have received increasing attention.1–3 It has been found that nearly 40% of protein-protein interactions are mediated by short peptides.1 Therefore, determining the complex structures of protein-peptide interactions is crucial for studying their molecular mechanism and thus developing drugs targeting the interactions.4, 5 However, due to the high cost and technical difficulties, only a small number of protein-peptide complex structures were experimentally determined,6 compared to the huge number of peptides involved in cellular functions.7, 8 As such, a variety of computational methods3, 9, 10 have been developed to predict the structures of protein-peptide complexes. These methods can be roughly classified into two broad categories: template-based modeling11–16 and molecular docking.17–26 Despite having achieved many successes, template-based modeling relies on the availability of protein-peptide complex templates that are homologous to the protein-peptide complex to be modeled.12–16 Therefore, there is an inherent limitation in template-based modeling for general applications, especially for predicting novel protein-peptide interactions.2 Compared to template-based modeling approaches, docking-based algorithms do not depend on the availability of protein-peptide templates. Given a protein and a peptide, docking-based methods predict their complex structure by sampling and ranking putative peptide binding conformations around a binding site if the binding site is known or globally around the whole protein if the information of binding site is unknown. The peptide binding site may also be predicted by other programs.27–32 Therefore, docking-based approaches are more robust than template-based modeling in terms of general applicability, and consequently have recently obtained considerable advancements.17–26 A big challenge in docking-based methods is how to consider the peptide flexibility because peptides are much more flexible than small molecules.33–37 One way is to fully sample the conformations of a peptide on-the-fly guided by its binding energy score, as used in traditional protein-ligand docking.38–40 However, given the large number of rotatable bonds in peptides, such sampling pro3

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

cesses are computationally prohibitive. Therefore, current docking-based approaches often dock a few 3D structures of a peptide against a protein and then refine a certain number of top binding modes with molecular dynamics (MD) simulations.17, 18, 21 Nevertheless, this type of docking + MD protocols are computationally expensive and may take days for docking a peptide.17, 18, 21 Therefore, with the computational efficiency of current protein-peptide docking methods, it is a challenge to screening a large number of peptides through docking. Another way to consider peptide flexibility is through ensemble docking.41–43 Namely, an ensemble of peptide conformations are first generated by a conformational sampling method and then docked against the protein by regular rigid-ligand docking approaches.41 A few top fits are selected as the predicted binding modes, which can be further refined by a post-docking approach. Because of its high computational efficiency, ensemble docking has been widely used to consider molecular flexibility in both protein-protein and protein-small molecule docking.9, 44, 45 Utilizing our efficient peptide conformer generation algorithm ModPep,46 we have developed a fast hierarchical protein-peptide docking algorithm, which is referred to as HPepDock. HPepDock first generates an ensemble of peptide conformations from sequence with ModPep and then docks the generated peptide conformations into the binding site on the protein with a modified version of MDock.41, 42 HPepDock was evaluated on 53 diverse protein-peptide complexes of the LEADS-PEP benchmark data set34 and 19 bound/unbound and 10 unbound/unbound protein-peptide complexes of the Glide SP-PEP test set for its performance in pose prediction, and compared with other peptide docking approaches. It was shown that HPepDock performed significantly better than the other peptide docking algorithms in predicting near-native conformations. Lessons were also learned from those successful and failed predictions of HPepDock by closely examining their predicted binding modes and native protein-peptide structures.

4

ACS Paragon Plus Environment

Page 4 of 32

Page 5 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

2 MATERIALS AND METHOD Given a protein structure and a peptide sequence, HPepDock predicts their protein-peptide complex structure through a hierarchical procedure of two stages: modeling of peptide conformations from a sequence and docking of the modeled peptide conformations, which are detailed as follows.

2.1 Peptide conformer generation Our ModPep algorithm was used to generate the peptide 3D structures from a sequence.46 Only linear peptides are currently supported, and cyclic peptides are not supported yet. Specifically, given a peptide sequence, the program PSIPRED is first used to predict its secondary structure information (i.e. coil, sheet, or helix).47 Our benchmarking results showed that PSIPRED achieved an average success rate of 85.1%, 78.9%, 53.5% in the secondary structure prediction for coil, helix, and sheet on a test set of 910 peptides, respectively.46 The high success rates in secondary structure prediction warrant the accuracy of predicting peptide 3D structures. In addition, ModPep is also robust in conformer generation due to its stochastically building process and may still be able to generate a correct model through an ensemble of peptide conformations even with a wrongly predicted secondary structure.46 Then, a rotamer is randomly selected from the single-residue rotamer library for the first amino acid. If three or more consecutive amino acids including the current one all have a secondary structure type of helix, a helix fragment will be selected from the helix library according to its probability in the library and added to the peptide structure. The corresponding side chains for the helix fragment are constructed according to the probability of their residue types in the single-residue rotamer library. For all other cases, the residue structure will be built by selecting a rotamer from the two-residue rotamer library according to its probability in the library. The newly added residue or helix fragment is subject to an atomic clash checking. If the newly added residue or fragment has severe clashes with the existing peptide structure, the residue or fragment will be discarded and a rebuilding process is carried out. The structure building process is repeated until the last amino acid of the sequence is 5

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

reached. ModPep is very fast and can generate 100 peptide conformations within one second. The specific commands to generate peptide conformations can be found in Supporting Information.

2.2 Docking algorithm A modified version of MDock was used to dock the generated peptide conformations against the target protein. MDock is a fast protein-ligand docking algorithm for docking ligands against an ensemble of protein structures.41, 42 It uses the similar shape matching algorithm as implemented in the molecular docking program DOCK to sample initial ligand binding orientations within the binding site,40 and then evaluates the binding energy scores of the sampled binding modes with an iterative knowledge-based scoring function.48, 49 Rather than adjusting ligand conformations on-thefly during docking, MDock considers the ligand flexibility by docking multiple conformations of a ligand and thus achieves a good balance between accuracy and speed.43 Specifically, the molecular ˚ from the given ligand was first calculated using the DMS surface for the protein atoms within 5 A program.50 Next, the sphere points that represent the negative images of the molecular shape were generated using the SPHGEN algorithm.40 About 50 sphere points were selected to represent the binding site on the protein. Then, the putative binding modes were sampled by matching the ligand atoms with the sphere points.40 The sampled binding orientations were further optimized and ranked according to their binding energy scores. Since peptides differ significantly from small molecules in size, flexibility, and binding mechanism, we customized the MDock program for peptide-docking purpose. Firstly, we used a reduced peptide model in the matching process for generating putative peptide binding orientations. Namely, each residue was represented by two pseudo atoms corresponding to the CA atom and the center of mass for the other atoms of the residue, in order to reduce the memory consumption and accelerate the sampling process. After the peptide orientations were generated with the reduced model, the all-atom peptide was used to evaluate the binding energy score. Secondly, we replaced the original

6

ACS Paragon Plus Environment

Page 6 of 32

Page 7 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

scoring function for protein-ligand interactions with an iterative knowledge-based scoring function for protein-protein interactions in evaluating the binding energy scores of protein-peptide binding modes.51 During the peptide docking process, an ensemble of 1000 peptide conformations was generated by ModPep for each peptide. Then, all the 1000 generated conformations were docked against the protein into the binding site and ranked according to their binding energy scores. For computational efficiency, the protein was treated as a rigid body in this study. The docking parameters and specific commands to dock a peptide are detailed in Supporting Information.

2.3 Test sets The LEADS-PEP data set, constructed by Hauser and Windshugel,34 was used to evaluate our HPepDock algorithm. The protein and peptide structures of the benchmark were directly downloaded from the authors’ website. LEADS-PEP is a non-redundant data set for assessing the performance of peptide docking protocols. It consists of 53 diverse protein-peptide complexes with peptide length ranging from 3 to 12 residues. The complexes in the benchmark are all X-ray crystal structures with ˚ Proteins in those complexes have a sequence identity below 30%. a resolution of better than 2.0 A. A total of ten protocols from four well-known docking programs, including AutoDock,38 AutoDock Vina,52 Surflex-Dock,53 and GOLD,54 have been evaluated on the benchmark. Therefore, abundant data are available for their peptide docking performances, which can be used as references to assess our protein-peptide docking algorithm. In addition, we also tested our docking algorithm on the Glide SP-PEP benchmark data set which has been used to evaluate the Glide SP-PEP + MM-GBSA peptide docking protocol by TubertBrohman and his colleagues.37 The data set contains 19 protein-peptide complexes with non-αhelical peptides, 10 of which have a unbound or apo protein structure. Extensive comparisons between Glide SP-PEP + MM-GBSA and FlexPepDock have been conducted on this data set.37

7

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

2.4 Evaluation criteria We adopted the same criteria as that in the LEADS-PEP study to evaluate the docking performance of our peptide docking algorithm,34 so that our results can be comparable with those in the LEADSPEP study. The quality of a predicted protein-peptide binding mode was measured by its root-meansquare-deviation (RMSD) from the native structure. Here, the RMSD was calculated based on the backbone atoms (N, CA, C) of the peptide. A prediction was considered as near-native conformation ˚ 55 The success rate was defined as the percentage of the cases if it has a backbone RMSD of ≤2.5 A. with at least one near-native conformation compared to all 53 protein-peptide complex cases in the LEADS-PEP data set when a certain number of top poses were considered.

3 RESULTS AND DISCUSSION 3.1 Docking performance Table 1 shows that HPepDock predicted near-native peptide binding conformations for 28 and 39 of the 53 protein-peptide complexes in the LEADS-PEP data set when the top 1 and 10 binding modes were considered, resulting in a success rate of 52.8% and 73.6%, respectively. When all the 1000 binding modes were considered, the number of successfully predicted complexes increased to 48, resulting in a high success rate of 90.6%. Therefore, the performance of HPepDock may be further improved by using a post-docking approach to increase the rankings of near-native predictions. From Table 1, one can also see that with the increase of peptide length, predicted protein-peptide structures tend to have a larger RMSD. This can be understood because the peptide conformation is more difficult to predict for longer peptides due to the larger number of rotatable bonds.46 Another notable feature in Table 1 is that the predicted binding modes for the targets before 4WLB all had an ˚ when the top 10 binding modes were considered. That means that HPepDock is RMSD of ≤ 2.5 A accurate enough to predict near-native binding conformations for the peptides within 8 residues when 8

ACS Paragon Plus Environment

Page 8 of 32

Page 9 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

the top 10 predictions were considered. For peptides with more than 8 residues, the performance of HPepDock varies between different targets. For example, HPepDock wrongly predicted the peptide ˚ for 2B9H binding mode for 1NTV, but identified a near-native conformation with an RMSD of 2.1 A when the top 10 predictions were considered. To further evaluate the performance of HPepDock, we calculated the success rates of HPepDock ˚ to ≤2.5 A ˚ when the top 1, in predicting the binding modes at different RMSD criteria from ≤0.5 A 10, 20, 100, and 1000 predictions were considered (Figure 1). The results show that the success rates become higher as the RMSD cutoff increases, as expected. However, there is a turning point at the ˚ after which the trend becomes slower. Another interesting observation criterion of RMSD ≤ 2.0 A, is that the success rates do not increase linearly with the number of top considered predictions and the top 10 predictions seem to be the crossover point (Figure 1). Therefore, the RMSD cutoff of 2.0 ˚ for top 10 predictions may be used as a reasonable metric to evaluate the performance of a peptide A docking algorithm. In addition, Figure 1 also shows that the success rate for predicting sub-angstrom ˚ is still not high (only 15.1%) when the top prediction was considered. Part models (i.e. ≤ 1.0 A) of the reason is due to the rigid-docking nature of MDock. Refinement of top predictions can be conducted to achieve more sub-angstrom models in our future study. Figure 2 shows a comparison between the predicted near-native peptide binding modes and the native structures for nine peptides with different lengths and secondary structures when the top 10 predictions were considered. It can be seen from the figure that all the predicted peptide binding conformations overlap with their native structures very well for these nine cases. Nevertheless, our peptide docking algorithm failed in some cases especially for those long peptides with extended structures, which may provide insights into the further improvement of our docking algorithm. Figure 3 shows nine protein-peptide complexes for which our peptide docking algorithm failed to predict any near-native conformations within top 10 predictions. It can be seen from the figure that all the failed cases contain long peptides with more than 8 residues. In general, the failure can be attributed to several reasons or the combination of them. One reason is that there exists a wide binding 9

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 32

pocket which can hold the peptide well at multiple positions in terms of shape complementarity (Figures 3C, 3D, and 3E). Therefore, it is challenging for a scoring function to place a ligand or peptide in the right position in these cases. Another one is the quality of modeled peptide conformations by ModPep. As HPepDock treats the input peptide conformations as rigid bodies during docking, the quality of peptide conformations will directly affect the accuracy of predicted peptide binding modes. ˚ and 6.3 A ˚ for the For example, the poorly modeled peptide conformations result in an RMSD of 5.0 A failed cases of 4J8S and 4DGY (Figures 3H and 3I). In addition, the binding pockets for some targets are formed by a narrow and long groove where the native peptide structure fits tightly (Figures 3B, 3F, and 3G). In these cases, it is difficult to obtain near-native conformations, as a slight difference in the peptide conformation may induce atomic clashes and push the peptide away from the native binding position. To overcome these limitations, improvements are required in both scoring function and peptide conformational sampling. A post-docking process for initially docked results will also be needed to remove the atomic clashes between the peptide and the binding pocket.

3.2 Comparison with other approaches To further validate our peptide docking algorithm, we also compared HPepDock with other peptide docking programs. For a fair comparison of different docking approaches, two criteria should be considered. One is that the docking approaches to be compared should belong to the same category, i.e. local peptide docking or global peptide docking, as it would be unfair for a global docking approach to be compared with a local docking method. The other is that the comparison should be conducted on the same benchmark or data set. Therefore, we have performed comparative evaluations on two benchmark data sets, the LEADS-PEP benchmark data set34 and the Glide SP-PEP test set,37 on which various peptide docking protocols have been evaluated previously.34, 37

10

ACS Paragon Plus Environment

Page 11 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

3.2.1

On the LEADS-PEP data set

Tables 2 and 3 list the docking results of HPepDock, DOCK (version 6.8),56 and the ten peptide docking protocols from four other docking programs, AutoDock,38 AutoDock Vina,52 Surflex,53 and GOLD,54 when the top 1 and 20 predictions were considered, respectively. Here, the results of HPepDock and DOCK were obtained from our own docking calculations. The results for the other ten docking protocols were obtained from the literature.34 For a fair comparison, the docking performance for a specific docking method will be represented by the best one among its docking protocols. It can be seen from Table 2 that HPepDock revealed a significantly better docking performance than the other five docking programs and predicted near-native peptide conformations for 28 cases when the top prediction was considered, followed by 20 cases for Surflex, 16 cases for GOLD, and 10 cases for DOCK, AutoDock, and Vina. Similar trends can be observed in the docking results for top 20 predictions (Table 3). Here, HPepDock also showed a significantly better docking performance than the other five docking programs and resulted in near-native peptide binding conformations for 39 protein-peptide complexes, followed by 29 complexes for Surflex, 28 complexes for Vina and GOLD, 15 complexes for DOCK, and 12 complexes for AutoDock (Table 3). Figure 4 shows that HPepDock obtained a lower success rate than some docking protocols like ˚ and RMSD ≤ 1.0 A ˚ when the top predicSurflex:SA and GOLD:GS at the criteria of RMSD ≤ 0.5 A ˚ HPepDock performs better tion was considered (Figure 4A). However, at a RMSD cut-off of 1.5 A, than all the other docking protocols (Figure 4A). These results can be understood because HPepDock consider the peptide flexibility by rigidly docking multiple peptide conformations. In such cases, it is challenging to obtain an excellent fit between the protein and the peptide for the top prediction. When more binding conformations are considered, the peptide flexibility is better described and the ˚ will become higher. Correspondpossibility for achieving sub-angstrom poses (i.e. RMSD ≤ 1.0 A) ingly, the performance of HPepDock in predicting sub-angstrom models will become relatively better compared to those of the other docking protocols when the more binding predictions are considered,

11

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 32

as shown in Figure 4B vs. Figure 4A. The significantly worse performance of the other five docking algorithms compared to HPepDock can be understood because the five docking programs, DOCK, AutoDock, Vina, Surflex, and GOLD were originally developed for protein-ligand docking. Given that peptides significantly differ from ligands in size, flexibility, and composition, traditional protein-ligand docking algorithms may suffer from several difficulties when docking peptides. One is in sampling peptide conformations because peptides have many more rotatable bonds and thus are much more flexible than ligand compounds. The other is in evaluating the binding energy score between the protein and the peptide because protein-peptide interactions are more like protein-protein interactions than protein-ligand interactions. In addition, due to the smaller size of ligands than peptides, protein-ligand docking tends to depend on the shape of binding pocket much more than protein-peptide docking. In contrast, HPepDock has been designed to address these challenges by taking advantage of our efficient peptide conformer generation algorithm ModPep and an iterative knowledge-based scoring function for protein-protein interactions. Consequently, HPepDock predicted near-native conformations on significantly more cases than the other five docking programs did. For example, HPepDock predicted ˚ for 1N7F, while the other five a near-native peptide binding conformation with an RMSD of 1.0 A docking algorithms failed on this case when the top 20 predictions were considered (Table 3). A close examination of this complex shows that the native peptide binds to a relatively open binding pocket on the protein and some residues of the peptide are located outside of the binding pocket (Figure 2E). In such case, it is difficult to determine the binding mode of the peptide. Therefore, a scoring function that is designed for protein-ligand interactions may not be able to describe such protein-peptide interactions and thus led to a failure for the five protein-ligand docking algorithms. In addition, the advantages of our protein-peptide docking algorithm can be further supported by comparing the docking performances between HPepDock and DOCK as both programs use a similar shape matching algorithm for generating peptide binding orientations. The same set of sphere points has been used to represent the binding pocket for HPepDock and DOCK. Therefore, the better 12

ACS Paragon Plus Environment

Page 13 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

performance of HPepDock compared to DOCK is likely due to the unique peptide docking features in HPepDock: efficient conformer generation for peptides and an iterative knowledge-based scoring function for protein-protein interactions. Despite the relatively worse performance of the five protein-ligand docking algorithms compared to HPepDock, protein-ligand docking algorithms still possess advantages on certain cases. For example, HPepDock failed to give any near-native peptide binding conformations within the top 20 predictions for 1H6W and 1N12, while Surflex and GOLD:GS obtained successful predictions in these two cases (Figures 3B and 3F). Close examination of these two cases shows that the binding pockets are all very long and narrow, where the native peptide structure fits tightly. In such cases, it is a challenge for HPepDock to obtain a near-native conformation because a slight difference from the native peptide structure will cause severe atomic clashes and lead to a failure due to the rigid docking feature of MDock. In contrast, docking programs like Surflex and GOLD adjust the peptide binding conformations within the binding pocket on-the-fly and therefore can optimize the shape fit between the peptide and the binding pocket during the docking process, resulting in near-native predictions. In other words, protein-ligand docking algorithms like Surflex and GOLD may work relatively well on such shape-dominant peptide binding pockets.

3.2.2

On the Glide SP-PEP data set

In addition, we compared the performance of our HPepDock algorithm with those of the Glide SPPEP docking protocols and FlexPepDock for both bound/unbound and unbound/unbound docking on the Glide SP-PEP test set. Here, bound/unbound docking means docking between a bound protein structure and a peptide sequence, and unbound/unbound docking stands for docking between a unbound protein structure and a peptide sequence. In order to compare our results with the literature data,37 we used the interface RMSD (iRMSD) to measure the accuracy of a peptide binding pre˚ and 3.0 A ˚ to define the success of a prediction.37 Here, the diction and the RMSD cutoffs of 2.0 A ˚ from the protein. interface RMSD was calculated based on those peptide residues within 8 A 13

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 32

Table 4 lists the best interface RMSDs of peptide binding modes within top 10 predictions by HPepDock on the test set of 19 bound/unbound protein-peptide complexes. As a reference, the corresponding results for three Glide peptide docking protocols and FlexPepDock are also shown in the table. It can be seen from Table 4 that HPepDock predicted the peptide binding modes with iRMSD ˚ for 12 of 19 bound/unbound cases, compared to 10 cases for the GlideScore and Emodel ≤ 2.0 A protocols of Glide, 11 cases for the MM-GBSA protocol of Glide, and 13 cases for FlexPepDock. ˚ was used, HPepDock achieved a higher performance However, when the criterion of iRMSD ≤ 3.0 A than the other four peptide docking protocol, and obtained correct predictions for 15 of 19 test cases, compared to 13 cases for the other four protocols. Table 5 shows the interface RMSDs of best peptide binding modes within top 10 predictions by HPepDock on the test set of 10 unbound/unbound protein-peptide complexes. The corresponding results for Glide SP-PEP+MM-GBSA and FlexPepDock are also listed in the table. It can be seen from Table 5 that HPepDock predicted correct binding modes for 5 of the 10 unbound/unbound cases at iRMSD ≤ 2.0, compared to 4 cases for Glide SP-PEP+MM-GBSA and 6 cases for FlexPepDock. ˚ was used, HPepDock obtained a better performance than When the criterion of iRMSD ≤ 3.0 A the other two peptide docking protocols, and predicted correct binding modes for 8 of 10 test cases, compared to 4 cases for Glide SP-PEP+MM-GBSA and 6 cases for FlexPepDock. Considering both the bound/unbound and unbound/unbound docking results on the Glide SP-PEP test sets, HPepDock performs better that the Glide SP-PEP peptide docking protocols and slightly ˚ was used. However, HPepDock outworse than FlexPepDock when the criterion of iRMSD ≤ 2.0 A ˚ was used. In addition, it should performed all docking protocols when the criterion of iRMSD 3.0 A be noted that Glide SP-PEP+MM-GBSA consumes a total computational time of approximately 24 hours on a single CPU and FlexPepDock takes months of CPU time per polypeptide.37 As a comparison, HPepDock finishes a peptide docking calculation on average within ∼15 mins. Overall, HPepDock achieved the best performance among the three peptide docking programs when taking both the docking performance and computational efficiency into account. 14

ACS Paragon Plus Environment

Page 15 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

3.3 Computational efficiency HPepDock is computationally efficient. The conformer generation algorithm of peptides is very fast and can normally construct 100 peptide conformations in less than one second. Therefore, the computing time is mostly consumed in the docking step. Figure 5 shows the average running time for docking 1000 conformations of a peptide on the LEADS-PEP data set, where the docking calculaR Xeon R CPU E5-2690 v4 @ 2.60 GHz) of a tions were conducted on a single CPU core (Intel

Linux x86 64 cluster. As expected, the running time increases with increasing peptide length. The running time varies between ∼1 min for the peptide 2OY2 and ∼40 mins for the peptides of 11 residues. In addition, since HPepDock uses the shape matching algorithm for orientational sampling during docking, HPepDock tends to consume more time to generate orientations that pass atomic clash checking for narrow pockets. For example, the running time for the peptides of 11 residues is significantly higher than that for the peptides of 12 residues (Figure 5), because of the much narrower binding site of the former one, as shown in targets 3BFW and 1N12 (Figures 2H and 3F). On average, HPepDock is able to dock a peptide within ∼15 minutes.

4 CONCLUSION We have developed a fast hierarchical algorithm for flexible peptide-protein docking, which is referred to as HPepDock, by first modeling an ensemble of peptide conformations from sequence and then docking the modeled peptide conformations into the binding site on the protein. Benefiting from our efficient peptide conformation modeling method ModPep and the iterative knowledge-based scoring function for protein-protein interactions, HPepDock achieved a significantly better performance than the other five protein-ligand docking programs (DOCK, AutoDock, AutoDock Vina, Surflex, and GOLD) in predicting near-native peptide binding conformations on the LEADS-PEP benchmark data set of 53 diverse protein-peptide complexes. When the top 1 and 10 predictions were considered, HPepDock obtained a success rate of 52.8% and 73.6% in binding mode prediction, respectively. 15

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 32

The results also showed that HPepDock was successful in predicting near-native conformations for the peptides of eight or fewer residues. Tested on the 19 bound/unbound and 10 unbound/unbound protein-peptide complexes of the Glide SP-PEP benchmark, HPepDock also showed an overall better performance than Glide SP-PEP+MM-GBSA and FlexPepDock in both bound and unbound cases. HPepDock is computationally efficient and docks a flexible peptide on average in ∼15 minutes with a range from one minute for short peptides to 40 minutes for long peptides. Due to its fast speed and good accuracy, HPepDock will be beneficial for high throughput peptide docking studies.

16

ACS Paragon Plus Environment

Page 17 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Supporting Information Supporting Information Available: The detailed commands and parameters used for conformer generation and docking by HPepDock.

ACKNOWLEDGEMENTS This work is supported by the National Key Research and Development Program of China (grant Nos. 2016YFC1305800 and 2016YFC1305805), the National Natural Science Foundation of China (grant No. 31670724), and the startup grant of Huazhong University of Science and Technology (grant No. 3004012104).

Conflict of interest statement. None declared.

17

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 32

References (1) Petsalaki, E.; Russell, R. B. Peptide-Mediated Interactions in Biological Systems: New Discoveries and Applications. Curr. Opin. Biotechnol. 2008, 19, 344-350. (2) Lensink, M. F.; Velankar, S.; Wodak, S. J. Modeling Protein-Protein and Protein-Peptide Complexes: CAPRI 6th Edition. Proteins 2017, 85, 359-377. (3) London, N.; Raveh, B.; Schueler-Furman, O. Peptide Docking and Structure-Based Characterization of Peptide Binding: from Knowledge to Know-How. Curr. Opin. Struct. Biol. 2013, 23, 894-902. (4) Fosgerau, K.; Hoffmann, T. Peptide Therapeutics: Current Status and Future Directions. Drug Discov. Today 2015, 20, 122-128. (5) Craik, D. J.; Fairlie, D. P.; Liras, S.; Price, D. The Future of Peptide-Based Drugs. Chem. Biol. Drug Des. 2013, 81, 136-147. (6) Berman, H. M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T. N.; Weissig, H.; Shindyalov, I. N.; Bourne, P. E. The Protein Data Bank. Nucleic Acids Res. 2000, 28, 235-242. (7) Rey, J.; Deschavanne, P.; Tuffery, P. BactPepDB: a Database of Predicted Peptides from a Exhaustive Survey of Complete Prokaryote Genomes. Database (Oxford) 2014, 2014, bau106. (8) Vetter, I.; Davis, J. L.; Rash, L. D.; Anangi, R.; Mobli, M.; Alewood, P. F.; Lewis, R. J.; King, G. F. Venomics: a New Paradigm for Natural Products-Based Drug Discovery. Amino Acids 2011, 40, 15-28. (9) Huang, S.-Y. Search Strategies and Evaluation in Protein-Protein Docking: Principles, Advances and Challenges. Drug Discov. Today 2014, 19, 1081-1096. (10) Huang, S.-Y. Exploring the Potential of Global Protein-Protein Docking: an Overview and Critical Assessment of Current Programs for Automatic Ab Initio Docking. Drug Discov. Today 2015, 20, 969-977. (11) Yan, Y.; Zhang, D.; Zhou, P.; Li, B.; Huang, S.-Y. HDOCK: a Web Server for Protein-Protein and ProteinDNA/RNA Docking Based on a Hybrid Strategy. Nucleic Acids Res. 2017, 45, W365CW373 (12) Lee, H.; Heo, L.; Lee, M. S.; Seok, C. GalaxyPepDock: a Protein-Peptide Docking Tool Based on Interaction Similarity and Energy Optimization. Nucleic Acids Res. 2015, 43(W1), W431-W435. (13) Yan, Y.; Wen, Z.; Wang, X.; Huang, S.-Y. Addressing Recent Docking Challenges: a Hybrid Strategy to Integrate Template-Based and Free Protein-Protein Docking. Proteins 2017, 85, 497-512. (14) Lee, H.; Baek, M.; Lee, G. R.; Park, S.; Seok, C. Template-Based Modeling and Ab Initio Refinement of Protein Oligomer Structures Using GALAXY in CAPRI Round 30. Proteins 2017, 85, 399-407. (15) Xue, L. C.; Rodrigues, J. P. G. L. M.; Dobbs, D.; Honavar, V.; Bonvin, A. M. J. J. Template-Based Protein-Protein Docking Exploiting Pairwise Interfacial Residue Restraints. Brief. Bioinform. 2017, 18, 458-466.

18

ACS Paragon Plus Environment

Page 19 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(16) Szilagyi, A.; Zhang, Y. Template-Based Structure Modeling of Protein-Protein Interactions. Curr. Opin. Struct. Biol. 2014, 24, 10-23. (17) Yan, C.; Xu, X.; Zou, X. Fully Blind Docking at the Atomic Level for Protein-Peptide Complex Structure Prediction. Structure 2016, 24, 1842-1853. (18) Schindler, C. E.; de Vries, S. J.; Zacharias, M. Fully Blind Peptide-Protein Docking with Pepattract. Structure 2015, 23, 1507-1515. (19) Pallara, C.; Jimenez-Garcia, B.; Romero, M.; Moal, I. H.; Fernandez-Recio, J. Pydock Scoring for the New Modeling Challenges in Docking: Protein-Peptide, Homo-Multimers, and Domain-Domain Interactions. Proteins 2017, 85, 487-496. (20) Kurcinski, M.; Jamroz, M.; Blaszczyk, M.; Kolinski, A.; Kmiecik, S. CABS-Dock Web Server for the Flexible Docking of Peptides to Proteins without Prior Knowledge of the Binding Site. Nucleic Acids Res. 2015, 43(W1), W419-W424. (21) Trellet, M.; Melquiond, A. S.; Bonvin, A. M. A Unified Conformational Selection and Induced Fit Approach to Protein-Peptide Docking. PLoS One 2013, 8, e58769. (22) Ben-Shimon, A.; Niv, M. Y. AnchorDock: Blind and Flexible Anchor-Driven Peptide Docking. Structure 2015, 23, 929-940. (23) Raveh, B.; London, N.; Zimmerman, L.; Schueler-Furman, O. Rosetta Flexpepdock Ab-Initio: Simultaneous Folding, Docking and Refinement of Peptides onto Their Receptors. PLoS One 2011, 6, e18934. (24) Donsky, E.; Wolfson, H. J. Pepcrawler: A Fast RRT-Based Algorithm for High-Resolution Refinement and Binding Affinity Estimation of Peptide Inhibitors. Bioinformatics 2011, 27, 2836-2842. (25) Antes, I. DynaDock: A New Molecular Dynamics-Based Algorithm for Protein-Peptide Docking Including Receptor Flexibility. Proteins 2010, 78, 1084-1104. (26) Pierce, B. G.; Weng, Z. A Flexible Docking Approach for Prediction of T Cell Receptor-Peptide-Mhc Complexes. Protein Sci. 2013, 22, 35-46. (27) Taherzadeh, G.; Zhou, Y.; Liew, A.W.; Yang, Y. Structure-Based Prediction of Protein-Peptide Binding Regions Using Random Forest. Bioinformatics 2018, 34, 477-484. (28) Yan, C.; Zou, X. Predicting Peptide Binding Sites on Protein Surfaces by Clustering Chemical Interactions. J. Comput. Chem. 2015, 36, 49-61. (29) Lavi, A.; Ngan, C.H.; Movshovitz-Attias, D.; Bohnuud, T.; Yueh, C.; Beglov, D.; Schueler-Furman, O.; Kozakov, D. Detection of Peptide-Binding Sites on Protein Surfaces: The First Step toward the Modeling and Targeting of Peptide-Mediated Interactions. Proteins 2013 81, 2096-105.

19

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 32

(30) Saladin, A.; Rey, J.; Thevenet, P.; Zacharias, M.; Moroy, G.; Tuffery, P. PEP-Sitefinder: A Tool for the Blind Identification of Peptide Binding Sites on Protein Surfaces. Nucleic Acids Res. 2014, 42, W221-W226. (31) Trabuco, L.G.; Lise, S.; Petsalaki, E.; Russell, R.B. Pepsite: Prediction of Peptide-Binding Sites from Protein Surfaces. Nucleic Acids Res. 2012, 40, W423-7. (32) Petsalaki, E.; Stark, A.; Garcia-Urdiales, E.; Russell, R.B. Accurate Prediction of Peptide Binding Sites on Protein Surfaces. PLoS Comput. Biol. 2009, 5, e1000335. (33) Rentzsch, R.; Renard, B. Y. Docking Small Peptides Remains a Great Challenge: An Assessment Using Autodock Vina. Brief. Bioinform. 2015, 16, 1045-1056. (34) Hauser, A. S.; Windshugel, B. LEADS-PEP: A Benchmark Data Set for Assessment of Peptide Docking Performance. J. Chem. Inf. Model. 2016, 56, 188-200. (35) Sacquin-Mora, S.; Prevost, C. Docking Peptides on Proteins: How to Open a Lock, in The Dark, with a Flexible Key. Structure 2015, 23, 1373-1374. (36) Friesner, R. A.; Banks, J. L.; Murphy, R. B.; Halgren, T. A.; Klicic, J. J.; Mainz, D. T.; Repasky, M. P.; Knoll, E. H.; Shelley, M.; Perry, J. K.; Shaw, D. E.; Francis, P.; Shenkin, P. S. Glide: A New Approach for Rapid, Accurate Docking and Scoring. 1. Method and Assessment of Docking Accuracy. J. Med. Chem. 2004, 47, 1739-1749. (37) Tubert-Brohman, I.; Sherman, W.; Repasky, M.; Beuming, T. Improved Docking of Polypeptides with Glide. J. Chem. Inf. Model. 2013, 53, 1689-1699. (38) Morris, G. M.; Goodsell, D. S.; Halliday, R. S.; Huey, R.; Hart, W. E.; Belew, R. K.; Olson, A. J. Automated Docking Using a Lamarckian Genetic Algorithm and an Empirical Binding Free Energy Function. J. Comp. Chem. 1998, 19, 1639-1662. (39) Staneva, I.; Wallin, S. All-Atom Monte Carlo Approach to Protein-Peptide Binding. J. Mol. Biol. 2009, 393, 1118-1128. (40) Ewing, T. J.; Makino, S.; Skillman, A. G.; Kuntz, I. D. DOCK 4.0: Search Strategies for Automated Molecular Docking of Flexible Molecule Databases. J. Comput. Aided Mol. Des. 2001, 15, 411-428. (41) Huang, S.-Y.; Zou, X. Ensemble Docking of Multiple Protein Structures: Considering Protein Structural Variations in Molecular Docking. Proteins 2007, 66, 399-421. (42) Huang, S.-Y.; Zou, X. Efficient Molecular Docking of NMR Structures: Application to Hiv-1 Protease. Protein Sci. 2007, 16, 43-51. (43) Huang, S.-Y.; Zou, X. Construction and Test of Ligand Decoy Sets Using Mdock: Community StructureActivity Resource Benchmarks for Binding Mode Prediction. J. Chem. Inf. Model. 2011, 51, 2107-2114.

20

ACS Paragon Plus Environment

Page 21 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(44) Huang, S.-Y.; Zou, X. Advances and Challenges in Protein-Ligand Docking. Int. J. Mol. Sci. 2010, 11, 3016-3034 . (45) Huang, S.-Y.; Grinter, S. Z.; Zou, X. Scoring Functions and Their Evaluation Methods for Protein-Ligand Docking: Recent Advances and Future Directions. Phys. Chem. Chem. Phys. 2010, 12, 12899-12908. (46) Yan, Y.; Zhang, D.; Huang, S.-Y. Efficient Conformational Ensemble Generation of Protein-Bound Peptides. J. Cheminform. 2017, 9, 59. (47) Jones, D. T. Protein Secondary Structure Prediction Based on Position-Specific Scoring Matrices. J. Mol. Biol. 1999, 292, 195-202. (48) Huang, S.-Y.; Zou, X. An Iterative Knowledge-Based Scoring Function to Predict Protein-Ligand Interactions: I. Derivation of Interaction Potentials. J. Comput. Chem. 2006, 27, 1866-75. (49) Huang, S.-Y.; Zou, X. An Iterative Knowledge-Based Scoring Function to Predict Protein-Ligand Interactions: II. Validation of the Scoring Function. J. Comput. Chem. 2006, 27, 1876-82. (50) Richards, F. M. Areas, Volumes, Packing and Protein Structure. Annu. Rev. Biophys. Bioeng. 1977, 6, 151-176. (51) Huang, S.-Y.; Zou, X. An Iterative Knowledge-Based Scoring Function for Protein-Protein Recognition. Proteins 2008, 72, 557-79. (52) Trott, O.; Olson, A. J. AutoDock Vina: Improving the Speed and Accuracy of Docking with a New Scoring Function, Efficient Optimization and Multithreading. J. Comput. Chem. 2010, 31, 455-461. (53) Jain, A. N. Surflex: Fully Automatic Flexible Molecular Docking Using a Molecular Similarity-Based Search Engine. J. Med. Chem. 2003, 46, 499-511. (54) Jones, G.; Willett, P.; Glen, R. C. Molecular Recognition of Receptor Sites Using a Genetic Algorithm with a Description of Desolvation. J. Mol. Biol. 1995, 245, 43-53. (55) Audie, J.; Swanson, J. Recent Work in the Development and Application of Protein-peptide Docking. Future Med. Chem. 2012, 4, 1619.1644. (56) Allen, W.J.; Balius, T.E.; Mukherjee, S.; Brozell, S.R.; Moustakas, D.T.; Lang, P.T.; Case, D.A.; Kuntz, I.D.; Rizzo, R.C. DOCK 6: Impact of New Features and Current Docking Performance. J. Comput. Chem. 2015, 36, 1132-56.

21

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 32

Table 1: The RMSDs of the predicted peptide binding modes by HPepDock when several numbers of top predictions were considered, where the RMSDs are shown in a gradient color code from green (0 ˚ to red (≥10 A). ˚ A) Target

Number of predictions

Res.

PDB

3 3 3 3 3 3 4 4 4 4 4 5 5 5 5 5 5 6 6 6 6 6 6 7 7 7 7 7 8 8 8 8 8 9 9 9 9 9 10 10 10 10 10 11 11 11 11 11 12 12 12 12 12

1B9J 2OY2 3GQ1 3BS4 2OXW 2B6N 1TW6 3VQG 1UOP 4C2C 4J44 2HPL 2V3S 3NFK 1NVR 4V3I 3T6R 1SVZ 3D1E 3IDG 3LNY 4NNM 4Q6H 3MMG 3Q47 3UPV 4QBR 3NJG 1ELW 3CH8 4WLB 1OU8 1N7F 3OBQ 4BTB 2W0Z 4N7H 2QAB 1H6W 3BRL 1NTV 4DS1 2O02 1N12 2XFX 3BFW 4EIK 3DS1 4J8S 2W10 3JZO 4DGY 2B9H

Near-natives

1

10

20

50

100

1000

0.5 0.9 2.9 2.2 1.3 6.1 0.7 0.7 6.1 1.3 0.9 1.3 1.8 1.4 9.9 4.5 1.9 2.3 2.2 1.8 0.8 2.1 2.9 1.6 6.9 5.1 1.3 1.2 2.5 1.1 0.9 2.9 4.3 2.8 11.4 1.6 14.1 1.9 3.6 8.4 15.7 6.3 7.6 23.3 7.0 2.0 5.5 1.1 5.0 15.4 13.0 7.1 2.1

0.4 0.4 0.5 1.2 1.0 1.6 0.3 0.4 1.3 0.5 0.6 1.2 1.4 0.8 1.5 1.8 1.0 1.9 1.1 1.8 0.8 0.9 1.1 1.4 1.9 2.4 1.3 0.5 1.7 1.1 0.7 2.9 1.4 2.0 3.1 1.1 2.7 1.2 3.6 5.4 7.8 1.6 4.9 4.3 4.5 1.0 3.1 0.7 5.0 2.8 4.2 6.3 2.1

0.4 0.4 0.5 1.2 0.8 1.6 0.3 0.4 1.3 0.5 0.6 1.2 1.4 0.8 0.9 1.8 1.0 1.8 1.1 1.8 0.7 0.9 1.1 1.4 1.9 1.7 1.3 0.5 1.4 1.1 0.6 2.9 1.0 2.0 3.1 1.1 2.7 0.6 3.6 3.8 4.9 1.6 4.2 4.3 3.5 1.0 3.1 0.7 5.0 2.8 2.6 5.7 2.1

0.4 0.4 0.5 1.2 0.8 0.9 0.3 0.4 0.6 0.5 0.5 1.2 1.1 0.7 0.9 1.5 1.0 1.8 1.1 1.6 0.7 0.9 0.8 1.4 1.3 1.7 0.9 0.5 0.9 1.1 0.5 2.2 1.0 1.4 1.9 1.1 2.7 0.5 3.6 3.8 3.3 1.6 4.0 4.3 3.5 0.9 2.9 0.7 3.9 2.8 2.6 5.0 2.1

0.4 0.4 0.5 1.2 0.4 0.7 0.3 0.4 0.6 0.5 0.5 0.7 1.1 0.5 0.9 1.5 1.0 1.8 1.1 1.6 0.7 0.9 0.5 1.4 1.3 1.7 0.9 0.5 0.8 1.1 0.5 2.2 1.0 1.4 1.1 1.1 2.2 0.5 3.6 2.1 3.1 1.6 3.4 4.3 3.5 0.9 2.9 0.7 3.9 2.4 1.9 5.0 2.1

0.3 0.4 0.5 0.7 0.4 0.3 0.3 0.4 0.6 0.5 0.3 0.7 0.9 0.5 0.8 0.9 1.0 1.7 1.1 1.4 0.7 0.9 0.5 1.4 1.1 1.7 0.9 0.5 0.8 1.1 0.3 1.9 0.8 1.4 1.1 1.0 2.2 0.5 3.6 1.9 1.8 1.6 2.5 4.3 3.5 0.9 1.9 0.7 3.8 2.4 1.6 4.6 2.1

28

39

39

41

45

48

22

ACS Paragon Plus Environment

Page 23 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Table 2: Comparison of the docking results of HPepDock, DOCK, and the ten protocols of four other docking programs on the LEADS-PEP benchmark data set when the best-scored prediction was ˚ to red (≥10 A). ˚ considered, where the RMSDs are shown in a gradient color code from green (0 A) Docking protocols: SA, standard accuracy; HA, high accuracy; ASP, Astex Statistical Potential; CP, ChemPLP; CS, ChemScore; GS, GoldScore.34 Target Res.

PDB

3 3 3 3 3 3 4 4 4 4 4 5 5 5 5 5 5 6 6 6 6 6 6 7 7 7 7 7 8 8 8 8 8 9 9 9 9 9 10 10 10 10 10 11 11 11 11 11 12 12 12 12 12

1B9J 2OY2 3GQ1 3BS4 2OXW 2B6N 1TW6 3VQG 1UOP 4C2C 4J44 2HPL 2V3S 3NFK 1NVR 4V3I 3T6R 1SVZ 3D1E 3IDG 3LNY 4NNM 4Q6H 3MMG 3Q47 3UPV 4QBR 3NJG 1ELW 3CH8 4WLB 1OU8 1N7F 3OBQ 4BTB 2W0Z 4N7H 2QAB 1H6W 3BRL 1NTV 4DS1 2O02 1N12 2XFX 3BFW 4EIK 3DS1 4J8S 2W10 3JZO 4DGY 2B9H

Near-natives

HPepDock

DOCK

AutoDock

Vina

Surflex

GOLD

SA

HA

SA

HA

SA

HA

ASP

CP

CS

GS

0.5 0.9 2.9 2.2 1.3 6.1 0.7 0.7 6.1 1.3 0.9 1.3 1.8 1.4 9.9 4.5 1.9 2.3 2.2 1.8 0.8 2.1 2.9 1.6 6.9 5.1 1.3 1.2 2.5 1.1 0.9 2.9 4.3 2.8 11.4 1.6 14.1 1.9 3.6 8.4 15.7 6.3 7.6 23.3 7.0 2.0 5.5 1.1 5.0 15.4 13.0 7.1 2.1

1.4 8.0 0.9 8.0 9.5 8.7 2.1 7.4 8.4 1.1 2.4 7.6 9.2 8.6 3.4 10.2 8.5 6.4 9.9 8.5 10.5 1.3 10.3 0.8 3.8 4.0 7.5 0.8 3.7 0.7 6.2 4.6 10.3 5.3 1.9 10.4 4.9 8.6 10.0 2.5 7.9 12.4 8.1 19.2 6.3 19.4 12.1 8.1 8.5 16.7 9.8 10.3 15.8

1.3 2.4 0.9 3.3 1.8 8.1 1.3 2.9 5.4 1.2 1.5 1.7 7.3 3.6 6.8 4.8 7.3 6.9 1.3 6.2 5.0 9.2 5.1 11.5 9.4 5.3 7.9 11.1 5.3 12.7 9.7 10.8 7.1 12.2 1.9 9.4 7.5 3.8 17.1 15.6 10.9 6.7 3.7 17.6 11.5 10.1 8.1 11.5 5.7 15.2 7.0 8.0 6.1

1.1 0.5 2.5 0.5 3.4 8.4 1.3 2.8 0.6 1.0 1.0 6.9 3.9 4.2 7.3 9.7 4.4 5.1 9.4 6.3 7.4 9.7 10.7 10.4 6.7 4.7 8.6 2.5 3.2 7.8 5.3 10.2 11.9 12.5 14.7 11.3 7.4 4.3 13.9 11.3 4.9 5.5 5.0 12.6 15.6 11.5 5.3 5.8 7.3 15.1 5.8 9.3 13.2

0.9 1.1 1.4 0.6 6.9 7.9 0.7 0.3 6.3 0.5 0.8 3.2 5.6 8.7 5.7 6.3 6.6 8.0 12.2 4.9 2.5 0.9 10.0 6.7 6.8 5.0 2.5 5.7 8.2 5.2 6.0 8.9 13.6 14.8 15.5 14.3 12.2 8.7 5.6 6.9 7.1 11.6 8.8 17.9 12.1 9.1 5.5 5.6 7.7 15.5 9.3 9.7 16.4

1.0 7.2 1.6 0.7 6.8 7.8 1.0 0.6 6.4 0.6 0.8 2.8 5.6 8.4 9.0 6.4 7.2 8.0 10.6 7.2 11.3 0.8 9.9 1.2 9.9 4.9 11.3 2.6 9.2 5.4 6.3 7.5 14.3 14.5 15.5 14.3 12.1 4.4 3.2 4.4 4.7 17.5 4.9 16.8 2.0 18.5 7.8 5.2 11.4 15.4 6.1 8.9 9.5

0.3 7.1 0.9 0.5 7.1 8.0 0.8 2.1 6.5 0.7 1.2 2.4 10.9 0.5 8.3 4.8 7.3 2.7 11.0 1.4 10.7 1.8 9.3 1.3 6.3 3.2 1.0 3.4 2.5 12.6 6.4 1.7 9.5 2.2 8.6 1.3 11.8 5.5 1.1 17.4 15.4 4.6 11.0 0.6 14.5 0.3 4.8 12.5 17.7 4.7 13.9 10.4 15.6

0.4 7.1 0.9 0.4 7.1 7.9 0.9 0.7 6.5 0.7 0.9 7.4 11.2 6.7 9.1 3.1 7.4 6.5 9.5 5.0 11.3 3.3 8.7 2.1 7.7 2.5 1.2 0.4 2.5 5.4 5.0 4.7 9.2 2.2 8.6 4.4 6.8 4.5 2.6 3.1 15.3 1.6 12.0 1.3 1.4 0.4 4.6 12.6 13.9 4.8 13.4 7.8 10.2

0.3 0.7 1.3 5.1 9.0 7.4 0.7 7.7 4.7 0.9 0.4 8.1 1.4 2.9 9.2 7.5 3.2 9.1 5.7 4.8 0.6 3.3 9.5 9.0 6.1 7.0 11.0 3.2 5.6 7.5 5.7 9.2 6.0 5.1 16.3 3.2 9.6 10.7 20.0 3.4 3.8 17.3 12.9 17.6 13.8 6.0 7.0 9.2 4.9 16.2 13.1 9.3 5.0

0.6 0.6 1.4 0.3 7.2 7.7 0.6 6.4 5.1 0.9 0.6 8.5 1.3 8.8 9.1 6.5 0.8 8.4 1.4 7.1 1.1 1.5 5.2 6.6 5.1 3.5 7.2 3.3 3.4 10.5 5.2 11.4 9.0 4.6 16.0 5.9 5.9 5.1 19.5 6.0 8.4 18.0 12.0 18.7 15.6 20.7 3.3 7.6 11.5 17.7 13.1 9.3 7.6

0.5 0.6 1.3 0.7 9.3 2.2 0.9 0.8 4.9 0.8 0.5 2.5 4.3 7.8 9.3 4.0 1.7 4.3 0.4 6.9 0.6 0.9 1.8 7.7 8.7 10.6 13.7 1.2 3.0 12.2 4.3 6.1 8.9 4.9 9.5 4.3 9.8 10.4 20.2 8.5 13.4 4.5 12.5 13.8 12.4 4.8 3.9 10.8 13.6 17.4 13.4 12.6 12.2

0.4 0.4 4.4 0.9 6.8 8.6 0.4 0.7 0.4 1.0 0.8 7.3 1.6 3.4 5.2 7.3 0.7 6.4 9.6 9.7 3.9 1.5 2.8 1.3 7.7 4.6 1.9 2.7 3.5 6.5 6.5 3.9 8.4 5.5 9.7 14.3 2.2 4.8 1.5 2.5 13.9 5.4 4.0 4.5 7.0 19.8 4.1 8.0 14.2 5.6 9.8 8.9 4.3

28

10

10

9

10

10

20

16

8

12

16

16

23

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 32

Table 3: Comparison of the docking results of HPepDock, DOCK, and the ten protocols of four other docking programs on the LEADS-PEP benchmark data set when the top 20 predictions were ˚ to red (≥10 A). ˚ considered, where the RMSDs are shown in a gradient color code from green (0 A) Docking protocols: SA, standard accuracy; HA, high accuracy; ASP, Astex Statistical Potential; CP, ChemPLP; CS, ChemScore; GS, GoldScore.34 Target Res.

PDB

3 3 3 3 3 3 4 4 4 4 4 5 5 5 5 5 5 6 6 6 6 6 6 7 7 7 7 7 8 8 8 8 8 9 9 9 9 9 10 10 10 10 10 11 11 11 11 11 12 12 12 12 12

1B9J 2OY2 3GQ1 3BS4 2OXW 2B6N 1TW6 3VQG 1UOP 4C2C 4J44 2HPL 2V3S 3NFK 1NVR 4V3I 3T6R 1SVZ 3D1E 3IDG 3LNY 4NNM 4Q6H 3MMG 3Q47 3UPV 4QBR 3NJG 1ELW 3CH8 4WLB 1OU8 1N7F 3OBQ 4BTB 2W0Z 4N7H 2QAB 1H6W 3BRL 1NTV 4DS1 2O02 1N12 2XFX 3BFW 4EIK 3DS1 4J8S 2W10 3JZO 4DGY 2B9H

Near-natives

HPepDock

DOCK

AutoDock

Vina

Surflex

GOLD

SA

HA

SA

HA

SA

HA

ASP

CP

CS

GS

0.4 0.4 0.5 1.2 0.8 1.6 0.3 0.4 1.3 0.5 0.6 1.2 1.4 0.8 0.9 1.8 1.0 1.8 1.1 1.8 0.7 0.9 1.1 1.4 1.9 1.7 1.3 0.5 1.4 1.1 0.6 2.9 1.0 2.0 3.1 1.1 2.7 0.6 3.6 3.8 4.9 1.6 4.2 4.3 3.5 1.0 3.1 0.7 5.0 2.8 2.6 5.7 2.1

0.7 3.4 0.6 4.2 0.5 7.5 1.1 5.4 4.5 1.1 1.9 4.3 4.3 5.9 1.7 4.5 5.0 1.3 4.0 4.6 6.5 1.3 9.9 0.8 3.8 3.9 7.1 0.8 3.4 0.7 4.8 3.4 5.2 1.1 1.7 7.9 3.6 5.5 9.5 2.4 6.6 11.6 7.1 8.8 6.3 19.0 11.9 5.4 6.0 6.1 6.2 8.2 12.3

0.8 0.5 0.9 3.3 1.3 7.6 1.0 2.6 1.1 0.9 0.9 1.3 3.6 2.8 4.1 4.8 4.8 3.4 1.3 4.9 5.0 5.7 5.1 7.6 4.3 3.0 7.9 9.2 5.2 7.9 6.0 6.0 4.6 5.9 1.9 6.3 3.8 3.8 11.9 8.1 5.9 6.7 3.7 11.9 7.4 7.9 7.9 5.0 4.9 8.5 5.7 5.7 6.1

0.5 0.4 1.4 0.5 0.8 7.7 0.8 2.8 0.6 0.8 0.7 2.3 3.7 3.3 3.9 5.0 2.4 5.1 6.1 4.5 6.2 5.4 6.0 8.2 6.0 4.7 4.5 2.5 3.2 6.3 5.3 4.1 6.3 10.0 4.7 6.8 5.0 4.3 13.1 7.8 3.8 5.5 5.0 11.4 8.0 10.0 5.3 5.0 5.1 5.8 5.8 5.6 5.2

0.9 1.1 1.3 0.6 1.1 0.5 0.7 0.3 2.9 0.5 0.4 2.1 1.0 4.8 2.7 5.2 4.0 3.7 3.3 3.5 0.6 0.9 7.8 5.7 3.3 1.5 2.1 1.2 3.8 5.2 5.2 5.3 8.5 7.4 2.4 5.1 4.8 4.0 5.6 6.0 5.4 5.9 4.3 10.5 4.7 6.1 2.2 4.5 4.7 2.1 5.2 7.4 6.0

0.4 0.8 0.9 0.4 1.1 5.2 1.0 0.6 0.4 0.6 0.3 1.9 0.9 1.2 1.1 5.2 4.5 0.7 4.0 2.3 0.9 0.8 6.8 1.2 1.6 0.6 1.2 1.3 2.6 0.5 5.2 4.5 11.7 5.5 2.2 3.8 4.7 4.0 3.2 4.4 3.7 2.8 4.2 9.6 2.0 5.8 1.5 4.5 8.4 4.6 5.1 1.1 5.5

0.2 1.3 0.5 0.4 2.5 7.5 0.7 0.4 0.4 0.5 0.8 1.1 7.5 0.3 2.7 3.4 4.1 2.3 1.7 1.2 6.7 0.7 6.3 0.7 6.1 1.6 0.6 0.9 2.0 1.5 5.1 1.3 9.1 1.8 8.6 1.3 3.4 4.6 0.6 3.2 15.1 2.2 4.6 0.5 6.4 0.3 4.0 11.5 5.7 3.5 5.8 7.7 12.7

0.3 5.5 0.3 0.4 5.3 7.8 0.7 0.3 2.6 0.6 0.7 0.8 7.7 0.3 2.4 2.8 2.7 2.3 1.8 5.0 11.0 0.9 1.3 0.5 3.2 2.2 1.0 0.4 2.5 4.2 4.7 1.4 8.9 2.1 8.5 4.1 4.5 4.5 1.3 2.4 13.4 1.6 3.8 1.3 1.3 0.3 2.5 10.6 13.2 3.7 5.9 7.7 10.0

0.3 0.5 0.8 0.7 0.7 2.3 0.6 0.6 0.7 0.6 0.3 2.8 1.4 1.0 4.5 2.0 2.1 6.2 4.6 4.8 0.6 2.3 3.4 7.7 5.4 3.8 5.5 3.2 3.5 6.3 3.5 3.6 6.0 2.9 4.3 3.2 3.8 3.6 13.8 3.4 2.9 3.5 4.6 8.9 4.3 6.0 3.6 4.2 4.4 7.3 6.4 8.9 3.4

0.2 0.5 0.4 0.3 1.0 2.7 0.4 5.8 3.7 0.8 0.4 4.5 1.3 2.7 6.2 1.6 0.6 3.5 1.4 5.8 0.7 1.5 2.9 4.8 4.5 3.5 3.2 2.6 3.4 8.5 5.2 3.6 7.5 2.2 6.8 1.4 5.3 4.1 16.4 4.0 5.7 4.6 3.6 11.7 6.1 5.8 3.2 6.2 5.5 2.3 4.9 7.9 4.2

0.3 0.4 0.6 0.4 0.7 2.2 0.4 0.8 4.3 0.5 0.3 2.5 1.5 3.3 6.2 1.2 1.7 4.3 0.4 3.8 0.6 0.9 1.6 2.9 0.7 5.1 8.6 1.2 3.0 7.7 3.7 4.9 6.1 3.0 3.4 3.0 6.8 3.7 13.2 3.6 4.1 2.8 4.2 9.3 4.7 4.8 3.9 5.4 4.1 3.5 5.1 5.0 3.8

0.4 0.4 0.6 0.4 2.2 1.3 0.4 0.5 0.4 0.4 0.4 4.8 0.7 1.8 4.2 2.0 0.7 2.9 3.1 4.9 1.2 1.4 2.6 1.3 1.4 3.4 1.8 1.1 1.7 3.9 4.0 3.6 8.1 1.4 5.9 3.3 2.2 4.6 1.5 2.5 4.3 2.3 3.7 2.3 4.5 3.5 3.5 6.7 8.7 5.6 5.2 5.8 3.8

39

15

11

12

20

28

29

28

17

17

20

28

24

ACS Paragon Plus Environment

Page 25 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Table 4: Performance of HPepDock, the three peptide docking protocols of Glide, and FlexPepDock on the 19 bound/unbound cases of the PeptiDB when the top 10 poses were considered. The interface ˚ to red (≥10 A). ˚ The docking RMSDs (iRMSDs) are shown in a gradient color code from green (0 A) results for the methods other than HPepDock were taken from the literature.37 Target

HPepDock

Res.

PDB

4 5 5 6 6 7 7 8 8 8 8 8 9 10 10 11 11 11 11

1TW6 1NVR 1W9E 1AWR 3D1E 2FNT 2VJ0 1ER8 1N7F 2C3I 2FGR 2J6F 1Z9O 1QKZ 2O9V 1NLN 1RXZ 1SSH 2P1K

˚ iRMSD ≤2A ˚ iRMSD ≤3A

Glide

FlexPepDock

GlideScore

Emodel

MM-GBSA

0.87 1.17 0.89 1.78 1.38 2.00 2.72 3.95 1.34 1.85 2.38 3.81 1.61 6.74 1.85 1.73 3.84 1.49 2.55

0.33 0.70 2.12 1.90 0.85 1.81 2.54 1.22 2.18 3.39 7.78 10.66 5.45 1.69 3.78 0.64 5.31 0.52 1.73

0.33 0.70 1.92 1.43 0.85 1.81 2.54 1.78 2.18 3.39 4.91 3.48 5.45 1.69 3.78 0.64 5.31 1.26 2.62

0.34 0.63 1.65 1.12 0.85 1.31 1.79 1.78 1.96 4.00 4.91 10.74 5.45 7.40 2.05 0.41 4.77 0.75 2.43

5.90 0.50 1.50 0.90 1.10 1.00 1.50 1.40 1.40 6.00 8.60 5.70 1.60 8.80 0.60 2.00 0.70 3.10 0.60

12 15

10 13

10 13

11 13

13 13

25

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 32

Table 5: Performance of HPepDock, Glide/SP-PEP+MM-GBSA, and FlexPepDock on the 10 unbound/unbound cases of the PeptiDB when the top 10 poses were considered. The interface RMSDs ˚ to red (≥10 A). ˚ The docking results (iRMSDs) are shown in a gradient color code from green (0 A) for the methods other than HPepDock were taken from the literature.37 Target

HPepDock

Res.

PDB

5 5 6 7 8 8 8 10 11 11

1NVR 1W9E 1AWR 2VJ0 1N7F 2C3I 2FGR 2O9V 1RXZ 1SSH

˚ iRMSD ≤2A ˚ iRMSD ≤3A

Glide

FlexPepDock

SP-PEP+MM-GBSA 1.07 1.42 1.56 2.34 1.49 3.38 2.41 1.46 4.28 2.22

0.54 1.99 0.86 5.22 1.64 3.62 5.02 11.23 3.94 4.45

0.40 1.90 1.40 1.40 1.00 3.70 10.10 0.60 4.30 3.20

5 8

4 4

6 6

26

ACS Paragon Plus Environment

Page 27 of 32

Figure 1

100

Success rate (%)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

80 60

Top 1 Top 10 Top 20 Top 50 Top 100 Top 1000

40 20 0

0.5

1.0 1.5 2.0 RMSD criteria (Å)

2.5

Figure 1: The success rates of HPepDock in predicting near-native peptide binding conformations on the LEADS-PEP data set at different RMSD criteria for top 1, 10, 20, 50, 100, and 1000 predictions.

27

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 28 of 32

Figure 2

A. 3VQG (0.5 Å)

B. 3NFK (0.8 Å)

C. 3IDG (1.8 Å)

D. 3MMG (1.4 Å)

E. 1N7F (1.4 Å)

F. 2QAB (1.2 Å)

G. 4DS1 (1.6 Å)

H. 3BFW (1.0 Å)

I. 2B9H (2.1 Å)

Figure 2: The predicted peptide binding modes by HPepDock for nine successfully predicted examples when the top 10 predictions were considered, where the protein is represented by molecular surface, the native peptide structure is colored in green, and the predicted binding mode is colored in magenta. The RMSDs between the predicted binding modes and the native structures are listed in the parentheses.

28

ACS Paragon Plus Environment

Page 29 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 3

A. 4BTB (3.1 Å)

B. 1H6W (3.6 Å)

C. 3BRL (5.4 Å)

D. 1NTV (7.8 Å)

E. 2O02 (4.9 Å)

F. 1N12 (4.3 Å)

G. 2XFX (4.5 Å)

H. 4J8S (5.0 Å)

I. 4DGY (6.3 Å)

Figure 3: The predicted peptide binding modes by HPepDock for nine failed cases when the top 10 predictions were considered, where the protein is represented by molecular surface, the native peptide structure is colored in green, and the predicted binding mode is colored in magenta. The RMSDs between the predicted binding modes and the native structures are listed in the parentheses.

29

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

Figure 4

Success rate (%)

A 100 80 60 40 20 0

HPepDock DOCK 6 AutoDock:SA AutoDock:HA Vina:SA Vina:HA Surflex:SA Surflex:HA GOLD:ASP GOLD:CP GOLD:CS GOLD:GS

0.5

1.0

1.5 RMSD criteria (Å)

2.0

2.5

1.0

1.5 RMSD criteria (Å)

2.0

2.5

B 100 Success rate (%)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 30 of 32

80 60 40 20 0

HPepDock DOCK 6 AutoDock:SA AutoDock:HA Vina:SA Vina:HA Surflex:SA Surflex:HA GOLD:ASP GOLD:CP GOLD:CS GOLD:GS

0.5

Figure 4: Comparison of the success rates of HPepDock and 11 other peptide docking methods in predicting near-native peptide binding conformations on the LEADS-PEP data set at different RMSD criteria for top 1 (A) and top 20 (B) predictions.

30

ACS Paragon Plus Environment

Page 31 of 32

Figure 5

45 40 Running time (min)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

35 30 25 20 15 10 5 0 0

10

20

30

40

50

Complex No. Figure 5: The running times of HPepDock for docking the 53 peptides in the LEADS-PEP benchmark data set, where the complex No. is consistent with the order in Table 1. The dashed line stands for the average of the running times.

31

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Table of Contents 338x190mm (96 x 96 DPI)

ACS Paragon Plus Environment

Page 32 of 32