CSAR Benchmark of Flexible MedusaDock in Affinity Prediction and

Aug 7, 2015 - E-mail: [email protected]. ... In CSAR 2013–2014 binding prediction benchmark exercises, the new scoring function MScomplex was found ...
2 downloads 0 Views 3MB Size
Article pubs.acs.org/jcim

CSAR Benchmark of Flexible MedusaDock in Affinity Prediction and Nativelike Binding Pose Selection Praveen Nedumpully-Govindan,† Domen B. Jemec,‡ and Feng Ding*,† Department of Physics and Astronomy and ‡Department of Genetics and Biochemistry, Clemson University, Clemson, South Carolina 29634, United States

Downloaded by RUTGERS UNIV on August 24, 2015 | http://pubs.acs.org Publication Date (Web): August 17, 2015 | doi: 10.1021/acs.jcim.5b00303



ABSTRACT: While molecular docking with both ligand and receptor flexibilities can help capture conformational changes upon binding, correct ranking of nativelike binding poses and accurate estimation of binding affinities remains a major challenge. In addition to the commonly used scoring approach with intermolecular interaction energies, we included the contribution of intramolecular energies changes upon binding in our flexible docking method, MedusaDock. In CSAR 2013−2014 binding prediction benchmark exercises, the new scoring function MScomplex was found to better recapitulate experimental binding affinities and correctly identify ligandbinding sequences from decoy receptors. Our further analysis with the DUD data sets indicates significant improvement of virtual screening enrichment using the new scoring function when compared to the previous intermolecular energy based scoring method. Our postanalysis also suggests a new approach to select nativelike poses in the clustering-based pose ranking approach by MedusaDock. Since the calculation of intramolecular energy changes and clustering-based pose ranking and selection are not MedusaDock specific, we expect a broad application in force-field based estimation of binding affinities and pose ranking using flexible ligand−receptor docking.



ranking of the near-native poses.26 For efficient assessment of a large number of docked poses, simple intermolecular interaction energies have often been used as the structure-based scoring functions.26−28 However, due to conformational changes in flexible docking, inclusion of intramolecular energy changes upon binding may help improve the estimation of binding affinities in addition to intermolecular interaction energies. In MedusaDock, Medusa force field28,29 is used in conformational sampling of receptor side chains and ligand by Monte Carlo (MC) based minimization of the total energy, including both ligand and receptor. It is, therefore, straightforward to estimate both intra- and intermolecular energies. Here, we extended the simple intermolecular interaction energy, MedusaScore (MS),28 by including the total energy of the ligand−receptor complex, MScomplex. In the CSAR 2013−2014 blind prediction exercises, we benchmarked the updated MedusaDock approach in terms of identifying nativelike poses, recapitulating experimentally determined binding affinities, and predicting the ligandrecognizing protein sequences. Our postanalysis suggests a new approach to select the nativelike poses from the clusteringbased pose ranking approach.23 Our results indicate that the new scoring approach better recapitulates CSAR ligand binding affinities compared to the MS score, which allowed us to identify ligand-binding protein sequences out of a set of decoy sequences. Our further virtual screening (VS) analysis using a subset of DUD (a Directory of Useful Decoys)30 also highlights

INTRODUCTION Modeling conformational flexibilities of both small molecule ligands and protein receptors upon their specific binding is a grand challenge in computational ligand−receptor docking. The difficulties arise from the large number of degrees of freedom, including the flexibility of protein backbone and side chains, ligand conformation, and the interligand−receptor rigidbody motion. While the flexibility of ligands in molecular docking can be efficiently modeled by a variety of algorithms, receptor flexibility remains a daunting challenge.1,2 Commonly used flexible-receptor docking methods3,4 include using soft receptors,5 selection of a few critical degrees of freedom in the binding sites,6−8 ensemble docking with multiple receptor conformations derived from either experiments9−12 or computational modeling,13−20 and recently rotamer-based extensive sampling of protein side chain conformations near the binding sites.21,22 Different approaches can also be integrated together for enhanced sampling of protein conformational changes upon ligand binding. For example, ensemble docking with multiple protein backbone conformations has been combined with the flexible MedusaDock, where conformational flexibility of both ligands and protein side chains were modeled simultaneously.23 CSAR 2011−2012 benchmark suggested that the combined approach increased the predictive power of MedusaDock in terms of predicting the nativelike poses. Despite many successes in sampling the nativelike poses using flexible docking, accurate prediction of the binding affinities as well as correct ranking of the nativelike poses still appears to be a difficult task.24,25 In some cases, increased receptor flexibility resulted in reduced accuracy in correct © XXXX American Chemical Society

Special Issue: Community Structure Activity Resource (CSAR) Received: May 22, 2015

A

DOI: 10.1021/acs.jcim.5b00303 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling

protein backbone conformations using pairwise root-meansquare-deviation (RMSD) and grouped similar poses together using a cutoff of 2.5 Å. For each cluster, the average score was recomputed for ranking as described previously.23 Preparation of the Initial 3D Structure of a Ligand. In MedusaDock, the generation of STROLL is based on rotation of all rotatable dihedral angles, which requires an initial 3D structure of the ligand. When the input of a ligand is in SMILE format, we used the “--gen3d” option in Open Babel38 to generate a preliminary 3D structure and assigned the protonation state corresponding to the pH of 7 (i.e., the option of “-h -p 7” in Open Babel). In this study, we did not vary the protonation states of ligands. We utilized the “mengine” program implemented in smi23d39 to minimize the structure initially. Further energy minimization was performed by the “obminimize” program in Open Babel, where the steepest descent method was applied. The MMFF9440 force field was used during the minimization process. Scoring Functions. Two different scoring functions were tested in the benchmark, including the original MedusaScore (MS)28 and a newly defined score that considers the energy of the whole ligand−receptor complex, MScomplex. MS score considers only the intermolecular interaction energies, which are computed directly with the ligand−receptor complex structure from docking simulations. In other words, the score is based on the assumption that intramolecular energy changes associated with receptor and ligand conformation changes upon binding are negligible. Such assumption is only true for the rigid docking approach. In flexible docking approaches, both ligand and receptor undergo conformational changes, and there may be energy costs associated with conformation change, contributing to the binding affinity. A more accurate estimate of the binding affinity can thus be obtained by finding the difference in energy between bound and unbound states. Thus, we calculate the averaged energy of the protein receptor (P) and ligand (L) in isolation and subtract them from the average energy of the receptor−ligand complex (P+L) in the bound state:

the advantage of the new scoring method in recovering active compounds from a large library of decoys.

Downloaded by RUTGERS UNIV on August 24, 2015 | http://pubs.acs.org Publication Date (Web): August 17, 2015 | doi: 10.1021/acs.jcim.5b00303



METHODS MedusaDock. We used our flexible docking method, MedusaDock,22 to model ligand−receptor binding structures. Briefly, in MedusaDock, flexibilities of both receptor side chains and ligand are treated simultaneously. Protein side chains near the binding pocket can sample all their corresponding rotamers included in the backbone-dependent rotamer library, where the average value and standard deviation of each side chain dihedral angle were tabulated from high-resolution protein structures.31 For a given side chain rotamer, the subrotameric space of each dihedral angle was stochastically sampled according to the Gaussian distribution around the average value within one standard deviation.29 A stochastic rotamer library of ligands (STROLL) was used to model ligand conformational flexibility the same as protein rotamers.31,32 For each rotatable dihedral angle of a ligand, we determined its average and standard deviation according to the hybridization of two atoms forming the rotatable bond.22 The docking protocol consists of two steps. The first step is coarse docking, which involves generation of ligand conformations and their clustering based on structural similarities. Representatives of each cluster are then rapidly docked onto the receptor by rigid-body minimization. In the second fine-docking step, energy minimization is performed by sampling ligand and receptor side chain rotamers and by allowing the rigid-body movement of the ligand. A MC-based simulated annealing is used for the energy minimization. Backbone flexibility was considered in MedusaDock using an ensemble-based docking approach.23 In phase 2 exercise of CSAR 2014, we used multiple receptor structures cocrystallized with different ligands that were provided by the organizers. In MedusaDock simulations, we used the extended Medusa force field28,29 to evaluate total interaction potential energies. The energy terms included van der Waals (Evdw), solvation (Esolv), hydrogen bond (Ehbond), electrostatics (Ees) with a distance dependent dielectric constant, and a statistical potential energy for backbone-dependent rotamers (Erot). The parameters for VDW interactions in the Medusa force field were taken from CHARMM 19.33 The solvation energy was estimated using the Lazaridis-Karplus implicit solvent model.34,35 A distance and angular-dependent hydrogen bond model was used.36 The total energy of a given conformation of a molecule or molecular complex was the linear combination of all these continuous, interatomic potential energy terms: Etotal = WvdwEvdw + WsolvEsolv + WesEes + WrotErot (1)

MScomplex = ⟨EP + L⟩bound − ⟨EP ⟩unbound − ⟨EL⟩unbound

(2)

Here, the average for the bound state was taken over a large ensemble of docked poses. Some of these poses may have unfavorable receptor−ligand contacts and consequently large energies. Hence, a simple average of the binding score over all the docked poses would underestimate the binding affinity. To circumvent this issue, we used the Boltzmann average ⟨E⟩ =

Here, the weights (W) were initially determined by recapitulating the native packing of side chains29 and protein stability changes upon mutations.37 In this work, we did not further train or tune the weights and parameters for either docking, pose selection, or affinity estimation. Depending on the flexibility of a given ligand, each MedusaDock run results in multiple poses. Due to the stochastic nature of simulated annealing, we usually perform multiple independence runs against the receptor (100xNp with Np as the number of receptor backbone structures) with different random seeds. Each MedusaDock run took on average approximately 3−5 min on an Intel 2.6 GHz Xeon processor. Independent runs were performed in parallel. We performed clustering analysis for all the poses generated with different

∑ Ei exp( −βEi) ∑ exp( −βEi)

(3)

Here, β = KBT, KB is the Boltzmann constant, and T = 300 K, and the summation was over all the docked poses. For the unbound protein, repacking of protein side chains with simulated annealing was performed separately.29,37 Conformations from 100 independent repacking simulations were used to compute ⟨EP⟩unbound. We used the STROLL library, where a set of nonself-overlapping ligand conformations was generated by random sampling of all rotatable dihedral angles,22 to compute the energy of unbound ligand, ⟨EL⟩unbound. For ligands with large degrees of freedom, the MC based sampling of STROLL was stopped when either 1000 different conformations were generated or the maximum of 1 million B

DOI: 10.1021/acs.jcim.5b00303 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling

nonmatching residues from the template protein were mutated to those of the given sequence, and all side chains were repacked using Medusa energy minimization (Figure 1).29,37 Using the homology model structure, MedusaDock simulations were performed, and the binding affinities were estimated as described above using the MScomplex score.

Downloaded by RUTGERS UNIV on August 24, 2015 | http://pubs.acs.org Publication Date (Web): August 17, 2015 | doi: 10.1021/acs.jcim.5b00303

MC trials were reached. In all cases including protein−ligand complexes, isolated proteins, and ligands, the same Medusa scoring function was used to compute the corresponding conformational energies, and the average energy was computed by the above Boltzmann summation. For the ligands, the statistical potential of a rotamer was not computed. Therefore, MScomplex is not used to rank individual binding poses but rather to rank each ligand by estimating its bind affinity to a receptor using all predicted poses. Docking with Homology Modeling. For a given primary sequence to be evaluated for ligand binding, the 3D structure was first reconstructed using homology modeling (Figure 1).



RESULTS AND DISCUSSION CSAR 2013 benchmark exercise had three phases, where phase 1 requested the selection of ligand-binding sequences, phase 2 corresponded to selecting nativelike poses from decoys, and phase 3 required the prediction of binding affinities. CSAR 2014 exercises comprised of two phases with phase 1 for identifying nativelike poses and phase 2 for affinity predictions. Next, we discuss the benchmark results according to the task types. Identify the near-Native Pose within a Set of Docking Decoys. In CSAR2013 Phase 2 (2 ligand−receptor pairs) and CSAR2014 Phase 1 (22 pairs) exercises, near-native poses (RMSD < 1 Å) mixed with decoys (RMSD > 1 Å) were provided for each ligand−receptor pair. For each provided pose, we performed energy minimization using MedusaDock. The ligand was kept rigid with translational and rotational degrees of freedom. We used two different approaches to model protein receptor side chain flexibilities. In one approach, we fixed each residue in its native rotamer but allowed sampling in the corresponding subrotameric space (Methods). In another approach, we allowed each residue to sample all possible rotamers. Because of the stochastic nature of the MedusaDock approach, we performed 10 independent minimization runs for each input decoy pose and computed the average MS score. For each energy minimization approach, we ranked all decoys according to the average MS score (Table 2). For the final submission, we used the combined rank score − the sum of the rank indexes from the two different minimization approaches − to rerank the decoy poses. In only 1 out of the total 24 receptor−ligand pairs, ranking with the subrotamer minimization approach failed to predict the nativelike decoy as the top one (07_SYK_gtc249). In 2 out of 24 cases, minimization with an all-rotamer approach missranked th e n ativelik e p oses (11_TRMD_gtc447, 14_TRMD_gtc452). In all the cases where the nativelike poses were not top-ranked, the corresponding nativelike poses were ranked within the top 3. Interestingly, the three cases have no overlap with each other, i.e. the nativelike poses were topranked in at least one of two of the approaches. As a result, ranking with the combined rank score was able to predict the nativelike poses for 07_SYK_gtc249 and 11_TRMD_gtc447. However, in the case of 14_TRMD_gtc452, the combined ranking was not able to select the nativelike pose as the top one but as the second. Therefore, all three ranking approaches have similar performance in terms of predicting the nativelike pose. Although it is unknown a priori which ranking approach is more accurate, consensus about the top decoy poses between two minimization approaches resulted in correction predictions. For the three receptor−ligand pairs without consensus (Table 3), we also performed retrospective flexible docking using MedusaDock, where both ligand and receptor side chains were flexible. We used the clustering-based ranking approach that has been benchmarked in previous CSAR exercise to identify the nativelike poses (Methods). We calculated the RMSD between poses top-ranked by each minimization approach and the MedusaDock-predicted pose (Table 3). In all cases, the

Figure 1. The flowchart of predicting steroid-binding sequences in the phase 1 of CSAR 2013 benchmark. Given each input protein sequence, we built a homology model structure and performed MedusaDock simulations. The binding affinity was estimated with the MScomplex score.

First, sequence alignment of the query sequence against PDB41 (i.e., sequences with known structures) was carried out using PSI-BLAST.42 The sequence with the highest significance score was chosen as the template for structural determination of the given sequence (Table 1). In all cases, there was no gap in the sequence alignment, probably because these sequences were originally designed from these template PDB structures.43 Based on the sequence alignment from PSI-BLAST, the Table 1. List of Designed Receptor Sequences in the Phase 1 of CSAR 2013 Benchmark Exercisea sequence ID

PDB ID

MScomplex, kcal/mol

rank

binder

DIG1 DIG2 DIG3 DIG4 DIG5 DIG6 DIG7 DIG8 DIG9 DIG10 DIG12 DIG13 DIG14 DIG17 DIG18 DIG19

1GY7 3AXD 1PVX 4BRO 1Z1S 3CU3 3GWR 3HKR 1I60 1Z1S 2OWP 2OX1 3E5Z 3CU3 1Z1S 1Z1S

−16.16 −38.85 −41.73 −40.38 −40.45 −37.78 −31.50 −40.81 −31.94 −42.69 −37.05 −38.75 −35.65 −37.78 −43.79 −43.06

16 8 4 7 6 11 15 5 14 3 12 9 13 10 1 2

0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 1

a

For each sequence, the PDB ID of the template selected for homology modeling, the computational binding score MScomplex, and the corresponding rank are listed. In terms of ligand binding, we use 1 for true binder and 0 otherwise. C

DOI: 10.1021/acs.jcim.5b00303 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling Table 2. Ranking of the Nativelike Decoy Posesa benchmark exercise

CSAR 2013

Downloaded by RUTGERS UNIV on August 24, 2015 | http://pubs.acs.org Publication Date (Web): August 17, 2015 | doi: 10.1021/acs.jcim.5b00303

CSAR 2014

RMSD of 0.66 Å to the nativelike pose (decoy 52) and 1.95 Å to the decoy (decoy 79) selected by the more flexible allrotamer minimization approach and also the combined score (Figure 2). The similarity between the decoy and native pose

rank index of the nativelike pose allrotamer

name: native pose index

subrotamer

combined

01_DIG18: 25 02_DIG20: 39 1_FXA_gtc101: 146 02_FXA_gtc398: 190 03_FXA_gtc401: 37 04_SYK_gtc224: 116 05_SYK_gtc225: 51 06_SYK_gtc233: 71 07_SYK_gtc249: 48 08_SYK_gtc250: 141 09_TRMD_gtc445: 126 10_TRMD_gtc446: 76 11_TRMD_gtc447: 18 12_TRMD_gtc448: 3 13_TRMD_gtc451: 5 14_TRMD_gtc452: 79 15_TRMD_gtc453: 32 16_TRMD_gtc456: 185 17_TRMD_gtc457: 88 18_TRMD_gtc458: 116 19_TRMD_gtc459: 195 20_TRMD_gtc460: 27 21_TRMD_gtc464: 118 22_TRMD_gtc465: 154

1 1 1 1 1 1 1 1 2 (7) 1 1

1 1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1

1 2 (92) 1 1 3 (52) 1 1

1 1 1 1 2 (52) 1 1

1 1

1 1

1 1

1

1

1

1 1

1 1

1 1

1

1

1

Figure 2. The binding poses of ligand GTC000452 with TRMD. Decoy 79 is the nativelike pose, while Decoy 52 was selected by the all-rotamer minimization approach. The major difference between two poses is the orientation of the phenol group. The top pose selected by the flexible docking with MedusaDock has a smaller RMSD (in the bracket) to the nativelike Decoy 79 than Decoy 52.

possibly resulted in the missed ranking. Taken together, our analysis suggests that combining predictions with flexible docking using MedusaDock may help improve the accuracy in identifying the near-native pose within a set of docking decoys. Predict Binding Poses. In CSAR 2013 phase 3 (10 steroid ligands against one designed protein) and CSAR 2014 phase 2 exercises (106 for FXA, 276 for SYK, and 33 for TRMD), participants were asked to predict the binding poses (top three poses) and corresponding binding affinities. Ligands were provided in either SMILE (CSAR 2013) or 3D mol2 formats (CSAR 2014). One (CSAR 2013) or multiple (CSAR 2014) receptor conformations, determined experimentally, were provided. For ligands in SMILE format, we used Open Babel38 to generate the initial 3D structure, followed by energy minimization with the MMFF94 force field (Methods). In the CSAR 2013 exercise, only one input structure was used since the protein was de novo designed. We used the ensemble-based docking for CSAR 2014 benchmark. For each ligand−receptor pair, we performed flexible docking with MedusaDock to

a The poses were ranked by the MS score using minimization approaches with different levels of protein side chain flexibility, including sampling of either subrotameric space of native rotamers (subrotamer) or all possible rotamers (all-rotamer). Using the combined rank score, all decoys were reranked for the final submission. The rank indexes of the nativelike poses are listed, where the rank of 1 corresponds to the correct selection of the nativelike pose. The cases where the nativelike pose were not topranked are highlighted in italic font, and the corresponding pose index of the top-ranked decoy pose is given in parentheses.

nativelike decoys were found to have subangstrom RMSD with respect to the predicted poses by MedusaDock. The high accuracy in predicting the nativelike poses by MedusaDock is possibly because of high-resolution input structure of receptors provided by the organizer. For example, in the case of 14_TRMD_gtc452, the pose predicted by MedusaDock has a Table 3. Analysis of the Cases without Consensusa top-ranked pose index receptor_ligand name: native pose index 07_SYK_gtc249: 48

subrotamer

all-rotamer

7 48

11_TRMD_gtc447: 18

18

14_TRMD_gtc452: 79

79

92 52

RMSD with respect to MedusaDock prediction (Å) 4.39 0.41 0.70 3.81 0.66 1.95

a

For each of the top-ranked poses identified from different minimization approaches, RMSD with respect to the MedusaDock-predicted native pose was computed. The cases where the nativelike poses were not top-ranked are highlighted in italic font. D

DOI: 10.1021/acs.jcim.5b00303 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling

Table 4. Summary Table of the Predicted Ligand−Receptor Poses and Binding Affinities in the CSAR 2014 Phase 2 Exercisesa RMSD (Å) centroid receptors FXA

SYK

Downloaded by RUTGERS UNIV on August 24, 2015 | http://pubs.acs.org Publication Date (Web): August 17, 2015 | doi: 10.1021/acs.jcim.5b00303

TRMD

average

minimum MS score

affinities and scores

ligands

pose #1

pose #2

pose #3

pose #1

pose #2

pose #3

pIC50

MS

MScomplex

GTC000006A GTC000101A GTC000102A GTC000222A GTC000223A GTC000224A GTC000225A GTC000226A GTC000233A GTC000249A GTC000250A GTC000444A GTC000445A GTC000446A GTC000447A GTC000448A GTC000449A GTC000450A GTC000451A GTC000452A GTC000453A GTC000454A GTC000455A GTC000456A GTC000457A GTC000458A GTC000459A GTC000460A GTC000461A GTC000462A GTC000463A GTC000464A GTC000465A GTC000466A GTC000467A GTC000468A GTC000469A GTC000470A GTC000471A GTC000472A GTC000473A GTC000474A

1.46 7.42 6.72 0.77 1.78 2.02 2.12 1.75 0.71 3.19 2.05 0.82 0.27 0.37 0.74 1.86 0.99 7.31 5.04 0.75 0.64 0.60 1.10 0.56 6.25 1.16 0.60 0.71 4.69 0.53 1.57 2.05 1.20 1.35 2.33 1.28 2.26 3.50 1.14 0.70 0.90 0.31 1.99

7.09 2.13 3.81 8.01 2.63 3.33 3.32 3.41 2.32 3.97 4.18 6.16 4.33 4.45 4.61 7.10 5.75 2.02 6.78 8.02 6.01 6.87 7.68 6.06 1.02 6.54 6.48 3.33 7.87 6.70 6.86 6.65 5.69 8.32 7.15 6.93 7.47 1.54 3.60 5.42 5.96 7.43

4.29 9.96 9.34 3.17 3.75 3.10 8.49 2.80 2.51 6.49 4.97 6.66 3.35

1.37 7.04 7.65 0.71 1.89 0.51 1.61 0.94 0.56 1.74 0.99 0.92 0.29 0.37 0.72 1.92 0.45 6.98 4.96 1.65 1.96 0.64 1.54 0.46 6.17 1.07 0.51 0.54 4.72 0.52 1.68 0.51 1.13 1.00 2.45 1.13 2.12 3.47 1.00 0.76 0.82 0.79 1.86

7.18 2.12 3.73 8.09 2.15 3.29 3.40 3.56 2.18 2.64 3.96 6.08 4.36 4.47 4.63 6.96 5.82 1.39 6.76 8.00 5.93 6.65 7.63 6.07 0.86 6.48 6.34 2.93 8.15 6.71 6.83 6.37 5.96 8.46 7.11 6.93 7.39 1.65 3.60 5.42 6.02 7.43

4.64 10.05 9.35 2.44 3.62 3.29 8.18 2.88 2.49 6.77 4.09 6.41 3.08

8.9 8.8 7.4 7.9 8.5 7.9 7.7 8.7 8.6 8.3 8.3 5.5 4.6 3.8 3.5 4.4 8.3 8.3 6.8 5.7 5.4 6.1 5.2 5.1 6.7 6.3 5.9 6.7 5.1 5.7 4.8 6.2 4.7 4.8 4.8 6.2 5.8 6.3 5.7 6.1 4 5.6

−52.5 −49.3 −43.6 −51.0 −50.3 −56.9 −54.0 −53.2 −53.0 −49.8 −57.4 −43.7 −30.2 −29.7 −29.7 −34.6 −58.5 −53.2 −49.1 −46.0 −49.6 −45.5 −48.5 −31.1 −40.9 −41.4 −45.1 −38.5 −62.0 −35.7 −45.0 −38.3 −29.9 −50.0 −44.4 −41.2 −44.2 −49.5 −53.4 −34.8 −37.3 −39.2

−41.1 −38.3 −34.0 −39.5 −38.9 −42.2 −42.5 −39.4 −40.0 −35.2 −43.5 −34.5 −22.3 −22.5 −22.7 −27.6 −43.6 −41.1 −39.5 −35.7 −39.6 −35.6 −35.9 −23.2 −30.9 −31.9 −35.3 −28.0 −42.3 −27.2 −32.4 −27.1 −22.0 −31.6 −34.5 −29.0 −31.8 −36.5 −40.2 −27.4 −28.0 −28.8

6.59 6.51 6.31 6.59 6.56 8.37 7.49 6.96 7.05 6.53 6.66 7.61 6.62 5.93 4.36 7.10 6.10 5.18 2.61 7.14 7.33 6.56 6.57 7.11 5.25 5.80 6.07

6.60 6.52 6.47 5.62 7.01 8.35 7.66 6.96 7.07 6.25 6.07 7.60 6.57 5.72 4.07 7.12 6.12 5.15 2.58 7.20 6.38 7.76 6.55 6.00 5.25 5.80 5.90

a

For each ligand−receptor pair, we used the clustering-based approach and submitted the centroid poses for each of the top three clusters. In the retrospective analysis, we also selected the poses with the lowest MS score within each cluster and computed the RMSD with respect to the released experimental structures. The RMSD values smaller than 2.5 Å are highlighted in italic bold font. The cases where none of the top three poses are within 2.5 Å RMSD are highlighted in italic font. The experimental binding affinities (pIC50) and the corresponding computational MS and MScomplex scores are also listed.

RMSD to the rest of poses within the cluster − for each of the top three clusters. Out of the total 325 ligand−receptor pairs provided for blind predictions in CSAR 2013−2014 benchmarks, 42 cocrystallized structures were released afterward for CSAR 2014 targets (Table 4). In 38 out of the 42 (∼90%) cases, the nativelike poses with RMSD < 2.5 Å were within the submitted top three poses. Within these 38 cases, 34 of our top predictions were nativelike (81%). For three out of the four challenging cases,

generate an ensemble of binding poses (Methods). For the MedusaDock generated ensemble of ligand−receptor binding poses, we applied clustering analysis to group similar poses together with a cutoff RMSD of 2.5 Å. We ranked each cluster according to the ensemble-averaged binding energy described previously, where the cluster population was used to approximate the entropy.23 For the final submission, we chose the centroid poses − the one with minimal average E

DOI: 10.1021/acs.jcim.5b00303 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling Table 5. Rank of Nativelike Poses for the Challenging Casesa best rank of nativelike pose centroid

minimum MedusaScore

receptors

ligands