Template-Based Method for Conformation Generation and Scoring for

May 2, 2019 - Physics-based prediction of protein–ligand binding affinities for a congeneric series of ligands in lead optimization requires their g...
2 downloads 0 Views 4MB Size
Article Cite This: J. Chem. Inf. Model. 2019, 59, 2690−2701

pubs.acs.org/jcim

Template-Based Method for Conformation Generation and Scoring for Congeneric Series of Ligands E. Prabhu Raman* BIOVIA, Dassault Systemes, 5005 Wateridge Vista Drive, San Diego, California 92121, United States

Downloaded via UNIV OF SOUTHERN INDIANA on July 17, 2019 at 07:59:50 (UTC). See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.

S Supporting Information *

ABSTRACT: Physics-based prediction of protein−ligand binding affinities for a congeneric series of ligands in lead optimization requires their geometries as a first step. In this paper, we report a method that uses the 3D conformation of a lead compound in complex with a protein as a template to generate conformations of a series of related analog compounds. The method uses the Maximal Common Substructure (MCS) computed between lead and analog ligands to assign coordinates for the atoms shared between the ligands. For the differing atoms, a conformation generation procedure is implemented that results in a diversity of conformations. The generated conformations are sorted using a score based on the Molecular Mechanics and Generalized Born with Solvent Accessible Surface Area contribution (MM-GBSA) method. The accuracy of the generated conformations is tested retrospectively using a cross-validation approach applied to four data sets obtained from the Drug Design Data Resource (D3R) by measuring the RMSD of the top scored conformation with respect to the crystallographic pose. The scoring ability of the method is independently assessed using data for the same protein targets to test the rank ordering ability and separating active and inactive ligands. We tested the effect of protein flexibility during structural optimization and scoring approaches with and without strain energies. Retrospective validation on data sets comprising 4 targets shows that the method outperforms random selection for all targets and outperforms a molecular weight-based null model in 3 out of 4 targets in separating active and inactive compounds. Therefore, the presented method is expected to be of utility in lead optimization for rapidly screening analog ligands and generating initial conformations for use in more detailed physics-based binding affinity prediction methods.



energy methods3,4 offer an accurate route to calculating relative binding affinities, which is desired in LO, but the computational cost of tens of nanoseconds of MD simulation is formidable for large sets of compounds, even with the current advanced state of CPU and GPU computing. It is thus desirable to have methods that can screen large numbers of analogs in the early stages of LO to allow for a relatively larger exploration of the chemical space around the lead compound. A smaller list of compounds shortlisted this way using an approximate method could subsequently be subject to more accurate but time-consuming free energy methods for prioritization with higher accuracy. To enable such a two-step physics-based workflow for congeneric virtual screening, there is another requirement, which is described as follows. For computation intensive free energy methods, it is critical that the initial protein−ligand complex geometry be accurate. This is especially the case for relative binding free energy calculations involving a few tens of

INTRODUCTION Small molecule Lead Optimization (LO) is the most expensive and time-consuming of the different stages involved in preclinical drug discovery and design.1 LO involves chemically modifying a lead compound to improve binding affinity and a host of other druglike properties including specificity, bioavailability, solubility, and ADMET properties. While affinity is just one of the many properties, its accurate and efficient estimation in response to a changing chemotype provides opportunities to optimize the other properties while monitoring affinity. Toward this goal, physics-based methods have seen increasing adoption over the past decade particularly for binding affinity optimization.2,3 Unlike knowledge-based methods, physics-based methods do not require target-specific experimental training data and offer physical insights that could be used for rational optimization. The chemical space exploration needed to achieve the desired improvements in druglike properties may require screening a large number of compounds. Thus, it is desirable to have efficient methods that can screen tens to hundreds of thousands of ligands. Among physics-based methods, free © 2019 American Chemical Society

Received: January 7, 2019 Published: May 2, 2019 2690

DOI: 10.1021/acs.jcim.9b00032 J. Chem. Inf. Model. 2019, 59, 2690−2701

Article

Journal of Chemical Information and Modeling

chosen; this arbitrary choice ended up having an effect on the interpretation of results which is discussed later in this paper. The protein preparation protocol assigns amino acid protonation states at pH of 7.4 using the Generalized Born electrostatics model, assuming a protein dielectric constant of 10 and ionic strength of 0.145 M, as detailed previously.14 Upon preparation, for each data set, all protein structures were aligned using the “Superimpose Proteins” tool in Discovery Studio. Ligand Preparation. Ligands obtained from D3R were in SMILES format. A “rule-based” method was used to assign a single protonation state for the ligands, followed by assignment of 3D coordinates, and although the presented method is capable of handling molecules with chiral centers, chiral ligands were not included in order to keep comparisons with experimental affinity data unambiguous. The “Prepare Ligands” protocol in Discovery Studio was used to accomplish this task. Analog Structure Generation. In a lead optimization project, there typically are one or more protein−ligand cocrystal structures available. We term the ligands in these structures as “lead” in the following discussion. The goal of the analog structure generation step is to obtain the 3D conformations of ligands similar to the lead compounds, which we refer to as “analogs”. For each analog compound, the structure generation method involves six steps: (1) MCS Generation. The maximal common substructure (MCS) method is used to obtain the common substructure between the analog and each of the lead compounds. The details of the MCS algorithm are described elsewhere.7 Hydrogens are removed prior to MCS computation. The overlap between the analog and lead compound is quantified by the fraction of common atoms defined as

nanoseconds of sampling or less, which assume that the overall binding mode of the ligands involved does not change significantly. This assumption enables shorter simulation times and when true can result in accurate predictions. However, it requires that the starting binding mode be similar or kinetically accessible to the thermodynamically important state. One approach to obtaining initial geometries is docking, which is suited to the problem as it often identifies the correct binding mode5,6 and is automatable for high-throughput applications. However, as observed in recently conducted Drug Design Data Resource (D3R) open docking challenges, it is not very uncommon for docking methods to fail in identifying the crystallographic pose as the top scoring pose among many generated for each ligand.5,6 This poses a challenge to obtaining predictive results. For LO applications that involve screening of a congeneric series involving an invariant scaffold, one possible alternative approach to obtaining initial geometries is to use the crystallographic binding mode of the lead compound where available as a template. In this paper, we describe such a method, which we term Generate Analog Conformations (GAC). Our approach detailed in this paper circumvents the docking problem and uses information from an experimentally known binding mode of one ligand in the series to generate 3D conformations of analogous ligands. The method detects the set of common atoms between lead and analog ligands using the Maximal Common Substructure (MCS) method.7 The coordinates for the common set in the analog ligand are directly assigned from the lead ligand; for the different set a limited conformational search is implemented (see below). Such a minimalistic approach was chosen keeping in mind small chemical changes that characterize a congeneric series. While being template-based, our approach is distinct from other template-based methods that use 3D structural alignment to generate geometries8,9 that have shown promise in recent D3R challenges in generating accurate geometries.5,6 We validate the accuracy of the generated conformations by comparing them with crystallographic geometries in crossdocking calculations using data from D3R. Then, using the binding score that is obtained as a byproduct of the conformation sorting procedure, we test the ability of the method to rank order compounds and separate active ligands from inactive ones in congeneric subdata sets constructed from data sets collected from D3R. The results show that the template-based conformation generation method is sufficiently accurate when 65% or more of the analog ligand is shared with the lead. Second, affinity scoring applied to the generated poses outperforms random selection and the molecular weight (MW)-based null model in separating active compounds from inactive ones and therefore is expected to be of utility in lead optimization.

Fc =

NMCS Nanalog

(1)

where NMCS and Nanalog are the number of heavy atoms in the MCS and the analog ligand, respectively. Thus, this metric quantifies the fraction of the analog ligand that exactly matches the lead. For the subsequent steps, the lead compound with the largest Fc is selected as that which would provide the best possible template for conformation generation. The MCS also contains the atom−atom mapping between the lead and the analog. At this stage, if the MCS is found to contain atoms that differ in chirality between the chosen lead and the analog, a minimal number of smallest possible groups are removed from the mapping that would enable the structure of the generated analog to have the correct chirality. (2) Assign Coordinates of Common Atoms. The coordinates of atoms in the common set are assigned exactly as the corresponding lead atom coordinates. (3) Generate Coordinates of Different Atoms. The coordinates of atoms that do not belong to the MCS are generated using internal coordinates. Internal coordinates (IC) for each such atom are calculated from a single 3D conformation of the analog that is independently generated. At this stage, the geometry of the analog compound is fully obtained. For certain molecules involving rings, it is possible that the IC-based conformation generation may result in incorrect chirality; such molecules are rejected by obtaining the chirality for each stereocenter in the molecule and



METHODS Protein Structure Preparation. Protein structures were downloaded from the Protein Data Bank (PDB).10 Crystallographic waters and cosolvents were removed, and the ligand was retained. The structure was then prepared using the “Prepare Protein” protocol in BIOVIA Discovery Studio.11 Missing loop regions of less than or equal to 8 residues were initially built using MODELER12 and further refined using the LOOPER method.13 Longer loops were not built; the disconnected residues in such loops were capped with neutral patches. For atoms with multiple occupancy, the first one was 2691

DOI: 10.1021/acs.jcim.9b00032 J. Chem. Inf. Model. 2019, 59, 2690−2701

Article

Journal of Chemical Information and Modeling

and the solvent-accessible surface area energy computed as follows

comparing it against the same from the aforementioned newly generated 3D conformation. (4) Selective Diverse Conformation Generation. To introduce diversity in the coordinates of non-MCS atoms, the generated conformation is subjected to a “Diverse Conformation Generator” method that uses the CAESAR algorithm,15 with the core MCS atoms being constrained. Constraints on MCS atoms within 2-bonds connectivity distance from any non-MCS atoms are removed to allow for additional conformational flexibility. The constraints ensure that the binding mode of the MCS region is preserved. A maximum of 15 conformations are generated. A diverse subset is selected from the generated conformations such that no two conformations have pairwise RMSD less than 1 Å, which is computed only for the nonconstrained atoms. After this step, hydrogens are added to the molecule. The protein structure is not used in this step. (5) Binding Energy-Based Ranking. The conformations are merged with the protein, and a binding energy is calculated for each using the MM-GBSA method which uses the Generalized Born (GB) module16 implemented in CHARMm.17 Conformations are sorted in the increasing order of binding energy, from which the top 10 are chosen. Since the goal of this step is to efficiently screen out high energy conformations, the faster GB variant that does not involve the molecular volume term was used. Other details of the GB calculation used in this step are the same as described in the end point binding energy scoring section below. (6) In Situ Minimization and Reranking. The 10 selected conformations are subject to an in situ energy minimization with a distance-dependent dielectric to treat electrostatic interactions. In one variant of the method, the protein is held rigid, whereas in another the binding site residues are unconstrained. The flexible binding site residues are chosen based on proximity to the lead compound in the initial geometry; residues with any atom within 3 Å of the lead compound are designated as flexible. This way the same set of atoms is chosen to be flexible for all analog ligands, so as to not have the trends in computed binding energies be influenced by the choice of the flexible region. The minimized structure is subject to a single point MM-GBSA calculation that uses the more accurate GBMV module 13,14 implemented in CHARMm.16−19 Details of the GBMV parameters used are described in the end point binding energy scoring section below. For each conformation, the total energy is calculated for the protein−ligand complex, the free protein, and the free ligand. The scoring approach is described next. The combined structure generation and scoring protocol is implemented in BIOVIA Discovery Studio as a protocol named “Generate Analog Conformations”. End Point Binding Energy Scoring. The energy components computed in the previous step are combined using the following two expressions to obtain the final binding energy score. In one approach, a binding energy is calculated for each conformation, and the minimum of the binding energy over all conformations is termed as the binding energy score (BE) BE = min r{E PL(r ) − E P(r ) − E L(r )}

E(r ) = EMM(r ) + EGB(r ) + ESA (r )

(3)

Nonbonded interactions were switched off smoothly in the range 10−12 Å. The key parameters used in the Generalized Born Molecular Volume (GBMV) solvation energy calculation are as follows. The dielectric constant used for the solute and solvent were 2 and 80, respectively. The van der Waals radii were used as input for computing SASA and molecular volume. The salt concentration was set to 0 mM. The term corresponding to the solvent accessible surface area (SASA) is computed as follows ESA (r ) = αSASA + β

(4) −1

−2

where α = 0.00542 kcal mol Å , and β = 0.92 kcal mol−1. An alternative approach was tested for scoring that was aimed at accounting for reorganization energy change upon binding. In this approach, the minimum energy among the different conformations was calculated separately for each molecule P, L, and PL. Accordingly, the Rescored Binding Energy (RBE) is calculated as RBE = min r{E PL(r )} − min r{E P(r )} − min r{E L(r )} (5)

By choosing the minimum energy in each ensemble, RBE does not involve canceling of the intramolecular energies as in BE and therefore is aimed at including the reorganization energy in the free P and L states. The approach described in eq 2 above is implemented in BIOVIA Discovery Studio as a protocol named “Generate Analog Conformations”. Validation Metrics. Root mean square deviation (RMSD) was used to monitor the structural deviation between the predicted and crystallographic geometries. The RMSD was computed over all heavy atoms including both the common and different atoms and accounted for topological symmetry. For validation of scoring capability, the Area Under Curve (AUC) of the Receiver Operator Characteristic (ROC) curve was calculated. In addition, Predictive Index (PI) was computed to judge the directional correctness quantitatively. PI is defined as follows20 PI =

∑j > i ∑i wijCij ∑j > i ∑i wij

(6)

where the sum in the numerator and denominator is over all unique pairs of ligands (i,j) in a data set. Each pair has an associated weight defined as wij = |E(j) − E(i)|

(7)

where E(i) and P(i) correspond to experimental and predicted binding affinities of ligand i, respectively. The directional correctness for each pair of ligands (i,j) is computed as | l [E(j) − E(i)] o o o o o o > 1, if 0 o o o o o o [ − ] P( j ) P( i ) o o o o o o o o o o [ − ] E ( j ) E ( i ) Cij = m } o o o o − < 1, if 0 o o o o [ − ] o o P( j ) P( i ) o o o o o o o o o o 0, if [P(j) − P(i)] = 0 o o n ~

(2)

where EX(r) is the total energy of a system X in conformation r. X can be either the protein−ligand complex PL, the free protein P, or the free ligand L. The energy of each molecule is a sum of intramolecular terms, the Generalized Born energy, 2692

(8)

DOI: 10.1021/acs.jcim.9b00032 J. Chem. Inf. Model. 2019, 59, 2690−2701

Article

Journal of Chemical Information and Modeling Table 1. Data Set Used for Retrospective Validationa protein CDK2 Urokinase CHK1 HSP90

ligands

active

source

PDB IDs

33 (101)

24

CSAR

36 (39) 95 (95) 148 (172)

28 69 113

Abbott Abbott AbbVie

4EK4, 4EK5, 4FKG, 4FKI, 4EK6, 4FKL, 4EK8, 3SW4, 3SW7, 4FKO, 4FKP, 4FKQ, 4FKR, 4FKS, 4FKT, 4FKU, 4FKV, 4FKW 4FUD, 4FUE, 4FU7, 4FU8, 4FU9, 4FUB, 4FUC 4FSM, 4FSW, 4FT5, 4FSN, 4FSZ, 4FT7, 4FT0, 4FT9, 4FTA, 4FTC, 4FSQ, 4FSR, 4FST 4YKR, 4YKQ, 4YKT, 4YKW, 4YKX, 4YKZ, 4YKY, 4YKU

a

The protein target, number of ligands in the data set, number of active ligands, source of data, and the complexes with solved crystal structures are listed. The number in parentheses in the second column indicates the total number of ligands in the data set, some of which were not processed by the method due to lower than 50% similarity with any of the template ligands.

for congeneric ligands to provide sufficient data to validate the structure generation method. Another factor was the availability of both active and inactive compounds in the data set so as to test the enrichment ability of the scoring method. In particular, we chose 4 data sets that had at least 30 ligands and more than 6 inactive compounds. For all sets except HSP90, inactive compounds were already identified in the data sets. For HSP90, an affinity cutoff of −6 kcal/mol was chosen to identify inactive molecules. Analog Structure Generation. The primary goal of our study was to validate the presented template-based protein− ligand 3D structure generation method in the context of lead optimization. Since the subsequent scoring for binding affinity involves many approximations, it is important to assess the accuracy of the generated conformations independent of binding affinities. To this end, we tested our method’s ability to reproduce crystallographic geometries of protein−ligand complexes. A leave-one-out cross docking approach was undertaken where, for each target, one protein structure and the cognate lead ligand were used as a template to generate the geometries of the remainder of the ligands, referred to here as “test ligands”. The RMSD of the top scoring geometry for these test ligands was computed with respect to their crystallographic geometries, which were obtained from the alignment of all protein structures described above. For a protein with M crystal structures, this test repeated M times would result in a maximum of M × (M−1) RMSD values. Some tests involved template ligands that belonged to a different chemical series than the test ligands where the fraction of common atoms, Fc, would be less than 0.5, thus making those ligands outside the scope of the method. A combined total of 177 RMSD values were obtained from the cross docking tests performed for the four proteins. We tested the sensitivity of the results for two variations of the method: (i) rigid vs flexible local residues and (ii) BE and RBE used for scoring (see the Methods section). Keeping the protein rigid during in situ minimization (step 6 of the analog structure generation algorithm) results in a significant speed increase for the calculations, but this approach would not model induced-fit effects. On the other hand, a structure minimization would not cross conformational barriers that may exist between relevant conformational states involved in induced fit binding. Therefore, the benefit of including protein flexibility especially in ranking a congeneric series of ligands is not a priori apparent, which motivated our testing of its effect on the data sets. Second, in many applications of MM-GBSAbased end-point scoring, the unbound ensemble is not modeled separately, which is motivated by the prospect of benefiting from error cancellation. In practice this is implemented either using eq 2 or a similar approach involving computing averages24 instead of choosing the minimum

PI quantifies the directional correctness of predictions in a data set weighted by the difference in experimental values and ranges from −1 to 1. Docking. Two docking programs namely LibDock21 and CDOCKER22 were used to compare against the performance of the presented method. LibDock uses the alignment of physicochemical features computed for protein binding sites (“hotspots”) with the corresponding properties of ligand groups to guide docking. By contrast the program CDOCKER uses a force-field representation and molecular dynamics (MD)-based simulated annealing method to obtain docked poses. Default parameters as implemented in BIOVIA Discovery Studio were used for the docking calculations. For LibDock, key parameters used are as follows. A total of 100 polar and apolar hotspots were calculated for the protein. A RMSD tolerance of 0.25 Å was used to select structurally distinct conformations. A maximum of 1000 starting conformations were used for posing. For CDOCKER, key parameters used are as follows. Ten random initial conformations were generated using high temperature (1000 K) MD. For each conformation, 10 random rotations were performed for posing. The poses were subject to 2000 steps of heating MD to a target temperature of 700 K, followed by simulated annealing MD cooling over 5000 steps to a target temperature of 300 K. The binding site was defined as a sphere that encompasses all atoms of the ligand that the protein was cocrystallized with. For a given cross-docking calculation, a single top ranked pose was obtained for each method, and its RMSD was computed against the corresponding crystal geometry.



RESULTS AND DISCUSSION Data Set. Our primary goal in this study was to test the ability of the Generate Analog Conformations (GAC) method in generating geometries of a congeneric series of ligands given the geometry of one member in the series as a template. A secondary goal was to test the scoring ability of the method to reproduce experimental trends in binding affinity in congeneric sets. To achieve this 2-fold validation goal, we selected 4 data sets (Table 1) with structure and binding affinity data that are publicly available in the Drug Design Data Resource (D3R, https://drugdesigndata.org), which is a result of a communitywide data sharing initiative involving industrial and academic groups.5,6 This resource also contains data sets collected through the Community Structure Activity Resource (CSAR) initiative.23 These data sets contain structure and affinity data for a large number of ligands, some of which fit the description of “congeneric” sets such that the ligands involve variations of groups on a common scaffold as typically is the case in LO. Our choice of the data sets among those available was motivated by the availability of a number of crystal structures 2693

DOI: 10.1021/acs.jcim.9b00032 J. Chem. Inf. Model. 2019, 59, 2690−2701

Article

Journal of Chemical Information and Modeling

Figure 1. Probability distribution of RMSD computed for the top scoring conformation with respect to crystal. The three sets of results correspond to restricting the data set to different values of similarity between the template ligand and the analog ligand, quantified by the fraction of common atoms, Fc.

Figure 2. RMSD (Å) computed for the generated conformations with respect to crystal, as a function of the fraction of common atoms Fc. Panel a shows the RMSD distribution for the top ranked pose only. Panel b shows the same for all conformations generated by the method. Values are color coded based on the protein data set.

(data not shown). Figure S1 in the Supporting Information shows the same data plotted in Figure 1 (left panel) but separately for each of the four targets. Figure 2a shows the RMSD as a function of Fc for the Flex RBE variant for the four data sets. The results for the other three methods are similar and not shown here. This analysis reinforces the observation made here and shows the dependence of reliability of pose prediction on the similarity between the lead and analog compounds quantified by the fraction of common atoms, Fc. Based on this analysis, the applicability domain of the method is in the Fc ≥ 0.65 range. Figure 2b shows the RMSD distribution of all conformations generated including the top scoring one. The deviation from experimental pose increases with decreasing Fc as expected. However, a comparison of panels a and b of Figure 2 shows that the scoring function deprioritizes high RMSD conformations and assigns favorable scores to ones that are 2 Å was 45, 45, 47, and 43 out of 177 for RigidBE, Rigid-RBE, Flex-BE, and Flex-RBE, respectively. For RMSD > 3 Å, the corresponding numbers were 25, 27, 26, and 24. Thus, the prediction quality of the four different methods is similar. A significant number of predictions diverge from the crystal geometry. The RMSD distribution for the Flex-RBE set is depicted in the leftmost panel of Figure 1. Given the fact that our method relies on chemical similarity to the lead compound for structure generation, it is relevant to observe the accuracy of the generated top pose as a function of similarity. Therefore, we restricted the RMSD comparison to Fc ≥ 0.6 and Fc ≥ 0.65; these results are shown in the center and right panels of Figure 1. The number of structures with a large deviation from the crystal pose significantly drops when the Fc floor is raised from 0.5 to 0.65. For example, for the Flex RBE set, the number of RMSD values >3 Å reduces from 24/177 to 1/97, and the number of RMSD values >2 Å reduces from 43/177 to 10/97 (these cases are discussed in greater detail below). The results for Rigid BE, Rigid RBE, and Flex BE follow the same trend 2694

DOI: 10.1021/acs.jcim.9b00032 J. Chem. Inf. Model. 2019, 59, 2690−2701

Article

Journal of Chemical Information and Modeling

Figure 3. Two examples showing the lead and analog ligands chemical structures and the conformations generated. Panel a shows the conformation generation for ligand 3SW7 using 3SW4 as a template, and panel b shows the conformation generation of ligand 4FKV using 4FKQ as a template. The image on the extreme right in both rows shows the overlay of the top scoring conformation with the crystal geometry.

generation method, it avoids the denovo docking problem, which has a greater potential to lead to incorrect binding modes. The GAC method takes a conservative approach to conformation generation where the conformational search is limited to the chemical groups that are different between the analog and the lead compound. This aspect is illustrated in Figure 3a where the lead (3SW4) and analog (3SW7) ligands differ by a nitro and an amine group. The nitro group being in the meta position in 3SW7 can map in two unique ways with the 3SW4 ligand, which are detected by the method resulting in two generated conformations as shown in the third column. A comparison of the top scoring conformation shows the overlap between the 3SW7 crystal geometry and the conformation generated by the method showing that the GBMV-based scoring correctly ranks the experimentally observed conformer among the two generated. Another case illustrated in Figure 3b exemplifies a scenario where the chemical similarity between the lead 4FKQ and analog 4FKV is much lower at Fc = 0.69, and therefore many of the conformations generated are very different from each other, with the diversity largely arising from the parts that are uncommon between two ligands. The right most image in Figure 3b shows the comparison of the top ranked conformation with the crystal geometry (in ball and stick), with an RMSD of 1.49 Å, showing that the score correctly ranks the pose that is closest to the experimental one. These two examples thus illustrate two features of the method that include finding alternative mappings and conformational sampling of the analog chemical group that differs from the lead ligand. In the RMSDs computed for the data set described above, although a majority of the geometries in the Fc < 0.65 range still result in low RMSDs, a significant number of values >3 Å show lack of reliable prediction for dissimilar ligands. We now analyze the 10 out of 97 predictions that involve Fc ≥ 0.65 and RMSD > 2 Å as these cases would be within the applicability

the template-based method using a rigid receptor; though the latter results were found not to be sensitive to modeling protein flexibility. Below, we compare the results obtained with the docking programs for the cross dockings which fall into the applicability domain of the template-based method, i.e. for cases where Fc > 0.65. For the Rigid BE set, the template-based approach resulted in cross docking with RMSD values greater than 2 and 3 Å for 12/97 and 1/97 cases, respectively. The corresponding rates of large RMSDs were much higher for both LibDock- and CDOCKER-based cross dockings. For LibDock, the number of RMSD values above 2 and 3 Å were 47/97 and 32/97, respectively. For CDOCKER the number of RMSD values above 2 and 3 Å were 39/97 and 28/97, respectively. Thus, CDOCKER has a higher cross-docking success rate than LibDock. However, while the docking methods generate accurate poses in the majority of cases, there are a large number of cases where poses significantly deviate from experiment. Figure S2 in the Supporting Information plots the RMSD as a function of Fc (analogous to Figure 2a) for LibDock and CDOCKER, which reinforces the analysis above. Figure S2 also shows that the number of high RMSD values increases with decreasing Fc; though unlike the template-based method, the high RMSD values are not exclusively restricted to the low similarity regime. This observation suggests that docking can be more accurate when using a protein structure that was cocrystallized with a ligand that is similar to the one being docked. While the template-based approach benefits from both having the ligand template and a hololike protein structure, docking only benefits from the latter. It is thus evident that for Fc > 0.65, there is a clear benefit to using the template-based method to generate reliable geometries. These findings are consistent with other recent studies that used a template-based approach and outperformed docking when using a similar template.25,26 We now further elaborate on certain key features and limitations of the method. Being a template-based structure 2695

DOI: 10.1021/acs.jcim.9b00032 J. Chem. Inf. Model. 2019, 59, 2690−2701

Article

Journal of Chemical Information and Modeling

Figure 4. Illustration of the cases for which deviation from crystal geometry is greater than 2 Å. Panel a shows the ligands discussed. Panel b shows the overlay of the alternate crystal geometries of ligand 4EK8 in ball and stick representation and the generated conformation in stick representation. Panel c shows the conformation generation of ligand 4FKT using 4FKP as a template. The protein surface is depicted in yellow.

The deviations for the other 6 predictions in the CDK2 data set are due to ambiguities in the crystal structures. RMSDs in the range 2.40−2.63 Å for 3SW4−4EK8, 3SW7−4EK8, 4FKO−4EK8, and 4EK8−3SW7 are a result of alternate crystallographic geometries for the 4EK8 ligand that place the nitro group differently owing to the rotation of the phenyl ring. Figure 4b shows the two crystallographic geometries of the 4EK8 ligand in ball and stick representation. The predicted geometry shown in stick representation overlaps with one of the alternate crystal structures (cyan) but not with the one arbitrarily chosen as reference for RMSD computation. The results for all four of the cases can be explained by the alternate crystallographic conformations. Another ambiguity in the crystal structures explains the deviations involved in the results for the 4FKT ligand, observed in 4FKP−4FKT and 4FKR− 4FKT predictions, for which the RMSDs are 2.51 and 2.06 Å, respectively. The method generates 10 conformations that place the flexible amino-alkyl group in the 4FKT ligand in different locations as shown in Figure 4c. Analysis of the Bfactors of the atoms associated with the 4FKT ligand shows the flexible nature of the amino alkyl group; the average B-factor of the 5 heavy atoms comprising the group is 54.8, whereas the average value for the remainder of the ligand is only 23.1. Therefore, this discrepancy is expected. The 2.06 Å RMSD for the 4FKR−4FKT case is explained by the same reason. Thus, these deviations are not due to a limitation in the method. Finally, for the one case in the CHK1 data set namely, 4FT7− 4FSW, the RMSD is 4.17 Å. This pair has an Fc exactly equal to 0.65, and the two ligands involved belong to entirely different chemical series. Being a template-based method conformational sampling is adopted only for the differing regions. As described earlier, this approach avoids sampling of groups that are common between the lead compound and the analog. In certain cases, this approach may prove to be restrictively conservative and potentially miss generating relevant binding modes. One example is 3SW7−4EK8 (Figure 4a), which share the nitro group. The generated conformation for the 4EK8 ligand will

domain of the method. Nine of these are from the CDK2 data set, and one is from the CHK1 data set. In the following discussion, we refer to each prediction using PDB IDs of the lead and analog ligands. For the 9 CDK2 cases, RMSDs range from 2.06 to 2.80 Å. Interestingly, for 4EK8−4FKL, 3SW7− 4FKL, 4FKO−4FKL, and 3SW7−4EK8, Fc = 1, and so the relatively high RMSDs of 2.4−2.8 are unexpected. The chemical structures of these are depicted in Figure 4a. The 4EK8−4FKL, 3SW7−4FKL, and 4FKO−4FKL cases involve the 4FKL ligand, which is a substructure of the template lead ligands 4EK8, 3SW7, and 4FKO and is significantly smaller. For the 4FKL ligand, this leads to a full substructure match, and therefore Fc = 1. This applies to 3SW4−4FKL also for which Fc = 0.93. The template lead ligand in all three cases has an additional substituted phenyl ring, which apparently leads to a significantly different binding mode. We computed a score Fc* which quantifies the overlap of the lead compound with the MCS. This metric, which is analogous to Fc, is defined as Fc* =

NMCS Nlead

(9)

Fc* quantifies the extent of the lead compound that is covered by the MCS. For the predictions discussed here, F*c is computed as 0.61, 0.54, 0.58, and 0.57 for 4EK8−4FKL, 3SW7−4FKL, 4FKO−4FKL, and 3SW4−4FKL, respectively. Thus, the relatively high RMSD is explained due to an insufficient overlap of the MCS with the lead compound. The use of Fc to determine the applicability of the method to a given analog ligand is appropriate because the metric directly quantifies the fraction of the ligand for which the structure is not determined (1 − Fc); such a direct relation does not exist with Fc*, which relates to the lead compound instead. While it may be possible to impose a threshold based on F*c , it would reduce the applicability domain of the method, because of which this additional restriction was not imposed. Besides, for the few cases where the fraction of the lead ligand covered is low, the RMSDs are still below 3 Å. 2696

DOI: 10.1021/acs.jcim.9b00032 J. Chem. Inf. Model. 2019, 59, 2690−2701

Article

Journal of Chemical Information and Modeling Table 2. Prediction Metrics for the 39 Congeneric Data Setsa

a

Area Under Curve (AUC) and Predictive Index (PI) are computed and reported using the four different calculation methods. For comparison, the AUC and PI are calculated for the Molecular Weight-based null model. The cells containing AUC values that are above 0.5 are colored in shades of green, whereas the cells containing AUC values that are below 0.5 are colored in shades of red. The cells containing PI values that are above 0 are colored in shades of green, whereas the cells containing AUC values that are below 0 are colored in shades of red. nLigands indicates the total number of ligands in each subset. nActive indicates the number of ligands designated as Active.

potential inaccuracies may arise from the lack of conformational sampling of the shared groups. In summary, this analysis shows that when there is a modest amount of similarity between the lead ligand and those in the congeneric series, such that at least 65% of analog ligands have atoms common with the lead ligand, the method generates accurate geometries.

not explore the alternate location of the nitro group since it is shared with the lead compound. In this particular example, the 4EK8 ligand was found with two occupancies representing the rotation of the nitrophenyl ring, one of which matches the generated geometry. However, one could envision cases where the template-based approach may miss the true conformation. Thus, while the conservative approach to conformer generation may be accurate as shown in the above discussions, 2697

DOI: 10.1021/acs.jcim.9b00032 J. Chem. Inf. Model. 2019, 59, 2690−2701

Article

Journal of Chemical Information and Modeling

Figure 5. Weighted averages of AUC (left) and PI (right) calculated over congeneric data sets for each target. Predictions from the random and null model are also included for comparison.

Binding Affinity Scoring. To test the ability of the method to score congeneric ligands in accordance with binding affinity, data from the same set of four target proteins was used as above. Table 1 lists the proteins in the data set with the number of ligands with affinity data and the number designated as active in each set. Only a subset of the ligands (in parentheses in Table 1, column 2) was processed in the calculations since a number of ligands in each set were far too different from the template ligands with a solved crystal structure, thus putting them out of the scope of the method (Fc < 0.5). In total, 312 ligands across the four targets were processed by the method for which results are discussed below. Since there are multiple cocrystal structures present in each set (Table 1, rightmost column) that could potentially be used as a template for structure generation and scoring, all of them were used in separate runs. For each template protein−ligand cocrystal structure, the subset of the ligands chosen to be scored was based on the Fc > 0.5 criterion. Each such test is representative of a lead optimization scenario where only one cocrystal structure is available, and virtual screening of analogs related to the lead compound crystallized with the protein is desired. On the other hand, oftentimes in projects there are multiple cocrystal structures available, in which case predictions could be combined from using different protein structures. Here, we analyze both of these scenarios. Scores were calculated for 39 congeneric sets, which are composed of 13, 7, 11, and 7 sets for CHK1, Urokinase, CDK2, and HSP90, respectively. The number of compounds in the sets ranged from 5 to 82, with the average set size being 28.5. Table 2 lists the Area Under Curve (AUC) and the Predictive Index (PI) metrics computed for each set. The AUC can be used to assess the ability of the method to separate active compounds from inactive ones. In addition, since these sets are composed of analogous compounds, it is reasonable to expect the approximate scoring method used here to rank order ligands in accordance with binding affinity. To test these predictions, we used the Predictive Index (PI) metric as described in the Methods section to judge the quality of rank ordering. The PI metric ranges from −1 to 1 and is a measure of directional correctness of pairwise affinity predictions with respect to experiment weighted by the experimental difference in affinity. A detailed analysis of the AUC and PI values computed for each set is presented in Table 2. The AUC and PI computed using the null model and the four different variations of the method are tabulated for all the sets and are color-coded.

Predictions better or worse than random are depicted in green or red, respectively. Of 39 congeneric sets, the Rigid BE, Rigid RBE, Flex BE, and Flex RBE methods outperform the random model when comparing AUC (PI) in 35 (35), 36 (37), 33 (34), and 37 (36) sets, respectively. When comparing AUC (PI) against the MW-based null model, 28 (26), 30 (29), 24 (25), and 27 (27) sets outperform using the Rigid BE, Rigid RBE, Flex BE, and Flex RBE methods, respectively. We also compared the predictions using the Kendall Tau metric as reported in multiple studies.5,6 When comparing Kendall Tau for the 39 data sets, the Rigid BE, Rigid RBE, Flex BE, and Flex RBE methods outperform the random model for 35, 36, 34, and 35 data sets, respectively. When comparing against the MW-based null model, 24, 26, 25, and 26 sets outperform using the Rigid BE, Rigid RBE, Flex BE, and Flex RBE methods, respectively. Table S1 in the Supporting Information reports the individual Kendall Tau values for the data sets and color codes them analogous to Table 2. The qualitative similarity of the comparisons obtained from PI and Kendall Tau shows that the results are not very sensitive to the choice of rank ordering metric. To summarize the predictions and observe target dependence, we computed the weighted average of these metrics over all sets for each target and present them in Figure 5. The average AUC values range from 0.60 to 0.98 and are data set dependent. Judged by the weighted average of AUC (Figure 5, left panel), all four of the methods on all four data sets perform better than random model and the MWbased null model. Similarly, as judged by PI (Figure 5, right panel), all comparisons with the exception of one (Flex BE for HSP90) perform better than the null model. While this comparison involves averaged metrics, the direct comparison across the 39 data sets shows that the method almost always outperforms random selection and in 62−77% of cases outperforms the MW-based null model. Thus, based on these analyses the method shows potential for prioritizing congeneric design ideas. Second, the performance of the four methods is similar; no single method outperforms others across all 4 data sets. However, within the flexible and rigid classes, the RBE metric yields marginally better prediction than BE. Next, we test the second scenario where predictions from multiple template structures, when available, could be combined. It is not uncommon in LO projects to have more than one chemical series of compounds as drug candidates. Such a scenario can be modeled by the use of multiple template structures evaluated here. In the following analysis, 2698

DOI: 10.1021/acs.jcim.9b00032 J. Chem. Inf. Model. 2019, 59, 2690−2701

Article

Journal of Chemical Information and Modeling

Figure 6. Area under the ROC curve (AUC, left) and Predictive Index (PI, right) calculated for the four data sets combined from predictions obtained from multiple template structures. Error bars depict 95% confidence interval obtained from 5000-fold bootstrap sampling. Predictions from the random and null model are also included for comparison.

However, when assessing the method by quantitatively rank ordering ligands by affinity, the performance is not better than the MW-based null model. In Figure 6, the right panel shows the Predictive Index (PI) obtained for the four methods on all four data sets. While the predictions are significantly better than random, they are statistically similar when compared to the null model for Urokinase, CDK2, and HSP90. The worse performance as judged by PI in these heterogeneous data sets is not surprising for an approximate scoring method as used due to the presence of multiple ligand series which does not allow for error cancelation and inclusion of noise in predictions, as noted above. Based on this analysis it can be concluded that the rapid scoring which is obtained as a byproduct of the conformation generation procedure is expected to be of utility in choosing active compounds from the pool of compound ideas with a modest level of additional enrichment as compared to the intuitive MW-based null model. More detailed calculations that incorporate the physics of protein−ligand interactions such as explicit solvent and dynamics were not undertaken in this study, which was aimed at establishing a baseline level of predictability. The reliable geometry generation validated in this work suggests that these can serve as a starting point for implementing detailed calculations including free energy methods for accurate binding affinity estimation.

we combine predictions obtained from the different template cocrystal structures. The score associated with the highest value of Fc was chosen for each ligand. This was done to maximize the overlap in ligand similarity and therefore increase accuracy of pose predictions. Additionally, this would result in the best matching protein structure and therefore the most reliable score. Combination of the predictions this way results in the sets becoming larger: CHK1, Urokinase, CDK2, and HSP90 to 95, 36, 33, and 148 ligands, respectively. However, this also leads to multiple chemical series of compounds being present in the same set, which makes the set challenging to predict. The previous tests described earlier involved trends in affinity prediction for closely related analogous compounds, which led to better opportunities for error cancellation. This may not be the case here with different scaffolds present in the same scoring set. Thus, it is reasonable to expect worse predictive ability for the heterogeneous data sets considered here. The left panel in Figure 6 shows the AUC computed using the four approaches. The AUC values resulting from the predictions range from 0.64 to 0.98 and are data set dependent. Also shown are the predictions from the MWbased null and random model. To judge the statistical significance of the results, 5000 bootstrapping samples were generated by random selection of data points, and 95% confidence interval error bars are depicted. As with the predictions in the individual sets, the results for the combined sets are target dependent but significantly better than random for all four methods and proteins. For Urokinase, CDK2, and HSP90, all four methods outperform the MW-based null model. For Urokinase, the difference is statistically significant; for CDK2, 3 out of 4 methods are significantly better. For HSP90, the flexible protein-based predictions are statistically separated from the MW-based null model, but this is not the case for the rigid-protein-based predictions. Unlike these three target proteins, for CHK1, there is not a significant difference between our prediction and the null model. Therefore, while the scoring from all four variants is significantly better than random, this does not hold true for the MW-based null model. Thus, in heterogeneous data sets that are comprised of many different congeneric series of compounds, the method performs better than the null model in 3 out of 4 targets and as good in one, when assessed in the ability to separate active from inactive compounds.



CONCLUSIONS The presented Generate Analog Conformations (GAC) method is designed to be used in lead optimization for the purpose of generating 3D geometries of a congeneric series of compounds that share a scaffold with a lead compound with determined geometry in complex with the protein. The accuracy of the generated conformations as described above implies that the method can be used for accurate pose generation as long as 65% of the analog structure is shared with the lead compound. It was shown that within this applicability domain, the method outperforms docking. Additionally, the score calculated during the process of conformer generation can be of utility in selecting active compounds from a set of compound ideas. Validation performed in this study using four data sets shows that for closely related congeneric compound data sets, the method almost always outperforms random selection, and in 62 to 77% of such data sets, it outperforms an intuitive molecular weight-based null model. Thus, it is 2699

DOI: 10.1021/acs.jcim.9b00032 J. Chem. Inf. Model. 2019, 59, 2690−2701

Article

Journal of Chemical Information and Modeling

(7) Hassan, M.; Brown, R. D.; Varma-O’brien, S.; Rogers, D. Cheminformatics Analysis and Learning in a Data Pipelining Environment. Mol. Diversity 2006, 10, 283−299. (8) Kumar, A.; Zhang, K. Y. J. A Cross Docking Pipeline for Improving Pose Prediction and Virtual Screening Performance. J. Comput.-Aided Mol. Des. 2018, 32, 163−173. (9) Duan, R.; Xu, X.; Zou, X. Lessons Learned from Participating in D3R 2016 Grand Challenge 2: Compounds Targeting the Farnesoid X Receptor. J. Comput.-Aided Mol. Des. 2018, 32, 103−111. (10) Berman, H. M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T. N.; Weissig, H.; Shindyalov, I. N.; Bourne, P. E. The Protein Data Bank. Nucleic Acids Res. 2000, 28, 235−242. (11) BIOVIA Discovery Studio Modeling Environment, Release 2018; BIOVIA Dassault Systemes: San Diego, CA. https://www.3dsbiovia. com/products/collaborative-science/biovia-discovery-studio/ (accessed May 7, 2019). (12) Eswar, N.; Webb, B.; Marti-Renom, M. A.; Madhusudhan, M. S.; Eramian, D.; Shen, M. Y.; Pieper, U.; Sali, A. Comparative Protein Structure Modeling Using Modeller. In Curr. Protoc Bioinformatics; John Wiley & Sons. Inc.: 2006; Vol. 15, pp 5.6.1−5.6.30, DOI: 10.1002/0471250953.bi0506s15. (13) Spassov, V. Z.; Flook, P. K.; Yan, L. LOOPER: A Molecular Mechanics-Based Algorithm for Protein Loop Prediction. Protein Eng., Des. Sel. 2008, 21, 91−100. (14) Spassov, V. Z.; Yan, L. A Fast and Accurate Computational Approach to Protein Ionization. Protein Sci. 2008, 17, 1955−1970. (15) Li, J.; Ehlers, T.; Sutter, J.; Varma-O’brien, S.; Kirchmair, J. CAESAR: A New Conformer Generation Algorithm Based on Recursive Buildup and Local Rotational Symmetry Consideration. J. Chem. Inf. Model. 2007, 47, 1923−1932. (16) Dominy, B. N.; Brooks, C. L., 3rd Methodology for ProteinLigand Binding Studies: Application to a Model for Drug Resistance, the HIV/FIV Protease System. Proteins: Struct., Funct., Genet. 1999, 36, 318−331. (17) Brooks, B. R.; Brooks, C. L., III; MacKerell, A. D., Jr.; Nilsson, L.; Petrella, R. J.; Roux, B.; Won, Y.; Archontis, G.; Bartels, C.; Boresch, S.; Caflisch, A.; Caves, L.; Cui, Q.; Dinner, A. R.; Feig, M.; Fischer, S.; Gao, J.; Hodoscek, M.; Im, W.; Kuczera, K.; Lazaridis, T.; Ma, J.; Ovchinnikov, V.; Paci, E.; Pastor, R. W.; Post, C. B.; Pu, J. Z.; Schaefer, M.; Tidor, B.; Venable, R. M.; Woodcock, H. L.; Wu, X.; Yang, W.; York, D. M.; Karplus, M. CHARMM: The Biomolecular Simulation Program. J. Comput. Chem. 2009, 30, 1545−1614. (18) Im, W.; Lee, M. S.; Brooks, C. L., 3rd Generalized Born Model with a Simple Smoothing Function. J. Comput. Chem. 2003, 24, 1691−1702. (19) Lee, M. S.; Feig, M.; Salsbury, F. R.; Brooks, C. L., III New Analytic Approximation to the Standard Molecular Volume Definition and Its Application to Generalized Born Calculations. J. Comput. Chem. 2003, 24, 1348−1356. (20) Pearlman, D. A.; Charifson, P. S. Are Free Energy Calculations Useful in Practice? A Comparison with Rapid Scoring Functions for the P38 MAP Kinase Protein System. J. Med. Chem. 2001, 44, 3417− 3423. (21) Rao, S. N.; Head, M. S.; Kulkarni, A.; LaLonde, J. M. Validation Studies of the Site-Directed Docking Program LibDock. J. Chem. Inf. Model. 2007, 47, 2159−2171. (22) Wu, G.; Robertson, D. H.; Brooks, C. L., 3rd; Vieth, M. Detailed Analysis of Grid-Based Molecular Docking: A Case Study of CDOCKER-a Charmm-Based Md Docking Algorithm. J. Comput. Chem. 2003, 24, 1549−1562. (23) Carlson, H. A. Lessons Learned over Four Benchmark Exercises from the Community Structure-Activity Resource. J. Chem. Inf. Model. 2016, 56, 951−954. (24) Brown, S. P.; Muchmore, S. W. Large-Scale Application of High-Throughput Molecular Mechanics with Poisson-Boltzmann Surface Area for Routine Physics-Based Scoring of Protein-Ligand Complexes. J. Med. Chem. 2009, 52, 3159−3165.

expected that our method can be useful in shortlisting congeneric design ideas, and the resultant geometries can be used as initial conformation for computationally intensive free energy methods to obtain more accurate affinity predictions.



ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jcim.9b00032.



Protein structures in mol2 format, downloaded from PDB and prepared using BIOVIA Discovery Studio and used in calculations, SDF files containing crystallographic conformation of lead ligands used as a template for structure generation and validation, and SDF files containing set of ligands for binding affinity validation (ZIP) Probability distribution of RMSD, Figure S1; RMSD computed for docked conformations with respect to crystal, Figure S2; and Table S1, prediction metrics for 39 congeneric data sets (PDF)

AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. ORCID

E. Prabhu Raman: 0000-0003-2303-7757 Notes

The author declares no competing financial interest. The method as described in this article is implemented as a protocol named “Generate Analog Conformations” in BIOVIA Discovery Studio package version 2018 or later. For further details, please see https://www.3dsbiovia.com/products/ collaborative-science/biovia-discovery-studio/.



ACKNOWLEDGMENTS The author thanks Lisa Yan, Hugues-Olivier Bertrand, and Eric Hurd for a critical review of the manuscript and Tedman Ehlers for stimulating discussions.



REFERENCES

(1) Paul, S. M.; Mytelka, D. S.; Dunwiddie, C. T.; Persinger, C. C.; Munos, B. H.; Lindborg, S. R.; Schacht, A. L. How to Improve R&D Productivity: The Pharmaceutical Industry’s Grand Challenge. Nat. Rev. Drug Discovery 2010, 9, 203−214. (2) De Vivo, M.; Masetti, M.; Bottegoni, G.; Cavalli, A. Role of Molecular Dynamics and Related Methods in Drug Discovery. J. Med. Chem. 2016, 59, 4035−4061. (3) Cournia, Z.; Allen, B.; Sherman, W. Relative Binding Free Energy Calculations in Drug Discovery: Recent Advances and Practical Considerations. J. Chem. Inf. Model. 2017, 57, 2911−2937. (4) Knight, J. L.; Brooks, C. L., 3rd Lambda-Dynamics Free Energy Simulation Methods. J. Comput. Chem. 2009, 30, 1692−1700. (5) Gaieb, Z.; Liu, S.; Gathiaka, S.; Chiu, M.; Yang, H.; Shao, C.; Feher, V. A.; Walters, W. P.; Kuhn, B.; Rudolph, M. G.; Burley, S. K.; Gilson, M. K.; Amaro, R. E. D3R Grand Challenge 2: Blind Prediction of Protein-Ligand Poses, Affinity Rankings, and Relative Binding Free Energies. J. Comput.-Aided Mol. Des. 2018, 32, 1−20. (6) Gathiaka, S.; Liu, S.; Chiu, M.; Yang, H.; Stuckey, J. A.; Kang, Y. N.; Delproposto, J.; Kubish, G.; Dunbar, J. B., Jr.; Carlson, H. A.; Burley, S. K.; Walters, W. P.; Amaro, R. E.; Feher, V. A.; Gilson, M. K. D3r Grand Challenge 2015: Evaluation of Protein-Ligand Pose and Affinity Predictions. J. Comput.-Aided Mol. Des. 2016, 30, 651−668. 2700

DOI: 10.1021/acs.jcim.9b00032 J. Chem. Inf. Model. 2019, 59, 2690−2701

Article

Journal of Chemical Information and Modeling (25) Xu, X.; Ma, Z.; Duan, R.; Zou, X. Predicting Protein-Ligand Binding Modes for CELPP and GC3: Workflows and Insight. J. Comput.-Aided Mol. Des. 2019, 33, 367−374. (26) Ignatov, M.; Liu, C.; Alekseenko, A.; Sun, Z.; Padhorny, D.; Kotelnikov, S.; Kazennov, A.; Grebenkin, I.; Kholodov, Y.; Kolosvari, I.; Perez, A.; Dill, K.; Kozakov, D. Monte Carlo on the Manifold and MD Refinement for Binding Pose Prediction of Protein-Ligand Complexes: 2017 D3R Grand Challenge. J. Comput.-Aided Mol. Des. 2019, 33, 119−127.

2701

DOI: 10.1021/acs.jcim.9b00032 J. Chem. Inf. Model. 2019, 59, 2690−2701