Article pubs.acs.org/JPCB
Multi-Conformer Ensemble Docking to Difficult Protein Targets Sally R. Ellingson,†,§,∥,⊥ Yinglong Miao,‡,§,# Jerome Baudry,‡,§ and Jeremy C. Smith*,‡,§ †
Genome Science and Technology, University of Tennessee, Knoxville, Tennessee, United States Department of Biochemistry and Cellular and Molecular Biology, University of Tennessee, Knoxville, Tennessee, United States § UT/ORNL Center for Molecular Biophysics, Oak Ridge National Laboratory, Oak Ridge, Tennessee, United States ∥ Division of Biomedical Informatics, UK College of Public Health, University of Kentucky, Lexington, Kentucky, United States ‡
S Supporting Information *
ABSTRACT: Large-scale ensemble docking is investigated using five proteins from the Directory of Useful Decoys (DUD, dud.docking.org) for which docking to crystal structures has proven difficult. Molecular dynamics trajectories are produced for each protein and an ensemble of representative conformational structures extracted from the trajectories. Docking calculations are performed on these selected simulation structures and ensemble-based enrichment factors compared with those obtained using docking in crystal structures of the same protein targets or random selection of compounds. Simulation-derived snapshots are found with improved enrichment factors that increase the chemical diversity of docking hits for four of the five selected proteins. A combination of all the docking results obtained from molecular dynamics simulation followed by selection of top-ranking compounds appears to be an effective strategy for increasing the number and diversity of hits when using docking to screen large libraries of chemicals against difficult protein targets.
■
INTRODUCTION Virtual docking of small organic molecules to protein targets is an important drug discovery tool. Predicted small-molecule hits can be prioritized for subsequent experimental validation in binding or functional assays or subjected to more sophisticated computational investigation. When large databases of drug candidates are considered for hit discovery, fast, efficient, and robust high-throughput docking programs are required. For speed, protein receptors are often modeled as rigid or very selectively flexible (by specifying in advance one or a few side chains that can rotate and the states they can adopt). However, proteins are far from static structures. The lock-and-key model of ligand binding first proposed in 1894 by Fischer1 was subsumed by Koshland’s induced-fit model,2 in which ligand binding induces conformational change in the protein. More recently, the conformational selection model3,4 has developed, in which a ligand binds to a specific conformation of the protein target, leading to a stabilized, low-energy protein−ligand complex. Conformational selection and induced fit models are not mutually exclusive, and both effects could contribute to binding; for example, a target protein may fluctuate between multiple low-energy conformational states, a ligand may bind and stabilize one of these states, and small induced-fit changes may further increase the stability of the complex. In some cases, conformational rearrangements are so large that binding can be close to protein refolding.5 High-throughput docking has been used very successfully with single structures for protein targets,6,7 including for offtarget binding and toxicity or repurposing.8 In some cases, © XXXX American Chemical Society
protein targets lend themselves very well to this approach and may be then labeled “easy targets”, such as, for instance, the human estrogen receptor alpha, leading to very successful virtual docking results.9,10 Very recent approaches that integrate computational and X-ray crystallography demonstrate that conformational selection can be leveraged for the discovery of novel protein ligands in large databases of chemicals.11 However, many other protein targets do not lead to such good database enrichments and hit discovery.12,13 What can one do in such cases? We investigate here explicit consideration of the conformational flexibility not only of the potential ligands but also of the entire protein target of interest in these “difficult cases”. Similarly, the use of virtual screening beyond hit discovery, for polypharmacology, toxicity and potency predictions, and repurposing,14 also requires a large number of protein structures to be investigated in order to quantify potential cross-reactivity of drug candidates with several targets. While virtual screening approaches are available to leverage the massive computational power of the cloud and supercomputers, the amount and the complexity of generated data represent a challenge.14 Incorporating the conformational flexibility of several targets increases these difficulties in a combinatorial fashion. Special Issue: William L. Jorgensen Festschrift Received: June 30, 2014 Revised: September 1, 2014
A
dx.doi.org/10.1021/jp506511p | J. Phys. Chem. B XXXX, XXX, XXX−XXX
The Journal of Physical Chemistry B
Article
AmpC β-lactamase (PDB ID: 1XGJ), (2) FGFr1: fibroblast growth factor receptor kinase (PDB ID: 1AGW), (3) GR: glucocorticoid receptor (PDB ID: 1M2Z), (4) PDGFrb: platelet derived growth factor receptor kinase (model), and (5) SRC: inactive conformation of tyrosine kinase SRC (PDB ID: 2SRC) and active conformation (used for docking only) of tyrosine kinase SRC (PDB ID: 1Y57). The first four proteins were selected, since they have been shown to perform poorly with a number of docking algorithms when using crystal structures and protein flexibility may therefore play an important role for drug recognition.13 Human tyrosine kinase SRC was chosen to investigate novel binding sites, as suggested in the literature.28−30 Molecular Dynamics. The structures of five proteins (1− 4) were obtained from DUD and processed using MOE31 (Molecular Operating Environment, Chemical Computing Group, Montréal, Canada) version 2012’s “Structure Preparation” facility to add missing atoms and protonate the structures according to pKa estimates (“Protonate 3D” facility in MOE). The proteins were solvated in a periodic box using the Solvate facility in MOE with NaCl as neutralizing counterions. The resulting models were equilibrated at 300 K, and 100 ns production trajectories were generated using the program NAMD2 32 in the NVT thermodynamic ensemble, an integration time step of 2 fs, and a fixed length of covalent bonds containing the hydrogen atoms. A GPU accelerated version of NAMD232 was used on the Department of Energy’s Titan supercomputer located in the Oak Ridge National Laboratory. Trajectory frames were saved every 100 ps, resulting in 1000 frames for each protein for further analysis. Selection of Protein Conformations. Water molecules were removed from each MD frame, and the protein structures in each frame were aligned to remove translational and rotational movement of the entire protein using the VMD RMSD Trajectory Tool33 in order to only consider internal conformational fluctuations. Residues in close contact with the known cocrystallized ligands (any residue with at least one atom within 4.5 Å) were selected using MOE for protein structure clustering. Since novel potential binding sites were investigated for SRC, all residues were considered. The aligned MD trajectories and residue selections were then uploaded to the National Biomedical Computation Resource’s Opal Server34 to use the Gromos35 RMSD-based clustering tool. A pairwise RMSD value was calculated for every frame in the trajectory, and frames were considered neighbors if their RMSD value is below a provided cutoff. The frame with the largest number of neighbors was taken as the representative conformation of the first cluster. This frame and all its neighbors were removed from the pool of frames and the clustering continued until no frames were left. The cutoff values were set to obtain ∼40 clusters for each protein. The cutoff values were 0.202 nm for AmpC, giving 38 clusters, 0.142 nm for FGFr1, giving 38 clusters, 0.141 nm for GR, giving 39 clusters, 0.153 nm for PDGFrb, giving 37 clusters, and 0.206 nm for SRC, giving 39 clusters. This represents a total of 196 protein conformations in the multiple target ensemblescreening described below (191 snapshots from the molecular dynamics simulations plus 5 crystal structures). Selection of Binding Sites. The binding sites were defined for AmpC, FGFr1, GR, and PDGFrb snapshots by extracting the coordinates of a selected atom which is near the cocrystallized ligands in each of the crystal structures (namely, AmpC: “CA GLY 307”, FGFr1: “N ALA 92”, GR: “NE2 GLN
Molecular flexibility can contribute to a favorable change in free energy of binding in at least two different ways: a favorable change in enthalpy by optimizing noncovalent interactions between the ligand and receptor or a minimization in the decrease in entropy by increasing the flexibility in regions of the protein or ligand.15−17 Protein−ligand complexes undergo a wide range of motions ranging from small changes in binding site residues to large-scale motions of entire protein domains. Some of the approaches used to incorporate protein flexibility in docking include (i) soft docking,18 (ii) rotamer exploration,19 and (iii) generation of MPS (multiple protein structures).20,21 Soft docking uses a rigid protein but allows for some overlap of the ligand and protein atoms in the scoring function by reducing van der Waals penalties at short distances. This is computationally efficient, since only the scoring function is involved in this approach, but allows only for minor side-chain movements. The ligand and receptor sometimes interact too tightly and the “soft” region becomes too large, inhibiting the true complex from being able to form.22 The use of rotamer libraries to model the side-chain movements restricts possible conformations to those in the library, with challenges reported, in even very complete rotamer libraries, in sampling side-chain conformations finely enough to produce collision free structures in test complexes.23 However, it has also been reported that biasing smaller rotations underestimates changes in rotameric states.24 Finally, using MPS for one protein, either multiple experimentally observed or computationally predicted structures, allows for any kind of conformational change but is also limited by the discrete number of structures chosen.25 Molecular dynamics (MD) simulation is an appealing method for generating an ensemble of protein structures. MD is typically limited to the low-microsecond time scale, which may not be enough to sample all conformations and increases the computational complexity of the problem. Multiple structures can be averaged to find a consensus structure or individually used in docking, with the latter considerably increasing computational time. MPS generation and its use in docking is the conformational sampling method that is the closest to the conformational selection binding mechanism, since protein structures are generated independently from ligand binding, and energy refinement techniques allow for a full spectrum of motions postdocking to be investigated. This makes MPS generation an appealing approach for improving the realism of virtual screening. MD derived multiple protein structures are used in the relaxed complex scheme21 which has been used to successfully identify potential drug candidates that would not have been identified using crystal structures alone.26,27 However, it remains to be seen (i) if an improvement over the results obtained with a single crystal structure in the accuracy of virtual screening of large or very large databases of compounds is indeed achieved through MPS generation and docking in multiple protein structures, (ii) if this improvement of accuracy is worth the associated increase in computational cost, and (iii) if the additional number of false-positive drug leads overwhelms the number of novel true leads, “hiding” useful data in a large amount of noise that is very difficult to mine. The present work examines these questions.
■
METHODS Protein Targets. Five proteins included in the directory of useful decoys (DUD)12 were selected to be used in ensemblebased virtual screening. The selected proteins are (1) AmpC: B
dx.doi.org/10.1021/jp506511p | J. Phys. Chem. B XXXX, XXX, XXX−XXX
The Journal of Physical Chemistry B
Article
48”, and PDGFrb: “CA CYS 167”) and using this as the center of the docking box. The dimensions of the boxes were kept consistent (30 Å × 30 Å × 30 Å). Novel binding sites were identified in the SRC snapshots by utilizing the Site Finder facility in MOE. Any identified site in SRC with a propensity value for ligand binding36 greater than 2 was kept (anywhere between 1 and 3 sites were identified in each snapshot). VinaMPI docking boxes were defined by obtaining the center point and unit vectors (with VMD) of a box containing the residues surrounding the identified sites as reported by the Site Finder facility in MOE. Docking and Scoring. The DUD database of ligands and decoys was used to investigate whether docking calculations could identify known ligands preferentially over nonbinding molecules in specific protein targets. The DUD database contains, for each protein target, a set of experimentally validated ligands and a much larger set of decoy molecules that exhibit a degree of chemical similarity with the known ligands (but with a Tanimoto similarity coefficient of less than 0.9 as calculated from the CACTVS structural keys). The entire nonredundant DUD database was docked into the snapshots and crystal structures of AmpC, FGFr1, GR, and PDGFrb (98,164 ligands × 156 conformations = 15,313,584 individual docking calculations) and the SRC specific docking set was docked into each identified SRC binding site (5,948 ligands × 70 sites = 416,360 dockings), totaling 15,729,944 dockings. Results were analyzed for performance in each protein against either (i) the entire DUD database of 98,134 molecules (including decoys and ligands nonspecific to a given target) or (ii) the protein-specific subset of ligand and decoys, as given in Table 1. The program VinaMPI37 was used to perform the
screening of large chemical databases in one single calculation run.37 Enrichment Factor Calculations. The enrichment factor of a database is calculated as EF(subset size) =
where hitss is the number of active compounds identified at the specified subset size and Ns is the total number of compounds at that subset size and hitst and Nt are the total number of actives and total number of compounds in the entire docked library.12 Therefore, an enrichment factor is a characteristic of a ranked list at a given subset size of the ranked list. An enrichment factor of 1 would correspond to identifying the number of active compounds expected at random, an enrichment factor of 2 would correspond to identifying twice the number of compounds expected randomly, and so on. It is important to note here that “active” is defined as a molecule from the “ligand” subset in DUD for its respective protein target. The “decoy” DUD compounds, to the best of our knowledge, have not been tested experimentally for their binding in the DUD protein targets. It would be entirely possible that “decoy” compounds end up being identified as actual ligands of the protein target, if such experimental validation were to be performed. However, in the present work, we consider only the “ligands” subset compounds in DUD as real positives for all enrichment and hit discovery calculations.
■
RESULTS AND DISCUSSION I. Does Using Ensemble Docking Improve Docking Enrichment Factors over Using a Crystal Structure? Table 2 provides the average enrichment factor over all snapshots for each protein at the top 0.5, 1, 2, and 5% of the ranked lists obtained using the decoy subsets for each target protein when docking in the crystal structure (column a), an average of the enrichment factors obtained over all MD snapshots (column b), and the best enrichment factor obtained using any one of the MD snapshots (column c), and a table using the entire DUD database of ligands (98,164 ligands) is given in Table 1-SI (Supporting Information). There does not appear to be a systematic increase in the enrichment factors when using an average of ensemble docking results compared with using the crystal structure. While in some cases the “MD average” (values in column b of Table 2) factor is twice as high as using the crystal structure (for instance, in PDGFrb, 1 and 2% EFs), in other cases the values in column b are an order of magnitude less than the values in column a (for instance, AmpC, 1%, where only one compound is identified through docking). The difference between enrichment factors calculated
Table 1. Total Number of Compounds in Each ReceptorSpecific DUD Library, Number of Ligands in the DUD Subset, and Number of Snapshots Obtained from MD Simulations total actives snapshots
AmpC
FGFr1
GR
PDGFrb
SRC
753 21 38
4323 118 38
2875 78 39
5771 157 37
5948 155 39 (69a)
hitss /Ns hitst /Nt
a
Total number of binding sites, since each of the 39 snapshots may have multiple binding sites.
docking calculations on the Department of Energy’s Titan supercomputer. VinaMPI utilizes the Autodock Vina algorithms for docking and scoring.38 VinaMPI can efficiently process many conformations of many protein targets in virtual Table 2. Enrichment Factorsa AmpC 0.5% 1% 2% 5%
FGFr1
GR
PDGFrb
SRC
a
b
c
a
b
c
a
b
c
a
b
c
0.0 4.8 2.4 1.0
0.0 0.1 0.1 0.1
0.0 0.0 0.0 1.0
0.0 0.0 0.8 1.0
0.8 1.0 1.0 1.0
8.5 4.2 3.4 2.5
18.1 9.1 5.8 2.6
7.1 6.2 5.1 3.7
28.4 20.7 16.2 8.0
3.6 1.8 1.8 1.3
5.4 4.4 3.4 2.3
10.9 8.9 6.2 4.2
a 2.7 2.0 1.6 3.0
0.0 0.0 0.6 0.8
b
c
2.0 2.1 1.9 1.5
9.3 9.1 6.5 4.1
For each of the five protein targets studied here, the enrichment factors at the given subsets of the ranked databases (0.5, 1, 2, and 5%) are reported in the following columns: (a) enrichment factors obtained using the crystal structures of the targets; (b) average enrichment factor of all conformations in each ensemble; (c) best enrichment factor obtained from any of the molecular dynamics snapshots. For SRC, the left “a” column is the active crystal structure and the right “a” column is the inactive crystal structure. a
C
dx.doi.org/10.1021/jp506511p | J. Phys. Chem. B XXXX, XXX, XXX−XXX
The Journal of Physical Chemistry B
Article
Figure 1. Enrichment plots for AmpC. Number of known active compounds identified at every percent of the ranked chemical database after docking when using just the AmpC decoy and active sets (A, for entire database; B, for top 10% of the database).
Figure 2. Enrichment plots for FGFr1. Number of known active compounds identified at every percent of the ranked chemical database after docking when using just the FGFr1 decoy and active sets (A, for entire database; B, for top 5% of the database).
Figure 3. Enrichment plots for GR. Number of known active compounds identified at every percent of the ranked chemical database after docking when using just the GR decoy and active sets (A, for entire database; B, for top 5% of the database).
one of the ∼40 snapshots yields much higher enrichment factors than when using the crystal structure. This improvement can be as high as ∼5 times better in column c of Table 1 than in column a (e.g., pfgfrb 1%). In two cases (FGFr1 0.5 and 1%), using the crystal structure identifies no ligands, whereas one of
using a crystal structure and an average enrichment from MD simulation structures therefore appears to be protein-dependent. The values shown in columns c of Table 2, however, indicate that, for four out of the five proteins investigated here, at least D
dx.doi.org/10.1021/jp506511p | J. Phys. Chem. B XXXX, XXX, XXX−XXX
The Journal of Physical Chemistry B
Article
Figure 4. Enrichment plots for PDGFrb. Number of known active compounds identified at every percent of the ranked chemical database after docking when using just the PDGFrb decoy and active sets (A, for entire database; B, for top 5% of the database).
Figure 5. Enrichment plots for SRC. Number of known active compounds identified at every percent of the ranked chemical database after docking. (A, B) when using just the SRC decoy and active sets (A, for entire database; B, for top 5% of the database).
from those identified in the crystal structure at 10% of the sorted docked database (and 66% at 5% of the sorted database). This indicates that, even in cases where MD snapshots do not identify more ligands than the crystal structure, the chemical diversity of the ligands may be increased; i.e., compounds are identified that would not have scored well in the crystal structure and would have therefore not been considered for experimental validation. Indeed, this diversity enrichment was also observed in the case of a small library in ref 26. In the case of the other protein targets, the advantage of using MD snapshots is twofold: (i) docking in MD snapshots identifies more ligands than docking in the crystal structures (Figures 1−5); (ii) as a corollary, the diversity of the ligands is increased over that obtained using the crystal structure (Figures 5−9-SI, Supporting Information). II. Are the Ligands Identified through Ensemble Docking against MD Snapshots Promiscuous? The numbers of identified active compounds in the top portion of the ranked lists for SRC are shown in Figure 9 (Supporting Information), together with a breakdown of which binding site they were identified in. The first identified binding site generally recovers most of the active compounds, but this is not always the case. There are many snapshots in which the
the MD snapshots identified ligands 8.5 times better than random compound selection. The above finding is illustrated in Figures 1−5. In all cases, enrichment plots exhibit lines obtained from docking on specific MD snapshots that are found to be higher than that (thick line) obtained with the crystal structure, and above the random selection performance. This is particularly the case for low % values (up to 5%). For instance, Figure 3 shows one GR conformation identifies over 40% of the active compounds in the top 5% of the ranked database. The case of the AmpC protein target is different: the values in Table 1 (c) are zero up to 5% of the ranked database, at which point it then has the same value calculated from docking in the crystal structure. Figure 1 details the enrichment factor for the AmpC protein target: Figure 1b shows that three snapshots identify ∼4.8% of the protein’s ligands (y axis) in the top 3−4% of the ranked docked database (x axis). In contrast, using the crystal structure identifies the same percentage of the protein’s ligands in the top ∼0.9% of the ranked docked database, although docking only identifies one ligand. This appears to suggest that, for this target, there appears to be no specific advantage in compound recognition by any MD snapshot over the crystal structure. However, Figure 5-SI (Supporting Information) shows that 50% of the ligands identified in the MD snapshots are different E
dx.doi.org/10.1021/jp506511p | J. Phys. Chem. B XXXX, XXX, XXX−XXX
The Journal of Physical Chemistry B
Article
Table 3. Crystal Structure Ranks 1% 5% 10%
AmpC (/39)
FGFr1 (/39)
GR (/40)
PDGFrb (/38)
SRCactive (/71)
SRCinactive (/71)
1 1 1
33 19 18
12 23 8
32 34 35
37 6 4
68 58 67
(Supporting Information). These figures indicate that improvement of database enrichment (obtained by using MD snapshots rather than crystal structures) in the cases of these large databases is qualitatively similar to what is obtained using ligands and decoys specific to a given protein, suggesting relatively low cross-reactivity. III. Is It Possible to Identify a Priori the MD Snapshots Performing Better than the Crystal Structure? The above results show that protein conformations sampled by MD simulation of a crystal structure can lead to binding site geometries capable of binding more and more diverse ligands than the crystal structure itself. This does not mean that every snapshot is capable of enriching databases. Generally, many MD snapshots are not capable of identifying more ligands than the crystal structure (which is shown for the top percentages of the ranked lists in Table 3)and often actually identify fewer. This means that MD simulation can produce many conformations of a protein target, some of which will significantly improve database enrichment and others which would not lead to any improvement. Is there a way to identify a priori which snapshots lead to such improvement, i.e., is it possible to identify the signal (MD snapshots improving enrichment factors) in an otherwise large amount of noise (the ensemble of all conformations generated by the MD simulations)? Several physicochemical and thermodynamic properties and descriptors of MD-obtained conformations were investigated to determine if they were correlated to enrichment factors achieved. These properties included some obtained from the GROMOS clustering algorithm used to extract distinct conformations from the trajectory, such as cluster size (number of frames that make up the cluster), frame number (trajectory frame number of the largest neighbor conformation that is used as the representative conformation), RMSD of the cluster (largest pairwise RMSD within the cluster), and RMSD from the middle (largest RMSD within the cluster to the representative conformation). Other properties of the SRC conformations and binding sites were extracted using MOE and compared to the rankings of each site/conformation. Characteristics pertaining to the identified binding sites were obtained from the Site Finder tool used to identify the binding sites, including the site size (number of contact atoms in the binding site), PLB (propensity for ligand binding score36), number of hydrophobic contact atoms in the binding site, number of side chain contact atoms in the binding site, and number of contact residues in the binding site. Some additional properties of the conformations were also obtained using the Protein Property tool in MOE, such as the VdW surface area, hydrophilic surface area, hydrophobic surface area, and VdW volume. Unfortunately, no clear correlation was found between these structural properties and their improvement of enrichment factors compared to that obtained with a crystal structure. IV. Can the Results Obtained from Several Snapshots Be Combined to Optimize the Ensemble-Docking Results? Table 4 indicates how MD snapshot results could potentially be combined together in the attempt to improve the
second identified binding site uniquely recovers the majority of the identified active compounds, such as in snapshot numbers 6, 22, and 39. The second identified binding site in these snapshots corresponds to one that is formed when the SH2 regulatory domain and kinase domain of Src are in close contact and the protein adopts an inactive conformation. The importance of this binding site is discussed in more detail later in the manuscript. When three binding sites are present, about half of the identified active compounds are uniquely identified in different binding sites (such as in snapshots 3 and 16) and sometimes the majority of them are uniquely identified in different binding sites (such as in snapshots 28 and 39). This indicates that multiple binding sites add diversity to the identified compounds, and not just redundant information, and suggests that multiple binding sites may play a role in the function of Src by being able to bind multiple ligands. Parts D and E of Figure 9-SI (Supporting Information) compare the number of these compounds that are also identified in the crystal structure of the active conformation (D) or not (E). At the top 10% of the ranked databases, a large number of the active compounds identified in each snapshot are also recovered in the crystal structure. However, at the top 1% of the ranked databases, almost every snapshot uniquely identifies active compounds that would not be found in the crystal structure. Parts F and G of Figure 9-SI (Supporting Information) compare the number of these compounds that are also identified in the crystal structure of the inactive conformation (F) or not (G). Many more ligands are identified in the snapshots than in the crystal structure of the inactive conformation. To better understand if these are results of promiscuous ligand binding to many conformations/sites or uniquely identified ligands, for each known active ligand, a count was made of how many times it appears in the top portion of each ranked database, shown in Table 3-SI (Supporting Information). Counts were made for each time a ligand was scored as a hit in a given binding site (70 binding site occurrences in 39 conformations), and anytime it is scored as a hit for a snapshot (any binding site in it), in which case there are 39. In Table 3-SI (Supporting Information), it is evident that there are some very promiscuous ligands (or at least biased by the docking program), such as ZINC03815412 which binds in 31 different binding site occurrences at the top 1% of the ranked lists and in almost every snapshot at the top 10% of the ranked lists. However, it can also be seen by the fact that 10 different compounds are identified in only one site at the top 1%, that the snapshots are indeed adding diversity to the identified compounds. Good enrichments from docking studies may arise from the group of active compounds for any given protein scoring very well across many protein classes (biased by the docking algorithm), and conversely, bad enrichment in docking studies may arise from many different decoy molecules scoring well for a particular protein. The enrichment factors obtained when using the entire ∼96K molecules of the DUD database are reported in Tables 1-SI and 2-SI and in Figures 1−4, 10−12-SI F
dx.doi.org/10.1021/jp506511p | J. Phys. Chem. B XXXX, XXX, XXX−XXX
The Journal of Physical Chemistry B
Article
of the database, improves the results (compared to using only the crystal structure or randomly selecting compounds) in 3 out of the 5 targets investigated here, i.e., in the case of the GR, PDGFrb, and SRC targets, and for selection of 5% or more of the docked database (rows labeled “total”). In the case of the AmpC and FGFr1 targets, using the crystal structure does not improve the results over a random selection of compounds, suggesting the possibility that the crystal structures are not necessarily adapted to docking and that MD simulations of these structures do not alleviate these issues. Combination method “B” in Table 4, i.e., poling all results from all snapshots together and selecting the best-ranking n%, using MD snapshots further improves the performance of docking over clustering method “A”, and does so at lower percentages of the databases. Table 4 “combination C” indicates the results obtained with the “best snapshot” yielding the top EFs in Table 2. The “best snapshot” systematically outperforms crystalstructure-based docking and random selection of compounds even for the top 0.5% of the database. Table 4 also indicates that many of the compounds identified using ensemble-based docking in MD snapshots would not have been found using the crystal structure at similar percentages of the database (rows labeled “not in crystal”). These results indicate that using MD snapshots in ensemblebased docking and selecting the best ranking compounds from all snapshots results polled together can improve docking results obtained using crystal structures for a similar number of compounds selected for experimental validation: ensemble docking results lead to more and more diverse ligands identified than when using crystal structures. The number of compounds that would need to be experimentally validated to achieve these results is target-dependent: as little as 0.5% of the database in the case of the PDGFrb target and as much as 5% of the database in the case of the SRC target. However, when the crystal structure itself does not perform better than random (the case of the AmpC and FGFr1 targets), such pooling together of all the snapshot results may not improve the results. In the case of the FGFr1 target, very well performing snapshots are identified but without a priori knowledge of what these snapshots are it is not possible to accumulate the results in a way that outperforms a random selection of compounds. V. A Specific Case: Tyrosine Kinase SRC as a Cancer Drug Target. Finally, we discuss the results in one specific case: SRC. Kinases are involved in many biological pathways and processes; therefore, a disturbance in their regulation can be the cause of many diseases including cancer. Tyrosine kinase SRC is known to be overexpressed and upregulated in many tumor types.39,40 Many kinase inhibitors (such as dasatinib) target a highly conserved ATP pocket and thus suffer from a lack of selectivity and efficacy and may fail due to the development of drug resistance. Mutation from a smaller residue to a larger hydrophobic residue in the back of the ATP binding pocket of kinases has been associated with drug resistance in cancer therapies.41 In particular, a T338 mutation causes dasatinib resistance in c-Src and it has been proposed that finding drugs that target positions outside of the ATPbinding pocket can increase target selectivity and give new opportunities to overcome drug resistance.40 The greatly improved docking results presented here obtained from docking into novel binding sites identified in snapshots from an MD trajectory in which c-Src remains in an inactive conformation, over docking into the known ATP-binding site of the crystal structure of the inactive conformation, suggests
Table 4. Number of Active Compounds Recovered at Each Percent of the Ranked List from Docking to the Crystal Structure, at Random, and from Methods A, B, and C after Docking to the MD Snapshotsa subset of ranked database in crystal random A B C
in crystal random A B C
in crystal random A B C
in crystal random A B C
in crystalinactive random A B C
0.5%
1%
AmpC (753 Compounds) 0 1 0 0 total 0 0 not in crystal 0 0 total 0 0 not in crystal 0 0 total 0 0 not in crystal 0 0 FGFr1 (4323 Compounds) 0 0 1 1 total 0 1 not in crystal 0 1 total 0 0 not in crystal 0 0 total 5 5 not in crystal 5 5 GR (2875 Compounds) 7 7 0 1 total 0 0 not in crystal 0 0 total 5 9 not in crystal 5 9 total 11 16 not in crystal 11 16 PDGFrb (5771 Compounds) 3 3 1 2 total 0 3 not in crystal 0 1 total 8 13 not in crystal 5 10 total 9 15 not in crystal 6 12 SRC (5948 Compounds) 0 0 1 2 total 0 0 not in crystal 0 0 total 0 1 not in crystal 0 1 total 7 14 not in crystal 7 14
5%
10%
1 1 0 0 0 0 1 1
1 2 0 0 0 0 1 1
6 6 2 2 2 0 15 13
12 12 6 4 7 4 29 22
10 4 14 14 21 20 31 28
14 8 22 19 30 27 40 37
11 8 14 8 19 10 36 26
16 16 25 11 30 15 52 39
6 8 4 4 12 12 32 31
8 16 10 9 21 17 45 38
a A, an equal portion of the ranked list is extracted for each snapshot (i.e., total percentage/number of snapshots); B, a unified list is made taking the best docking score for each compoundl; C, the best scoring snapshot is used.
“signal-to-noise” ratio of docking calculations, in the case of results considering the target-specific compounds. A similar table obtained from considering all the ∼98K DUD compounds is shown in the Supporting Information (Table 4-SI). Combination method “A” in Table 4, i.e., selecting the very top of each compound list from each snapshot until the total number of compounds corresponds to the top 0.5, 1, 5, or 10% G
dx.doi.org/10.1021/jp506511p | J. Phys. Chem. B XXXX, XXX, XXX−XXX
The Journal of Physical Chemistry B
Article
variations of the protein:ligand complexes will be a very important research axis in the future of massive docking for structure-based drug discovery. Ideally, a successful approach could identify the MD snapshots that increase the diversity of hit compounds over those identified using crystal structures or random selections. Molecular dynamics does generate such extremely useful conformations; the problem is now to identify them.
this approach has the potential to identify compounds that can selectively lock c-Src in an inactive conformation and be of potential pharmaceutical interest.
■
CONCLUSIONS In this work, we find that the use of multiple protein conformations always increases the diversity of identified active compounds, illustrating the usefulness of ensemble docking. This is particularly true for the very top of database ranks postdocking. This is an important finding, since, as computing capacities become larger, larger databases of compounds will be screened against a larger number of protein structures. This increased diversity could be more marked in cases where the chemical diversity of the compounds being screened is large, while in the present work only DUD compounds, chemically similar to each other, were used. Computational docking of a database containing millions of compounds can be useful only if the number of compounds prioritized for experimental validation remains manageable in the experimental laboratory; i.e., only the top first few percent of a docked database would be selected. MD simulations produce conformations that perform better or much better than other conformations and crystal structures. However, no set of rules of conformational or structural properties were identified here for choosing the best protein conformations without prior knowledge of the active compounds. This does not mean that such properties do not exist. As always, conformational sampling could be different when running longer molecular dynamics simulations, and could potentially identify other relevant conformations. The most efficient way to use the ensemble-based docking results appears to be combining all the docking results together and selecting the best ranking compounds (clustering method “B” in Table 4). Thanks to the available supercomputing power, this approach can be used before experimental screening and does reduce the time, cost, and complexity of the hit discovery process. It will be interesting to compare those results with thecomputationally expensiveresults obtained using flexible side chain docking to characterize the side chains’ and backbone dynamics’ respective contributions to ensemble docking results. Ensemble Docking and Conformational Selection Mechanism. Cluster size (i.e., how common a particular conformation is in the molecular dynamics trajectory) is a promising property because the cluster size corresponds to relative free energies of specific conformations, with a lower free energy cluster being more likely to be “seen” by a ligand. This however is not truly a representation of the conformational selection process. What docking mostly evaluates is the binding free energy (or at least the binding enthalpy) of a ligand in a protein. Improvements of scoring functions and of the treatment of protein:ligand interactions, such as pioneered by the Jorgensen group, are a fundamental development pathway for the future of structure-based drug discovery. In the conformational selection model, the ligand could conceivably select a protein conformation of a relatively high free energy, but the complex then would be stabilized to a lower free energy than other bound conformations. While the parametrization of a docking scoring function includes experimental results from protein:ligand complexes in their crystal structure, our presently unsuccessful attempts at identifying conformational properties determining the “binding-enabled” protein conformations suggest that calculationor estimationsof free energy
■
ASSOCIATED CONTENT
S Supporting Information *
Enrichment factors and enrichment plots, details for screening of the entire DUD database, and details of all “ligand” compounds docked in the SRC target. This material is available free of charge via the Internet at http://pubs.acs.org.
■
AUTHOR INFORMATION
Corresponding Author
*E-mail:
[email protected]. Present Addresses ⊥
S.R.E.: Division of Biomedical Informatics, UK College of Public Health, University of Kentucky, 2365 Harrodsburg Rd, Suite A230, Lexington, KY 40504-3381. # Y.M.: Howard Hughes Medical Institute, University of California San Diego, La Jolla, CA 92093-0365. Author Contributions
S.E.R. and Y.M. performed the calculations. J.B. and J.C.S. codirected the project. The manuscript was written through contributions of all authors. All authors have given approval to the final version of the manuscript. Notes
The authors declare no competing financial interest.
■
ACKNOWLEDGMENTS We thank Loukas Petridis for helpful discussions regarding molecular docking simulations using NAMD2 and Jim Phillips (UIUC) for providing the GPU-enabled binaries of NAMD2. This work was partially funded through the National Center for Computational Sciences at Oak Ridge National Laboratory, and computing time was provided through grant number bio026.
■
ABBREVIATIONS DUD, directory of useful decoys; MD, molecular dynamics; AmpC, AmpC β-lactamase; FGFr1, fibroblast growth factor receptor kinase; GR, glucocorticoid receptor; PDGFrb, platelet derived growth factor receptor kinase; SRC, tyrosine kinase SRC; RMSD, root mean squared deviation; MOE, Molecular Operating Environment; PDB, Protein Data Bank; NaCl, sodium chloride; ns, nanosecond; VMD, visual molecular dynamics; PLB, propensity for ligand binding; GPU, graphical processing unit
■
REFERENCES
(1) Fischer, E. Einfluss Der Configuration Auf Die Wirkung Der Enzyme. Ber. Dtsch. Chem. Ges. 1894, 27, 2985−2993. (2) Koshland, D. E. Application of a Theory of Enzyme Specificity to Protein Synthesis. Proc. Natl. Acad. Sci. U. S. A. 1958, 44, 98−104. (3) Straub, F. B. Formation of the Secondary and Tertiary Structure of Enzymes. Adv. Enzymol. Relat. Areas Mol. Biol. 1964, 89−114. (4) Ma, B.; Shatsky, M.; Wolfson, H. J.; Nussinov, R. Multiple Diverse Ligands Binding at a Single Protein Site: A Matter of PreExisting Populations. Protein Sci. 2002, 11, 184−197.
H
dx.doi.org/10.1021/jp506511p | J. Phys. Chem. B XXXX, XXX, XXX−XXX
The Journal of Physical Chemistry B
Article
(5) Durrant, J. D.; McCammon, J. A. Computer-Aided DrugDiscovery Techniques That Account for Receptor Flexibility. Curr. Opin. Pharmacol. 2010, 10, 770−774. (6) Vidler, L. R.; Filippakopoulos, P.; Fedorov, O.; Picaud, S.; Martin, S.; Tomsett, M.; Woodward, H.; Brown, N.; Knapp, S.; Hoelder, S. Discovery of Novel Small-Molecule Inhibitors of BRD4 Using Structure-Based Virtual Screening. J. Med. Chem. 2013, 56, 8073− 8088. (7) Hu, G.; Li, X.; Zhang, X.; Li, Y.; Ma, L.; Yang, L.-M.; Liu, G.; Li, W.; Huang, J.; Shen, X.; et al. Discovery of Inhibitors to Block Interactions of HIV-1 Integrase with Human LEDGF/p75 via Structure-Based Virtual Screening and Bioassays. J. Med. Chem. 2012, 55, 10108−10117. (8) Lounkine, E.; Keiser, M. J.; Whitebread, S.; Mikhailov, D.; Hamon, J.; Jenkins, J. L.; Lavan, P.; Weber, E.; Doak, A. K.; Côté, S.; et al. Large-Scale Prediction and Testing of Drug Activity on SideEffect Targets. Nature 2012, 486, 361−367. (9) Schapira, M.; Abagyan, R.; Totrov, M. Nuclear Hormone Receptor Targeted Virtual Screening. J. Med. Chem. 2003, 46, 3045− 3059. (10) Nose, T.; Tokunaga, T.; Shimohigashi, Y. Exploration of Endocrine-Disrupting Chemicals on Estrogen Receptor Alpha by the Agonist/antagonist Differential-Docking Screening (AADS) Method: 4-(1-Adamantyl)phenol as a Potent Endocrine Disruptor Candidate. Toxicol. Lett. 2009, 191, 33−39. (11) Fischer, M.; Coleman, R. G.; Fraser, J. S.; Shoichet, B. K. Incorporation of Protein Flexibility and Conformational Energy Penalties in Docking Screens to Improve Ligand Discovery. Nat. Chem. 2014, 6, 575−583. (12) Huang, N.; Shoichet, B. K.; Irwin, J. J. Benchmarking Sets for Molecular Docking. J. Med. Chem. 2006, 49, 6789−6801. (13) Docking at UTMB: http://docking.utmb.edu/dudresults/. (14) Ellingson, S. R.; Smith, J. C.; Baudry, J. Polypharmacology and Supercomputer-Based Docking: Opportunities and Challenges. Mol. Simul. 2014, 1−7. (15) Balog, E.; Becker, T.; Oettl, M.; Lechner, R.; Daniel, R.; Finney, J.; Smith, J. Direct Determination of Vibrational Density of States Change on Ligand Binding to a Protein. Phys. Rev. Lett. 2004, 93, 028103. (16) Moritsugu, K.; Njunda, B. M.; Smith, J. C. Theory and NormalMode Analysis of Change in Protein Vibrational Dynamics on Ligand Binding. J. Phys. Chem. B 2010, 114, 1479−1485. (17) Balog, E.; Perahia, D.; Smith, J. C.; Merzel, F. Vibrational Softening of a Protein on Ligand Binding. J. Phys. Chem. B 2011, 115, 6811−6817. (18) Jiang, F.; Kim, S. H. Soft Docking”: Matching of Molecular Surface Cubes. J. Mol. Biol. 1991, 219, 79−102. (19) Anderson, A. C.; O’Neil, R. H.; Surti, T. S.; Stroud, R. M. Approaches to Solving the Rigid Receptor Problem by Identifying a Minimal Set of Flexible Residues during Ligand Docking. Chem. Biol. 2001, 8, 445−457. (20) Damm, K. L.; Carlson, H. a. Exploring Experimental Sources of Multiple Protein Conformations in Structure-Based Drug Design. J. Am. Chem. Soc. 2007, 129, 8225−8235. (21) Lin, J.-H.; Perryman, A. L.; Schames, J. R.; McCammon, J. A. Computational Drug Design Accommodating Receptor Flexibility: The Relaxed Complex Scheme. J. Am. Chem. Soc. 2002, 124, 5632− 5633. (22) Cavasotto, C.; Orry, A.; Abagyan, R. The Challenge of Considering Receptor Flexibility in Ligand Docking and Virtual Screening. Curr. Comput.-Aided Drug Des. 2005, 1, 423−440. (23) Dunbrack, R. L., Jr; Karplus, M. Backbone-Dependent Rotamer Library for Proteins Application to Side-Chain Prediction. J. Mol. Biol. 1993, 230, 543−574. (24) Zavodszky, M. I.; Kuhn, L. A. Side-Chain Flexibility in Protein − Ligand Binding: The Minimal Rotation Hypothesis. Protein Sci. 2005, 14, 1104−1114.
(25) Totrov, M.; Abagyan, R. Flexible Ligand Docking to Multiple Receptor Conformations: A Practical Alternative. Curr. Opin. Struct. Biol. 2009, 18, 178−184. (26) Amaro, R. E.; Schnaufer, A.; Interthal, H.; Hol, W.; Stuart, K. D.; McCammon, J. A. Discovery of Drug-like Inhibitors of an Essential RNA-Editing Ligase in Trypanosoma Brucei. Proc. Natl. Acad. Sci. U. S. A. 2008, 105, 17278−17283. (27) Wassman, C. D.; Baronio, R.; Demir, Ö .; Wallentine, B. D.; Chen, C.-K.; Hall, L. V.; Salehi, F.; Lin, D.-W.; Chung, B. P.; Hatfield, G. W.; et al. Computational Identification of a Transiently Open L1/ S3 Pocket for Reactivation of Mutant p53. Nat. Commun. 2013, 4, 1407. (28) Burnham, M. R.; Bruce-staskal, P. J.; Mary, T.; Weidow, C. L.; Ma, A.; Weed, S. A.; Bouton, H.; Harte, M. T.; Ma, A. M. Y. Regulation of c-SRC Activity and Function by the Adapter Protein CAS Regulation of c-SRC Activity and Function by the Adapter Protein CAS. Mol. Cell. Biol. 2000, 20, 5865−5878. (29) Pérez, Y.; Maffei, M.; Igea, A.; Amata, I.; Gairí, M.; Nebreda, A. R.; Bernadó, P.; Pons, M. Lipid Binding by the Unique and SH3 Domains of c-Src Suggests a New Regulatory Mechanism. Sci. Rep. 2013, 3, Article number: 1295. (30) Ritchie, S.; Boyd, F. M.; Wong, J.; Bonham, K. Transcription of the Human c-Src Promoter Is Dependent on Sp1, a Novel Pyrimidine Binding Factor SPy, and Can Be Inhibited by Triplex-Forming Transcription of the Human c-Src Promoter Is Dependent on Sp1, a Novel Pyrimidine Binding Factor SPy, and Can. J. Biol. Chem. 2000, 275, 847−854. (31) Molecular Operating Environment (MOE), 2012. (32) Phillips, J. C.; Braun, R.; Wang, W.; Gumbart, J.; Tajkhorshid, E.; Villa, E.; Chipot, C.; Skeel, R. D.; Kalé, L.; Schulten, K. Scalable Molecular Dynamics with NAMD. J. Comput. Chem. 2005, 26, 1781− 1802. (33) Humphrey, W.; Dalke, a; Schulten, K. VMD: Visual Molecular Dynamics. J. Mol. Graphics 1996, 14 (33−38), 27−28. (34) Stearn, B.; Bhatia, K.; Baldridge, K. K.; Li, W. W.; Arzberger, P.; Chem, O.; Zurich, C. Opal: Simple Web Services Wrappers for Scientific Applications. In ICWS 2006, IEEE International Conference on Web Services; 2006. (35) Daura, X.; Gademann, K.; Jaun, B.; Seebach, D.; van Gunsteren, W. F.; Mark, A. E. Peptide Folding: When Simulation Meets Experiment. Angew. Chem., Int. Ed. 1999, 38, 236−240. (36) Soga, S.; Shirai, H.; Kobori, M.; Hirayama, N. Use of Amino Acid Composition to Predict Ligand-Binding Sites. J. Chem. Inf. Model. 2007, 47, 400−406. (37) Ellingson, S. R.; Smith, J. C.; Baudry, J. VinaMPI: Facilitating Multiple Receptor High-Throughput Virtual Docking on HighPerformance Computers. J. Comput. Chem. 2013, 34, 2212−2221. (38) Trott, O.; Olson, A. AutoDock Vina: Improving the Speed and Accuracy of Docking with a New Scoring Function, Efficient Optimization and Multithreading. J. Comput. Chem. 2011, 31, 455− 461. (39) Yeatman, T. J. A Renaissance for SRC. Nat. Rev. Cancer 2004, 4, 470−480. (40) Getlik, M.; Grütter, C.; Simard, J. R.; Klüter, S.; Rabiller, M.; Rode, H. B.; Robubi, A.; Rauh, D. Hybrid Compound Design to Overcome the Gatekeeper T338M Mutation in cSrc. J. Med. Chem. 2009, 52, 3915−3926. (41) Daub, H.; Specht, K.; Ullrich, A. Strategies to Overcome Resistance to Targeted Protein Kinase Inhibitors. Nat. Rev. Drug Discovery 2004, 3, 1001−1010.
I
dx.doi.org/10.1021/jp506511p | J. Phys. Chem. B XXXX, XXX, XXX−XXX