Molecular Simulation-Driven Fragment Screening for the Discovery of

Fragment-based drug discovery (FBDD) has become a mainstream approach in drug design because it allows the reduction of the chem- ical space and ...
0 downloads 0 Views 6MB Size
Article Cite This: J. Chem. Inf. Model. 2018, 58, 683−691

pubs.acs.org/jcim

Molecular-Simulation-Driven Fragment Screening for the Discovery of New CXCL12 Inhibitors Gerard Martinez-Rosell,† Matt J. Harvey,‡ and Gianni De Fabritiis*,†,§ †

Computational Biophysics Laboratory (GRIB-IMIM), Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), C/Doctor Aiguader 88, 08003 Barcelona, Spain ‡ Acellera, Barcelona Biomedical Research Park (PRBB), C/Doctor Aiguader 88, 08003, Barcelona, Spain § Institució Catalana de Recerca i Estudis Avançats (ICREA), Passeig Lluis Companys 23, Barcelona 08010, Spain S Supporting Information *

ABSTRACT: Fragment-based drug discovery (FBDD) has become a mainstream approach in drug design because it allows the reduction of the chemical space and screening libraries while identifying fragments with high protein−ligand efficiency interactions that can later be grown into drug-like leads. In this work, we leverage high-throughput molecular dynamics (MD) simulations to screen a library of 129 fragments for a total of 5.85 ms against the CXCL12 monomer, a chemokine involved in inflammation and diseases such as cancer. Our in silico binding assay was able to recover binding poses, affinities, and kinetics for the selected library and was able to predict 8 mM-affinity fragments with ligand efficiencies higher than 0.3. All of the fragment hits present a similar chemical structure, with a hydrophobic core and a positively charged group, and bind to either sY7 or H1S68 pockets, where they share pharmacophoric properties with experimentally resolved natural binders. This work presents a large-scale screening assay using an exclusive combination of thousands of short MD adaptive simulations analyzed with a Markov state model (MSM) framework.



library usually consists of only 1000−5000 compounds,10 while a drug-like library usually comprises between 0.5 and 3 million compounds.1 Second, fragment libraries have been reported to yield higher hit rates than HTS.11,12 The rationale behind this observation is that, as molecules grow, there is more probability that a chemical group causing an unfavorable interaction is included in the molecule and that the introduction of this group ruins completely the affinity for the target. Conversely, fragments, due to their small size, establish less interactions with the target and should be able to bind to a greater number of sites. Moreover, the quality of interactions between a fragment and a protein is usually high, as supported by the conservation of the binding mode as the fragments are grown into larger molecules.13,14 These characteristics make FBDD especially appealing to tackle difficult targets such as allosteric sites or protein−protein interaction interfaces (PPIs). Several experimental techniques are routinely used in FBDD.3 These have to be sensitive enough to detect low affinity fragments and, when possible, shed light on the binding mode of the ligand to enable structure-based drug design (SBDD). The most predominant ones are surface plasmon resonance (SPR), nuclear magnetic resonance (NMR) spectroscopy, and X-ray crystallography, of which only the last two yield structural hints of the binding mode. On the other hand, computational methods such as docking or pharmacophoric

INTRODUCTION Fragment-based drug discovery (FBDD) started to get popular in the early 2000s as an alternative to high-throughput screening1 (HTS) or virtual screening2 (VS) of drug-like molecules. This tendency has continued until our days to the point that FBDD has become a mainstream technique and the driver technology of more than 30 drug candidates.3 The main characteristic that differentiates FBDD from a typical drug-like HTS approach is the size of the ligands employed in the screening phase. In particular, fragments are usually defined as having less than 20 non-hydrogen (or “heavy”) atoms, while drug-like molecules can go up to 30 heavy atoms or more3 and can be understood as a combination of two or more fragments.4 Additionally, fragments have less functionality and are correspondingly weaker than most drug-like hits in HTS, with binding affinities in the range of mM to 30 μM.5 Therefore, while the objective of a HTS technique is to find directly a drug-like lead, the approach used in FBDD is to discover small millimolar-binding fragments with high ligand efficiency6,7 (LE) that can later be extended or linked together to form a drug-sized lead.8 Several advantages characterize FBDD. First, the smaller size of the ligands reduces the accessible chemical space. A study calculated that each heavy atom adds roughly 1 order of magnitude to the number of possible chemical combinations.9 This implies that the chemical space of drug-like molecules is many orders of magnitude bigger than the fragment chemical space. A practical consequence of this fact is that a fragment © 2018 American Chemical Society

Received: October 24, 2017 Published: February 26, 2018 683

DOI: 10.1021/acs.jcim.7b00625 J. Chem. Inf. Model. 2018, 58, 683−691

Article

Journal of Chemical Information and Modeling

Figure 1. (A) APBS analysis for the CXCL12 monomer (chain A of PDB 4UAI42). The positively charged surface is colored in blue and negatively charged in red. (B) CXCL12 monomer (chain A of PDB 2K0538) depicted in gray surface bound to CXCR4 receptor chains (colored in green and yellow) with sulfo-tyrosines depicted in VDW-style. (C) CXCL12 monomer (chain A of PDB 2NWG46) represented in gray surface and heparin (H1S) residues highlighted in VDW-style. (D) CXCL12 monomer (chain A of PDB 4UAI42) depicted in cartoon-style and colored by secondary structure. (E) CXCL12 monomer (chain A of PDB 2N5547) displayed as gray surface and CXCR4 chain in transparent green cartoon. Highlighted residues of CXCR4 in VDW-style correspond to key CXCR4−CXCL12 interaction residues: tyrosines 12 and 21 and isoleucines 4 and 6. (F) CXCL12 monomer (chain A of PDB 4UAI42) with in silico screening hits superposed in different colors. Ten representative frames per hit are shown in line-style smoothed with a moving average window of 3.

screening have been sparsely used in the literature.15 However, in general, the application of computational methods to FBDD has been quite limited to date, partially due to promiscuity of fragments binding mode16,17 and docking limited ability to correctly describe protein conformational plasticity18,19 or to rank fragments.15 Molecular dynamics (MD) simulation has been proposed as a computational alternative able to characterize the binding of fragments with atomic detail and femtosecond time resolution. Early work on the benzamidine−trypsin system20 or the development of methods to map fragment binding hotspots such as SILCs21,22 or MDmix23 exemplify the potential of the technique to recover fragment occupancies and even binding affinities. Additionally, recent validation work on factor Xa24 and on FKBP25 further supports the ability of MD to correctly rank affinities and kinetics for a set of fragments given a protein system. We present here an MD-driven fragment screening study using a library of 129 compounds and a total simulation time of 5.85 ms against the chemokine CXCL12. We obtained the compound library by filtering the ZINC26 database using a combination of chemical property filters, docking to MDsampled protein conformations, and a scaffold filter. Then, for each fragment, we built systems containing the protein and one ligand embedded in a water box and simulated them using an adaptive sampling scheme to explore the binding space efficiently. CXCL12 (stromal cell-derived factor-1/SDF-1) is a chemokine, a small dimerizable soluble protein that stimulates chemotactic cell migration via activation of a G-protein coupled receptor (GPCR).27 Its structure consists of a C-terminal αhelix, three antiparallel β-sheets, and an N-terminal flexible loop

(Figure 1D). CXCL12 and its receptor CXCR4 are particularly well studied, and their participation in physiological processes28 (e.g., embriogenesis, wound healing, stem cell homing) as well as morbid processes (e.g., autoimmune diseases,29 cancer,30−32 HIV33,34) is known. In particular, the significant role of the CXCR4/CXCL12 axis in metastasis, tumor survival, and tumor angiogenesis has raised the interest in developing targeted drug therapies.30 The selection of CXCL12 as the test system responds to (a) the small size of the protein (89−140 residues depending on the isoform and 32,000 atoms in a complete solvated protein− ligand system), which allows a reduction of the computational cost, (b) the difficulty of the target, which is known to present a very “soft” surface, and (c) its involvement in an array of morbid processes, which makes it very clinically relevant. Even though CXCL12 monomer is known to be a difficult target in terms of druggability due to the shallowness of its surface, we were able to characterize the binding mode and affinity of at least 8 hit fragments (∼6% of the total) with high predicted ligand efficiency (LEpred > 0.3). These fragments bound repeatedly to two small cavities (Figure 1F) detected experimentally, namely, sY7 and H1S68, and the pharmacophoric properties of their binding modes are consistent with experimentally resolved natural binders.



RESULTS AND DISCUSSION CXCL12 Surface Characterization. While most attempts to block the CXCR4/CXCL12 axis have been focused on inhibiting the receptor CXCR4,35 which presents a clear druggable cavity where the chemokine CXCL12 docks, targeting CXCL12 has traditionally been deemed “undruggable” due to its shallow surface.36 However, recent studies 684

DOI: 10.1021/acs.jcim.7b00625 J. Chem. Inf. Model. 2018, 58, 683−691

Article

Journal of Chemical Information and Modeling have shown that the CXCL12 surface is not completely flat. In fact, scientists have learned about CXCL12 druggability by studying the interaction between the chemokine and its receptor. This protein−protein interface has been resolved via NMR in several cases, some of which are displayed in Figure 1. From the inspection of these structures and mutation studies, we learned that the CXCL12−CXCR4 interaction and affinity is mediated by key CXCR4 residues,37−39 some of the most important being tyrosines 7, 12, and 21 (Figure 1B and E) and isoleucines 4 and 6 (Figure 1E). The o-sulfation of the aforementioned tyrosines (Figure 1B) in the Golgi apparatus seem to selectively enhance the affinity of CXCR4 for CXCL12.39 Furthermore, CXCL12 has also been resolved bound to heparin40 (Figure 1C). The binding of all of these residues to CXCL12 involves the formation of small pockets and therefore reveals potential binding hot spots that can be leveraged to design and dock specific inhibitors. Consistent with this hypothesis, recent studies report small molecules binding to sY21,36,41,42 sY12,43 and I4/I643 binding pockets. In order to characterize the electrostatic properties of the CXCL12 protein surface, we ran Adaptive Poisson−Boltzmann Solver44 (APBS) to study the charge distribution over the protein surface. The overlap of the binding hot spots previously described with a generally positively charged surface (Figure 1A) helped us to take the decision of selecting a negatively charged compound library for screening. Furthermore, in order to study the dynamics of the CXCL12 monomer and generate protein conformations for docking, we produced an ensemble of 57.6 μs of simulation data of the protein alone solvated in a box of water. Using as metric the dihedral angles of the backbone, we built a Markov state model (MSM) and clustered the data into eight macrostates. A representative from each macrostate was picked (Figure S2) and further used in docking. Interestingly, one of the macrostates (Figure S2 (1)) displays a transient pocket involving CXCL12 first and second beta-sheets that was already described in another study.45 Compound Library Selection. The compounds used for the in silico binding assay were selected following a stepwise filtering protocol (Figure 2) down to the final list of 129

fragments. The initial library was obtained by selecting all ZINC26 database ligands with “on stock” availability. We used the structures provided by default ZINC, whose protonation state is set at pH 7. Subsequently, we selected only negatively charged ligands, with a number of torsions equal or smaller than 2, absence of halogen atoms, and the number of heavy atoms bigger than 6 and smaller than 17. This particular selection allowed us to obtain rather small fragments without halogen atoms, for which reliable parameters are hard to obtain, and with a limited number of torsions in order to reduce the computational expense related to the parametrization procedure (see Methods). Furthermore, the negative charge of the ligands was chosen to fit the positively charged CXCL12 surface, as suggested by applying the APBS analysis. Afterward, we used AutoDock VINA48 with default parameters to dock the remaining 2442 fragments to all of the surface (grid box as big as the protein) of each of the eight CXCL12 conformations obtained from MD simulations. Only fragments yielding any binding pose with a score lower than −5 kcal/mol were kept. In order to ensure chemical diversity, the remaining 1380 fragments were clustered using the SHED scaffold filter49 and 129 random fragments were picked from the 389 different clusters obtained by SHED for subsequent in silico binding assay. In Silico Binding Assay and Hit Discovery. For each of the 129 ligands, we ran what we call an “in silico binding assay”. Essentially, this method consists of the production of a series of short MD simulations, in our case 70 ns long, of the protein and a single ligand molecule solvated in a water box following an adaptive sampling scheme. This scheme consists of periodically respawning simulations taking into consideration previous simulations to enhance the sampling of unexplored areas based on a Markov state model (MSM). The effectiveness of this adaptive sampling scheme was recently assessed for the system benzamidine−trypsin50 where it was shown to reduce the amount of sampling time necessary for statistics convergence in an order of magnitude. A minimum of 29 μs and an average of 45 μs of aggregated simulation time was produced for each system following this scheme. A final analysis using MSMs allowed us to define a bulk state (i.e., unbound) and a number of sink states (i.e., bound) and associated equilibrium probabilities, as well as binding free energies by applying the Boltzmann distribution (see Methods). Predicted ligand efficiencies (LEpred) were obtained by dividing the binding free energy by the number of heavy atoms (Figure 3), and those ligands with LEpred higher than 0.3 were labeled as hits. Eight fragments were found to yield a ligand efficiency higher than 0.3. Note that, although 0.3 is an arbitrary LEpred threshold to define a hit, it is a widespread rule of thumb in FBDD.7 Chemical structure, kinetic, and affinity data for all of the fragment library can be found in the Supporting Information. Additionally, MSMs allowed us to calculate kinetic rates (kon and koff) between bulk and sink states (Table 1). All ligands had kon of 107−108 s−1 M−1 and koff of 105−106 M−1. Note that the predicted dissociation constant (Kd), a common measure of protein−ligand binding affinity, can be calculated as koff divided by kon or directly from the probability of bulk and bound states using the Boltzmann distribution. Interestingly, results may differ depending on whether kinetic rates or equilibrium probabilities are used. This explains the differences between the predicted Kd of the ligands shown in Figure 3C, calculated from the kinetic rates, and the predicted Kd shown in Table 1,

Figure 2. Step-wise filtering protocol employed to select the 129 fragments for further in silico binding assay starting from the ZINC database library. 685

DOI: 10.1021/acs.jcim.7b00625 J. Chem. Inf. Model. 2018, 58, 683−691

Article

Journal of Chemical Information and Modeling

Figure 3. (A) Distribution of heavy atoms number for the in silico screening library. (B) Kernel distribution plot for ligand efficiencies with the hit threshold (0.3) defined in discontinuous red line. (C) Log−log plot for kon and koff for the screened fragment library with hits marked in red.

Table 1. Description of the Fragment hitsa ID

ChemSpider ID

1 2 3 4 5 6 7 8

81494 5537302 134073 68076 71271 472468 5363628 10037905

ΔG0 (kcal/mol) −2.9 −2.96 −3.31 −2.12 −2.4 −2.89 −2.31 −2.72

± ± ± ± ± ± ± ±

0.19 0.11 0.06 0.04 0.19 0.07 0.38 0.19

LEpred (kcal/mol) 0.322 0.329 0.331 0.353 0.343 0.321 0.33 0.302

± ± ± ± ± ± ± ±

0.021 0.012 0.006 0.007 0.027 0.007 0.055 0.021

kon (s−1 M−1) 1.18e+09 1.15e+09 6.60e+08 7.07e+08 4.96e+08 1.50e+09 3.85e+08 4.61e+08

± ± ± ± ± ± ± ±

6.90e+08 6.74e+08 3.91e+07 1.03e+08 2.40e+07 3.36e+07 1.63e+08 2.26e+08

koff (M−1) 2.14e+07 4.71e+07 4.87e+06 4.09e+07 3.20e+07 2.26e+07 3.51e+07 2.69e+07

± ± ± ± ± ± ± ±

2.32e+06 2.12e+06 2.29e+05 4.75e+06 1.16e+06 1.04e+06 5.44e+06 6.26e+06

Kd (mM)

sim. length (μs)

± ± ± ± ± ± ± ±

45.92 40.32 78.21 40.6 40.23 40.79 42.84 79.99

8.08 7.09 3.89 28.86 18.93 7.87 24.4 10.96

2.33 1.42 0.37 2.06 7.3 0.88 11.58 3.35

a

Predicted binding free energy, ligand efficiency (LEpred), binding kinetics (kon and koff), predicted dissociation constant (Kd) calculated from equilibrium probability, and total simulation length for each fragment is shown. The ID column shows the identifiers used throughout the text. The ChemSpider ID column shows the unique identifier in the popular ChemSpider51 database.

first, all of the hits contain a hydrophobic core and a negatively charged group, either a carboxyl or a sulfate; (b) second, all of the hits bind to either sY7 (fragments 1−3) or H1S68 cavities (fragments 4−8); (c) third, all of the hits have a similar binding mode, with the hydrophobic core buried in the CXCL12 surface and with the negatively charged group pointing outside; (d) the negatively charged groups of all the hits establish an electrostatic interaction with an arginine residue (Arg41 for the H1S68 pocket and Arg20 for the sY7 pocket). The chemical properties and binding mode of the hits support the initial hypothesis that CXCL12 has a positively charged surface, represented by arginines 41 and 20, and that negatively charged residues should effectively bind to it. Furthermore, the pharmacophoric properties of the hits are consistent with experimentally resolved ligand-like moieties. In

calculated from the equilibrium probabilities. However, in the present study, we give priority to the binding affinity calculated from the equilibrium probabilities, as the kinetic rates usually bear a higher standard deviation. It is interesting to note that in FBDD the absolute binding free energy is not as relevant as the binding free energy relative to the number of heavy atoms (i.e., ligand efficiency) to categorize a fragment as a good hit. This is the reason why we can find fragments with lower absolute binding free energy but they are not considered hits because, in relative terms, the free energy contribution per atom is not higher than 0.3, which indicates that the protein−fragment interactions are, on average, of a poorer quality. For each of the fragment hits, we selected a representative frame of the bound macrostate (Figure 4, 1−8). The analysis of their binding mode leads us to a number of conclusions: (a) 686

DOI: 10.1021/acs.jcim.7b00625 J. Chem. Inf. Model. 2018, 58, 683−691

Article

Journal of Chemical Information and Modeling

Figure 4. Representative frames of the bound macrostates (1−8 left) for each of the fragment hits (1−8 right). Fragments are represented in thick licorice. Residues around the fragment are represented in thin licorice. Hydrogen bonds established with each respective arginine are represented in discontinuous black lines. Parts A and B show the minimum distance from Arg20 to fragment 1 and Arg41 to fragment 5, respectively, for five representative simulations each where we can observe binding and unbinding events of the fragments with the respective pockets.

particular, (a) the sulfo-tyrosine 7 in the PDB 2K05,38 which contains a hydrophobic benzene core and a negatively charged o-sulfate group interacting with Arg20 (Figure 5A), and (b) the heparin molecule H1S68 in PDB 2NWG,46 whose sulfate interacts with Arg41 and this interaction seems to stabilize the opening of the hydrophobic pocket that the fragments 4−8 leverage to dock their respective hydrophobic cores (Figure 5B). Note, however, that, in both pockets, the respective arginines had to undergo a lateral chain flip in order to establish a double hydrogen bond with the carboxyl/sulfate group of the fragments.

by predicting their binding poses, kinetics, and binding affinities using MSM analysis technology. In particular, our in silico binding assay applied to the CXCL12 chemokine was able to predict at least 8 mM affinity fragments with high predicted ligand efficiency (LEpred > 0.3) specifically binding to sY7 and H1S68 pockets. Interestingly, all fragment hits share similar chemical properties, a hydrophobic core and a carboxyl/sulfate group, and a similar binding mode, with the hydrophobic part buried and the negatively charged group interacting with an arginine (Arg41 for the H1S68 pocket and Arg20 for the sY7 pocket). The pharmacophoric properties of the fragments are consistent with the experimentally resolved binding mode of natural binders: the sulfated tyrosine 7 of the CXCR4 receptor for the sY7 pocket and the sulfate group from heparin for the H1S68 pocket. Furthermore, the relative proximity between sY7 and H1S68 pockets supports the feasibility of a fragment linking and/or fragment extension strategy.



CONCLUSIONS FBDD offers an efficient alternative to HTS by allowing the use of reduced screening libraries that span more effectively across the chemical space and present a higher hit rate. In this study, we applied high-throughput MD simulation in a FBDD context and demonstrated its ability to screen a library of 129 fragments 687

DOI: 10.1021/acs.jcim.7b00625 J. Chem. Inf. Model. 2018, 58, 683−691

Article

Journal of Chemical Information and Modeling

Figure 5. Experimentally resolved pockets and ligand-like moieties that hold analogy with the predicted hits. The discontinuous line in green outlines the cavity that docks the hydrophobic part of the fragments. The discontinuous line in red outlines the occupancy of the negatively charged part of the fragments. (A) PDB 2K0538 with a close detail of the sulfo-tyrosine 7 (sY7) binding in the sY7 pocket. (B) PDB 2NWG46 with a close detail of the heparin 68 (H1S68) interacting with Arg41 and promoting the opening of the H1S68 pocket.

The current study paves the way for the use of MD in the early screening phase of compounds in FBDD. While at the current stage it is impractical to run this type of in silico assay due to computational restrictions, the evolution of software implementations such as ACEMD and the increase of GPU computational power52 may make the computation affordable enough to run full fragment set screenings (in the order of thousands of compounds) in the coming years and eventually introduce MD-based methods such as the present in silico binding assay in mainstream drug discovery pipelines.



METHODS Electrostatic and Conformational Characterization of CXCL12. APBS44 as implemented in VMD53 was used for the electrostatic characterization. Protein conformational analysis (Figure 6 (1)) was produced by running and analyzing MD simulations of the CXCL12 monomer solvated in a water box. The system was built with HTMD54 using chain A from structure 4UAI42 obtained from the PDB55,56 database and adding counterions until neutralization of the total charge. Four 40 ns long equilibrations were performed locally using the software ACEMD57 and following the protocol described elsewhere.58 The Charmm22*59,60 force field was used throughout. Production runs of 100 ns were performed using the NVT ensemble in the distributed network GPUGRID61 using ACEMD. Up to a total of 57.6 μs of simulations were produced using an adaptive sampling scheme50 based on protein self-contacts metric. This adaptive sampling scheme consists of the periodic building of a MSM and respawning from undersampled states in order to optimize the exploration of the conformational landscape. Final analysis was made by building a MSM using backbone dihedrals as metric, tICA62

Figure 6. Graphical description of the pipeline used in the current study. First (1), we ran MD simulations of the chain A of PDB 4UAI42 to explore the conformational space. Second (2), we docked the filtered ZINC library against eight representative protein conformations. Then (3), we built, equilibrated, and ran MD simulations of CXCL12 in the presence of one fragment molecule for each fragment of the final compound library. We ran an adaptive scheme for a number of epochs, consisting of (5) running a MSM analysis on previous simulations and (6) respawning simulations from unexplored areas of the protein−ligand binding space. Finally (7), we ran a final MSM analysis on the total ensemble of the simulations produced to obtain binding poses, kinetics, and binding affinities.

dimensionality reduction down to three dimensions, 200 microstates clustering using the Kmeans algorithm imple688

DOI: 10.1021/acs.jcim.7b00625 J. Chem. Inf. Model. 2018, 58, 683−691

Article

Journal of Chemical Information and Modeling mented in scikit-learn,63 and a 60 ns lag time. Microstates were clustered into eight macrostates using PCCA,64 and one representative from each macrostate was picked for subsequent docking. Both adaptive sampling and analysis was performed using HTMD.54 In Silico Binding Assay. Given a protein and a ligand, an in silico binding assay consists of the following: Building and Equilibration. The system was created by placing the CXCL12 protein (chain A of PDB 4UAI42) in the center of a water box and the fragment in a random location around the protein. The system was then neutralized and equilibrated as specified in the previous section (Figure 6 (3)) using HTMD.54 The Charmm22*59,60 force field was used. Ligand parameters were obtained using the GAAMP65 parametrization tool, which includes a computationally expensive quantum mechanics (QM) dihedral scan using Gaussian66 and a QM data fitting procedure to obtain Charmm-compatible parameters. Adaptive Sampling. Simulations of 70 ns were launched in two independent batches for a number of epochs up to a total average of 54 μs (Figure 6 (4)). Each of the aforementioned epochs consists of the analysis of previous simulations by building a MSM (Figure 6 (5)) and the respawning of simulations from undersampled states in order to optimize the exploration of the protein−ligand contact space (Figure 6 (6)). MSMs were built using contact maps between protein alpha carbons and ligand heavy atoms, five tICA dimensions, and the rest of the default parameters used in HTMD.54 The computational work has been performed using ACEMD57 in the GPUGRID61 distributed network, which aggregates hundreds of GPUs with diverse computational performance offered by independent volunteers. The computation spanned along a wall-clock time of around 6 months, which corresponds to an approximate amount of 380,000 GPU/h (based on a GTX1080 performance). Final Analysis. Final analysis (Figure 6 (7)) was performed using HTMD.54 Contact maps between the alpha carbons of the protein and the heavy atoms of the ligand were calculated for each frame of all of the simulations. Contact maps were clustered into 1000 microstates using the Kmeans clustering algorithm. Five independent MSMs with random 75% of the total simulation data (i.e., bootstraps) were built using three tICA dimensions and a lag time of 10 ns (implied time scales available in Figure S1). Finally, microstates were joined into five macrostates using PCCA, and the equilibrium distribution was calculated. The bulk (i.e., the unbound state) was automatically assigned to the macrostate with the least number of protein− ligand contacts, and the sink (i.e., bound state) was automatically assigned to be the macrostate with the highest equilibrium population excluding the bulk state. Means and standard deviations for each of the measurables (ΔG0, LEpred, kon, koff, Kd) were calculated from the five bootstraps. Results for all of the ligands can be found in the Supporting Information. The equilibrium binding free energy was calculated using the Boltzmann distribution ⎛ P ⎞ ΔG = −KBT ln⎜ sink ⎟ ⎝ Pbulkc ⎠

the equilibrium probability of the bulk or unbound state, and c is the concentration of the fragment.



ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jcim.7b00625. Fragments used in the screening (MOL) Figures showing an implied time scales plot for the selected fragments with high ligand efficiency, protein conformations used for docking, and correlation between fragment binding free energy and number of heavy atoms (PDF) Results of the screening (XLSX)



AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. ORCID

Gerard Martinez-Rosell: 0000-0001-6277-6769 Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS G.D.F. acknowledges support from MINECO (BIO201782628-P) and FEDER. The authors acknowledge funding from ACCIO (RDIS14-1-0002) and from Acellera Ltd. The authors thank all GPUGRID users.



REFERENCES

(1) Macarron, R.; Banks, M. N.; Bojanic, D.; Burns, D. J.; Cirovic, D. A.; Garyantes, T.; Green, D. V. S.; Hertzberg, R. P.; Janzen, W. P.; Paslay, J. W.; Schopfer, U.; Sittampalam, G. S. Impact of HighThroughput Screening in Biomedical Research. Nat. Rev. Drug Discovery 2011, 10, 188−195. (2) Cheng, T.; Li, Q.; Zhou, Z.; Wang, Y.; Bryant, S. H. StructureBased Virtual Screening for Drug Discovery: a Problem- Centric Review. AAPS J. 2012, 14, 133−141. (3) Erlanson, D. A.; Fesik, S. W.; Hubbard, R. E.; Jahnke, W.; Jhoti, H. Twenty Years On: the Impact of Fragments on Drug Discovery. Nat. Rev. Drug Discovery 2016, 15, 605−619. (4) Ursu, O.; Rayan, A.; Goldblum, A.; Oprea, T. I. Understanding Drug-Likeness. WIREs Comput. Mol. Sci. 2011, 1, 760−781. (5) Rees, D. C.; Congreve, M.; Murray, C. W.; Carr, R. FragmentBased Lead Discovery. Nat. Rev. Drug Discovery 2004, 3, 660−672. (6) Hopkins, A. L.; Groom, C. R.; Alex, A. Ligand Efficiency: a Useful Metric for Lead Selection. Drug Discovery Today 2004, 9, 430−431. (7) Murray, C. W.; Rees, D. C. The Rise of Fragment-Based Drug Discovery. Nat. Chem. 2009, 1, 187−192. (8) Ferenczy, G. G.; Keseru, G. M. How are Fragments Optimized? A Retrospective Analysis of 145 Fragment Optimizations. J. Med. Chem. 2013, 56, 2478−2486. (9) Ruddigkeit, L.; van Deursen, R.; Blum, L. C.; Reymond, J.-L. Enumeration of 166 Billion Organic Small Molecules in the Chemical Universe Database GDB-17. J. Chem. Inf. Model. 2012, 52, 2864−2875. (10) Erlanson, D. A.; McDowell, R. S.; O’Brien, T. Fragment-Based Drug Discovery. J. Med. Chem. 2004, 47, 3463−3482. (11) Hann, M. M.; Leach, A. R.; Harper, G. Molecular Complexity and its Impact on the Probability of Finding Leads for Drug Discovery. J. Chem. Inf. Model. 2001, 41, 856−864. (12) Leach, A. R.; Hann, M. M. Molecular Complexity and Fragment-Based Drug Discovery: Ten Years On. Curr. Opin. Chem. Biol. 2011, 15, 489−496.

(1)

where ΔG is the Gibbs free energy, KB is the Boltzmann constant in kcal/(mol·K), T is the temperature (300 K), Psink is the equilibrium probability of the sink or bound state, Pbulk is 689

DOI: 10.1021/acs.jcim.7b00625 J. Chem. Inf. Model. 2018, 58, 683−691

Article

Journal of Chemical Information and Modeling

(34) Arenzana-Seisdedos, F. SDF-1/CXCL12: A Chemokine in the Life Cycle of HIV. Front. Immunol. 2015, 6, 256. (35) Choi, W.-T.; Yang, Y.; Xu, Y.; An, J. Targeting Chemokine Receptor CXCR4 for Treatment of HIV-1 Infection, Tumor Progression, and Metastasis. Curr. Top. Med. Chem. 2014, 14, 1574− 1589. (36) Veldkamp, C. T.; Ziarek, J. J.; Peterson, F. C.; Chen, Y.; Volkman, B. F. Targeting SDF-1/CXCL12 with a Ligand that Prevents Activation of CXCR4 through Structure-Based Drug Design. J. Am. Chem. Soc. 2010, 132, 7242−7243. (37) Veldkamp, C. T.; Seibert, C.; Peterson, F. C.; Sakmar, T. P.; Volkman, B. F. Recognition of a CXCR4 Sulfotyrosine by the Chemokine Stromal Cell-Derived Factor-1alpha (SDF-1alpha/ CXCL12). J. Mol. Biol. 2006, 359, 1400−1409. (38) Veldkamp, C. T.; Seibert, C.; Peterson, F. C.; Cruz, N. B. D. l.; Haugner, J. C.; Basnet, H.; Sakmar, T. P.; Volkman, B. F. Structural Basis of CXCR4 Sulfotyrosine Recognition by the Chemokine SDF-1/ CXCL12. Sci. Signaling 2008, 1, ra4−ra4. (39) Ziarek, J. J.; Getschman, A. E.; Butler, S. J.; Taleski, D.; Stephens, B.; Kufareva, I.; Handel, T. M.; Payne, R. J.; Volkman, B. F. Sulfopeptide Probes of the CXCR4/CXCL12 Interface Reveal Oligomer-Specific Contacts and Chemokine Allostery. ACS Chem. Biol. 2013, 8, 1955−1963. (40) Ziarek, J. J.; Veldkamp, C. T.; Zhang, F.; Murray, N. J.; Kartz, G. A.; Liang, X.; Su, J.; Baker, J. E.; Linhardt, R. J.; Volkman, B. F. Heparin Oligosaccharides Inhibit Chemokine (CXC Motif) Ligand 12 (CXCL12) Cardioprotection by Binding Orthogonal To The Dimerization Interface, Promoting Oligomerization, and Competing with the Chemokine (CXC Motif) Receptor 4 (CXCR4) N Terminus. J. Biol. Chem. 2013, 288, 737−746. (41) Ziarek, J. J.; Liu, Y.; Smith, E.; Zhang, G.; Peterson, F. C.; Chen, J.; Yu, Y.; Chen, Y.; Volkman, B. F.; Li, R. Fragment-Based Optimization of Small Molecule CXCL12 Inhibitors for Antagonizing the CXCL12/CXCR4 Interaction. Curr. Top. Med. Chem. 2012, 12, 2727−2740. (42) Smith, E. W.; Liu, Y.; Getschman, A. E.; Peterson, F. C.; Ziarek, J. J.; Li, R.; Volkman, B. F.; Chen, Y. Structural Analysis of a Novel Small Molecule Ligand Bound to the CXCL12 Chemokine. J. Med. Chem. 2014, 57, 9693−9699. (43) Smith, E. W.; Nevins, A. M.; Qiao, Z.; Liu, Y.; Getschman, A. E.; Vankayala, S. L.; Kemp, M. T.; Peterson, F. C.; Li, R.; Volkman, B. F.; Chen, Y. Structure-Based Identification of Novel Ligands Targeting Multiple Sites within a Chemokine-G-Protein-Coupled-Receptor Interface. J. Med. Chem. 2016, 59, 4342−4351. (44) Baker, N. A.; Sept, D.; Joseph, S.; Holst, M. J.; McCammon, J. A. Electrostatics of Nanosystems: Application to Microtubules and the Ribosome. Proc. Natl. Acad. Sci. U. S. A. 2001, 98, 10037−10041. (45) Bernini, A.; Henrici De Angelis, L.; Morandi, E.; Spiga, O.; Santucci, A.; Assfalg, M.; Molinari, H.; Pillozzi, S.; Arcangeli, A.; Niccolai, N. Searching for Protein Binding Sites from Molecular Dynamics Simulations and Paramagnetic Fragment-Based NMR Studies. Biochim. Biophys. Acta, Proteins Proteomics 2014, 1844, 561− 566. (46) Murphy, J. W.; Cho, Y.; Sachpatzidis, A.; Fan, C.; Hodsdon, M. E.; Lolis, E. Structural and Functional Basis of CXCL12 (Stromal CellDerived Factor-1 Alpha) Binding to Heparin. J. Biol. Chem. 2007, 282, 10018−10027. (47) Ziarek, J. J.; et al. Structural Basis for Chemokine Recognition by a G Protein-Coupled Receptor and Implications for Receptor Activation. Sci. Signaling 2017, 10, eaah5756. (48) Trott, O.; Olson, A. J. AutoDock Vina: Improving the Speed and Accuracy of Docking with a New Scoring Function, Efficient Optimization and Multithreading. J. Comput. Chem. 2010, 31, 455− 461. (49) Gregori-Puigjané, E.; Mestres, J. SHED: Shannon Entropy Descriptors from Topological Feature Distributions. J. Chem. Inf. Model. 2006, 46, 1615−1622.

(13) Murray, C. W.; Verdonk, M. L.; Rees, D. C. Experiences in Fragment-Based Drug Discovery. Trends Pharmacol. Sci. 2012, 33, 224−232. (14) Kozakov, D.; Hall, D. R.; Jehle, S.; Jehle, S.; Luo, L.; Ochiana, S. O.; Jones, E. V.; Pollastri, M.; Allen, K. N.; Whitty, A.; Vajda, S. Ligand Deconstruction: Why Some Fragment Binding Positions are Conserved and Others are Not. Proc. Natl. Acad. Sci. U. S. A. 2015, 112, E2585−E2594. (15) Mortier, J.; Rakers, C.; Frederick, R.; Wolber, G. Computational Tools for In Silico Fragment-Based Drug Design. Curr. Top. Med. Chem. 2012, 12, 1935−1943. (16) Chen, Y.; Shoichet, B. K. Molecular Docking and Ligand Specificity in Fragment-Based Inhibitor Discovery. Nat. Chem. Biol. 2009, 5, 358−364. (17) Davis, B. J.; Erlanson, D. A. Learning from our Mistakes: the ’Unknown Knowns’ in Fragment Screening. Bioorg. Med. Chem. Lett. 2013, 23, 2844−2852. (18) Fischer, M.; Coleman, R. G.; Fraser, J. S.; Shoichet, B. K. Incorporation of Protein Flexibility and Conformational Energy Penalties in Docking Screens to improve Ligand Discovery. Nat. Chem. 2014, 6, 575−583. (19) Zoete, V.; Grosdidier, A.; Michielin, O. Docking, Virtual High Throughput Screening and In Silico Fragment-Based Drug Design. J. Cell. Mol. Med. 2009, 13, 238−248. (20) Buch, I.; Giorgino, T.; De Fabritiis, G. Complete Reconstruction of an Enzyme-Inhibitor Binding Process by Molecular Dynamics Simulations. Proc. Natl. Acad. Sci. U. S. A. 2011, 108, 10184−10189. (21) Guvench, O.; MacKerell, A. D. Computational Fragment-Based Binding Site Identification by Ligand Competitive Saturation. PLoS Comput. Biol. 2009, 5, e1000435. (22) Faller, C. E.; Raman, E. P.; MacKerell, A. D.; Guvench, O. Site Identification by Ligand Competitive Saturation (SILCS) simulations for fragment-based drug design. Methods Mol. Biol. 2015, 1289, 75−87. (23) Alvarez-Garcia, D.; Barril, X. Molecular Simulations with Solvent Competition Quantify Water Displaceability and Provide Accurate Interaction Maps of Protein Binding Sites. J. Med. Chem. 2014, 57, 8530−8539. (24) Ferruz, N.; Harvey, M. J.; Mestres, J.; De Fabritiis, G. Insights from Fragment Hit Binding Assays by Molecular Simulations. J. Chem. Inf. Model. 2015, 55, 2200−2205. (25) Pan, A. C.; Xu, H.; Palpant, T.; Shaw, D. E. Quantitative Characterization of the Binding and Unbinding of Millimolar Drug Fragments with Molecular Dynamics Simulations. J. Chem. Theory Comput. 2017, 13, 3372−3377. (26) Irwin, J. J.; Sterling, T.; Mysinger, M. M.; Bolstad, E. S.; Coleman, R. G. ZINC: A Free Tool to Discover Chemistry for Biology. J. Chem. Inf. Model. 2012, 52, 1757−1768. (27) Zlotnik, A.; Yoshie, O. The Chemokine Superfamily Revisited. Immunity 2012, 36, 705−716. (28) Karpova, D.; Bonig, H. Concise Review: CXCR4/CXCL12 Signaling in Immature Hematopoiesis-Lessons From Pharmacological and Genetic Models. Stem Cells 2015, 33, 2391−2399. (29) Karin, N. The Multiple Faces of CXCL12 (SDF-1alpha) in the Regulation of Immunity during Health and Disease. J. Leukocyte Biol. 2010, 88, 463−473. (30) Sun, X.; Cheng, G.; Hao, M.; Zheng, J.; Zhou, X.; Zhang, J.; Taichman, R. S.; Pienta, K. J.; Wang, J. CXCL12/CXCR4/CXCR7 chemokine axis and cancer progression. Cancer Metastasis Rev. 2010, 29, 709−722. (31) Domanska, U. M.; Kruizinga, R. C.; Nagengast, W. B.; TimmerBosscha, H.; Huls, G.; de Vries, E. G. E.; Walenkamp, A. M. E. A Review on CXCR4/CXCL12 Axis in Oncology: No Place to Hide. Eur. J. Cancer 2013, 49, 219−230. (32) Chatterjee, S.; Azad, B. B.; Nimmagadda, S. The Intricate Role of CXCR4 in Cancer. Adv. Cancer Res. 2014, 124, 31−82. (33) Patrussi, L.; Baldari, C. T. The CXCL12/CXCR4 Axis as a Therapeutic Target in Cancer and HIV-1 Infection. Curr. Med. Chem. 2011, 18, 497−512. 690

DOI: 10.1021/acs.jcim.7b00625 J. Chem. Inf. Model. 2018, 58, 683−691

Article

Journal of Chemical Information and Modeling (50) Doerr, S.; De Fabritiis, G. On-the-Fly Learning and Sampling of Ligand Binding by High-Throughput Molecular Simulations. J. Chem. Theory Comput. 2014, 10, 2064−2069. (51) Pence, H. E.; Williams, A. ChemSpider: An Online Chemical Information Resource. J. Chem. Educ. 2010, 87, 1123−1124. (52) Martinez-Rosell, G.; Giorgino, T.; Harvey, M. J.; de Fabritiis, G. Drug Discovery and Molecular Dynamics: Methods, Applications and Perspective Beyond the Second Timescale. Curr. Top. Med. Chem. 2017, 17, 2617−2625. (53) Humphrey, W.; Dalke, A.; Schulten, K. VMD: Visual Molecular Dynamics. J. Mol. Graphics 1996, 14, 33−38. (54) Doerr, S.; Harvey, M. J.; Noé, F.; De Fabritiis, G. HTMD: HighThroughput Molecular Dynamics for Molecular Discovery. J. Chem. Theory Comput. 2016, 12, 1845−1852. (55) Bernstein, F. C.; Koetzle, T. F.; Williams, G. J.; Meyer, E. F.; Brice, M. D.; Rodgers, J. R.; Kennard, O.; Shimanouchi, T.; Tasumi, M. The Protein Data Bank: a Computer-Based Archival File for Macromolecular Structures. Arch. Biochem. Biophys. 1978, 185, 584− 591. (56) Berman, H. M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T. N.; Weissig, H.; Shindyalov, I. N.; Bourne, P. E. The Protein Data Bank. Nucleic Acids Res. 2000, 28, 235−242. (57) Harvey, M. J.; Giupponi, G.; Fabritiis, G. D. ACEMD: Accelerating Biomolecular Dynamics in the Microsecond Time Scale. J. Chem. Theory Comput. 2009, 5, 1632−1639. (58) Doerr, S.; Giorgino, T.; Martínez-Rosell, G.; Damas, J. M.; De Fabritiis, G. High-Throughput Automated Preparation and Simulation of Membrane Proteins with HTMD. J. Chem. Theory Comput. 2017, 13, 4003−4011. (59) MacKerell, A. D.; et al. All-Atom Empirical Potential for Molecular Modeling and Dynamics Studies of Proteins. J. Phys. Chem. B 1998, 102, 3586−3616. (60) Piana, S.; Lindorff-Larsen, K.; Shaw, D. E. How Robust Are Protein Folding Simulations With Respect to Force Field Parameterization? Biophys. J. 2011, 100, L47−49. (61) Buch, I.; Harvey, M. J.; Giorgino, T.; Anderson, D. P.; De Fabritiis, G. High-Throughput All-Atom Molecular Dynamics Simulations using Distributed Computing. J. Chem. Inf. Model. 2010, 50, 397−403. (62) Pérez-Hernández, G.; Paul, F.; Giorgino, T.; De Fabritiis, G.; Noé, F. Identification of Slow Molecular Order Parameters for Markov Model Construction. J. Chem. Phys. 2013, 139, 015102. (63) Pedregosa, F.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825−2830. (64) Deuflhard, P.; Weber, M. Robust Perron Cluster Analysis in Conformation Dynamics. Linear Algebra Appl. 2005, 398, 161−184. (65) Huang, L.; Roux, B. Automated Force Field Parameterization for Non-polarizable and Polarizable Atomic Models based on Ab Initio Data. J. Chem. Theory Comput. 2013, 9, 3543−3556. (66) Frisch, M.; et al. Gaussian 09, revision B.01; Gaussian, Inc.: Wallingford, CT, 2009.

691

DOI: 10.1021/acs.jcim.7b00625 J. Chem. Inf. Model. 2018, 58, 683−691