MM Docking with Attracting Cavities - Journal of

Dec 5, 2016 - We developed a hybrid quantum mechanical/molecular mechanical (QM/MM) on-the-fly docking algorithm to address the challenges of treating...
1 downloads 7 Views 2MB Size
Article pubs.acs.org/jcim

On-the-Fly QM/MM Docking with Attracting Cavities Prasad Chaskar, Vincent Zoete,* and Ute F. Röhrig* SIB Swiss Institute of Bioinformatics, Molecular Modeling Group, CH-1015 Lausanne, Switzerland S Supporting Information *

ABSTRACT: We developed a hybrid quantum mechanical/ molecular mechanical (QM/MM) on-the-fly docking algorithm to address the challenges of treating polarization and selected metal interactions in docking. The algorithm is based on our classical docking algorithm Attracting Cavities and relies on the semiempirical self-consistent charge density functional tight-binding (SCC-DFTB) method and the CHARMM force field. We benchmarked the performance of this approach on three very diverse data sets: (1) the Astex Diverse set of 85 common noncovalent drug/target complexes formed both by hydrophobic and electrostatic interactions; (2) a zinc metalloprotein data set of 281 complexes, where polarization is strong and ligand/protein interactions are dominated by electrostatic interactions; and (3) a heme protein data set of 72 complexes, where ligand/protein interactions are dominated by covalent ligand/iron binding. Redocking performance of the on-the-fly QM/MM docking algorithm was compared to the performance of classical Attracting Cavities, AutoDock, AutoDock Vina, and GOLD. The results demonstrate that the QM/MM code preserves the high accuracy of most classical scores on the Astex Diverse set, while it yields significant improvements on both sets of metalloproteins at moderate computational cost.

1. INTRODUCTION Molecular docking has become a routine application in drug design projects when structural information about the target is available.1 Its task is to predict the binding mode of a ligand within the binding site of a macromolecule, usually an enzyme or a receptor. A docking algorithm consists of a conformational sampling engine to generate putative ligand poses and a scoring function to rank those poses. Ideally, the scoring function should be computationally inexpensive so that a large number of ligand poses can be treated, and it should be able to identify the native binding mode as its global minimum. Today, as computer performance continues to grow exponentially, there is an increasing interest in high-accuracy docking algorithms. The description of protein flexibility, solvation, covalent interactions, polarization, and charge transfer are main challenges for docking.2 The latter three are quantum chemical in nature, and including information from quantum mechanical (QM) calculations in scoring functions has been shown to increase docking accuracy.3−5 Since it would be computationally inefficient to describe large biological systems entirely with a traditional QM method, hybrid quantum mechanical/ molecular mechanical (QM/MM) methods and linear-scaling semiempirical QM methods are gaining importance.6−14 In the QM/MM approach,15 only the region of interest is described with a QM method, while its environment is described classically. A docking algorithm should rely on a QM method that provides a good compromise between accuracy and efficiency, allowing to score a large number of ligand binding poses. Semiempirical approaches are the methods of choice for © 2016 American Chemical Society

docking applications. Inclusion of dispersion corrections can further enhance their accuracy. Docking algorithms can profit from QM calculations at different levels, for example to improve the force field employed in the scoring function for a specific case. If the interaction between the ligand and the receptor is mainly of an electrostatic nature, the ligand and or the active site charges can be determined by QM or QM/MM calculations and used for subsequent docking. This approach is used for example in the “QM-polarized ligand docking” (QPLD) approach in Glide/ QSite,3 where the ligand charges used for classical docking are obtained from QM/MM calculations. If the ligand−protein interactions are not only of electrostatic but also of a covalent nature, such as in the case of hemeproteins, we have shown that a Morse-like metal binding potential (MMBP) fitted to QM calculations yields increased success rates in docking.16 This approach has a relatively low computational cost, but transferability and accuracy are important issues. Alternatively, the complex can be described in a QM/MM framework during the docking run, so that the charge distribution and covalent binding contributionsthat depend on the relative position of the ligand and macromoleculeare determined on-the-fly for each conformation of the complex during the docking. Here, we explore this solution, which is computationally more demanding and has not yet been fully exploited. Received: July 14, 2016 Published: December 5, 2016 73

DOI: 10.1021/acs.jcim.6b00406 J. Chem. Inf. Model. 2017, 57, 73−84

Article

Journal of Chemical Information and Modeling

Figure 1. Flowchart of QM/MM docking with Attracting Cavities. (A) QM/MM Rescoring approach.17 (B) On-the-fly QM/MM Docking Scheme 1: pose selection for the QM/MM input ensemble is based on full classical score. (C) On-the-fly QM/MM Docking Scheme 2: calculation of solvation energy is skipped for pose selection.

widely used docking programs, only GOLD has been parametrized and validated for docking to heme proteins.27 We have previously introduced MMBP, which were fitted to reproduce density functional theory calculations, to successfully describe heme-bound ligand complexes during docking.16 In the present study, we restrict ourselves to the study of proteins containing heme B, because they are the most abundant in nature and commonly have one free iron coordination site available for ligand binding. Since currently no SCC-DFTB parameters for simultaneous treatment of sulfur and iron are available, we were constrained to exclude cysteine-bound heme complexes from our test set. Since a high success rate for hemebinding ligands could artificially be obtained by overemphasizing ligand−iron interactions, we did not only consider complexes with a ligand directly coordinated to the iron ion (heme-binding data set), but also complexes with a ligand not in direct contact with the iron ion, although a free coordination site exists (heme-nonbinding data set). In the on-the-fly QM/MM implementation (Figure 1B,C), we generate a classical ensemble of poses using Attracting Cavities.21 Filtering and clustering of this ensemble yields the QM/MM input ensemble, which is subject to SCC-DFTB/ CHARMM-based QM/MM minimization to generate the QM/MM output ensemble. The score of each pose is computed using the FACTS solvation model22 and QM/MM charges, and docking results are obtained after a final filtering and clustering step. On the Astex data set of common noncovalent ligand/ protein complexes without special interactions, the on-the-fly QM/MM docking success rate using recommended parameters remained close to the classical success rate (76.5% vs 80.0%). We found an improved success rate for the zinc-binding data set using the QM/MM algorithm (69.9% vs 57.5%). For the heme binding data set, the improvement due to the QM/MM treatment was even more striking (success rate of 75.0% vs 37.5%) than in the zinc metalloproteins. These results demonstrate that our QM/MM approach provides results on par with the classical force-field approach for the common protein/ligand complexes present in the Astex Diverse data set.

We previously developed a semiempirical QM/MM scoring function17 based on the SCC-DFTB/CHARMM interface,18 which can easily be integrated into our CHARMM19-based docking algorithms EADock DSS20 and Attracting Cavities.21 We benchmarked this scoring function on the description of zinc metalloproteins using a rescoring approach (Figure 1A) and obtained a significantly improved success rate for zincbinding ligands (65.5% on docking ensembles, 77.0% when including the native pose in the ensembles). The description of correct zinc-binding geometries also improved substantially with the hybrid classical/quantum scoring. The use of a QM description of the active site caused a major charge redistribution, resulting in a much reduced positive charge on the zinc ion in agreement with earlier studies.5 Inclusion of the coordinating protein side chains in the QM subsystem proved to be important in describing this charge redistribution and significantly improved the success rate for the nonbinding test set. Application of the fast analytical treatment of solvation (FACTS) model22 improved docking results as compared to vacuum calculations. Here, we implement this scoring function into our in-house docking code Attracting Cavities21 to perform on-the-fly QM/ MM docking (Figure 1B,C). We test the implementation on the Astex Diverse data set,23 our previously published zinc data set,17 and on a newly developed challenging test set of 72 heme proteins. The manually curated high-quality protein−ligand complexes of the zinc and heme data sets are freely available online24,25 in CHARMM format ready to be used for docking with the SwissDock web server (http://www.swissdock.ch/).26 Heme proteins include many drug targets and carry out diverse biological functions such as oxygen transport (e.g., hemoglobin, myoglobin, neuroglobin, cytoglobin, leghemoglobin), enzyme catalysis (e.g., cytochrome P450, peroxidases, cytochrome c oxidase, ligninase, catalase, indoleamine 2,3-dioxygenase), active membrane transport (cytochromes), electron transfer (cytochrome c), and sensory functions (FixL, sGC, CooA). In most benchmarking data sets for in silico studies, iron-bound heme−ligand complexes were removed as covalently bonded systems that cannot be treated with docking algorithms. Among 74

DOI: 10.1021/acs.jcim.6b00406 J. Chem. Inf. Model. 2017, 57, 73−84

Article

Journal of Chemical Information and Modeling

obtain a sufficient number of test cases, the crystal structure resolution threshold was set to 2.9 Å, and no DPI cutoff was used. Because of the lack of SCC-DFTB parameters for iron in combination with sulfur and halogens, all complexes with iron− sulfur or iron−halogen interactions had to be excluded from the data set, effectively eliminating all hemeproteins with an iron− cysteine bond. We also removed lactoperoxidase structures, because most of these were not described in the scientific literature and had bad wwPDB X-ray validation reports.37 The complexes were divided into a heme-binding (Fe-ligand separation 2.0−3.0 Å, 32 cases) and a nonbinding set (4.0− 10.0 Å, 40 cases). Relaxation of the filters led to the inclusion of eight additional complexes in the heme-binding subset. Setups for the ligand−protein complexes were done using the same procedure as previously described.17 In brief, coordinates were obtained from the RCSB Protein Data Bank.38 We extracted ligand coordinates from these files, added hydrogen atoms with UCSF Chimera (http://www.cgl.ucsf. edu/chimera),39 and carefully adjusted bond orders and protonation states by visual inspection. All water molecules and heteroatoms were removed, except for the ligand and monatomic ions. In the employed CHARMM22 force field,30 the heme iron carries a charge of +0.24. The curated structures with their CHARMM force-field parameters are freely available.25 2.2. Classical Docking with AutoDock 4.2 and AutoDock Vina. We performed classical docking with AutoDock 4.240,41 and AutoDock Vina,42 using the same parameters as in our previous benchmark study.17 The search space was defined as a cube of edge length 20 Å centered on the ligand for the binding data sets or on the geometric center between the metal ion and the ligand for the nonbinding data sets. AutoDock Tools scripts40 with default settings were used to convert the PDB files of the receptors and the MOL2 files of the ligands into PDBQT format. For AutoDock, the charge on Fe was set to 0 and on Zn to +2, and the grid point spacing was set to 0.375 Å. Docking was performed with 200 independent runs of the Lamarckian Genetic Algorithm, using a maximum of 25 × 106 energy evaluations, a maximum number of 27 × 103 generations, a gene mutation rate of 0.02, and a crossover rate of 0.8. For AutoDock Vina, the exhaustiveness was set to 500 (default 8), and the maximum number of binding modes in the output was set to 20 (default 9). 2.3. Classical Docking with GOLD. For classical docking with Chemscore/GOLD,43 the radius of the spherical search space (12.4 Å) was chosen such as to provide the same volume as a cubic box with an edge length of 20 Å. The origin of the search space was defined as for AutoDock. Ten docking runs were performed, and the solution with the best score was retained. All other parameters were set to their default value in GOLD 5.4.1. For the heme data sets, we used the hemebinding parameters for Chemscore based on PDB-derived metal parameters (chemscore.p450_pdb.params).27 2.4. Classical Docking with Attracting Cavities. In Attracting Cavities,21 the search space was defined the same way as in AutoDock and AutoDock Vina. We used an NThr value of 60 and a rotation step of 60° for the initial ligand positioning procedure. Although the present version of Attracting Cavities does not include any explicit sampling of dihedral angles, it has been shown to yield very good redocking sucess rates also when starting from randomized ligand conformations. The scoring function of Attracting Cavities (ScoreAC) relies on the free energy estimate calculated

For metalloproteins, which pose a severe challenge to classical docking codes due to strong polarization or the formation of bonds of covalent nature, the QM/MM docking yields substantially improved results.

2. METHODS 2.1. Data Sets. We benchmarked the performance of our on-the-fly QM/MM approach on three very diverse data sets, namely the widely used Astex Diverse data set (85 cases),23 our previously developed zinc data set (281 cases),17,24 and our newly developed hemeprotein data set (72 cases).25 Ligand force-field parameters were derived based on the Merck Molecular Force Field28 using the SwissParam algorithm,29 which has been parametrized in combination with the all-atom CHARMM22 force field30 for the protein. 2.1.1. Astex Diverse Data Set. The Astex Diverse set is a high-quality set of 85 structures developed for the validation of protein−ligand docking performance.23 Protein−ligand complexes in the Astex data set are highly diverse as they do not over-represent any particular enzyme class, ligand functionality, or specific interactions like ligand−metal interactions. Approximately 35% of its complexes are proven or potential drug targets, and ligands were selected for their druglikeness.23 This data set has recently been used for benchmarking many docking algorithms.21,31−33 Here, we used the same manually curated structures and force-field parameters as in the classical Attracting Cavities benchmarking study,21 including the randomized ligand conformations generated by Open Babel34 starting from the SMILES of the ligands. The configurations of asymmetrical carbon atoms and double bonds were visually checked and corrected when necessary to preserve the ones found in the X-ray structures. The Astex Diverse set contains one complex, in which the ligand is in direct contact with a heme iron, and 11 complexes, in which the ligand is in direct contact with a zinc ion. 2.1.2. Zinc Data Set. In our previous study, we developed a high-quality database of protein−ligand complexes containing zinc metal ions for benchmarking the QM/MM scoring function.17 Complexes were selected on the basis of X-ray crystal structure resolution (