Efficient Approximation of Ligand Rotational and Translational

As to the validation, to the best of our knowledge, estimates of the reduction in translational and rotational freedoms of the ligand upon protein–l...
0 downloads 7 Views 4MB Size
Article pubs.acs.org/jcim

Efficient Approximation of Ligand Rotational and Translational Entropy Changes upon Binding for Use in MM-PBSA Calculations Ido Y. Ben-Shalom,† Stefania Pfeiffer-Marek,‡ Karl-Heinz Baringhaus,§ and Holger Gohlke*,† †

Institute for Pharmaceutical and Medicinal Chemistry, Department of Mathematics and Natural Sciences, Heinrich Heine University Düsseldorf, 40225 Düsseldorf, Germany ‡ LGCR/Pharmaceutical Sciences Operations, Sanofi-Aventis Deutschland GmbH, Industriepark Höchst, 65926 Frankfurt am Main, Germany § R&D Resources/Site Direction, Sanofi-Aventis Deutschland GmbH, Industriepark Höchst, 65926 Frankfurt am Main, Germany S Supporting Information *

ABSTRACT: A major uncertainty in binding free energy estimates for protein−ligand complexes by methods such as MM-PB(GB)SA or docking scores results from neglecting or approximating changes in the configurational entropies (ΔSconfig.) of the solutes. In MM/PB(GB)SA-type calculations, ΔSconfig. has usually been estimated in the rigid rotor, harmonic oscillator approximation. Here, we present the development of a computationally efficient method (termed BEERT) to approximate ΔSconfig. in terms of the reduction in translational and rotational freedom of the ligand upon protein−ligand binding (ΔSR/T), starting from the flexible molecule approach. We test the method successfully in binding affinity computations in connection with MM-PBSA effective energies describing changes in gas-phase interactions and solvation free energies. Compared to related work by Ruvinsky and co-workers, clustering bound ligand poses based on interactions with the protein rather than structural similarity of the poses, and an appropriate averaging over single entropies associated with an individual well of the energy landscape of the protein−ligand complex, were found to be crucial. Employing three data sets of protein−ligand complexes of pharmacologically relevant targets for validation, with up to 20, in part related ligands per data set, spanning binding free energies over a range of ≤7 kcal mol−1, reliable and predictive linear models to estimate binding affinities are obtained in all three cases (R2 = 0.54−0.72, p < 0.001, root mean squared error S = 0.78−1.44 kcal mol−1; q2 = 0.34−0.67, p < 0.05, root mean squared error sPRESS = 1.07−1.36 kcal mol−1). These models are markedly improved compared to considering MM-PBSA effective energies alone, scoring functions, and combinations with ΔSconfig. estimates based on the number of rotatable bonds, rigid rotor, harmonic oscillator approximation, or interaction entropy method. As a limitation, our method currently requires a target-specific training data set to identify appropriate scaling coefficients for the MM-PBSA effective energies and BEERT ΔSR/T. Still, our results suggest that the approach is a valuable, computationally more efficient complement to existing rigorous methods for estimating changes in binding free energy across structurally (weakly) related series of ligands binding to one target.



INTRODUCTION While in late stages of drug development rigorous free energy methods such as thermodynamic integration1 or free energy perturbation2 are applied, computationally more efficient (free) energy estimations such as MM-PBSA,3 MM-GBSA,4,5 or docking scores6 are preferably used in the early stages to rank potential ligands in the course of virtual screening.7−9 The increased computational efficiency of the latter methods results from several approximations, including the use of implicit solvent models,3−5,10 neglecting conformational changes of the receptor upon binding,11−13 (free) energy estimates considering a single protein−ligand configuration only,6,14−16 and/or neglecting or approximating contributions due to changes in the configurational entropies (ΔSconfig.) of the solutes.17,18 As to the latter contributions, in MM/PB(GB)SA-type calculations, they are usually estimated in the rigid rotor, harmonic oscillator (RRHO) approximation.19 This approximation allows © 2016 American Chemical Society

one to treat translational, rotational, and vibrational motions as uncoupled and, hence, to compute translational, rotational, and vibrational contributions to ΔSconfig. for each species in the binding equilibrium by gas-phase statistical mechanics.20 In particular, changes in the vibrational entropy (ΔSvib.) upon complex formation are estimated by normal-mode analysis (NMA).21 However, estimating ΔSvib. that way probably leads to systematic errors because anharmonic contributions are neglected.22,23 In addition, variations in ΔSvib. upon complexation of ∼5 kcal mol−1 have been observed for individually minimized structures from the same trajectory.24 Finally, the large computational burden precludes applying such calculations to more than a few protein−ligand complexes in general.25 As a result, alternative methods for computing ΔSvib. have been Received: June 22, 2016 Published: December 20, 2016 170

DOI: 10.1021/acs.jcim.6b00373 J. Chem. Inf. Model. 2017, 57, 170−189

Article

Journal of Chemical Information and Modeling applied including the use of quasi-harmonic analysis23 or variations of the NMA approach.26,27 Yet, even with these efforts determining ΔSconfig. remains challenging due to, e.g., issues of convergence11 or the question how to appropriately treat the mass dependence of the single entropy contributions in the RRHO approximation.20 In scoring functions for docking, entropy contributions are frequently considered implicitly for changes in solvent degrees of freedom in terms of estimates of (de)solvation contributions to the binding affinity.28,29 Not considering ΔSconfig., however, may give rise to a marked dependence of computed scores on the ligand size.30,31 One way to counteract this dependence is to estimate the reduction in the number of low-energy wells accessible to a ligand on binding, i.e., the change in the conformational entropy of the ligand ΔSconform., from the number of rotatable bonds in the molecule.14,32 However, apart from the fact that varying ΔSconform. contributions per bond ranging between 0.2 kcal mol−1 33 and 1.4 kcal mol−1 34 have been suggested, explicitly evaluating systems’ predominant low-energy conformations for 13 binding reactions revealed that computed drops in configurational entropy are not primarily due to drops in the number of energy wells.35,36 In contrast, ΔSconfig. primarily arose from the narrowing of energy wells, including the reduction in translational and rotational freedom upon complex formation from two free molecules.35 The above provided the incentive for us to develop a computationally efficient method to approximate ΔSconfig. upon protein− ligand binding by estimating the reduction in translational and rotational freedom of the ligand (ΔSR/T) and to test it in binding affinity computations with MM-PBSA. Rigorous estimates of the effect of the partial loss of these degrees of freedom have been performed by Hermans and Wang37 employing free energy perturbation techniques for benzene binding to a lysozyme-T4 mutant. Gilson and co-workers suggested a statistical thermodynamics method combining MD simulations with double annihilation38 for calculating the free energy components of protein−ligand complexes.39 Gilson and co-workers also applied the second generation form of the Mining Minima algorithm40,41 to implicitly estimate the contribution due to narrowing energy wells on the binding affinity of host−guest systems.35,42 For a single protein−ligand complex, they provided a decomposition of ΔSconfig. using the same algorithm.36 Fogolari et al. used MD simulations together with the nearest neighbor method for estimating the configurational entropy of transthyretin dimerization.43 In the context of MM-PBSA, Swanson et al. suggested a novel method for calculating the association free energy arising from a molecule’s loss of translational and rotational freedom based on the quasi-harmonic model and the use of quaternions and applied it to a small molecule/FK506 binding protein complex.44 Also in the context of MM-PBSA, Saini et al. analyzed configurational entropy changes in the course of antibiotics binding to the ribosome using MD simulations together with quasi-harmonic analysis.45 MD simulations were also used by Carlsson and Aqvist to estimate the reduction in translational and rotational freedom for a set of rather rigid molecular fragments binding to a lysozyme-T4 mutant.46 Lu et al. used a harmonic model to calculate the bound rotational and translational freedom of a peptide inhibitor, ATP, and water molecules bound to a protein kinase. In their model, they considered both the protein and the ligands (or water molecules) as rigid and computed only the translational and rotational degrees of freedom.47 Finkelstein and Janin used X-ray scattering to estimate the reduction in translational and rotational freedom for a series of polycyclic

aromatic hydrocarbons bound to lysozyme.48 Ruvinsky presented an estimate of ΔSR/T based on an approximation of the configurational integral through the sizes of clusters of docked ligand poses.49,50 Murray and Verdonk used the concept of multiple binding sites of fragments in order to calculate the rotational and translation entropy change of the fragments as rigid bodies.51 Thus, while estimating the reduction in translational and rotational freedom of a ligand upon binding has gained much attention, none of these methods was evaluated on a series of protein−ligand complexes as it appears in a lead optimization process. Here, we present BEERT (Binding Entropy Estimation for Rotation and Translation), an approach for efficiently approximating ΔSR/T based on the difference between the bound and unbound rotational and translational volumes of a ligand. To validate BEERT, we fit a linear combination of ΔSR/T and MMPBSA effective energies to experimental binding free energies for data sets of HIV-1 protease, FXa, and Hsp90 inhibitors. For all data sets, the obtained regressions are significant and good (R2 = 0.54−0.72, p < 0.001) with leave-one-out validation results of q2 = 0.34−0.67, p < 0.05, and markedly improved compared to using MM-PBSA effective energies alone.



THEORY Formulation Starting from the Flexible Molecule Approach. Our approach to estimate translational and rotational entropy change upon ligand binding to a protein starts from the formulation of the molecular partition function in what has been termed the flexible molecule (FM) approach,20 which distinguishes itself from the RRHO approximation19,52 in that it does not depend on a rigid rotor approximation and, hence, is better suited for flexible molecules. The FM approach relies on classical statistical mechanics, which allows us to omit kinetic energy contributions to the partition function. Furthermore, as applies to the RRHO approximation, in the FM approach in the classical limit, the free energy or entropy of binding becomes independent of mass.20 In the FM approach, the molecular partition function is given by the configurational integral over the n spatial coordinates, which can be separated into three coordinates of translation, three coordinates of rotation, and (3n − 6) internal coordinates (eq 1). (Note that the RRHO approximation also allows one to approximate the translational, rotational, and vibrational motions as uncoupled.20) ⎛ 8π 2 ⎞ Q = QTQ R Q in = V ⎜ ⎟Q ⎝ σext ⎠ in

(1)

Overall translation contributes a factor of V, overall rotation a factor of 8π2/σext, with σext being the symmetry number for external symmetry operations that leave internal molecular coordinates unchanged, and Q in is the internal contribution.20,53 ∂ Considering S = ∂T (kBT ln Q )53 and eq 1, the complete configurational entropy can be decomposed into20 Sconfig. = ST + SR + Sin −

1 2

∑ i , j ∈ {T,R,in}; i ≠ j

Ii , j + IT,R,in (2)



where Si = ∂T (kBT ln Q i) with i = {T, R, in} is the entropy associated with the {translational, rotational, internal} coordinates and Ii,j, IT,R,in are the mutual information terms that account for 171

DOI: 10.1021/acs.jcim.6b00373 J. Chem. Inf. Model. 2017, 57, 170−189

Article

Journal of Chemical Information and Modeling

in the width of its energy wells. Finally, as to the ligand,36,62 this results in

the degree of correlation between two of the three or all three coordinates, respectively.17,54,55 In the FM approach, these terms are zero, however, when they involve overall translation and rotation.20 For our purposes, it is convenient for the protein−ligand complex C to further separate the 3n − 6 internal coordinates (in) into six relative translational (T′) and rotational (R′) coordinates of the ligand L with respect to the protein P, and the remaining 3n − 12 internal coordinates (in′)20

ΔSR/T = (STC′ − STL) + (SRC′ − SRL)

Approximation of the Change in Translational Entropy. The term within the first brackets in eq 5 can be evaluated as29,63 ⎛ V ⎞ STC′ − STL = ΔST = kB ln⎜ bound ⎟ ⎝ Vunbound ⎠

C Sconfig. = STC + SRC + STC′ + SRC′ + SinC′



1 2



(3)

which now incurs likely finite second- and third-order mutual information terms I. As a first approximation, we neglect these terms as they are computationally expensive to evaluate.17,55 However, it has been suggested that these terms can be similar in magnitude to the individual entropy terms, particularly if internal coordinates are involved:36,56 For binding of a tetrapeptide to the protein Tsg101, pairwise mutual information terms accounting for correlations between pairs of conformational variables (bonds, angles, tosions) yielded entropy penalties from 3 to 6 kcal mol−1 for correlations within the peptide or Tsg101, and of 30 kcal mol−1 for correlations of Tsg101 with the peptide. Correlations among the translational and rotational coordinates yielded a penalty of ∼1 kcal mol−1, and those between the translational, rotational, and internal coordinates of the tetrapeptide yielded a penalty of 2.6 kcal mol−1.56 The change in configurational entropy upon formation of the protein−ligand complex from the two independently moving binding partners then is

Vbound = [max(X ) − min(X )] × [max(Y ) − min(Y )] × [max(Z) − min(Z)]

(7)

with min{X, Y, Z} and max{X, Y, Z} being the respective minimal and maximal positions of the ligand’s center of mass along the Cartesian axes (Figure 1C). This procedure has been used previously36,49,65−67 and makes use of the fourth approximation that the ligand in the bound state resides in a square well potential of mean force. The approximation generally overestimates SCT′ compared to binding to a likely more realistic harmonic potential.68 Furthermore, by using Cartesian axes as a reference frame rather than the principal components of the center of mass motion, we may neglect coupled motions in different dimensions,44 which may also contribute to an overestimate of SCT′. Approximation of the Change in Rotational Entropy. By analogy, we compute the term within the second brackets in eq 5, again for each cluster separately, as29,63

ΔSconfig. = STC + SRC + STC′ + SRC′ + SinC′ − (STP + SRP + SinP + STL + SRL + SinL)

(6)

with Vunbound = 1660 Å3 as found by integrating over the translational volume of a ligand in solution at a standard concentration of 1 M and kB being the Boltzmann constant.39,64 Here, Vbound is the effective translational volume accessible to the ligand after binding.20 We determine the configurational space of the bound ligand by docking (Figure 1A). Similar poses are clustered together assuming that a cluster represents a minimum in the energy landscape (Figure 1B). It is important to note that a change in translational entropy defined that way in general depends on how the relative translational coordinates are defined for the complex.39 Treating the protein as rigid as done here (see below) eliminates one source of ambiguity arising from motions of some of the protein atoms used to define the protein’s reference point relative to the reference point of the ligand.20 Here, we compute Vbound separately for each cluster from

Ii , j + IT ′ ,R ′ ,in ′

i , j ∈ {T ′ ,R ′ ,in ′}; i ≠ j

(5)

(4)

Consider that SCT and SPT are identical because the standard volume V (eq 1) applies to both species and that SCR and SPR will likewise cancel if σext of both species are identical (which holds for asymmetric proteins, and which we apply as a second approximation if the protein has rotational symmetry but the binding ligand is asymmetric). As a third approximation, and in deviation from the FM approach, we neglect contributions by Sin and Sin′ for all three species. We do so assuming that considering relative translational and rotational motions (T′, R′) between protein and ligand in the complex (“librational motions”) captures a major contribution to ΔSin. Along these lines, contributions due to restrictions of the conformational space of the binding partners, related to drops in the number of energy wells accessed before and after complex formation, have previously been shown not to be the primary source of ΔSconfig.;35,36 for the ligand, this may be due to conformational preorganization already in the unbound state57,58 and for the protein due to a restricted rotamer space of side chains located in the concave surface of the binding pocket.59,60 We note, however, that initial results from NMR relaxation experiments revealed a large and variable role for conformational entropy in the binding of ligands by proteins.61 Not considering changes in the width of energy wells upon complex formation appears more severe.35,36 Yet, for reasons of computational efficiency, we are going to treat the protein as rigid for evaluating ΔSR/T (see below), excluding per se the possibility to compute changes

⎛ Ω ⎞ SRC′ − SRL = ΔSR = kB ln⎜ bound ⎟ ⎝ Ω unbound ⎠

(8)

with Ωunbound = 8π /σext and Ωbound being the effective rotational volume accessible to the ligand after binding.49,65,69 Following Ruvinsky,65 we approximate Ωbound as 2

49,64,65

Ω bound = [−cos(max(θ )) + cos(min(θ ))] × [max(ψ ) − min(ψ )] × [max(ϕ) − min(ϕ)] (9)

For determining the Euler angles θ, ϕ, and ψ, we treat the ligand as a rigid body. We center the ligand, determine its principal axes and the corresponding eigenvalues, order the eigenvalues according to their magnitude, and calculate the rotation matrix.70 The quaternion is then computed from the rotation matrix according to ref 71 and from it ϕ, ψ, and θ.72 172

DOI: 10.1021/acs.jcim.6b00373 J. Chem. Inf. Model. 2017, 57, 170−189

Article

Journal of Chemical Information and Modeling

Figure 1. Procedure to approximate ΔSR/T (eq 14). (A) Sampling ligand binding poses by docking. (B) Different clusters represent energy wells on the binding (free) energy landscape. (C) For each ligand pose, the center of mass is computed (marked as a green point). For each cluster, the effective translational volume is calculated from the encapsulated volume of the centers of mass of all ligand poses in the cluster (eq 7). (D) For each cluster, the effective rotational volume is calculated from the principal axes of the ligand poses (eq 9).

The values of ϕ and ψ cover the range modulo 2π, those of θ the range π. The cosine function for the angle θ results from the integration over sin(θ) in the rotational partition function as described in eq 20 in ref 50 and ref 73. Multiple Energy Wells in the Bound State. In the bound state, multiple binding modes of a ligand can sometimes be observed, particularly for weakly binding ligands,28,74,75 reflecting an energy surface with energetically similar energy wells. Previous work showed36,76 that the entropy across these wells is the weighted average of the entropies Si associated with an individual well plus an entropy associated with the distribution of the system across the energy wells {i} (“mixing entropy”).20 S=

− kB ∑ pi ln pi ∑ pS i i i

i

and in the second, all n energy wells get the same weight (eq 13)

pi =

e−Ei / kBT ∑j e−Ej / kBT

ΔSR/T = kB ∑ pi × (ΔSR, i + ΔST, i) i

(10)

Ni ∑j Nj

(14)

Note that this way of averaging entropies Si associated with an individual well distinguishes our work from that of Ruvisnky et al.50,77 Sampling of Energy Wells in the Bound State. According to the predominant states approximation introduced by Gilson,78 the largest contributions to the configurational integral are found in or near energy minima.20 We thus approximate eq 10 for the bound state by considering a finite, and usually small, number of well-defined energy wells (see below for how such energy wells are identified). Following a suggestion of Ruvinsky et al.,49,65 we use a global optimization technique combined with a local energy minimization as implemented in the Lamarckian genetic algorithm of AutoDock 3.0515 for generating bound ligand configurations located in energy wells (see the Molecular Docking section in Materials and Methods for details). Note that for smooth energy landscapes or systems with many degrees of freedom, sampling the energy wells will be

(11)

where Ei is the energy of the lowest (best) scored pose in well i and T = 298 K. As pi computed according to eq 11 depends on the accuracy of the docking energy in our approach, which may be limited, we tested two alternatives. In the first, pi is computed from the number of poses Ni in energy well i (eq 12)

pi =

(13)

As we do not consider multiple energy wells, i.e., conformations, for a ligand in the free state, we approximate eq 10 by omitting the “mixing entropy” for the bound state as well. See above with respect to the influence of the drop in the number of energy wells on ΔSconfig.. Considering eqs 6−10, finally, this results in the expression for approximating ΔSR/T used in this work

Here, pi is the probability of finding the system in energy well i and is computed from all bound ligand configurations2 in a well according to eq 1120 pi =

1 n

(12) 173

DOI: 10.1021/acs.jcim.6b00373 J. Chem. Inf. Model. 2017, 57, 170−189

Article

Journal of Chemical Information and Modeling

these residues were assigned such that one of the two aspartic acids is monoprotonated with a proton placed on the oxygen in position OD2 of the side chain (see Table S1 in the Supporting Information (SI) for details).98,99 The protonation states of the ligands were determined using the Epik program.100 Molecular Docking. All docking runs were performed following established procedures.96 We used AutoDock 3.0515 with DrugScore pair potentials101 as an objective function, which has been used successfully previously for generating good docking solutions.102−104 The docking protocol for flexible ligand docking considered 100 independent runs per ligand using an initial population size of 100 individuals, 5.0 × 103 generations, a maximum number of 10.0 × 106 energy evaluations, a mutation rate of 0.02, a crossover rate of 0.8, and an elitism value of 1. The spacing of the precomputed potential grids was set to the default value of 0.375 Å. In order to probe for the convergence of the ΔSR/T approximation, in addition, for each complex, ensembles with 500 and 1000 docking poses were generated. Clustering of Ligand Binding Poses. Subsets of the docked ligand configurations belonging to a well of the energy landscape of the bound state were, first, identified by RMSDbased clustering as implemented in AutoDock 3.05,15 using RMSD thresholds of 1 Å. The RMSD-based clustering considers internal symmetries (e.g., in the case of phenyl substituents).15 Second, we developed a clustering of bound ligand configurations based on the interaction pattern between the ligand and the protein, which was inspired by previous studies.84,85 For this, we identify all pairs of ligand and protein heavy atoms that are closer than a cutoff distance (dcut) for each docking pose (Figure 2) and generate a union set of all protein atoms. We assign “1” only for the pairs of ligand and protein heavy atoms with d < dcut and arrange them in a matrix such that the value of a matrix element becomes 1 if the actual distance is closer than dcut and 0 otherwise. All such matrices contain the same protein and ligand atoms for one protein−ligand complex in the same order in rows i and columns j, respectively. The number of matrix elements (i, j) that are 1 in both matrices then defines the similarity between two docking poses. The poses are clustered according to their similarity by hierarchical clustering105,106 as implemented in R.107 Identifying pairs of ligand and protein atoms in the first step of this approach is efficiently performed making use of a cell data structure108 ubiquitously applied in MD simulations.109,110 For this, the space of the protein is partitioned into cubes {c} of edge length dcut. For a given ligand atom in a cube c, only protein atoms within this cube or within the neighboring 26 cubes are considered for distance calculations. Estimating Binding Affinities by DrugScore Scoring. For comparison, relative binding affinities are estimated using the DrugScore pair potentials for the docked protein−ligand complex. DrugScore is a knowledge-based scoring function and has been previously used for estimating binding affinities on protein−ligand complexes.30,101,102,111,112 DrugScore potentials encode distance-dependent interaction energies between ligand and protein atoms derived from statistical preferences and implicitly include solvation contributions.101 Changes in ΔSR/T are not considered in DrugScore. Estimating Binding Affinities by Surflex Scoring. For comparison, we also used Surflex as an external scoring function developed by Jain and co-workers to estimate relative binding affinities of docked protein−ligand complexes,113 which performed very well in an external evaluation.114 We used Surflex

slow to converge such that non-negligible contributions to the overall ΔSR/T may be missed.41,78,79 A correction has been devised for that case by Gilson and co-workers.41,78 We feel it safe to assume, however, that the energy landscape of the bound state is dominated by a small number of low-energy states only80−82 such that we do not need to consider such a correction. Proper identification of which of the generated bound ligand configurations belong to one energy well is important for appropriately estimating residual translational and rotational mobility according to eqs 6 and 8. An obvious and widely used criterion is to cluster the ligand configurations based on the root mean square deviation (RMSD) of their coordinates.15,49,65,83 We tested this criterion, too (see Clustering of Ligand Binding Poses section in Materials and Methods for details). However, as this criterion can lead to essentially identical binding modes being sorted into differential clusters due to different conformations of ligand parts that remain solvated, in addition, we devised and tested an interaction-based clustering inspired by interaction fingerprints introduced in previous studies.84,85 See the Clustering of Ligand Binding Poses section in Materials and Methods for details. This way of clustering bound ligand configurations for approximating ΔSR/T distinguishes our work from that of Ruvisnky et al. in that only an RMSDbased clustering was used there.49,65 The overall workflow yielding the ΔSR/T approximation (eq 14) (Figure 1) has been termed BEERT (Binding Entropy Estimation for (changes in) Rotation and Translation).



MATERIALS AND METHODS Protein−Ligand Complexes Used for Validation. The BEERT workflow was evaluated on three data sets of protein− ligand complexes of pharmacologically relevant targets, HIV-1 protease, the trypsin-like serine protease, factor Xa (FXa), and the ATP-dependent molecular chaperone, heat shock protein 90 (Hsp90). The structures were retrieved from the Protein Data Bank (PDB),86 using only complexes that contain a wildtype protein and an inhibitor for which experimental binding affinity information is available. We chose data sets with at least 15 crystal structures of the protein with different ligands. For the HIV-1 protease and FXa data sets, we obtained information about the experimental binding free energy from ΔG bind = RT ln K i

(15)

For the Hsp90 data set, we used pIC50 values instead. As these experimental data were retrieved for all Hsp90 ligands using the same experimental settings,87−92 they too can be used for computing relative binding free energies. Here, pKi and pIC50 values were taken from the databases PDBbind,93 Binding MOAD,94 and Binding DB.95 General Preparation of Protein and Ligand Structures. For each protein−ligand complex, the structural coordinates were retrieved from the PDB.86 The ligands were removed from the complexes, as were all water molecules, and the ligands were assigned Sybyl atom types and saved separately.97 For docking, hydrogens and charges are not considered and were therefore not added.30,96 For MD simulations, hydrogens were added using the tleap module of the AMBER11 suite of molecular simulation programs.116 Default protonation states were assigned to all protonatable amino acids, which resulted in all histidines being protonated at Nε. HIV-1 protease contains two aspartic acid residues in the catalytic site, Asp25 and Asp25′. According to several studies, the protonation states of 174

DOI: 10.1021/acs.jcim.6b00373 J. Chem. Inf. Model. 2017, 57, 170−189

Article

Journal of Chemical Information and Modeling

Figure 2. Advantages of interaction-based clustering. (A) Structure of FXa (PDB code: 1NFY) in surface representation with two docked ligand poses (ligand RTR). The RMSD between the two ligand poses is ∼2.5 Å, resulting mainly from the rotation of the solvent-exposed chlorobenzothiophene substituent. (B) Structure of trypsin (PDB code: 1O36) in surface representation with two docked ligand poses (ligand 607). The RMSD between the two ligand poses is ∼2.0 Å, resulting mainly from the rotation of the solvent-exposed aniline substituent. Both ligand poses in panels A and B introduce relatively high RMSD values despite overall similar binding modes. (C) and (D) Identical binding poses of the ligand RTR in the structure of FXa (PDB code: 1NFY) (C) and of ligand 607 in the structure of trypsin (PDB code: 1O36) (D) as in panels A and B, respectively. Protein atoms interacting only with one ligand pose are colored in magenta (24% and 16% of the total interacting atoms for A and B, respectively), protein atoms interacting with the other ligand pose are colored in light blue (20% and 13% of the total interacting atoms for C and D, respectively), and protein atoms interacting with either one of the ligand poses are colored in blue (56% and 71% of the total interacting atoms for C and D, respectively).

Cl−, one Cl−, and one Na+ ion(s) was (were) added, respectively. When the ligand was charged, the number of ions added was changed accordingly. Periodic boundary conditions were applied using the particle mesh Ewald (PME) method109 to treat long-range electrostatic interactions. Bond lengths of bonds involving hydrogen atoms were constrained using the SHAKE algorithm.121,122 The time step for all MD simulations was 2 fs, and a direct-space nonbonded cutoff of 8 Å was applied. Initially, each complex crystal structure was minimized by 50 steps of steepest descent minimization, followed by 500 steps of conjugate gradient minimization. After minimization, the system was heated from 100 to 300 K using canonical ensemble (NVT) MD simulations for 50 ps. Then, the solvent density was adjusted using isothermal−isobaric ensemble (NPT) MD simulations for 250 ps. Positional restraints with a force constant of 5 kcal mol−1 Å−2 applied during thermalization were reduced in a stepwise manner over 50 ps followed by 50 ps of unrestrained NVT-MD simulations at 300 K with a time constant of 2 ps for heat bath coupling. Temperature control was done using the Berendsen thermostat.123 The complexes were then subjected to 250 ns of NVT-MD simulations for production, extracting snapshots in time intervals of 20 ps.

to score our existing docked poses because it incorporates the number of rotatable bonds of the ligand as a measure for the configurational entropy.115 Molecular Dynamics Simulations. MD simulations to generate conformational ensembles of the protein−ligand complexes for post-processing with MM-PBSA (see below) were performed with the AMBER11 suite of molecular simulation programs116 following established procedures.12 The Cornell et al. force field117 with modifications introduced by Hornak et al. (ff99SB)118 and the general AMBER force field (GAFF)119 were used for proteins and ligands, respectively. Atomic partial charges were calculated with the RESP program119,120 in AMBER11,116 employing a two-step fitting procedure to the electrostatic potential (ESP) computed at the HF/6-31G* level with Gaussian09. Fitting to the ESP was performed using a scaling factor defining the vdW surface of 1.4 Å, in total four layers, a distance of 0.2 Å between these surfaces, and 0.28 points per square a.u. The structures were solvated in a rectangular box of TIP3P water molecules where the distance between the edges of the box and the closest solute atom was at least 11 Å, and counterions were added. For the complexes of HIV-1 protease, FXa, and Hsp90, seven 175

DOI: 10.1021/acs.jcim.6b00373 J. Chem. Inf. Model. 2017, 57, 170−189

Article

Journal of Chemical Information and Modeling Effective Energies from MM-PBSA Computations. MM-PBSA11,124 (molecular mechanics Poisson−Boltzmann surface area) is a post-processing end-point free energy calculation method, where the binding free energy is computed as the sum of changes in the effective energy (ΔGeff.) and changes in the configurational entropy of the binding partners. Here, ΔGeff. is computed as the sum of changes in the gas-phase energy (ΔEMM) and changes in the solvation free energy (ΔGsolv.) averaged over a conformational ensemble generated by MD simulations (eq 16). ΔGeff. = ΔEMM + ΔGsolv.

entropies are obtained compared to the common approach of extracting starting structures for receptor and ligand from a nonminimized complex structure. NMA was performed as previously established in our group131 using the program NAB of the AMBER11 suite of molecular simulation programs.132 All atoms of the respective complex, receptor, or ligand conformations were considered using a DDD of ε(r) = 4r. Then, ΔSvib. was calculated as the difference between the entropy of the proten−ligand complex and the entropy of its components. Entropy contributions arising from changes in the translational and rotational degrees of freedom were included applying classical statistical thermodynamics.53 Note that ΔST depends on solute concentration,53,134 leading to the concentration dependence of chemical equilibria that do not conserve the number of molecules (such as binding reactions).39,135 Here, we obtain binding free energies for a standard state of 1 M. This results in a translational entropy for each component that is smaller by 6.4 cal mol−1 K−1 than the entropy value obtained for the standard state of an (ideal) gas (1 atm, 298.15 K → 0.041 M). Interaction Entropies. Recently, Duan and co-workers suggested the interaction entropy (IE) method for calculating the entropy change upon protein−ligand complex formation.136 For the calculation of the IE, the fluctuation of the protein− ligand (gas-phase) interaction energy along an MD trajectory of the complex is exploited according to eq 6 in ref 136. Here, we considered the subset of the snapshots described above, extracted between 10 to 20 ns of the MD trajectories, resulting in a total of 500 structures. For comparison, we replaced ΔSR/T in eq 18 by the IE calculated that way. Multiple Linear Regression. A linear combination of ΔGeff. from MM-PBSA and TΔSR/T from BEERT was used as an approximation to the binding free energy (eq 18). The coefficients in eq 18 were determined by multiple linear regression against experimental ΔGbind.

(16)

As previously described,12 here, ΔEMM was computed by summing contributions from internal energies (including bond, angle, and torsional angle energies), electrostatic energies, and van der Waals energies using the ff99SB force field118 without applying any nonbonded cutoff. The changes in the solvation free energy were computed as sum of a polar (ΔGPB) and a nonpolar (ΔGSA) contribution using a continuum representation of the solvent (eq 17). ΔGsolv. = ΔG PB + ΔGSA

(17)

The polar part of the solvation free energy was determined by solving the linearized Poisson−Boltzmann (PB) equation as implemented in AMBER11116,125 and applying PARSE radii.126 A dielectric constant of 1 and 80 for the interior and exterior of the solute was applied, respectively. The polar contributions were computed at 100 mM ionic strength, with a solvent probe radius of 1.4 Å. The nonpolar part of the solvation free energy was calculated by a solvent accessible surface area-dependent term, using γ = 0.0072 kcal mol−1 Å−2 for the surface tension. Snapshots for the generation of conformational ensembles can either be obtained from a single trajectory of the complex (“single-trajectory approach”) or from separate trajectories of the complex, receptor, and ligand (“separate trajectory approach/three trajectory approach”). Previous studies showed larger noise when pursuing the latter approach.9,13 Hence, we followed the “single-trajectory approach” here. Snapshots of the binding partners were extracted every 20 ps from MD trajectories of the complexes of 250 ns length. A sampling interval of 20 ps is well above the correlation time of the effective energy and results in statistically independent snapshots in that respect.4,11,127 All counterions and water molecules were stripped from the snapshots prior to computing ΔGeff.. Configurational Entropies Computed in the RRHO Approximation. Configurational entropies due to changes in the solutes’ degrees of freedom (translational, rotational, and vibrational) were also computed in the RRHO approximation.19 ΔSvib. was computed by NMA128−130 on the subset of the snapshots described above, extracted between 10 to 20 ns of the MD trajectories, resulting in a total of 500 structures. Each complex conformation from the MD simulations was minimized to a root mean square gradient