Letter pubs.acs.org/JCTC
Rigidity Theory-Based Approximation of Vibrational Entropy Changes upon Binding to Biomolecules Holger Gohlke,*,† Ido Y. Ben-Shalom,† Hannes Kopitz,† Stefania Pfeiffer-Marek,‡ and Karl-Heinz Baringhaus§ †
Institute for Pharmaceutical and Medicinal Chemistry, Department of Mathematics and Natural Sciences, Heinrich Heine University Düsseldorf, 40225 Düsseldorf, Germany ‡ R&D/Pre-Development Sciences, Sanofi-Aventis Deutschland GmbH, Industriepark Höchst, 65926 Frankfurt am Main, Germany § R&D Resources/Site Direction, Sanofi-Aventis Deutschland GmbH, Industriepark Höchst, 65926 Frankfurt am Main, Germany S Supporting Information *
ABSTRACT: We introduce a computationally efficient approximation of vibrational entropy changes (ΔSvib) upon binding to biomolecules based on rigidity theory. From constraint network representations of the binding partners, ΔSvib is estimated from changes in the number of low frequency (“spongy”) modes with respect to changes in the networks’ coordination number. Compared to ΔSvib computed by normal-mode analysis (NMA), our approach yields significant and good to fair correlations for data sets of protein−protein and protein−ligand complexes. Our approach could be a valuable alternative to NMA-based ΔSvib computation in end-point (free) energy methods. standard. As to technical challenges, estimating ΔSvib by NMA is computationally very burdensome, which precludes applying such calculations to more than a few protein−ligand complexes in general.21 Consequently, alternative methods for computing ΔSvib have been applied including the use of quasi-harmonic analysis (QHA),22 variations of the NMA approach,23,24 or a solvent-accessible surface area-based model.25 Issues of convergence have been found to pose a limitation to the straightforward applicability of QHA to problems of binding to biomolecules, however.26 Here, we introduce a computationally highly efficient approximation of ΔSvib upon binding to biomolecules based on rigidity theory. We compare its results for data sets of protein− protein and protein−small molecule complexes to those obtained with NMA-based ΔSvib and find significant and good to fair correlations. The principle idea underlying our approach is that, rather than estimating ΔSvib from changes in the vibrational frequencies of normal modes and, hence, the width of energy wells upon binding, we estimate ΔSvib from changes in the variation of the number of low frequency modes. This is described in detail in the following. As to the theoretical background, in normal-mode analysis, a potential energy function V(x) is expanded in a Taylor series expansion about some point x0.27 If x0 denotes the location of a minimum of V(x), the gradient of V(x) vanishes. If also third and higher-order derivatives of V(x) are ignored, the dynamics of the
E
nd-point (free) energy methods such as MM-PBSA1 or MM-GBSA2,3 are widely used in the early stages of drug discovery to rank potential ligands4−6 and for identifying interaction hot spots in epitopes of protein−protein complexes.7,8 The methods’ computational efficiency results from several approximations, including the estimate of changes in the configurational entropies (ΔSconfig) of the binding partners.9,10 However, likely the largest uncertainty in MM-PB(GB)SA-type calculations originates from these estimates.1,11 In MM-PB(GB)SA-type calculations, ΔSconfig is usually estimated in the rigid rotor, harmonic oscillator (RRHO) approximation;12 there, translational, rotational, and vibrational contributions to ΔSconfig are computed separately for each species in the binding equilibrium by gas-phase statistical mechanics.13 Estimating changes in the vibrational entropy (ΔSvib) upon complex formation is particularly challenging. As to fundamental challenges, within the RRHO approximation, changes in the vibrational entropy (ΔSvib) are estimated by normal-mode analysis (NMA).14 For rigid small molecules, compared to reference values, absolute average percentage deviations up to 2.5% were found in this context when using unscaled harmonic frequencies.15 Applying NMA likely also leads to systematic errors because anharmonic contributions are neglected.15,16 This may become prominent when neglecting the anharmonicity of the torsional potential surface because, due to the anharmonicity, energy levels are more congested than predicted by a harmonic potential.16 In addition, for individually minimized structures from the same trajectory, variations in ΔSvib upon complexation of ∼5 kcal mol−1 have been observed.17 Still, NMA-based ΔSvib estimates are widely used18−20 and can be considered the gold © XXXX American Chemical Society
Received: January 7, 2017 Published: March 29, 2017 A
DOI: 10.1021/acs.jctc.7b00014 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX
Letter
Journal of Chemical Theory and Computation system can be described in terms of linearly independent normal modes obtained from diagonalizing the Hessian matrix, each one associated with a frequency νi. From νi, the vibrational contributions to thermodynamic properties can be determined.28,29 For Svib, one obtains (eq 1) 3N − 6
Svib = T −1
∑ i=1
⎡ ⎤ hνi − kB T ln(1 − e−hνi / kBT )⎥ ⎢ hνi / kBT ⎣e ⎦ −1 (1)
Note that Svib is particularly sensitive to the frequencies of the lowest modes of vibration;12,28 in contrast, for νi → ∞, both the first and second term in the square brackets approach zero. The low-frequency modes reflect the presence of weak forces in the biomolecular system, encoded, e.g., as torsion angle and van der Waals potentials in current state-of-the-art biomolecular force fields.30 To now make the connection to approximating Svib based on rigidity theory, we first neglect weak forces in V(x), resulting in a Kirkwood31 or Keating32 potential VK, schematically written as (eq 2)33 VK =
β α (Δl)2 + (Δθ )2 2 2
(2)
Here, VK describes small displacements from an equilibrium structure in a bond-bending network in terms of changes in bond length (Δl) and bond angle (Δθ), with α and β being the force constants for bond stretching and bending, respectively. Diagonalizing the Hessian from eq 2 ascertains a number F of vibrational modes with zero frequency.33 These so-called floppy modes correspond to the ways in which the network can be continuously deformed at no cost in energy by rotations around bonds; F decreases with an increasing mean coordination ⟨r⟩ in the network (Figure 1A).33 For a comprehensive review on rigidity theory for biomolecules see ref 34. As to the methodological development, we now describe how Svib can be estimated from the variation of the number of F with ⟨r⟩. In studies on random central-force networks and general lattices, the negative of the number of floppy modes, −F, has been shown to act as a free energy:39,40 −F is a concave function of ⟨r⟩ (i.e., −F(2) = −d2F/d⟨r⟩2 ≤ 0) (Figure 1A), as required of a free energy, such that if there is an ambiguity, the system will always be in the lowest free energy, i.e., maximum floppy modes, state.35,36 Note that the floppy modes were considered to have zero frequency in this context. However, in “real” molecular systems, floppy modes will become “spongy”,33 i.e., will have a small finite frequency, if weak forces are present in the network.37,38 Assuming that only these “spongy” modes contribute to the estimate of Svib (see the comment below eq 1 for a justification) and that the modes have the same small finite frequency νF (see eqs S1−S8 in the Supporting Information, SI, as to the adequacy of this assumption) then yields for the contribution to the free energy due to vibrations, Gvib (eq 3; see eqs S9−S12 in the Supporting Information for details): ⎛ ⎞ 3 Gvib ≈ −F ⎜NkT − NhυF ⎟ ⎝ ⎠ 2
Figure 1. (A) Number of floppy modes (F) as a function of the mean coordination ⟨r⟩ is shown exemplarily for one MD simulationsgenerated conformation of the trypsin-ligand complex (PDB ID 1K1N, solid line) and the protein only (dashed line). Along the lines, rigid cluster decompositions of both structures computed at identical Ecut values (mentioned under the structures, in kcal mol−1), respectively, are depicted, with each colored body indicating one rigid cluster; the largest rigid cluster is colored in blue. (B) −F(1) = −dF/d⟨r⟩, introduced here as an entropy-like quantity, is shown as a function of the mean coordination ⟨r⟩, corresponding to the curves in panel A. Here, Ecut values corresponding to ⟨r⟩ are depicted along the abscissa for the complex (Ecutcom) and the protein (Ecutrec). To compare −F(1) for different biomolecular systems, the respective ⟨r⟩ at a fixed Ecut value was determined. The vertical dashed and dotted lines depict −F(1) for complex and protein at Ecut = −1.0 kcal mol−1. Note that in this Ecut range, the rigid cluster decompositions (panel A) differ the most.
⎛ ∂G ⎞ Svib = −⎜ vib ⎟ ⎝ ∂T ⎠ N , p ≈
⎞ dF d r ⎛⎜ 3 NkT − NhυF ⎟ + F(Nk) ⎠ d r dT ⎝ 2
(4)
Here, ⟨r⟩ has been considered a temperature-related quantity as in flexibility and rigidity analysis before;39 in thermal unfolding simulations of constraint network representations of biomolecules, ⟨r⟩ was decreased with increasing temperature.40,41 To proceed with eq 4, we thus make the assumption (eq 5)
(3)
Employing the defining equation for entropy S = −(∂G/∂T)N,p29 and considering that F = f(⟨r⟩) (see above) and ⟨r⟩ = f(T) (see below) results in (eq 4)
dr = −c dT B
(5) DOI: 10.1021/acs.jctc.7b00014 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX
Letter
Journal of Chemical Theory and Computation that is, ⟨r⟩ decreases linearly with increasing T. This assumption was inspired by a consensus view on protein dynamics, which revealed that proteins as a whole behave like dense liquids42 and that the mean coordination in the liquid water has been proposed to decrease linearly with increasing temperature.43 Furthermore, Figure 1 exemplarily depicts that −dF/d⟨r⟩ is larger by a factor O(101) than F. We feel it is safe to assume that this example can be generalized for the type of constraint networks considered in this work, proteins and protein−ligand complexes, given that for 26 proteins with diverse architectures, including monomers and oligomers, very similar F(⟨r⟩) and −dF/d⟨r⟩(⟨r⟩) curves were found, leading to the notion of a universal behavior in terms of rigidity loss upon unfolding.44 Finally, for the “spongy” modes considered here, NkT ≫ 3/2NhνF at T ≈ 300 K and vF ≲ 150 cm−1;45 accordingly, (NkT − 3/2NhνF) is larger by a factor O(102 to 103) than Nk. If |c| (eq 5) > 1/(101 × (102 to 103)) K−1, the second term on the right side of eq 4 can then be neglected compared to the first one. For the liquid water, |c| = 0.004 K−1 has been reported,43 fulfilling this condition. Taken together, we introduce here that the negative of the change in the number of floppy modes with respect to a change in the mean coordination is related to the vibrational entropy Svib ≈ −
dF c′ = −F (1)c′ dr
where com, rec, and lig refer to the complex, receptor, and ligand, respectively. As to the computational ef f iciency of approximating the change in vibrational entropy upon binding by eq 8, three comments are in order: (I) If α and β become (infinitely) large in eq 2, the bond stretching and bending forces become bond and angle constraints, leading to a constraint network. A representation of biomolecules in terms of constraint networks has been successfully used in the analysis of bimolecular rigidity and flexibility previously; there, in addition to covalent interactions, noncovalent interactions (hydrogen bonds, salt bridges, and hydrophobic tethers) are modeled via bond and angle constraints.47−49 (II) Rather than by diagonalizing the Hessian from eq 2, F can also be determined by an advanced constraint (Maxwell) counting33,50 on the constraint network as implemented in the combinatorial “pebble game” algorithm.51,52 This algorithm performs with a time complexity of, on average, O(N), providing for a dramatic speed up for large (biomolecular) systems compared to the time complexity of O(N3) for matrix diagonalization. In fact, for a normal-sized protein (∼250 residues), the computing time for determining F is ∼8 s on a single CPU core with 2.5 GHz. (III) For the description of a system’s dynamics by NMA, the system must reside at a local minimum on the potential energy hypersurface. In MM/PB(GB)SA-type applications, typically, structures have been minimized to a root mean-square gradient ( R MS G ) o f t h e p o t e n t i al e n er g y f r o m 10 − 5 t o 10−3 kcal mol−1 Å−1 before applying NMA.26 Performing minimizations of biomolecules to such low RMSG is computationally demanding. For a protein of ∼250 residues, energy minimization and diagonalization of the Hessian require ∼2.5 h on a single CPU core with 2.5 GHz. In contrast, no minimization is required prior to applying the “pebble game” algorithm to constraint networks to compute F. As to computing −F(1) for eq 8, three steps are followed (see SI for details). (I) A structural ensemble of the complex is generated by all-atom molecular dynamics (MD) simulations. Performing the subsequent analyses on an ensemble rather than a single structure overcomes the problem that results from constraint counting are sensitive to the input structural information.53,54 (II) A constraint network is generated for each complex conformation, as done in previous studies of biomolecular rigidity and flexibility.47−49 In addition, a constraint network is generated for the receptor conformation extracted from the respective complex, as is for the extracted “ligand” in the case of protein−protein complexes. In contrast, small molecule ligands lack the typical network character and, thus, are not suitable for evaluation by constraint counting. For such ligands, −F(1)lig in eq 8 is replaced by a scaled s × Svib,lig value, as detailed in the SI. In all, a so-called single-trajectory approach is pursued, as often applied in end-point (free) energy methods, which neglects possible conformational changes of the unbound structures but usually gives less noisy results than the three-trajectory alternative.1 (III) A “constraint dilution trajectory” of network states {σ} is generated from each initial constraint network by successively removing noncovalent constraints.40,41,44,55,56 Here, hydrogen bond constraints (including salt bridges) are removed in the order of increasing strength40,44,57 such that for network state σ only those hydrogen bonds are retained that have an energy EHB ≤ Ecut. The hydrogen bond energy EHB is determined from an empirical energy function58 successfully used by us59−62 and others41,44,55,63,64 in this context. For each σ, F(⟨r⟩) is computed
(6)
with c′ = c × (NkT − 3/2NhνF). Considering that the idea behind vibrational entropy is that a phase space is explored by atoms as they vibrate and that the vibrational entropy is the larger the larger the phase space is, eq 6 yields that −F(1) is a measure for the size of the phase space. Notably, it thereby matters how the network reacts to changes (F(1)), rather than how the network’s floppyness is in absolute terms (F) (see also next paragraph). We note that, in addition to −F, a thermodynamic interpretation has been provided for F(2) in that F(2) has been regarded as a specific heat and used to characterize the order of transitions of constraint networks switching between rigid and flexible states.35 However, to the best of our knowledge, no thermodynamic interpretation of F(1) has yet been presented. For F(1), a relation with respect to F and the total number of bonds NB in constraint networks with a f ixed number of nearest neighbors has been derived (eq 7)38,39 −F (1) ∼ (3N − F )/NB ≥ 0
(7)
Equation 7 yields that −F(1) ≥ 0, as required of an entropy. Furthermore, −F(1) depends on the actual network state (Figure 1B): If NB is low, related to a very flexible network (Figure 1A), − F(1) approaches a positive limit, indicating the maximum entropy of the system; if NB is high, related to a very rigid network (Figure 1A), −F(1) approaches zero, as expected for a system for which only one state of realization exists. Note that adding constraints, e.g., due to binding of a ligand, to a constraint network representation of a biomolecule with either low or high NB will not lead to marked changes in −F(1) (Figure 1B). In contrast, marked changes are to be expected when constraints are added to a network with intermediate NB (see vertical lines in Figure 1B and the large change in the related rigid cluster decompositions in Figure 1A), reminiscent of a biomolecule with marginal stability.46 The change in vibrational entropy upon binding to a biomolecule is then approximated as (eq 8) Δ−F (1) = (−F (1)com) − (−F (1)rec) − (−F (1)lig)
(8) C
DOI: 10.1021/acs.jctc.7b00014 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX
Letter
Journal of Chemical Theory and Computation
Figure 2. (A) Correlation of Δ−F(1) versus ΔSvib computed for 10 protein−protein complexes. The average standard errors of the mean (SEM) of Δ−F(1) and ΔSvib are ∼210 and ∼10.0 cal mol−1 K−1, respectively. (B) Correlation of ΔΔ−F(1) versus ΔΔSvib computed for 30 alanine mutations in the interface of Ras-Raf (PDB ID 1GUA) 2 using Ecut = −0.2 kcal mol−1. The red symbols denote mutations E31Ras, E37Ras, and T68Raf considered outliers. The average SEM values of ΔΔ−F(1) and ΔΔSvib are ∼70 and ∼1.3 cal mol−1 K−1, respectively. Dashed lines indicate Δ−F(1), ΔSvib = 0; the correlation line is represented by a straight line.
Ecut was set to −0.6 or −1.4 kcal mol−1 (R2 = 0.63, 0.83; Figure S6 in the SI), demonstrating that our approach is robust with respect to the choice of Ecut. Finally, we used structures directly extracted from the MD trajectories for computing Δ−F(1), rather than the minimized ones used as input for NMA (see SI for details). Not considering PDB ID 2JEL, Δ−F(1) of which deviates most between nonminimized and minimized structures, resulted in a correlation with R2 = 0.54 (95% CI: 0.01 < R2 < 0.96; p = 0.02; Figure S7 in the SI). Thus, despite structural deviations between respective conformations used for constraint counting and NMA, still a good correlation is obtained. Computational alanine scanning allows from a single MD simulation an estimate of the individual contribution of each residue of a protein−protein complex to the binding and has proven valuable for identifying “hot spot” residues in protein− protein epitopes.2,7,65−67 For the vibrational entropy contribution, the difference in the change in Svib upon binding (ΔΔSvib) was computed by NMA from the wild type and an alanine mutant; the mutant is generated from the wild type conformations by removing respective atoms. Here, ΔΔ−F(1) was computed analogously. The correlation of ΔΔ−F(1) versus ΔΔSvib for 30 alanine mutations in the interface of Ras-Raf (PDB ID 1GUA)2 is significant and weak (R2 = 0.24; 95% CI: 0.01 < R2 < 0.58; p = 0.01; Figure S8 in the SI) if Ecut = −1.0 kcal mol−1 is used and three outliers are disregarded. The correlation can be markedly improved (R2 = 0.51; 95% CI: 0.22 < R2 < 0.72; p = 2.7 × 10−5; Figure 2B) if Ecut = −0.2 kcal mol−1 is used, again disregarding three outliers (residues E31Ras, E37Ras, and T68Raf). Apparently, analyzing stiffer constraint networks is favorable here, likely because the ΔΔ−F(1) values for side chains on the protein surface become less noisy when contributions from the protein core become less pronounced. Side chains of residues E37Ras and T68Raf are involved in salt bridges or a polar hydrogen bond across the center of the epitope, and their influence on the vibrational entropy change is underestimated by constraint counting (Figure S9 in the SI). Residue E31Ras engages in a salt bridge interaction at the edge of the epitope, and its influence on the vibrational entropy change is overestimated by constraint counting (Figure S9 in the SI). Neglecting solvent influences or cooperative effects on the strength of the polar interactions in the energy function EHB might cause these deviations. Note that the
by constraint counting with the program f irst,47,51 and from that −F(1) at a given ⟨r⟩ by numerical differentiation. Using eq 7 instead is not possible because atoms in constraint networks generated from biomolecules have a variable number of nearest neighbors. To compare −F(1) for different biomolecular systems, the respective ⟨r⟩ at a fixed Ecut value was determined. Here, Ecut = −1.0 kcal mol−1 was used unless otherwise noted, motivated from previous studies.53 As to results, the validity of eqs 6 and 8 was assessed on one data set of protein−protein complexes and four data sets of protein− small molecule complexes by comparison to vibrational entropies computed by NMA (see SI for details). The data set of protein−protein complexes comprises four antibody−antigen, four protease−protease inhibitor, and two signaling complexes, with diverse folds, protein sizes between 775 and 8398 atoms, and binding affinities from the μM to pM range. Initially, we probed if −F(1) behaves as an extensive property, as required of an entropy-like quantity, and confirmed in the case of insulin dimerization for Svib computed by NMA.12 We computed −F(1) for each complex, receptor, and ligand of the data set members, resulting in 3 × 10 values. When plotted against the mass of the proteins, a significant and very good correlation results (R2 = 0.92; bootstrapped 95% confidence interval (CI): 0.84 < R2 < 0.96; p = 2.2 × 10−15; Figure S5 in the SI), demonstrating a strong dependence of −F(1) on the system size, indicative of an extensive property. Not surprisingly, −F(1) and Svib of the 3 × 10 complexes, receptors, and ligands, respectively, also yield a very good correlation (R2 = 0.95; data not shown). More importantly, we next correlated Δ−F(1) and ΔSvib (eq S13 in the SI), i.e., estimates of changes in the vibrational entropy upon binding, for the 10 protein−protein complexes, yielding a significant and good correlation (R2 = 0.80; 95% CI: 0.19 < R2 < 0.96; p = 0.0005; c′ (eq 6) = d(ΔSvib)/d(Δ−F(1)) = 0.045 cal mol−1 K−1; Figure 2A). In contrast, ΔSvib correlated against the area of the protein−protein epitope buried upon complex formation yields a weak correlation (R2 = 0.36; data not shown). This demonstrates that the good correlation of Δ−F(1) versus ΔSvib does not have a trivial, i.e., size-dependent, origin; rather, Δ−F(1) describes alterations in the density of vibrational states of the complexes relative to the binding partners apparently with good accuracy. Note that significant and good correlations were also obtained if D
DOI: 10.1021/acs.jctc.7b00014 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX
Letter
Journal of Chemical Theory and Computation
Figure 3. Correlation of Δ−F(1) versus ΔSvib computed for protein−small molecule complexes of (A) the trypsin data set and (B) the factor Xa data set. Dashed lines indicate Δ−F(1), ΔSvib = 0; the correlation line is represented by a straight line. Data points of complexes that result in maximal favorable or disfavorable vibrational entropy changes are circled, and the respective crystal structures of the complex are depicted: (C) ligands CRC200 (taken from PDB ID 1K1N) and (D) nicotinamide (taken from PDB ID 2OTV); (E) ligands IMA (taken from PDB ID 1LPG) and (F) IIA (taken from PDB ID 2BOH). The location of the S1 pocket in trypsin and factor Xa is indicated by a black arc; polar interactions between protein and ligand are depicted by red dashed lines. The average SEM of Δ−F(1) and ΔSvib are 0.1 and 1.0 cal mol−1 K−1, respectively.
0.46; 95% CI: 0.06 < R2 < 0.74; p = 0.001; c′ (eq 6) = d(ΔSvib)/ d(Δ−F(1)) = 0.243 cal mol−1 K−1; Figure 3B and Table S2 in the SI). Again, both Δ−F(1) and ΔSvib distinguish between ligand binding that leads to favorable versus disfavorable vibrational entropy contributions to the binding affinity (Figure 3B). A ligand of the former group is IIA (PDB ID 2BOH), which places a chlorothiophen moiety into the S1 pocket (Figure 3F); one of the latter group is IMA (PDB ID 1LPG), which places a benzamidine moiety there (Figure 3E). As the ligands are otherwise similar in size and interaction pattern with the protein, one can speculate that it is the locking-in of protein and ligand by a salt bridge that leads to disfavorable vibrational entropy contributions in contrast to the less restrictive interactions of the chlorothiophen moiety. Finally, we investigated two additional data sets of Hsp90 and HIV-1 protease−small molecule complexes. These data sets were rather similar to the trypsin and factor Xa data sets with respect to the number of data points and the range of ligand sizes and binding affinities (Hsp90 data set: 16 complex structures with small molecule ligands ranging from 150−500 Da and binding affinities spanning 4.7 log units; HIV-1 protease data set: 20/500−750 Da/4.2 log units). As a major difference, however, the width of the distribution of ΔSvib computed by NMA across each data set is only ∼1/3 of that of the trypsin and factor Xa data sets (∼22−25 cal mol−1 K−1), being very similar in magnitude to the average standard deviation of the computed ΔSvib (∼23−29 cal mol−1 K−1). Therefore, according to Kramer et al.,70 the maximum possible squared Pearson correlation coefficient (R2max) on these data sets vanishes; in agreement, Δ−F(1) versus ΔSvib did not yield
vibrational entropy contributions of most of the side chains in the Ras-Raf epitope disfavor binding, as indicated by ΔΔSvib > 0. The ΔΔ−F(1) values mirror this finding for all but seven of the side chains (disregarding the three outliers). Regarding the protein−small molecule complexes, the trypsin data set encompasses 23 complexes with ligands ranging in size from filling only the S1 pocket to those capturing the entire active site and binding affinities covering a range of ∼3.4 log units. The correlation of Δ−F(1) versus ΔSvib is significant and fair (R2 = 0.40; 95% CI: 0.09 < R2 < 0.66; p = 0.001; c′ (eq 6) = d(ΔSvib)/ d(Δ−F(1)) = 0.182 cal mol−1 K−1; Figure 3A and Table S1 in the SI). Some ligands lead to ΔSvib > 0, whereas others show ΔSvib < 0. Ligands of the former group are usually small and make few interactions with the protein (Figure 3D), allowing for librational motions of the ligand;68 in contrast, those of the latter group usually make many interactions with different parts of the protein (Figure 3C), stiffening the protein.69 Notably, this distinction between ligands is almost perfectly reflected in the Δ−F(1) values (Figure 3A), revealing that constraint counting can distinguish between ligand binding that leads to favorable versus disfavorable vibrational entropy contributions to the binding affinity. This property is of high importance when ranking potential ligands.12 The factor Xa data set contains 20 complex structures with small molecule ligands that are more similar in size (400−600 Da) and show a narrower distribution of binding affinities (range: ∼2.7 log units). As an additional challenge, the data set contains both ligands that form the well-known salt bridge with Asp189 in the S1 pocket and those that place nonpolar moieties there. The correlation of Δ−F(1) versus ΔSvib is significant and fair (R2 = E
DOI: 10.1021/acs.jctc.7b00014 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX
Letter
Journal of Chemical Theory and Computation
for critically reading the manuscript. This work was supported by funds from Sanofi-Aventis Deutschland GmbH. We are grateful to the “Zentrum für Informations- und Medientechnologie” (ZIM) at the Heinrich Heine University Düsseldorf for providing computational support.
significant correlations (Tables S3 and S4 in the SI). Note that these last results restate the fundamental challenge of computing precise changes in vibrational entropy in general,17 rather than showing a limitation of approximating them by Δ−F(1). In summary, we presented a computationally highly efficient approximation of changes in the vibrational entropy (ΔSvib) upon binding to biomolecules based on rigidity theory. Compared to ΔSvib computed by NMA as a gold standard, our approach yields significant and good to fair correlations for data sets of protein−protein and protein−small molecule complexes as well as in alanine scanning. The slopes c′ = dΔSvib/dΔ−F(1) (eq 6) for the data sets of protein−protein and protein−ligand complexes differ by a factor of ∼5, and the ones for the two protein−ligand data sets are almost equal. While clearly further data sets need to be investigated before a general conclusion can be drawn, these results suggest that for the type of biomolecular complexes investigated here, a uniform c′ might exist. With further data sets, it may also be worth investigating if including the term on the right side of the sum in eq 4 will lead to better correlations with ΔSvib. However, then linear regression needs to be applied in order to determine the magnitude of c′. Overall, our results suggest that our approach is a valuable, computationally efficient alternative to NMA-based ΔSvib computation, with a potential scope of application in end-point (free) energy methods.
■
■
COMPUTATIONAL METHODS Details on the adequacy of the assumption that modes contributing to the estimate of Svib have the same frequency, the contribution to the free energy due to “spongy” modes, the data set of protein−protein complexes, the data sets of protein− ligand complexes, the calculation of Svib by normal-mode analysis, the constraint network generation and constraint counting, and statistical evaluation are provided in the Supporting Information. ASSOCIATED CONTENT
S Supporting Information *
The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jctc.7b00014. Supplemental tables: ΔSvib computed by NMA and Δ-F(1) computed by constraint counting for the protein−ligand data sets. Supplemental figures: Structures of the small molecules of the protein−ligand complexes and additional correlations. (PDF)
■
REFERENCES
(1) Homeyer, N.; Gohlke, H. Free Energy Calculations by the Molecular Mechanics Poisson-Boltzmann Surface Area Method. Mol. Inf. 2012, 31, 114−122. (2) Gohlke, H.; Kiel, C.; Case, D. A. Insights into protein-protein binding by binding free energy calculation and free energy decomposition for the Ras-Raf and Ras-RalGDS complexes. J. Mol. Biol. 2003, 330, 891−913. (3) Genheden, S.; Ryde, U. The MM/PBSA and MM/GBSA methods to estimate ligand-binding affinities. Expert Opin. Drug Discovery 2015, 10, 449−61. (4) Schneider, G.; Bohm, H. J. Virtual screening and fast automated docking methods. Drug Discovery Today 2002, 7, 64−70. (5) Leach, A. R.; Shoichet, B. K.; Peishoff, C. E. Prediction of proteinligand interactions. Docking and scoring: successes and gaps. J. Med. Chem. 2006, 49, 5851−5. (6) Homeyer, N.; Stoll, F.; Hillisch, A.; Gohlke, H. Binding Free Energy Calculations for Lead Optimization: Assessment of Their Accuracy in an Industrial Drug Design Context. J. Chem. Theory Comput. 2014, 10, 3331−3344. (7) Metz, A.; Pfleger, C.; Kopitz, H.; Pfeiffer-Marek, S.; Baringhaus, K. H.; Gohlke, H. Hot Spots and Transient Pockets: Predicting the Determinants of Small-Molecule Binding to a Protein-Protein Interface. J. Chem. Inf. Model. 2012, 52, 120−133. (8) Metz, A.; Ciglia, E.; Gohlke, H. Modulating protein-protein interactions: from structural determinants of binding to druggability prediction to application. Curr. Pharm. Des. 2012, 18, 4630−47. (9) Killian, B. J.; Yundenfreund Kravitz, J.; Gilson, M. K. Extraction of configurational entropy from molecular simulations via an expansion approximation. J. Chem. Phys. 2007, 127, 024107. (10) Brooijmans, N.; Kuntz, I. D. Molecular recognition and docking algorithms. Annu. Rev. Biophys. Biomol. Struct. 2003, 32, 335−73. (11) Hou, T. J.; Wang, J. M.; Li, Y. Y.; Wang, W. Assessing the Performance of the MM/PBSA and MM/GBSA Methods. 1. The Accuracy of Binding Free Energy Calculations Based on Molecular Dynamics Simulations. J. Chem. Inf. Model. 2011, 51, 69−82. (12) Tidor, B.; Karplus, M. The Contribution of Vibrational Entropy to Molecular Association - the Dimerization of Insulin. J. Mol. Biol. 1994, 238, 405−414. (13) Zhou, H. X.; Gilson, M. K. Theory of Free Energy and Entropy in Noncovalent Binding. Chem. Rev. 2009, 109, 4092−4107. (14) Go̅, N.; Scheraga, H. A. Analysis of the contribution of internal vibrations to the statistical weights of equilibrium conformations of macromolecules. J. Chem. Phys. 1969, 51, 4751−4767. (15) Cervinka, C.; Fulem, M.; Ruzicka, K. Evaluation of Uncertainty of Ideal-Gas Entropy and Heat Capacity Calculations by Density Functional Theory (DFT) for Molecules Containing Symmetrical Internal Rotors. J. Chem. Eng. Data 2013, 58, 1382−1390.
■ ■
ABBREVIATIONS CI: confidence interval MD: molecular dynamics MM-GBSA: molecular mechanics generalized Born surface area MM-PBSA: molecular mechanics Poisson−Boltzmann surface area NMA: normal-mode analysis RMSG: root mean-square gradient RRHO: rigid rotor harmonic oscillator QHA: quasi-harmonic analysis
AUTHOR INFORMATION
Corresponding Author
*Phone: +49(0)211-81-13662. E-mail:
[email protected]. ORCID
Holger Gohlke: 0000-0001-8613-1447 Notes
The authors declare the following competing financial interest(s): S.P.-M. and K.-H.B. are employees of Sanofi-Aventis Deutschland GmbH.
■
ACKNOWLEDGMENTS We are grateful to Dr. Natasja Brooijmans for providing conformational ensembles of the four antibody−antigen and four protease-protease inhibitor complexes, and to Dr. Christopher Pfleger, Dr. Denis Schmidt, and Christina Nutschel F
DOI: 10.1021/acs.jctc.7b00014 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX
Letter
Journal of Chemical Theory and Computation (16) East, A. L. L.; Radom, L. Ab initio statistical thermodynamical models for the computation of third-law entropies. J. Chem. Phys. 1997, 106, 6655−6674. (17) Kuhn, B.; Kollman, P. A. Binding of a diverse set of ligands to avidin and streptavidin: an accurate quantitative prediction of their relative affinities by a combination of molecular mechanics and continuum solvent models. J. Med. Chem. 2000, 43, 3786−91. (18) Goldstein, H.; Poole, C.; Safko, J.; Addison, S. R. Classical mechanics. Am. J. Phys. 2002, 70, 782−783. (19) Skjaerven, L.; Hollup, S. M.; Reuter, N. Normal mode analysis for proteins. J. Mol. Struct.: THEOCHEM 2009, 898, 42−48. (20) Xu, B.; Shen, H.; Zhu, X.; Li, G. Fast and accurate computation schemes for evaluating vibrational entropy of proteins. J. Comput. Chem. 2011, 32, 3188−93. (21) Chong, S. H.; Ham, S. Dissecting Protein Configurational Entropy into Conformational and Vibrational Contributions. J. Phys. Chem. B 2015, 119, 12623−12631. (22) Wang, J.; Morin, P.; Wang, W.; Kollman, P. A. Use of MM-PBSA in reproducing the binding free energies to HIV-1 RT of TIBO derivatives and predicting the binding mode to HIV-1 RT of efavirenz by docking and MM-PBSA. J. Am. Chem. Soc. 2001, 123, 5221−30. (23) Genheden, S.; Kuhn, O.; Mikulskis, P.; Hoffmann, D.; Ryde, U. The normal-mode entropy in the MM/GBSA method: effect of system truncation, buffer region, and dielectric constant. J. Chem. Inf. Model. 2012, 52, 2079−88. (24) Bahar, I.; Atilgan, A. R.; Erman, B. Direct evaluation of thermal fluctuations in proteins using a single-parameter harmonic potential. Folding Des. 1997, 2, 173−181. (25) Wang, J.; Hou, T. Develop and test a solvent accessible surface area-based model in conformational entropy calculations. J. Chem. Inf. Model. 2012, 52, 1199−212. (26) Gohlke, H.; Case, D. A. Converging free energy estimates: MMPB(GB)SA studies on the protein-protein complex Ras-Raf. J. Comput. Chem. 2004, 25, 238−50. (27) Case, D. A. Normal-Mode Analysis of Protein Dynamics. Curr. Opin. Struct. Biol. 1994, 4, 285−290. (28) Brooks, B. R.; Janezic, D.; Karplus, M. Harmonic-Analysis of Large Systems. 1.Methodology. J. Comput. Chem. 1995, 16, 1522−1542. (29) McQuarrie, D. A. Statistical Thermodynamics; Harper and Row, New York, 1973. (30) Ponder, J. W.; Case, D. A. Force fields for protein simulations. Adv. Protein Chem. 2003, 66, 27−85. (31) Kirkwood, J. G. The dielectric polarization of polar liquids. J. Chem. Phys. 1939, 7, 911−919. (32) Keating, P. Effect of invariance requirements on the elastic strain energy of crystals with application to the diamond structure. Phys. Rev. 1966, 145, 637. (33) Thorpe, M. F. Continuous Deformations in Random Networks. J. Non-Cryst. Solids 1983, 57, 355−370. (34) Hermans, S. M. A.; Pfleger, C.; Nutschel, C.; Hanke, C. A.; Gohlke, H. Rigidity theory for biomolecules: Concepts, software, and applications. WIREs Comput. Mol. Sci. 2017, DOI: 10.1002/wcms.1311. (35) Duxbury, P. M.; Jacobs, D. J.; Thorpe, M. F.; Moukarzel, C. Floppy modes and the free energy: Rigidity and connectivity percolation on Bethe lattices. Phys. Rev. E: Stat. Phys., Plasmas, Fluids, Relat. Interdiscip. Top. 1999, 59, 2084−2092. (36) Jacobs, D. J.; Thorpe, M. F. Generic rigidity percolation in two dimensions. Phys. Rev. E: Stat. Phys., Plasmas, Fluids, Relat. Interdiscip. Top. 1996, 53, 3682−3693. (37) Naumis, G. G. Energy landscape and rigidity. Phys. Rev. E 2005, 71, 026114. (38) Boolchand, P.; Enzweiler, R. N.; Cappelletti, R. L.; Kamitakahara, W. A.; Cai, Y.; Thorpe, M. F. Vibrational Thresholds in Covalent Networks. Solid State Ionics 1990, 39, 81−89. (39) Thorpe, M. F.; Jacobs, D. J.; Chubynsky, M. V.; Phillips, J. C. Selforganization in network glasses. J. Non-Cryst. Solids 2000, 266-269, 859−866.
(40) Radestock, S.; Gohlke, H. Exploiting the Link between Protein Rigidity and Thermostability for Data-Driven Protein Engineering. Eng. Life Sci. 2008, 8, 507−522. (41) Rader, A. J. Thermostability in rubredoxin and its relationship to mechanical rigidity. Phys. Biol. 2010, 7, 016002. (42) Rueda, M.; Ferrer-Costa, C.; Meyer, T.; Perez, A.; Camps, J.; Hospital, A.; Gelpi, J. L.; Orozco, M. A consensus view of protein dynamics. Proc. Natl. Acad. Sci. U. S. A. 2007, 104, 796−801. (43) Teixeira, J.; Bellissentfunel, M. C. Dynamics of Water Studied by Neutron-Scattering. J. Phys.: Condens. Matter 1990, 2, Sa105−Sa108. (44) Rader, A. J.; Hespenheide, B. M.; Kuhn, L. A.; Thorpe, M. F. Protein unfolding: rigidity lost. Proc. Natl. Acad. Sci. U. S. A. 2002, 99, 3540−5. (45) Go, N.; Noguti, T.; Nishikawa, T. Dynamics of a Small Globular Protein in Terms of Low-Frequency Vibrational-Modes. Proc. Natl. Acad. Sci. U. S. A. 1983, 80, 3696−3700. (46) Taverna, D. M.; Goldstein, R. A. Why are proteins marginally stable? Proteins: Struct., Funct., Genet. 2002, 46, 105−9. (47) Jacobs, D. J.; Rader, A. J.; Kuhn, L. A.; Thorpe, M. F. Protein flexibility predictions using graph theory. Proteins: Struct., Funct., Genet. 2001, 44, 150−165. (48) Pfleger, C.; Rathi, P. C.; Klein, D. L.; Radestock, S.; Gohlke, H. Constraint Network Analysis (CNA): A Python Software Package for Efficiently Linking Biomacromolecular Structure, Flexibility, (Thermo)Stability, and Function. J. Chem. Inf. Model. 2013, 53, 1007−1015. (49) Rathi, P. C.; Pfleger, C.; Fulle, S.; Klein, D. L.; Gohlke, H. Statics of Biomacromolecules; Wiley-VCH Verlag GmbH & Co. KGaA: Weinheim, Germany, 2011; pp 281−299. (50) Maxwell, J. C. XLV. On reciprocal figures and diagrams of forces. London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 1864, 27, 250−261. (51) Jacobs, D. J.; Thorpe, M. F. Generic rigidity percolation: The pebble game. Phys. Rev. Lett. 1995, 75, 4051−4054. (52) Lee, A.; Streinu, I. Pebble game algorithms and sparse graphs. Discrete Math. 2008, 308, 1425−1437. (53) Gohlke, H.; Kuhn, L. A.; Case, D. A. Change in protein flexibility upon complex formation: analysis of Ras-Raf using molecular dynamics and a molecular framework approach. Proteins: Struct., Funct., Genet. 2004, 56, 322−37. (54) Mamonova, T.; Hespenheide, B.; Straub, R.; Thorpe, M. F.; Kurnikova, M. Protein flexibility using constraints from molecular dynamics simulations. Phys. Biol. 2005, 2, S137−S147. (55) Hespenheide, B. M.; Rader, A. J.; Thorpe, M. F.; Kuhn, L. A. Identifying protein folding cores from the evolution of flexible regions during unfolding. J. Mol. Graphics Modell. 2002, 21, 195−207. (56) Radestock, S.; Gohlke, H. Protein rigidity and thermophilic adaptation. Proteins: Struct., Funct., Genet. 2011, 79, 1089−108. (57) Livesay, D. R.; Jacobs, D. J. Conserved quantitative stability/ flexibility relationships (QSFR) in an orthologous RNase H pair. Proteins: Struct., Funct., Genet. 2006, 62, 130−43. (58) Dahiyat, B. I.; Gordon, D. B.; Mayo, S. L. Automated design of the surface positions of protein helices. Protein Sci. 1997, 6, 1333−1337. (59) Pfleger, C.; Gohlke, H. Efficient and Robust Analysis of Biomacromolecular Flexibility Using Ensembles of Network Topologies Based on Fuzzy Noncovalent Constraints. Structure 2013, 21, 1725− 1734. (60) Rathi, P. C.; Radestock, S.; Gohlke, H. Thermostabilizing mutations preferentially occur at structural weak spots with a high mutation ratio. J. Biotechnol. 2012, 159, 135−44. (61) Rathi, P. C.; Jaeger, K. E.; Gohlke, H. Structural Rigidity and Protein Thermostability in Variants of Lipase A from Bacillus subtilis. PLoS One 2015, 10, e0130289. (62) Dick, M.; Weiergraber, O. H.; Classen, T.; Bisterfeld, C.; Bramski, J.; Gohlke, H.; Pietruszka, J. Trading off stability against activity in extremophilic aldolases. Sci. Rep. 2016, 6, 17908. (63) Rader, A. J.; Anderson, G.; Isin, B.; Khorana, H. G.; Bahar, I.; Klein-Seetharaman, J. Identification of core amino acids stabilizing rhodopsin. Proc. Natl. Acad. Sci. U. S. A. 2004, 101, 7246−51. G
DOI: 10.1021/acs.jctc.7b00014 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX
Letter
Journal of Chemical Theory and Computation (64) Tan, H. P.; Rader, A. J. Identification of putative, stable binding regions through flexibility analysis of HIV-1 gp120. Proteins: Struct., Funct., Genet. 2009, 74, 881−894. (65) Massova, I.; Kollman, P. A. Computational alanine scanning to probe protein-protein interactions: A novel approach to evaluate binding free energies. J. Am. Chem. Soc. 1999, 121, 8133−8143. (66) Kortemme, T.; Kim, D. E.; Baker, D. Computational alanine scanning of protein-protein interfaces. Sci. Signaling 2004, 2004, pl2. (67) Wichmann, C.; Becker, Y.; Chen-Wichmann, L.; Vogel, V.; Vojtkova, A.; Herglotz, J.; Moore, S.; Koch, J.; Lausen, J.; Mantele, W.; Gohlke, H.; Grez, M. Dimer-tetramer transition controls RUNX1/ETO leukemogenic activity. Blood 2010, 116, 603−13. (68) Fischer, S.; Smith, J. C.; Verma, C. S. Dissecting the vibrational entropy change on protein/ligand binding: Burial of a water molecule in bovine pancreatic trypsin inhibitor. J. Phys. Chem. B 2001, 105, 8050− 8055. (69) Moorman, V. R.; Valentine, K. G.; Wand, A. J. The dynamical response of hen egg white lysozyme to the binding of a carbohydrate ligand. Protein Sci. 2012, 21, 1066−73. (70) Kramer, C.; Kalliokoski, T.; Gedeck, P.; Vulpetti, A. The experimental uncertainty of heterogeneous public K(i) data. J. Med. Chem. 2012, 55, 5165−73.
H
DOI: 10.1021/acs.jctc.7b00014 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX