A Comparative Linear Interaction Energy and MM ... - ACS Publications

In MM/PBSA, Gpolar is obtained by solving the Poisson–. Boltzmann equation while the ..... expressed as follows:68. Gnonpolar = γsurf SASA + b. (13...
0 downloads 0 Views 5MB Size
Subscriber access provided by Nottingham Trent University

Pharmaceutical Modeling

A Comparative Linear Interaction Energy and MM/PBSA Study on SIRT1–Ligand Binding Free Energy Calculation Eko Aditya Rifai, Marc van Dijk, Nico P.E. Vermeulen, Arry Yanuar, and Daan P. Geerke J. Chem. Inf. Model., Just Accepted Manuscript • DOI: 10.1021/acs.jcim.9b00609 • Publication Date (Web): 28 Aug 2019 Downloaded from pubs.acs.org on August 29, 2019

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 46 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

A Comparative Linear Interaction Energy and MM/PBSA Study on SIRT1–Ligand Binding Free Energy Calculation Eko Aditya Rifai,† Marc van Dijk,† Nico P. E. Vermeulen,† Arry Yanuar,‡ and Daan P. Geerke∗,† †AIMMS Division of Molecular and Computational Toxicology, Department of Chemistry and Pharmaceutical Sciences, Vrije Universiteit Amsterdam, De Boelelaan 1108, 1081 HZ Amsterdam, the Netherlands ‡Faculty of Pharmacy, Universitas Indonesia, Depok 16424, Indonesia E-mail: [email protected]

Abstract Binding free energy (∆Gbind ) computation can play an important role in prioritizing compounds to be evaluated experimentally on their affinity for target proteins, yet fast and accurate ∆Gbind calculation remains an elusive task. In this study, we compare the performance of two popular end–point methods, i.e., linear interaction energy (LIE) and molecular mechanics/Poisson–Boltzmann surface area (MM/PBSA), with respect to their ability to correlate calculated binding affinities of 27 thieno[3,2– d]pyrimidine–6–carboxamide–derived sirtuin 1 (SIRT1) inhibitors with experimental data. Compared with the standard single–trajectory setup of MM/PBSA, our study elucidates that LIE allows to obtain direct (’absolute’) values for SIRT1 binding free energies with lower compute requirements, while the accuracy in calculating relative

1

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

values for ∆Gbind is comparable (Pearson's r = 0.72 and 0.64 for LIE and MM/PBSA, respectively). We also investigate the potential of combining multiple docking poses in iterative LIE models and find that Boltzmann–like weighting of outcomes of simulations starting from different poses can retrieve appropriate binding orientations. In addition, we find that in this particular case study, the LIE and MM/PBSA models can be optimized by neglecting the contributions from electrostatic and polar interactions to the ∆Gbind calculations.

Introduction A quantitative knowledge of protein–ligand binding affinities is essential in understanding molecular recognition, hence efficient and accurate binding free energy (∆Gbind ) calculations are a central goal in computer–aided drug design. 1 An arsenal of ∆Gbind calculation approaches is available, ranging from rigorous alchemical methods such as free energy perturbation (FEP) 2 and thermodynamic integration (TI) 3 to fast empirical scoring functions featured in molecular docking. 4 The latter are prominent in predicting protein–ligand binding poses and in discriminating binders and nonbinders within large chemical databases, 5 but they typically lack accuracy in quantitatively ranking and predicting ∆Gbind values. 6 In contrast, rigorous alchemical methods may provide reliable estimates for ∆Gbind but require extensive sampling of multiple intermediate nonphysical states and are thus computationally more expensive and still impractical for use in high–throughput scenarios. 7 Compared to these counterparts, the alternate end–point methods offer an intermediate in terms of effectiveness and efficiency in ∆Gbind computation by allowing to explicitly explore ligand, ligand–protein and solvent configurational space in the protein–ligand bound and unbound state only. 8 This provides advantages both over rigorous free energy calculations (in terms of efficiency) and empirical scoring functions (in terms of potential accuracy). Frequently applied end–point methods make use of linear interaction energy (LIE) theory 9 and the molecular mechanics/Poisson–Boltzmann surface area (MM/PBSA) approach. 10 2

ACS Paragon Plus Environment

Page 2 of 46

Page 3 of 46 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

In LIE, ∆Gbind is assumed to be linearly proportional with the differences in van der Waals and electrostatic interactions involving the ligand and its environment, as obtained from simulations of its protein–bound and unbound state in solvent. Differences in these interactions (modeled using Lennard–Jones (LJ) and Coulomb potential–energy functions) are scaled by LIE parameters α and β, respectively. 9 Originally, β was set to several fixed values according to a series of studied systems. 9,11–15 Later it turned out that fixed values for β are usually only suitable for particular systems of interest and not generally transferable between different systems. 16 To mitigate this, a proposal was made to treat both α and β as freely adjustable parameters that can be fitted based on a set of experimentally–determined ∆Gbind values. 17,18 The fitted values of α and β (and optionally offset parameter γ) determine the LIE scoring function to be used for predicting ∆Gbind for ligands with unknown affinity, which is given by: 9 ∆Gbind = α∆V vdw + β∆V ele + γ

(1)

vdw

vdw ∆V vdw = Vlig−surr − Vlig−surr unbound bound

(2)

ele

ele ∆V ele = Vlig−surr − V lig−surr bound unbound

(3)

with

and

The terms on the right hand side in Equation 2 and 3 are the MD–averaged van der Waals (vdw ) and electrostatic (ele) interaction energies, respectively, between the ligand (lig) and its surrounding (surr ) as obtained in complex with the protein bound or unbound in solvent. Another popular end–point method is molecular mechanics combined with Poisson– Boltzmann and surface area (MM/PBSA). MM/PBSA calculations can be performed using either results from protein–ligand complex simulations in a single–trajectory (one–average) setup, or from three separate simulations per compound (i.e., of the complex, the protein, and the unbound ligand) in a multi–trajectory or three–average setup. 19 Use of the single– trajectory approach is more widespread owing to its simplicity, efficiency, precision and 3

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 46

accuracy compared to the multi–trajectory setup. 6,20,21 This single–trajectory approach of MM/PBSA resembles LIE in a way that they both do not account explicitly for changes in internal energy and configurational entropy of the ligand and protein upon binding. A difference is that single–trajectory MM/PBSA assumes the conformational distribution for the bound and unbound ligand to be the same, while LIE does not. 22 Molecular dynamics (MD) trajectories of the protein–ligand complex as obtained in the one–average strategy can be used to evaluate each free energy term on the right–hand side of ∆Gbind within the following equation: 10,23

∆Gbind = Gcomplex − (Gprotein + Gligand )

(4)

In Equation 4, Gcomplex represents the free energy of the complex, and Gprotein and Gligand represent the free energies of the unbound protein and ligand, respectively. The separate free energy terms Gx for the protein, ligand or protein–ligand complex are each quantified as: 23 Gx = V bonded + V vdw + V ele + Gpolar + Gnonpolar − T S

(5)

V bonded comprises bond–stretch, angle–bend, torsion and improper–dihedral energies, and V vdw and V ele are the van der Waals and electrostatic nonbonded interaction energies, respectively. Together, the sum of these terms make up the vacuum MM energy terms while Gpolar and Gnonpolar constitute the solvation free energies calculated using a continuum (implicit) solvation model, representing the free energy change due to converting a solute in vacuum into its solvated state. In MM/PBSA, Gpolar is obtained by solving the Poisson– Boltzmann equation while the Gnonpolar term is often approximated by a solvent accessible surface area (SASA) term. 24 Replacing Poisson–Boltzmann with Generalized Born 25 for calculating Gpolar is on the basis of the alternative MM/GBSA approach. 10 T and S denote the temperature of the system and the entropy of the solute in vacuum, respectively. The T S terms entering Equation 4 via Gcomplex , Gprotein and Gligand account for the change in

4

ACS Paragon Plus Environment

Page 5 of 46 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

conformational entropy of the protein–ligand complex upon ligand binding. This can be approximated e.g., via normal–mode analysis of the protein–ligand coordinates obtained from the simulation of the complex. Often this term is neglected, 26 justified for example by the typical interest in relative binding free energies of similar compounds. As previously reported by Genheden and Ryde 27 among others, LIE can be more compute efficient up to almost one order of magnitude when compared to MM/GBSA (which is due to the entropy 27 and polar term 28 in Equation 5), while the relative accuracy of both methods in predicting experimental data can be system dependent, and this conclusion also applies to MM/PBSA. 27 In the current work, we evaluate if LIE can achieve similar (or higher) accuracy compared to MM/PBSA in calculating binding affinities of 27 thieno[3,2– d]pyrimidine–6–carboxamide analogs for the sirtuin 1 (SIRT1) receptor. SIRT1 is a member of the sirtuin family which is responsible for regulating and/or deacetylating stress resistance and metabolism processes 29 and inhibition of SIRT1 has been proposed for treating cancer, HIV–1 infections, mental retardation syndrome, and parasitic diseases. 30 Results will also be compared to a MM/GBSA model of Karaman and Sippl 31 for these ligands binding to SIRT1. Although LIE needs experimental data to train the model parameters in Equation 1, their calibration gives access to direct estimates of ∆Gbind for individual (query) compounds. This makes it straightforward to use a Boltzmann–like weighting scheme to combine results of multiple MD simulations starting from different protein conformations and/or ligand binding modes into a single calculation per compound. 32,33 Previously, we successfully used this strategy to compute binding affinities for flexible proteins such as Cytochrome P450s that can bind substrates and/or inhibitors in multiple ways. 33–36 In the current work, we evaluate the potential of this strategy to compute SIRT1 binding free energies by concatenating results of MD simulations starting with binding poses obtained from unsupervised docking and clustering. In addition, we investigate if the Boltzmann–like weighting scheme can identify binding poses as those manually selected by Karaman and Sippl 31 in their MM/GBSA study

5

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 46

as the most probable ones. In a final step, we monitor the contributions from the different terms in the central LIE and MM/PBSA equations to the calculations in (trends in) binding affinity, in order to evaluate the possibility to further increase efficiency and accuracy in computing SIRT1 affinities.

Methods Reference data and input file preparation Reference data for the SIRT1 ligand set (i.e., experimentally determined values ∆Gexp for protein–ligand binding free energies) were taken from Disch et al 37 (Table 1). All 27 compounds have a thieno[3,2–d]pyrimidine–6–carboxamide scaffold, and ∆Gexp ranges between –29.3 and –50.3 kJ mol−1 . Values for ∆Gexp were consistently obtained from the set of IC50 inhibition constants reported by Disch et al., using the Cheng–Prusoff relation 38,39

∆Gexp = RT ln

IC50 2

(6)

where R is the gas constant, T is the temperature, and the concentration of agonist used to measure the IC50 data is assumed to be equal to its EC50 value. 39 Analogous to previous MD studies, 31,40 the SIRT1 crystal structure with PDB ID 4I5I 41 was used as protein template structure for docking and subsequent MD. Along with the protein atom coordinates, the positions of the zinc ion and one of the water molecules in the binding pocket 31 were retained to obtain the template. Protein structure preparation was performed using the QuickPrep function of Molecular Operating Environment (MOE) version 2015.10 42 to add hydrogens, assign protonation states and partial charges, and perform energy minimization prior to docking. During minimization, the protein and zinc ion were described using the Amber14SB force field. 43 To obtain starting structures for ligand binding poses, two alternative strategies were

6

ACS Paragon Plus Environment

Page 7 of 46 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

used: ligand–binding poses were either taken directly from Karaman and Sippl 31 or obtained from our docking. An advantage of using the Karaman poses can be that these were carefully selected for all ligands such that their scaffold showed maximal structural overlap with the scaffold of ligands 11c, 28 and 31, which were co–crystallized in PDB structures 4JSR, 4JT8, and 4JT9 of the homologous SIRT3 protein, respectively, 37 after fitting these onto the 4I5I SIRT1 structure. In this way, the superimposed poses of all ligands resemble the pose of the co–crystallized inhibitors, which suggests that they occupy both the acetyl lysine substrate channel and the nicotinamide C–pocket of SIRT1 that overlaps with the NAD+ binding channel. 31,37 Ligand–binding poses were also generated using the PLANTS (Protein–Ligand Ant System) docking software version 1.2 44 with speed 1 setting and evaluated using the ChemPLP scoring function. 45 Coordinates for the docking center were defined as the average of the center–of–mass of ligands in the Karaman binding poses and a docking radius of 0.8 nm was used. For each ligand, 100 docking poses were generated and subjected to hierarchical clustering with the maximum number of clusters set to 3. For each cluster, a representative pose with the smallest deviation to the average cluster coordinate set was selected as initial structure for MD simulations. Ligand topologies for MD were generated according to the general Amber force field (GAFF) 46 at the AM1–BCC 47 level of semi–empirical theory by using the antechamber 48 and sqm packages of AmberTools15. 49 This topology generation was performed and converted to GROMACS format by using ACPYPE (Rev: 7268). 50

MD simulations Prior to MD, the protein–ligand complex was immersed in a dodecahedral periodic box and solvated in approximately 15300 TIP3P 51 water molecules. Nine Na+ counterions were added to neutralize the net charge of the system, and the system was energy minimized using a steepest descent algorithm. Thermal pre–equilibration and (grid–based) pair–list updates during MD were performed as described previously. 34 The equations of motion were 7

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

integrated using a leap–frog algorithm. 52 Particle mesh Ewald (PME) 53 was used to treat the long–range electrostatic interactions during MD. The cutoff for nonbonded interactions was set to 0.9 nm. Hydrogen atoms were assigned a mass of 4.032 amu 54 and all bonds were constrained using the LINCS algorithm, 55 allowing a time–step of 4 fs. Coordinative bonds with a length of approximately 0.2 nm were added between the Zn2+ ion and sulphur atoms of its coordinating residues (Cys131, Cys134, Cys155, Cys158) to ensure coordination during simulation, and kept constant using the LINCS algorithm as well. A Berendsen thermostat and barostat 56 were used to maintain the system temperature and pressure close to their reference values of 300 K and 1.01325 bar, using coupling constants of 0.1 and 0.5 ps, respectively. All MD stages, including 20 ns production runs in explicit solvent were carried out using GROMACS 4.5.5, 57 in which energy and coordinate trajectories were written out to disk every 10 ps and center–of–mass motion was removed every 10 ps. Every ligand pose was used twice as starting structure of replicated simulations 33 starting from different Maxwell–Boltzmann distributions of randomly assigned atomic velocities. Separate duplicated 20 ns production simulations were performed of the unbound ligands in water in periodic dodecahedral boxes with minimal solute atom–box edge distances of 1.8 nm, using identical settings as in the protein–ligand complex simulations described above. Energy minimization and MD simulations were run in an automated fashion as implemented in eTOX ALLIES , 58 and the trajectories obtained from the production runs were used for subsequent LIE and MM/PBSA calculations as described below.

LIE calculations Average electrostatic and van der Waals ligand–environment interaction energies were obtained by averaging over concatenated duplicated simulations performed per protein–ligand binding pose or per ligand free in solvent. LIE model parameters were calibrated based on experimental ∆Gbind values (Table 1) in an ordinary least square (OLS) fitting procedure using the Python scikit–learn 0.17 package. 35,59 8

ACS Paragon Plus Environment

Page 8 of 46

Page 9 of 46 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Inspired by the ability to deal with flexible proteins and/or possible multiple ligand– binding modes in LIE binding affinity computation for e.g., Cytochrome P450s (CYPs) 32,34,35 or nuclear receptors, 60,61 we report here also SIRT1 LIE models in which results from MD simulations starting from different binding poses are combined into a single ∆Gbind calculation. For this purpose, the weight Wi of the contribution from simulations starting from an individual binding pose i to the binding free energy is calculated as: 32,62 e−∆Gcalc,i /kB T Wi = P −∆G calc,i /kB T ie

(7)

kB is Boltzmann's constant and T is the temperature of the simulated system, and ∆Gcalc,i is the calculated binding free energy for a given binding pose according to Equation 1. The Wi 's can be used to combine results from simulations starting from N different binding poses by using the following Equation:

∆Gbind = α

N X

Wi ∆Vivdw + β

i

N X

Wi ∆Viele + γ

(8)

i

α and β (and the optional offset parameter γ) are again parameterized based on curated experimental values for ∆Gbind , but they need to be trained in an iterative fashion (in which simulation data for multiple binding modes are combined into individual computations) when N > 1, 32 because the Wi 's depend on the LIE model parameter values.

MM/PBSA calculations MM/PBSA models were generated using g mmpbsa, 24,63 by first PBC correcting the MD trajectories. The Zn2+ ion was included as part of the protein in the MM/PBSA calculations. We used a single–trajectory MM/PBSA protocol which makes only use of protein–ligand complex simulations and assumes the protein and ligand conformation in the bound and unbound state to be identical. Thus, in Equation 5 the Vbonded terms as well as the in-

9

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 46

tramolecular nonbonded interaction energies cancel out, 64 resulting in the final equation:

vdW ele ∆Gbind = Vcomplex + Vcomplex + ∆Gpolar + ∆Gnonpolar

(9)

vdW ele with Vcomplex and Vcomplex being the protein–ligand van der Waals and electrostatic interaction

energies, respectively, while

polar polar ∆Gpolar = Gpolar complex − (Gligand + Gprotein )

(10)

nonpolar ∆Gnonpolar = Gnonpolar + Gnonpolar protein ) complex − (Gligand

(11)

and

The Poisson–Boltzmann equation for the polar term is evaluated using the following equation: 10,63,65     4πρf (r) ∇ · ε(r)∇ · ϕ(r) − ε(r)κ(r)2 sinh ϕ(r) + =0 kB T

(12)

where ϕ(r) is the solute’s electrostatic potential, ε(r) is the dielectric constant, and ρf (r) is the fixed charge density. As the calculation of the polar solvation energy is known to depend on the chosen value for the dielectric constant εsolute of the complex, 66 its value was varied in order to inspect its effect on predicted ∆Gbind values. By default, εsolute was set to 2 and tuned to 1, 4, or 8 to examine the sensitivity of calculated binding free energies to the solute dielectric constant. Other default settings in g mmpbsa were left unaltered (i.e., use of atomic radii as proposed by Bondi, 67 linear PB equation solver, 0.05 nm grid resolution, and smoothed van der Waals surface). 24 To estimate nonpolar solvation energies, g mmpbsa assumes a linear dependence of Gnonpolar on the SASA, expressed as follows: 68

Gnonpolar = γsurf SASA + b 10

ACS Paragon Plus Environment

(13)

Page 11 of 46 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

where γsurf is related to the surface tension of the solvent and is set to its default value of 2.26778 kJ mol−1 nm−2 , while a default value (3.84982 kJ mol−1 ) was also used for fitting parameter b. A solvent probe radius of 0.14 nm was employed to determine the SASA. 24 Due to its high computational demand and large standard error compared to other contributing terms, 6,20,31,66,69–72 the entropic term was not included in the MM/PBSA model.

Protein–ligand interaction profiling To analyze protein–ligand interactions that occurred during the MD simulations, we built a dedicated Python–based biomolecular analysis library MDInteract, which we made freely available at https://github.com/MD-Studio/MDInteract. MDInteract adds structure analysis methods to the API of the popular pandas data analysis toolkit. 73 MD trajectory and structure file formats are imported using the MDTraj library. 74 MD trajectories were analyzed for the following biomolecular interaction types at a 100 ps resolution similar to previous studies performed by us: 35,61,75 • Hydrophobic interactions: all contacts between carbon atoms within 0.4 nm having only C or H atoms as neighbors. Interacting pairs from the same source atom to multiple target atoms in the same residue, in both directions, were filtered to retain only the pair with the smallest distance. Note that atoms involved in π– or T–stacking were separately considered (see below). • π– and T–stacking: identified as the geometric centers of two aromatic rings within 0.55 nm, with an offset no more than 0.2 nm after projection of the geometric center of one ring onto the other and an angle between the normals of both rings of 180◦ ± 30◦ for π–stacking or 90◦ ± 30◦ for T–stacking. 76 Aromatic rings were detected using a SSSR algorithm (Smallest Set of Smallest Rings) filtering for aromatic planar rings. • Hydrogen bonds (H–bond): identified as all interactions between H–bond donor–acceptor pairs within 0.41 nm where the donor – H – acceptor angle was within 50◦ from the 11

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ideal in–plane 180◦ orientation and the heavy atom acceptor neighbor – acceptor – H angle did not deviate more than 90◦ . 77 Donor atom SYBYL types: N.3, N.2, N.acid, N.ar, N.am, N.4, N.pl3, N.plc and O.3 with at least one attached hydrogen. Acceptor atom SYBYL types: N.3, N.2, N.1, N.acid, N.ar, O.3, O.co2, O.2, S.m and S.a not of a donor character. H–bonds part of a salt–bridge were labeled as such by the salt bridge detection method. • Salt bridges: identified as the interaction between the heavy atoms representing the geometric centers of two charged groups of opposite signs within 0.55 nm distance. 78 The H–bond component of the salt bridge was added by the H–bond detection method. • Cation–π interaction: identified as the interaction between the center of an aromatic ring and a center of positive charge if the distance was within 0.6 nm and the offset between charge center and ring center after projection on the same plane was less than 0.2 nm. In case the charge center was a tertiary amine the angle between charge center and ring plane should be within 90◦ ± 30◦ . 79 • Halogen bonds: identified as pairs of halogen bond donor (I, Br, Cl, F) and acceptor atoms (O, N, S having a single atom neighbor of type P, C or S) within 0.41 nm where the donor angle (C-X – O) was within 165◦ ± 30◦ and the acceptor angle (Y-O – X) was within 120◦ ± 30◦ . 80 The frequency of interactions per ligand pose to interacting amino acid residues were normalized by the number of replicates (i.e., by dividing interaction counts by two) and were represented as heat map using the seaborn Python package version 0.8. 81 Ligand–residue interaction counts that are less than 20 were discarded for further analysis.

12

ACS Paragon Plus Environment

Page 12 of 46

Page 13 of 46 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Results and discussion LIE and MM/PBSA models Tables 2 and 3 summarize the predictive ability of our LIE and MM/PBSA models, which was evaluated in terms of obtained deviations from and correlation with experimental data. The latter has been assessed by calculating r2 , Pearson's r (rP ), and Spearman's ρ (ρS ) coefficients. 82 For the calibrated LIE models, we also report correlation coefficients q 2 obtained from leave–one–out cross–validation (LOO–CV). Deviations of computed ∆Gbind values from experiment are analyzed in terms of root–mean–square errors (RMSEs) and standard deviations of error in prediction based on LOO–CV (SDEPs). In Tables 2 and 3, RMSEs and SDEPs are only reported for our LIE models because the MM/PBSA calculations do not allow to directly compare predicted and experimental ∆Gbind values for individual ligands. 66,83–86 This is exemplified by the range of predicted values obtained from one of the MM/PBSA models discussed below, as presented in Figure 1 (right panel). When calibrating a LIE model using per ligand the binding pose proposed by Karaman and Sippl 31 as (single) starting pose for MD, we obtain a model (LIEsingle in Table 2) with RMSE = 4.21 kJ mol−1 , which is within the typical experimental uncertainty of 4–5 kJ mol−1 . 87,88 Overall this suggests satisfactory performance of the model parameters used. Including the optional offset γ parameter did not improve model performance further: the resulting LIEγ=−10.24 model (with γ = –10.24 kJ mol−1 ) showed a slight reduction in RMSE and SDEP of 0.21 and 0.12 kJ mol−1 , respectively, but it did not show an increase in correlation coefficients compared to LIEsingle (Table 2). Figure 1 illustrates that LIEsingle shows a slightly better correlation between calculated and experimental ∆Gbind values than the MM/PBSA model in which εsolute is set to its default g mmpbsa value of 2 (MM/PBSAε=2 in Table 3). This is confirmed by the slightly higher r2 (0.52 vs. 0.41) and Pearson’s and Spearman’s coefficients (0.72 vs. 0.64 or 0.61) of LIEsingle compared to MM/PBSAε=2 , Tables 2 and 3. It should be kept in mind however that α and β in the LIE equations are fitted

13

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

parameters. q 2 of LIEsingle (0.35) is in fact slightly lower than r2 for MM/PBSAε=2 (0.41), and the value of –0.04 for β hints on possible overfitting. As discussed in more detail below this could be resolved by constraining β to zero in a separate LIEβ=0 model, which has a q 2 value of 0.42 (Table 2). In accordance with previous findings from MM/PBSA studies on other protein systems, 66 we found a clear effect of εsolute on obtained correlations in the MM/PBSA models. Setting its value to 1 or 4 led to slightly and significantly lower correlation coefficients, respectively, while using εsolute =8 further decreased the quality of the model when compared to MM/PBSAε=2 (Table 3). Whereas values of 6–7 or higher have been reported as average estimate for dielectric permittivities of protein binding pockets, 89 εsolute is often set to 1 or 2. 24,90 Our finding that the most successful models of MM/PBSA in predicting SIRT1–ligand binding use a low value of εsolute is in line with the hydrophobic acetyl–lysine binding interface in SIRT1: 91 as discussed by Karaman et al, 31 the binding site of all inhibitors is characterized by six hydrophobic (Ala22, Ile30, Phe33, Phe57, Ile107, Phe174) and two charged side chains (Asp108, His123). Our MM/PBSAε=2 model performs similarly as one of the MM/GBSA models presented by Karaman and Sippl, 31 who used a εsolute value of 1 and reported r2 values ranging between 0.4 and 0.6 in their MM/GBSA study on SIRT binding. In addition, the similar performance of LIEsingle and MM/PBSAε=2 is in line with earlier work of Uciechowska et al. who found comparable performance of LIE and MM/PBSA in reproducing experimental data for a smaller set of SIRT2 binders. 72

Inclusion of multiple binding poses In a next step, we investigated if we could use the iterative LIE scheme to improve calculations and to develop a binding affinity model in an unsupervised manner. For that purpose, multiple binding poses were directly obtained from docking and clustering. These were used as starting structures in MD simulations i in Equation 8. The resulting LIEmult 14

ACS Paragon Plus Environment

Page 14 of 46

Page 15 of 46 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

model showed significantly higher RMSE and lower correlation coefficients than the LIEsingle model, Table 2. We analyzed if this decrease in predictive quality was due to either a lack of correct MD starting poses in the LIEmult model or the fact that the protein does not bind its ligands in multiple orientations (or due to a combination of both). We first explored the interactions between the ligands and SIRT1 residues during simulations used in the LIEsingle and LIEmult models. The obtained interaction heat maps are depicted in Figure 2. They show high frequencies in three ligand–SIRT1 hot–spot interactions that were present in the starting poses 31 and persistent during the simulations used in the LIEsingle and MM/PBSA models (i.e., with Phe33, Ile107 and Asp108, Figure 3). Figure 2 shows that these frequencies are significantly lower for most of the LIEmult simulations, especially those involving interactions with Asp108. This finding indicates that the unsupervised selection of starting poses for MD leads to a larger spread in protein–ligand interactions during simulation than in the LIEsingle model. For most ligands we found a relatively large root–mean–square deviation (RMSD) in atomic positions between the MD starting poses used in LIEmult and the LIEsingle pose: for only 8 of the ligands at least one of the three docked starting poses showed an RMSD of less than 0.1 nm, Table S1. Only when combining results of the MD simulations initiated from the starting poses of LIEsingle with those from LIEmult we could train a combined model LIEcomb with similar accuracy and correlation coefficients (and α and β) as LIEsingle , Table 2. These findings indicate that docking and clustering for the purpose of LIEmult training did not lead to sufficiently appropriate binding poses, while the retained predictive quality of the LIEcomb model (compared to LIEsingle ) implies that incorporating too many binding poses does not necessarily lead to lower predictive quality. Thus, when being able to generate and select appropriate binding poses from docking and/or experimental information on protein–ligand interactions, iterative LIE training can subsequently be performed in an unsupervised manner. Here we evaluated if Boltzmann–like weighting (Equation 7) within the LIEcomb model predicted the manually prepared binding poses of Karaman and Sippl as most probable,

15

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

by inspecting per ligand the starting pose used in the MD simulation with highest weight Wi (Figure 4). We found that for 22 of the 27 ligands, the simulation starting from the binding pose used in the LIEsingle model shows the largest weight Wi . For the five ligands for which the largest weight comes from a simulation starting from one of the docked poses used in LIEmult (19, 20, 21, 27, 38), we inspected the corresponding pose (Figure 5) and discovered that the scaffolds of 3 out of 5 ligands (19, 20, 21) show clear structural overlap with the ligand pose used for LIEsingle (cf. Figure 5 and the relatively low RMSD in Table S1). Interestingly, docked starting poses of ligands 27 (pose 1) and 38 (pose 1) for the simulations with highest weight Wi in the LIEcomb model show less overlap (Figure 5) and higher RMSDs (Table S1), but they still share similar interaction profiles with the simulations used for these ligands in LIEsingle , Figure 2. This highlights that similarity in protein–ligand interactions may well be a better measure for comparable modes and/or affinities of binding than structural overlap alone, in agreement with results from previous studies in which we used iterative LIE to evaluate binding to flexible proteins such as CYPs or nuclear receptor FXR. 34,35,61 This is further exemplified by MD simulations starting from poses which show relatively high contributions to LIEcomb calculations such as in case of pose 1 of compound 26, which does not have the lowest RMSD of poses 1–3 when compared to the LIEsingle starting pose (Table S1) but it does show a similar pattern of protein–ligand interactions, Figure 2. Subsequently, we trained a LIE model using per ligand the simulation with highest weight Wi from LIEcomb and we found that this LIEBoltzmann4 model performs similar as LIEsingle and LIEcomb (Table 2), confirming that the latter does not utilize too many different MD starting poses.

Impact of simulation length We also explored the effect of simulation length on ∆Gbind computations by the LIEsingle and MM/PBSAε=2 models. For that purpose, models were developed based on averages for the interaction energy and solvation terms over either the first or the last 1, 5, 10 or 15 ns 16

ACS Paragon Plus Environment

Page 16 of 46

Page 17 of 46 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

of the production simulations starting from the Karaman and Sippl poses (and using ε=2 in the MM/PBSA models). The performance of these models is listed in Tables 2 and 3 (using subscripts f irstXns or lastXns) and was compared to the models based on 20–ns averages as described above. MM/PBSA studies in literature often report simulations of tens of ns from which the last few ns are used to compute averages for the different terms in Equation 5 or 9. 64 Our analysis shows that for the current system of interest, simulation lengths of 5 ns (in the MM/PBSAf irst5ns model) are sufficient to obtain comparable correlation coefficients as for MM/PBSAε=2 . However, by running longer simulation times and taking the last 1 or 5 ns to average over gives a slight deterioration in model performance (cf. MM/PBSAlast1ns and MM/PBSAlast5ns in Table 3, respectively). It is also worth mentioning that 1 ns LIE simulations are sufficient to achieve a model (LIEf irst1ns ) that shows even slightly better performance compared to LIEsingle , Table 2. The finding that shorter simulation lengths can improve LIE calculations is in line with previous results in iterative models that showed improved performance when restricting sampling to local parts of conformational space, 62,92 and it can be seen as a clear advantage of LIE in terms of its efficiency. It prompted us to generate a LIE model (LIEequil using the output of our thermal equilibration simulation at 300 K only, but this showed in turn a slightly higher RMSE (4.50 kJ mol−1 ) and lower rP (0.67) than for LIEsingle and LIEf irst1ns , Table 2. The reason that longer equilibration may not necessarily lead to improved correlation for the MM/PBSA model (which is in line with previous findings in literatures 66,93 ) and that longer simulations are required for MM/PBSA than for LIE is due to the relatively slow convergence of the MM/PBSA polar term. 90,94 This is illustrated in Figure 6 which shows an example of typically observed time series of the different MM/PBSA and LIE terms. In conclusion, longer equilibration times do not necessarily lead to an improved agreement with experimental data which makes the selection of the optimal simulation time to be an effective model parameter.

17

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Contributions of different LIE and MM/PBSA terms to the ∆Gbind calculations In the LIE models discussed above, β is close or practically equal to zero, Table 2. This indicates that electrostatic interactions do not significantly contribute to trends in binding affinity, which can be in line with the hydrophobic character of the SIRT1 binding pocket. 91 In a previous LIE study on a protein with hydrophobic active site (i.e., Cytochrome P450 1A2 or CYP1A2), fitted values for β were similarly small and constrained to zero in the final presented model. 95 As for CYP1A2, a SIRT1 LIEβ=0 model with β set to zero shows nearly identical performance as LIEsingle when considering RMSEs and correlation metrics, and a slightly lower SDEP (and higher q 2 ), Table 2. The low contribution of electrostatic interactions to predicted trends in binding affinity can be understood from the relatively low absolute values for ∆V ele obtained from MD (Table S2), suggesting that hot–spot and other hydrogen–bonding interactions with SIRT1 residues compensate for the ligand–solvent interactions in the unbound state. As a result, training a LIE model with α set to zero (and using only β as fitting parameter) is of no meaning at all, cf. LIEα=0 in Table 2. Similarly as for LIE, we tried to simplify the MM/PBSA models by only taking van der Waals interactions or the nonpolar/SASA term into account. Table 3 shows that models vdW that only include the van der Waals Vcomplex interactions (MM/PBSAvdW , rP = 0.77) or the

SASA ∆Gnonpolar term in Equation 9 (MM/PBSASASA , rP = 0.76) show higher predictivity ele than models based on electrostatic Vcomplex interactions (rP = 0.43) or the polar ∆Gpolar term

only (rP = –0.60), and even than the overall MM/PBSAε=2 model. Our finding that LIE and MM/PBSA models with reduced number of model terms can give similar or slightly improved performance would make it of interest to explore the use of extended LIE approaches such as the one recently proposed by He and co–workers, 96 in which MM/PBSA–like terms are included together with six fitting parameters. Based on our current findings, this number may also be effectively reduced in a model for ligand binding to SIRT1. In addition to the small electrostatic contribution to trends in binding to the hydrophobic SIRT1 pocket 18

ACS Paragon Plus Environment

Page 18 of 46

Page 19 of 46 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(Table S2), the reason that neglecting electrostatics and polar contributions can improve our MM/PBSA predictions can be explained by the relatively large uncertainty in determining ele and especially ∆Gpolar , cf. Figure 6. The difficulty to reach convergence in the Vcomplex

latter can be due to fluctuations in contributions not only from the protein binding site but also from distant protein residues that may be less relevant for ligand binding. 22 The uncertainty in determining ∆Gpolar was also highlighted in a previous benchmarking study on g mmpbsa, 24 in which it was observed that ∆Gpolar is more prone to be different (by >20 kJ mol−1 ) between results of g mmpbsa and AMBER simulation tools 97,98 than the other MM/PBSA terms. This was attributed to the different algorithms implemented in APBS 63 (used in g mmpbsa) and the PBSA package of AMBER, and/or subtle differences in the implementation of the AMBER force field used in AMBER and GROMACS. As the MM/PBSA method only considers protein–ligand interactions and MM/PBSAvdw was found to be best performing of the MM/PBSA models in Table 3, we conducted further investigation on a LIE model with β set to zero and in which ∆V vdw in Equation 1 is replaced

vdw by Vlig−surr . This LIE model is annotated as vdwbound in Table 2 and thus considers bound van der Waals interaction energies in the protein–ligand bound state only. It shows the highest accuracy of the LIE models presented in Table 2, with RMSE = 3.79 kJ mol−1 and rP = 0.76. Interestingly, when calibrating a LIE model vdwunbound with β = 0 and

vdw ∆V vdw in Equation 1 replaced by Vlig−surr , comparable performance was obtained unbound as for e.g., LIEsingle , Table 2. Comparable correlation coefficients could even be obtained by considering ligand SASAs only (for which r2 = 0.43 and rP = 0.66). In this particular case, binding prediction can apparently be made highly efficient by considering ligand simulations or properties only. This is probably again due to the dominance of nonpolar ligand–protein contacts in determining trends in binding affinity, but we note that the most predictive LIE and MM/PBSA models are the ones based on protein–ligand van der Waals interactions (Tables 2 and 3), as exemplified e.g. by the high q 2 value for the vdwbound LIE model.

19

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Conclusions By comparing fitted LIE models for a set of 27 compounds binding to SIRT1 with single– trajectory based MM/PBSA models, we find calculated ∆Gbind values from our LIE models with a RMSE of 4.21 kJ mol−1 while the obtained correlations are comparable (Pearson's r = 0.72 and 0.64 for LIEsingle and MM/PBSAε=2 , respectively). This indicates that if agreement with experimental affinities in absolute terms is not of interest, both methods have a comparable predictive quality. We also found that longer equilibration may not necessarily lead to improved correlation with experiment and that longer simulations are required for MM/PBSA than for LIE due to the relatively slow convergence of the MM/PBSA polar term. In addition to the higher cost in computing this term when compared to the terms in the central LIE equation, this makes LIE more efficient. For the studied protein and ligands, calibration of an iterative LIE model (that uses weighted contributions from multiple simulations starting from different ligand–binding poses) did not lead to satisfactory computations when generating starting poses for MD simulations in unsupervised docking and clustering. During these simulations we found relatively low frequencies in ligand interactions with hot–spot protein residues, as analyzed using our Python library MDInteract. In other words, careful selection of proper MD starting poses can be crucial in generating iterative LIE models. Encouragingly, when combining results from simulations starting from the docked poses and from poses based on homologous crystal structure data (in the LIEcomb model), Boltzmann–like weighting of the multiple simulations was able to identify appropriate starting configurations. For our simulations with starting structures obtained in a supervised way, we found relatively small values for the difference ∆V ele in electrostatic interactions in the bound and unbound state. Apparently, ligand interactions with hot–spot residues can largely compensate for the loss in polar interactions with the solvent upon binding. As a result, LIE parameter β was fitted to values close to zero. By constraining it to zero we could obtain a model with reduced overfitting compared to models in which β was used as free parameter. 20

ACS Paragon Plus Environment

Page 20 of 46

Page 21 of 46 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

This was illustrated by a lower SDEP and higher q 2 for the former model in leave–one–out cross–validation. We found that the correlation between calculated and experimental ∆Gbind values could be maximized by including protein–ligand van der Waals interactions only (i.e., vdW either in a MM/PBSA model with Vcomplex in Equation 9 as only term, or in a LIE model

with α as only free parameter and which only includes results from the bound–complex simulations). On the other hand, our models could be made most efficient by only including ligand properties, e.g., in a SASA–only based model or in a LIE model that only includes van der Waals interaction energies involving the unbound ligand. The finding that predictive models could be obtained in terms of van der Waals interactions only for this specific protein is in line with the hydrophobic character of its binding site and with the uncertainty in determining the Gpolar MM/PBSA term.

Acknowledgement Dr. Berin Karaman (Biruni University) and Prof. Wolfgang Sippl (Martin–Luther–University of Halle–Wittenberg) are gratefully acknowledged for sharing their SIRT1–ligand complex structures with us. We thank The Netherlands Organization for Scientific Research (NWO, VIDI grant 723.012.105) for financial support, and EAR gratefully acknowledges financial support from Indonesia Endowment Fund for Education (LPDP), Ministry of Finance, Republic of Indonesia.

Supporting Information Available The Supporting Information is available free of charge on the ACS Publications website at DOI: Root–mean–square deviations (RMSDs) of ligand–atom positions between starting poses used in the three simulations per ligand in the LIEmult model and the starting pose of the LIEsingle simulations (Table S1), ∆V vdw and ∆V ele values in kJ mol−1 per ligand as used in 21

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

the LIEsingle model (Table S2) (PDF) Ligand force field parameters: GROMACS include topology (.itp) files per ligand (using ligand IDs from Table 1 for file nomenclature) and a separate atom type file (attype.itp) to specify van der Waals parameters used in the GROMACS simulations (ZIP) This material is available free of charge via the Internet at http://pubs.acs.org/.

References (1) Murcko, M. A. Computational Methods to Predict Binding Free Energy in LigandReceptor Complexes. J. Med. Chem. 1995, 38, 4953–4967. (2) Zwanzig, R. W. High-Temperature Equation of State by a Perturbation Method. I. Nonpolar Gases. J. Chem. Phys. 1954, 22, 1420–1426. (3) Kirkwood, J. G. Statistical Mechanics of Fluid Mixtures. J. Chem. Phys. 1935, 3, 300–313. (4) Kitchen, D. B.; Decornez, H.; Furr, J. R.; Bajorath, J. Docking and Scoring in Virtual Screening for Drug Discovery: Methods and Applications. Nat. Rev. Drug Discovery 2004, 3, 935. (5) Shoichet, B. K.; McGovern, S. L.; Wei, B.; Irwin, J. J. Lead Discovery Using Molecular Docking. Curr. Opin. Chem. Biol. 2002, 6, 439–446. (6) Genheden, S.; Ryde, U. The MM/PBSA and MM/GBSA Methods to Estimate LigandBinding Affinities. Expert Opin. Drug Discovery 2015, 10, 449–461. (7) Amaro, R. E.; Baron, R.; McCammon, J. A. An Improved Relaxed Complex Scheme for Receptor Flexibility in Computer-Aided Drug Design. J. Comput.-Aided Mol. Des. 2008, 22, 693–705.

22

ACS Paragon Plus Environment

Page 22 of 46

Page 23 of 46 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(8) de Ruiter, A.; Oostenbrink, C. Free Energy Calculations of Protein–Ligand Interactions. Curr. Opin. Chem. Biol. 2011, 15, 547–552. (9) ˚ Aqvist, J.; Medina, C.; Samuelsson, J.-E. A New Method for Predicting Binding Affinity in Computer-Aided Drug Design. Protein Eng., Des. Sel. 1994, 7, 385–391. (10) Srinivasan, J.; Cheatham, T. E.; Cieplak, P.; Kollman, P. A.; Case, D. A. Continuum Solvent Studies of the Stability of DNA, RNA, and Phosphoramidate- DNA Helices. J. Am. Chem. Soc. 1998, 120, 9401–9409. Aqvist, J.; Hansson, T. On the Validity of Electrostatic Linear Response in Polar (11) ˚ Solvents. J. Phys. Chem. 1996, 100, 9512–9521. (12) Hansson, T.; Marelius, J.; ˚ Aqvist, J. Ligand Binding Affinity Prediction by Linear Interaction Energy Methods. J. Comput.-Aided Mol. Des. 1998, 12, 27–35. (13) Wang, W.; Wang, J.; Kollman, P. A. What Determines the van der Waals Coefficient β in the LIE (Linear Interaction Energy) Method to Estimate Binding Free Energies Using Molecular Dynamics Simulations? Proteins: Struct., Funct., Bioinf. 1999, 34, 395–402. (14) Alml¨of, M.; Carlsson, J.; ˚ Aqvist, J. Improving the Accuracy of the Linear Interaction Energy Method for Solvation Free Energies. J. Chem. Theory Comput. 2007, 3, 2162– 2175. (15) Guti´errez-de Ter´an, H.; ˚ Aqvist, J. Computational drug discovery and design; Springer, 2012; pp 305–323. (16) Valiente, P. A.; Gil L, A.; Batista, P. R.; Caffarena, E. R.; Pons, T.; Pascutti, P. G. New Parameterization Approaches of the LIE Method to Improve Free Energy Calculations of PlmII-Inhibitors Complexes. J. Comput. Chem. 2010, 31, 2723–2734.

23

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(17) Carlson, H. A.; Jorgensen, W. L. An Extended Linear Response Method for Determining Free Energies of Hydration. J. Phys. Chem. 1995, 99, 10667–10673. (18) Wall, I. D.; Leach, A. R.; Salt, D. W.; Ford, M. G.; Essex, J. W. Binding Constants of Neuraminidase Inhibitors: an Investigation of the Linear Interaction Energy Method. J. Med. Chem. 1999, 42, 5142–5152. (19) Wang, C.; Greene, D.; Xiao, L.; Qi, R.; Luo, R. Recent Developments and Applications of the MMPBSA Method. Front. Mol. Biosci. 2018, 4, 87. ˇ a, N.; Cheatham, T. E.; Ryj´aek, F.; Lankaˇs, F.; Van Meervelt, L.; Hobza, P.; (20) Spakov´ ˇ Sponer, J. Molecular Dynamics Simulations and Thermodynamics Analysis of DNADrug Complexes. Minor Groove Binding Between 4, 6-Diamidino-2-Phenylindole and DNA Duplexes in Solution. J. Am. Chem. Soc. 2003, 125, 1759–1769. (21) Lepˇs´ık, M.; Kˇr´ıˇz, Z.; Havlas, Z. Efficiency of a Second-Generation HIV-1 Protease Inhibitor Studied by Molecular Dynamics and Absolute Binding Free Energy Calculations. Proteins: Struct., Funct., Bioinf. 2004, 57, 279–293. (22) Gilson, M. K.; Zhou, H.-X. Calculation of Protein-Ligand Binding Affinities. Annu. Rev. Biophys. Biomol. Struct. 2007, 36, 21–42. (23) Kollman, P. A.; Massova, I.; Reyes, C.; Kuhn, B.; Huo, S.; Chong, L.; Lee, M.; Lee, T.; Duan, Y.; Wang, W.; Donini, O.; Cieplak, P.; Srinivasan, J.; Case, D. A.; Cheatham, T. E. Calculating Structures and Free Energies of Complex Molecules: Combining Molecular Mechanics and Continuum Models. Acc. Chem. Res. 2000, 33, 889–897. (24) Kumari, R.; Kumar, R.; Open Source Drug Discovery Consortium,; Lynn, A. g mmpbsa–A GROMACS Tool for High-Throughput MM-PBSA Calculations. J. Chem. Inf. Model. 2014, 54, 1951–1962.

24

ACS Paragon Plus Environment

Page 24 of 46

Page 25 of 46 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(25) Still, W. C.; Tempczyk, A.; Hawley, R. C.; Hendrickson, T. Semianalytical Treatment of Solvation for Molecular Mechanics and Dynamics. J. Am. Chem. Soc. 1990, 112, 6127–6129. (26) Shirts, M. R.; Mobley, D. L.; Brown, S. P. Drug design: structure-and ligand-based approaches; Cambridge University Press New York, 2010; pp 61–86. (27) Genheden, S.; Ryde, U. Comparison of the Efficiency of the LIE and MM/GBSA Methods to Calculate Ligand-Binding Energies. J. Chem. Theory Comput. 2011, 7, 3768– 3778. (28) Feig, M.; Onufriev, A.; Lee, M. S.; Im, W.; Case, D. A.; Brooks III, C. L. Performance Comparison of Generalized Born and Poisson Methods in the Calculation of Electrostatic Solvation Energies for Protein Structures. J. Comput. Chem. 2004, 25, 265–284. (29) Guarente, L. Sirtuins in Aging and Disease. Cold Spring Harbor Symp. Quant. Biol. 2007; pp 483–488. (30) Villalba, J. M.; Alca´ın, F. J. Sirtuin Activators and Inhibitors. BioFactors 2012, 38, 349–359. (31) Karaman, B.; Sippl, W. Docking and Binding Free Energy Calculations of Sirtuin Inhibitors. Eur. J. Med. Chem. 2015, 93, 584–598. (32) Stjernschantz, E.; Oostenbrink, C. Improved Ligand-Protein Binding Affinity Predictions Using Multiple Binding Modes. Biophys. J. 2010, 98, 2682–2691. (33) Peri´c-Hassler, L.; Stjernschantz, E.; Oostenbrink, C.; Geerke, D. P. CYP 2D6 Binding Affinity Predictions Using Multiple Ligand and Protein Conformations. Int. J. Mol. Sci. 2013, 14, 24514–24530.

25

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(34) Capoferri, L.; Verkade-Vreeker, M. C.; Buitenhuis, D.; Commandeur, J. N. M.; Pastor, M.; Vermeulen, N. P. E.; Geerke, D. P. Linear Interaction Energy Based Prediction of Cytochrome P450 1A2 Binding Affinities with Reliability Estimation. PLoS One 2015, 10, e0142232. (35) van Dijk, M.; Ter Laak, A. M.; Wichard, J. D.; Capoferri, L.; Vermeulen, N. P. E.; Geerke, D. P. Comprehensive and Automated Linear Interaction Energy Based BindingAffinity Prediction for Multifarious Cytochrome P450 Aromatase Inhibitors. J. Chem. Inf. Model. 2017, 57, 2294–2308. (36) Vosmeer, C. R.; Pool, R.; van Stee, M.; Peri´c-Hassler, L.; Vermeulen, N. P. E.; Geerke, D. P. Towards Automated Binding Affinity Prediction Using an Iterative Linear Interaction Energy Approach. Int. J. Mol. Sci. 2014, 15, 798–816. (37) Disch, J. S.; Evindar, G.; Chiu, C. H.; Blum, C. A.; Dai, H.; Jin, L.; Schuman, E.; Lind, K. E.; Belyanskaya, S. L.; Deng, J.; Coppo, F.; Aquilani, L.; Graybill, T. L.; Cuozzo, J. W.; Lavu, S.; Mao, C.; Vlasuk, G. P.; Perni, R. B. Discovery of Thieno [3, 2-d] Pyrimidine-6-Carboxamides as Potent Inhibitors of SIRT1, SIRT2, and SIRT3. J. Med. Chem. 2013, 56, 3666–3679. (38) Yung-Chi, C.; Prusoff, W. H. Relationship Between the Inhibition Constant (KI ) and the Concentration of Inhibitor Which Causes 50 Per Cent Inhibition (I50) of an Enzymatic Reaction. Biochem. Pharmacol. 1973, 22, 3099–3108. (39) Lazareno, S.; Birdsall, N. Estimation of Competitive Antagonist Affinity from Functional Inhibition Curves Using the Gaddum, Schild and Cheng-Pruso´ıf Equations. Br. J. Pharmacol. 1993, 109, 1110–1119. (40) Andika, A.; Erlina, L.; Azminah, A.; Yanuar, A. Molecular Dynamics Simulation of SIRT1 Inhibitor from Indonesian Herbal Database. J. Young Pharm. 2018, 10, 3.

26

ACS Paragon Plus Environment

Page 26 of 46

Page 27 of 46 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(41) Zhao, X.; Allison, D.; Condon, B.; Zhang, F.; Gheyi, T.; Zhang, A.; Ashok, S.; Russell, M.; MacEwan, I.; Qian, Y.; Jamison, J. A.; Luz, J. G. The 2.5 ˚ A Crystal Structure of the SIRT1 Catalytic Domain Bound to Nicotinamide Adenine Dinucleotide (NAD+) and an Indole (EX527 Analogue) Reveals a Novel Mechanism of Histone Deacetylase Inhibition. J. Med. Chem. 2013, 56, 963–969. (42) Molecular Operating Environment (MOE) Version 2015.10, Chemical Computing Group Inc., Canada. (43) Maier, J. A.; Martinez, C.; Kasavajhala, K.; Wickstrom, L.; Hauser, K. E.; Simmerling, C. ff14SB: Improving the Accuracy of Protein Side Chain and Backbone Parameters from ff99SB. J. Chem. Theory Comput. 2015, 11, 3696–3713. (44) Korb, O.; St¨ utzle, T.; Exner, T. E. An Ant Colony Optimization Approach to Flexible Protein–Ligand Docking. Swarm Intell. 2007, 1, 115–134. (45) Korb, O.; Stutzle, T.; Exner, T. E. Empirical Scoring Functions for Advanced ProteinLigand Docking with PLANTS. J. Chem. Inf. Model. 2009, 49, 84–96. (46) Wang, J.; Wolf, R. M.; Caldwell, J. W.; Kollman, P. A.; Case, D. A. Development and Testing of a General Amber Force Field. J. Comput. Chem. 2004, 25, 1157–1174. (47) Jakalian, A.; Bush, B. L.; Jack, D. B.; Bayly, C. I. Fast, Efficient Generation of HighQuality Atomic Charges. AM1-BCC Model: I. Method. J. Comput. Chem. 2000, 21, 132–146. (48) Wang, J.; Wang, W.; Kollman, P. A.; Case, D. A. Automatic Atom Type and Bond Type Perception in Molecular Mechanical Calculations. J. Mol. Graphics Modell. 2006, 25, 247–260. (49) Case, D. A.; Berryman, J. T.; Betz, R. M.; Cerutti, D. S.; Cheatham III, T. E.; Darden, T. A.; Duke, R. E.; Giese, T. J.; Gohlke, H.; Goetz, A. W.; Homeyer, N.; Izadi, S.; 27

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Janowski, P.; Kaus, J.; Kovalenko, A.; Lee, T. S.; LeGrand, S.; Li, P.; Luchko, T.; Luo, R.; Madej, B.; Merz, K. M.; Monard, G.; Needham, P.; Nguyen, H.; Nguyen, H. T.; Omelyan, I.; Onufriev, A.; Roe, D. R.; Roitberg, A.; Salomon-Ferrer, R.; Simmerling, C. L.; Smith, W.; Swails, J.; Walker, R. C.; Wang, J.; Wolf, R. M.; Wu, X.; York, D. M.; Kollman, P. A. AMBER 2015 ; University of California, San Francisco, 2015. (50) da Silva, A. W. S.; Vranken, W. F. ACPYPE-Antechamber Python Parser Interface. BMC Res. Notes 2012, 5, 367. (51) Jorgensen, W. L.; Madura, J. D. Quantum and Statistical Mechanical Studies of Liquids. 25. Solvation and Conformation of Methanol in Water. J. Am. Chem. Soc. 1983, 105, 1407–1413. (52) Hockney, R. W.; Eastwood, J. W. Computer Simulation Using Particles; CRC Press, 1988. (53) Darden, T.; York, D.; Pedersen, L. Particle Mesh Ewald: An N log (N) Method for Ewald Sums in Large Systems. J. Chem. Phys. 1993, 98, 10089–10092. (54) Feenstra, K. A.; Hess, B.; Berendsen, H. J. Improving Efficiency of Large Time-Scale Molecular Dynamics Simulations of Hydrogen-Rich Systems. J. Comput. Chem. 1999, 20, 786–798. (55) Hess, B.; Bekker, H.; Berendsen, H. J.; Fraaije, J. G. LINCS: a Linear Constraint Solver for Molecular Simulations. J. Comput. Chem. 1997, 18, 1463–1472. (56) Berendsen, H. J.; Postma, J.; van Gunsteren, W. F.; DiNola, A.; Haak, J. Molecular Dynamics with Coupling to an External Bath. J. Chem. Phys. 1984, 81, 3684–3690. (57) Pronk, S.; P´all, S.; Schulz, R.; Larsson, P.; Bjelkmar, P.; Apostolov, R.; Shirts, M. R.; Smith, J. C.; Kasson, P. M.; van der Spoel, D.; Hess, B.; Lindahl, E. GROMACS 4.5: 28

ACS Paragon Plus Environment

Page 28 of 46

Page 29 of 46 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

a High-Throughput and Highly Parallel Open Source Molecular Simulation Toolkit. Bioinformatics 2013, 29, 845–854. (58) Capoferri, L.; van Dijk, M.; Rustenburg, A. S.; Wassenaar, T. A.; Kooi, D. P.; Rifai, E. A.; Vermeulen, N. P. E.; Geerke, D. P. eTOX ALLIES: an Automated PipeLine for Linear Interaction Energy-Based Simulations. J. Cheminf. 2017, 9, 58. (59) Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; Vanderplas, J.; Passos, A.; Courna´ Scikit-learn: Machine Learning in peau, D.; Brucher, M.; Perrot, M.; Duchesnay, E. Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. (60) Van Lipzig, M. M.; Ter Laak, A. M.; Jongejan, A.; Vermeulen, N. P. E.; Wamelink, M.; Geerke, D. P.; Meerman, J. H. Prediction of Ligand Binding Affinity and Orientation of Xenoestrogens to the Estrogen Receptor by Molecular Dynamics Simulations and the Linear Interaction Energy Method. J. Med. Chem. 2004, 47, 1018–1030. (61) Rifai, E. A.; van Dijk, M.; Vermeulen, N. P. E.; Geerke, D. P. Binding Free Energy Predictions of Farnesoid X Receptor (FXR) Agonists Using a Linear Interaction Energy (LIE) Approach with Reliability Estimation: Application to the D3R Grand Challenge 2. J. Comput.-Aided Mol. Des. 2018, 32, 239–249. (62) Hritz, J.; de Ruiter, A.; Oostenbrink, C. Impact of Plasticity and Flexibility on Docking Results for Cytochrome P450 2D6: a Combined Approach of Molecular Dynamics and Ligand Docking. J. Med. Chem. 2008, 51, 7469–7477. (63) Baker, N. A.; Sept, D.; Joseph, S.; Holst, M. J.; McCammon, J. A. Electrostatics of Nanosystems: Application to Microtubules and the Ribosome. Proc. Natl. Acad. Sci. 2001, 98, 10037–10041. (64) Homeyer, N.; Gohlke, H. Free Energy Calculations by the Molecular Mechanics PoissonBoltzmann Surface Area Method. Mol. Inf. 2012, 31, 114–122. 29

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(65) Honig, B.; Nicholls, A. Classical Electrostatics in Biology and Chemistry. Science 1995, 268, 1144–1149. (66) Hou, T.; Wang, J.; Li, Y.; Wang, W. Assessing the Performance of the MM/PBSA and MM/GBSA Methods. 1. The Accuracy of Binding Free Energy Calculations Based on Molecular Dynamics Simulations. J. Chem. Inf. Model. 2010, 51, 69–82. (67) Bondi, A. Van der Waals Volumes and Radii. J. Phys. Chem. 1964, 68, 441–451. (68) Sitkoff, D.; Sharp, K. A.; Honig, B. Accurate Calculation of Hydration Free Energies Using Macroscopic Solvent Models. J. Phys. Chem. 1994, 98, 1978–1988. (69) Weis, A.; Katebzadeh, K.; S¨oderhjelm, P.; Nilsson, I.; Ryde, U. Ligand Affinities Predicted with the MM/PBSA Method: Dependence on the Simulation Method and the Force Field. J. Med. Chem. 2006, 49, 6596–6606. (70) Aldeghi, M.; Bodkin, M. J.; Knapp, S.; Biggin, P. C. Statistical Analysis on the Performance of Molecular Mechanics Poisson–Boltzmann Surface Area Versus Absolute Binding Free Energy Calculations: Bromodomains as a Case Study. J. Chem. Inf. Model. 2017, 57, 2203–2221. (71) Genheden, S.; Kuhn, O.; Mikulskis, P.; Hoffmann, D.; Ryde, U. The Normal-Mode Entropy in the MM/GBSA Method: Effect of System Truncation, Buffer Region, and Dielectric Constant. J. Chem. Inf. Model. 2012, 52, 2079–2088. (72) Uciechowska, U.; Schemies, J.; Scharfe, M.; Lawson, M.; Wichapong, K.; Jung, M.; Sippl, W. Binding Free Energy Calculations and Biological Testing of Novel Thiobarbiturates as Inhibitors of the Human NAD+ Dependent Histone Deacetylase Sirt2. MedChemComm 2012, 3, 167–173. (73) McKinney, W. Data Structures for Statistical Computing in Python. Proceedings of the 9th Python in Science Conference. 2010; pp 51–56. 30

ACS Paragon Plus Environment

Page 30 of 46

Page 31 of 46 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(74) McGibbon, R. T.; Beauchamp, K. A.; Harrigan, M. P.; Klein, C.; Swails, J. M.; Hern´andez, C. X.; Schwantes, C. R.; Wang, L.-P.; Lane, T. J.; Pande, V. S. MDTraj: a Modern Open Library for the Analysis of Molecular Dynamics Trajectories. Biophys. J. 2015, 109, 1528–1532. (75) Luirink, R. A.; Dekker, S. J.; Capoferri, L.; Janssen, L. F.; Kuiper, C. L.; Ari, M. E.; Vermeulen, N. P. E.; Vos, J. C.; Commandeur, J. N. M.; Geerke, D. P. A Combined Computational and Experimental Study on Selective Flucloxacillin Hydroxylation by Cytochrome P450 BM3 Variants. J. Inorg. Biochem. 2018, 184, 115–122. (76) McGaughey, G. B.; Gagn´e, M.; Rapp´e, A. K. π-Stacking Interactions Alive and Well in Proteins. J. Biol. Chem. 1998, 273, 15458–15463. (77) Hubbard, R. E.; Kamran Haider, M. Encyclopedia of Life Sciences (ELS); Wiley Online Library, 2010; pp 1–7. (78) Barlow, D. J.; Thornton, J. Ion-Pairs in Proteins. J. Mol. Biol. 1983, 168, 867–885. (79) Gallivan, J. P.; Dougherty, D. A. Cation-π Interactions in Structural Biology. Proc. Natl. Acad. Sci. 1999, 96, 9459–9464. (80) Auffinger, P.; Hays, F. A.; Westhof, E.; Ho, P. S. Halogen Bonds in Biological Molecules. Proc. Natl. Acad. Sci. 2004, 101, 16789–16794. (81) Waskom, M.; Botvinnik, O.; O’Kane, D.; Hobson, P.; Ostblom, J.; Lukauskas, S.; Gemperline, D. C.; Augspurger, T.; Halchenko, Y.; Cole, J. B.; Warmenhoven, J.; de Ruiter, J.; Pye, C.; Hoyer, S.; Vanderplas, J.; Villalba, S.; Kunter, G.; Quintero, E.; Bachant, P.; Martin, M.; Meyer, K.; Miles, A.; Ram, Y.; Brunner, T.; Yarkoni, T.; Williams, M. L.; Evans, C.; Fitzgerald, C.; Brian,; Qalieh, A. Seaborn: v0.9.0 (July 2018). 2018; https://doi.org/10.5281/zenodo.1313201.

31

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(82) To evaluate correlation coefficients, we used the online tool that is available at https://www.socscistatistics.com. (83) Brown, S. P.; Muchmore, S. W. Large-Scale Application of High-Throughput Molecular Mechanics with Poisson-Boltzmann Surface Area for Routine Physics-Based Scoring of Protein-Ligand Complexes. J. Med. Chem. 2009, 52, 3159–3165. (84) Yang, T.; Wu, J. C.; Yan, C.; Wang, Y.; Luo, R.; Gonzales, M. B.; Dalby, K. N.; Ren, P. Virtual Screening Using Molecular Simulations. Proteins: Struct., Funct., Bioinf. 2011, 79, 1940–1951. (85) Hou, T.; Wang, J.; Li, Y.; Wang, W. Assessing the Performance of the Molecular Mechanics/Poisson Boltzmann Surface Area and Molecular Mechanics/Generalized Born Surface Area Methods. II. The Accuracy of Ranking Poses Generated from Docking. J. Comput. Chem. 2011, 32, 866–877. (86) Oehme, D. P.; Brownlee, R. T.; Wilson, D. J. Effect of Atomic Charge, Solvation, Entropy, and Ligand Protonation State on MM-PB (GB) SA Binding Energies of HIV Protease. J. Comput. Chem. 2012, 33, 2566–2580. (87) ˚ Aqvist, J.; Luzhkov, V. B.; Brandsdal, B. O. Ligand Binding Affinities from MD Simulations. Acc. Chem. Res. 2002, 35, 358–365. (88) Shirts, M. R.; Pitera, J. W.; Swope, W. C.; Pande, V. S. Extremely Precise Free Energy Calculations of Amino Acid Side Chain Analogs: Comparison of Common Molecular Mechanics Force Fields for Proteins. J. Chem. Phys. 2003, 119, 5740–5761. (89) Li, L.; Li, C.; Zhang, Z.; Alexov, E. On the Dielectric Constant of Proteins: Smooth Dielectric Function for Macromolecular Modeling and Its Implementation in DelPhi. J. Chem. Theory Comput. 2013, 9, 2126–2136.

32

ACS Paragon Plus Environment

Page 32 of 46

Page 33 of 46 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(90) Wang, E.; Sun, H.; Wang, J.; Wang, Z.; Liu, H.; Zhang, J. Z.; Hou, T. End-Point Binding Free Energy Calculation with MM/PBSA and MM/GBSA: Strategies and Applications in Drug Design. Chem. Rev. 2019, DOI: 10.1021/acs.chemrev.9b00055. (91) Sanders, B. D.; Jackson, B.; Marmorstein, R. Structural Basis for Sirtuin Function: What We Know and What We Don’t. Biochim. Biophys. Acta, Proteins Proteomics 2010, 1804, 1604–1616. (92) Vosmeer, C. R.; Kooi, D. P.; Capoferri, L.; Terpstra, M. M.; Vermeulen, N. P. E.; Geerke, D. P. Improving the Iterative Linear Interaction Energy Approach Using Automated Recognition of Configurational Transitions. J. Mol. Model. 2016, 22, 31. (93) Genheden, S.; Ryde, U. How to Obtain Statistically Converged MM/GBSA Results. J. Comput. Chem. 2010, 31, 837–846. (94) Zhou, Y.; Feig, M.; Wei, G.-W. Highly Accurate Biomolecular Electrostatics in Continuum Dielectric Environments. J. Comput. Chem. 2008, 29, 87–97. (95) Vasanthanathan, P.; Olsen, L.; Jørgensen, F. S.; Vermeulen, N. P. E.; Oostenbrink, C. Computational Prediction of Binding Affinity for CYP1A2-Ligand Complexes Using Empirical Free Energy Calculations. Drug Metab. Dispos. 2010, 38, 1347–1354. (96) He, X.; Man, V. H.; Ji, B.; Xie, X.-Q.; Wang, J. Calculate Protein–ligand Binding Affinities with the Extended Linear Interaction Energy Method: Application on the Cathepsin S Set in the D3R Grand Challenge 3. J. Comput.-Aided Mol. Des. 2019, 33, 105–117. (97) Case, D. A.; Cheatham III, T. E.; Darden, T.; Gohlke, H.; Luo, R.; Merz Jr, K. M.; Onufriev, A.; Simmerling, C.; Wang, B.; Woods, R. J. The Amber Biomolecular Simulation Programs. J. Comput. Chem. 2005, 26, 1668–1688.

33

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(98) Salomon-Ferrer, R.; Case, D. A.; Walker, R. C. An Overview of the Amber Biomolecular Simulation Package. Wiley Interdiscip. Rev.: Comput. Mol. Sci. 2013, 3, 198–210.

34

ACS Paragon Plus Environment

Page 34 of 46

Page 35 of 46 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Table 1: Compound ID, molecular structure, and experimental values for the binding free energy (∆Gbind ) of the SIRT1 inhibitors used in the current study, with ∆Gbind values derived from experimental data obtained by Disch et al. 37 Note that the compound numbering (IDs) as listed below is the same as used by Disch et al. 37 ∆Gbind ID

Structure

∆Gbind ID

(kJ mol−1 )

Structure

(kJ mol−1 )

11a

–49.4

27

–32.6

11b

–43.0

28

–46.8

11c

–50.3

29

–44.2

11d

–45.0

30

–41.6

35

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 36 of 46

15a

–46.5

31

–49.9

16a

–35.1

32

–38.7

18

–46.9

33

–37.1

19

–49.2

34

–41.8

20

–41.8

35

–43.6

21

–29.5

36

–38.2

36

ACS Paragon Plus Environment

Page 37 of 46 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

23

–29.3

37

–43.4

24

–35.1

38

–38.1

25

–43.5

39

–40.6

26

–38.1

37

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 38 of 46

Table 2: Model parameters α and β of the LIE models for SIRT1, and their respective root–mean–square error (RMSE), standard deviation of error in prediction (SDEP, based on leave–one–out cross–validation), and correlation metrics represented by r2 , q 2 , Pearson’s r (rP ), and Spearman’s ρ (ρS ) for the dataset of 27 compounds. RMSEs and SDEPs are given in kJ mol−1 . α

β

RMSE

SDEP

r2

q2

rP

ρS

LIEsingle

0.40

–0.04

4.21

4.66

0.52

0.35

0.72

0.72

LIEγ=−10.24

0.30

–0.05

4.00

4.54

0.52

0.38

0.72

0.73

LIEmult

0.42

–0.19

6.16

6.59

0.16

–0.30

0.40

0.43

LIEcomb

0.40

–0.05

4.30

4.88

0.49

0.29

0.70

0.70

LIEBoltzmann4

0.40

–0.02

4.21

4.67

0.49

0.35

0.70

0.69

LIEf irst1ns

0.41

–0.04

3.96

4.39

0.54

0.42

0.74

0.70

LIEf irst5ns

0.41

–0.02

4.41

4.91

0.46

0.28

0.68

0.68

LIEf irst10ns

0.40

–0.04

4.27

4.73

0.51

0.33

0.71

0.72

LIEf irst15ns

0.40

–0.04

4.27

4.71

0.51

0.34

0.71

0.71

LIElast1ns

0.41

0.03

4.58

5.01

0.48

0.25

0.69

0.65

LIElast5ns

0.40

–0.03

4.28

4.75

0.52

0.32

0.72

0.69

LIElast10ns

0.40

–0.05

4.41

4.83

0.50

0.30

0.70

0.70

LIElast15ns

0.40

–0.05

4.30

4.74

0.52

0.33

0.72

0.72

LIEequil

0.41

–0.02

4.50

5.30

0.45

0.16

0.67

0.66

LIEβ=0

0.41



4.24

4.40

0.51

0.42

0.71

0.67

LIEα=0



–1.67

31.3

32.1

0.05

–29.9

0.23

0.18

vdwbound

0.19



3.79

3.94

0.58

0.54

0.76

0.76

vdwunbound

0.35



4.39

4.56

0.49

0.38

0.70

0.67

38

ACS Paragon Plus Environment

Page 39 of 46 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Table 3: Performance of the MM/PBSA models for SIRT1 in terms of their correlation metrics for the dataset, represented by r2 , q 2 , Pearson’s r (rP ), and Spearman’s ρ (ρS ) for the dataset of 27 compounds. r2

rP

ρS

MM/PBSAε=2

0.41

0.64

0.61

MM/PBSAε=1

0.39

0.63

0.60

MM/PBSAε=4

0.22

0.47

0.45

MM/PBSAε=8

0.11

0.33

0.31

MM/PBSAf irst1ns

0.28

0.53

0.46

MM/PBSAf irst5ns

0.41

0.64

0.64

MM/PBSAf irst10ns

0.42

0.65

0.63

MM/PBSAf irst15ns

0.41

0.64

0.63

MM/PBSAlast1ns

0.36

0.60

0.56

MM/PBSAlast5ns

0.35

0.59

0.56

MM/PBSAlast10ns

0.36

0.60

0.58

MM/PBSAlast15ns

0.39

0.62

0.60

MM/PBSAele

0.19

0.43

0.08

MM/PBSAvdW

0.60

0.77

0.76

MM/PBSApolar

0.36

–0.60

–0.57

MM/PBSASASA

0.57

0.76

0.75

MM/PBSAnoele

0.04

0.20

0.14

MM/PBSAnopolar

0.49

0.70

0.71

39

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 1: Scatter plots for experimental (exp) and calculated (calc) free energies ∆Gbind of ligand–SIRT1 binding. Calculated values are obtained using the LIEsingle (left panel) or MM/PBSAε=2 model (right panel). The solid blue lines represent linear fits to the data, while the distributions of data are represented by kernel density plots.

40

ACS Paragon Plus Environment

Page 40 of 46

Page 41 of 46 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 2: Ligand–residue interaction profiles for MD simulations used in the LIEsingle and MM/PBSA models (upper panel) and in the LIEmult model (lower panel). Interaction counts are presented for individual SIRT1 residues (x–axis labels) and averaged over duplicated 20– ns runs per ligand (y–axis labels in upper panel) or per combination of ligand and starting pose used for MD (y–axis labels in lower panel).

41

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 3: Residues of SIRT1 (Phe33, Ile107, Asp108) that show hot–spot interactions with the ligands during the simulations used e.g., in the LIEsingle model. All ligands included in the model are depicted in stick representation using different colors (in their starting structure used for MD), while the interacting residues are labeled and shown in ball–and– stick representation.

42

ACS Paragon Plus Environment

Page 42 of 46

Page 43 of 46 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 4: Boltzmann weights Wi for the simulations (x–axis) used per ligand (y–axis) in the combined LIEcomb model. Starting poses 0 and 1–3 were starting poses for the simulations used in the LIEsingle and LIEmult model, respectively, and the intensity of the color indicates the size of Wi per simulation.

43

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 5: Comparison of ligand starting poses for the MD simulations that have the highest weights Wi in the combined LIEcomb model (solid cyan) with the LIEsingle pose (transparent green).

44

ACS Paragon Plus Environment

Page 44 of 46

Page 45 of 46 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 6: Time series of different ligand–environment interaction energies and solvation free energy terms used in the LIE and/or MM/PBSA models, as obtained in one of the MD simulations of ligand 11a either bound to SIRT1 or free in water (red: ∆Gpolar , orange:

ele ele ele ∆Gnonpolar , light green: Vlig−surr f ree , green: Vlig−surr bound , dark green: Vcomplex , light

vdw

vdw vdw blue: Vlig−surr f ree , blue: Vlig−surr bound , dark blue: Vcomplex ). Plotted values are averages over 10 consecutive data points.

45

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Graphical TOC Entry

46

ACS Paragon Plus Environment

Page 46 of 46