Estimation of Drug-Target Residence Times by τ-Random

May 16, 2018 - Estimation of Drug-Target Residence Times by τ-Random Acceleration Molecular Dynamics Simulations. Daria B. Kokh*† , Marta Amaral‡...
1 downloads 0 Views 6MB Size
Subscriber access provided by READING UNIV

Biomolecular Systems

Estimation of drug-target residence times by # random acceleration molecular dynamics simulations Daria B. Kokh, Marta Amaral, Joerg Bomke, Ulrich Grädler, Djordje Musil, Hans-Peter Buchstaller, Matthias K. Dreyer, Matthias Frech, Maryse Lowinski, François Vallée, Marc Bianciotto, Alexey Rak, and Rebecca C Wade J. Chem. Theory Comput., Just Accepted Manuscript • DOI: 10.1021/acs.jctc.8b00230 • Publication Date (Web): 16 May 2018 Downloaded from http://pubs.acs.org on May 17, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Estimation of drug-target residence times by -random acceleration molecular dynamics simulations Daria B. Kokh1*, Marta Amaral2,3, Joerg Bomke4, Ulrich Grädler2, Djordje Musil2, Hans-Peter Buchstaller5, Matthias K. Dreyer6, Matthias Frech2, Maryse Lowinski7, Francois Vallee7, Marc Bianciotto7, Alexey Rak7, Rebecca C. Wade1,8,9*. 1Molecular

and Cellular Modeling Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Ger-

many 2Molecular 3

Interactions and Biophysics, Merck KGaA, Darmstadt,

Instituto de Biologia Experimental e Tecnológica, Oeiras, Portugal

4Molecular

Pharmacology, Merck KGaA, Darmstadt, Germany

5Medicinal

Chemistry, Merck KGaA, Darmstadt, Germany

6Sanofi-Aventis 7Sanofi 8Center

Deutschland GmbH, R&D Integrated Drug Discovery, Frankfurt am Main, Germany

R&D, Integrated Drug Discovery, Vitry-sur-Seine, France for Molecular Biology (ZMBH), DKFZ-ZMBH Alliance, Heidelberg University, Heidelberg,

Germany 9Interdisciplinary

Center for Scientific Computing (IWR), Heidelberg University, Heidelberg, Germany.

KEYWORDS: computer-aided drug design, drug-receptor residence time, binding kinetics, molecular dynamics simulation

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 29

ABSTRACT Drug-target residence time (), one of the main determinants of drug efficacy, remains highly challenging to predict computationally and, therefore, is usually not considered in the early stages of drug design. Here, we present an efficient computational method, -random acceleration molecular dynamics (RAMD), for the ranking of drug candidates by their residence time and obtaining insights into ligandtarget dissociation mechanisms. We assessed RAMD on a dataset of 70 diverse drug-like ligands of the N-terminal domain of HSP90α, a pharmaceutically important target with a highly flexible binding site, obtaining computed relative residence times with an accuracy of about 2.3 for 78% of the compounds and less than 2.0 within congeneric series. Analysis of dissociation trajectories reveals features that affect ligand unbinding rates, including transient polar interactions and steric hindrance. These results suggest that RAMD will be widely applicable as a computationally efficient aid to improving drug residence times during lead optimization.

2

ACS Paragon Plus Environment

Page 3 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

INTRODUCTION

Over the past few years, much progress has been made in the development of methods for in silico prediction of drug binding affinity. However, the binding kinetics, which underlie the affinity, have only recently been appreciated as important factors for drug efficacy and safety1,2. It has been demonstrated that the rates of drug binding to and unbinding from the target are the main features, along with pharmacokinetic properties, that determine a drug’s biological activity profile3. Computation of the dissociation rate constant, koff, on which the residence time is dependent (=1/koff), is generally more challenging than computation of the binding affinity. Unlike the binding affinity that can be estimated using a two-state endpoint approach, computation of the dissociation rate requires extensive sampling of the transition state that generally results from more than one pathway in the proteinligand configurational space. Given that the residence time of pharmaceutically interesting compounds (minutes to hours) is far beyond the current time accessible to conventional molecular dynamics (MD) simulations (μs), computation of unbinding rates becomes tremendously challenging in the framework of conventional MD alone.4 To date, unbinding rates have been estimated from conventional MD simulations only for molecules with fast dissociation rates (typically with koff >103 s-1).5,6,7,8 This fact has motivated the development of methods for the optimization of MD sampling algorithms by selection of reaction-relevant regions of protein-ligand configurational space (e.g. by Weighted Ensemble Sampling9,10,11,12 or the Adaptive Multilevel Splitting Method13) or by adding biasing potentials along selected collective variables, CV, (e.g. using metadynamics14,15,16,17,18). The latter method has been applied to simulate the unbinding of benzamidine from trypsin19 , inhibitors from several kinases15,16 and HIV-1 protease17, and antagonists from adenosine A2A receptor18, showing agreement with experimental data to within an order of magnitude of koff in most cases. However, the selection of appropriate CVs and the long MD simulations required mean that these methods are difficult to apply in real-life drug design processes. Instead of computing absolute values of koff, it has been proposed that relative residence times can be estimated for a set of compounds directly from the simulation time required for ligand dissociation in enhanced MD simulations20. Specifically, scaling of the potential energy with a partially restrained target was employed to correctly rank several glucokinase activators and HSP90 inhibitors 21,22. Later, the relative egress times in metadynamics runs were shown to provide a reasonably good ranking of the order of magnitude of the relative koff values of 10 inhibitors of cyclin-dependent kinase 8 23. However, both these 3

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 29

methods require parametrization (either selection of a scaling parameter or appropriate CVs), which is not always unambiguous and may notably affect simulation results. In the present study, we developed a new approach, called -random acceleration molecular dynamics (RAMD), to the computation of the relative target residence times of drug-like compounds. It is based on the random acceleration molecular dynamics (RAMD)24 method, an enhanced sampling procedure that was originally developed for exploring ligand egress pathways from buried binding sites in proteins25,26. In the RAMD method, during a standard molecular dynamics simulation of the bound complex, a small additional randomly oriented force is applied to the compound to facilitate its unbinding. The direction of this force is reassigned randomly when the ligand’s movement in a defined time interval falls below a specified threshold distance. The application of the random force thus allows egress of the ligand to be observed in short simulations of several nanoseconds duration. Unlike the methods mentioned above, RAMD requires neither prior knowledge of the dissociation pathway nor extensive parameter fitting. The only parameter to be set by the user is the magnitude of the random force, which mainly affects the simulation time required and, as demonstrated in the present study, when chosen within a reasonable range, does not generally affect the computed relative residence times. Here, we evaluate the application of RAMD to a diverse set of inhibitors of an important cancer target, the human N-terminal domain of heat shock protein 90α (N-HSP90). HSP90 is a chaperone protein that assists the folding of many types of protein, in particular those involved in cell growth. One way to disrupt the HSP90 chaperone activity is by blocking the ATPase function of N-HSP90 by small molecule inhibitors, which leads to degradation of the client proteins and diminished tumor growth27. N-HSP90 is a challenging target for simulations due to its high flexibility. The ATP binding site undergoes considerable conformational changes during the ATPase cycle. Human NHSP90 has been observed in several conformations (Figure 1a) that can be classified as loop- (loop-in and loop-out) and helix-type. Both loop conformations have been observed in crystal structures of human N-HSP90 co-crystallized with different small inhibitors bound to the ATP-binding pocket, as well as in the unbound protein28. In contrast, the helix conformation, with a complete α-helix3, has been observed only in holo-structures and only when the bound compound occupies a hydrophobic transient sub-pocket between α-helix3 and the beta-strands, in addition to the ATP binding site (see inset in Figure 1a), and thereby stabilizes α-helix3 (compounds bound to helical and loop-in conformations will be referred to hereafter as helix-binders and loop-binders, respectively).

4

ACS Paragon Plus Environment

Page 5 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Figure 1. Crystal structures and binding kinetics of the 70 N-HSP90-inhibitor complexes studied. (a) The two main conformations of the N-HSP90 binding site, helix-type (left; PDB ID: 5J9X) and loop-type (right; PDB: 5J2X); the α-helix3 region is shown in red; the surface of the ATP binding pocket and the hydrophobic sub-pocket are colored in yellow and orange, respectively; red and black circles indicate substituents in the solvent-exposed region and in the transient sub-pocket, respectively. (b) Measured KD (left) and kon (right) values plotted against koff. Crosses denote loop-binders, the other compounds are helix-binders. (c-m) Binding poses of compounds representing the eleven different ligand scaffolds considered. Crystal structures are shown in cartoon representation with the ligands and key side-chains in stick representation, water molecules as red spheres (numbered as referred to in the text); the binding pocket is colored as in plots (a,b). Ligands are colored by atom-type and additionally shown in 2D in the insets (compounds (c) 13 (PDB:5J9X), (d) 36 (PDB:5L06), (e) 21 (PDB:5LS1), (f) 60 (PDB:5OD7), (g) 67 (PDB:5LR1), (h) 17 (PDB:5LRZ), (i) 19 (PDB:2YKI), (j) 6 (PDB: 6F1N), (k) 69 (PDB:5LO5), (l) 68 (PDB:5EL5), and (m) 70 (PDB:2YKJ)).

5

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 29

Here, we assess the performance of RAMD on 70 N-HSP90 ATP-pocket binding compounds that have a range of different scaffolds (Figure 1, SI Table 1, SI Fig. 1). For this purpose, we here measured the binding kinetic parameters for 29 compounds by surface plasmon resonance (SPR) and solved crystal structures of complexes with N-HSP90 for 20 compounds. We demonstrate that, in addition to providing estimates of target residence times, RAMD gives insights into the ligand dissociation pathways and the mechanisms of unbinding.

EXPERIMENTAL METHODS Chemistry. Information on the synthesis of chemical compounds is provided in patents and publications listed in SI Table 1 and analytical data are provided in SI Table 3. Compound 1 (AUY922) was purchased from MedChem Express (Sollentuna, Sweden), and compounds 2 (VER49009) and 3 (VER50589) were purchased from Cayman Chemicals (Hamburg, Germany). Crystallization and structure determination. Compounds 4, 6, 7, 10, 14, 35, 36, 38, 40-41, 62, 68, and 69: A hexa-histidine tagged N-terminal fragment of HSP90α (residues 9-236) (NP_005339) was expressed and purified by the Instituto de Biologia Experimental e Tecnológica (Lisbon, Portugal), according to previously described protocols28. The crystallization conditions were essentially the same as those described in Refs. 28,29 Datasets were collected at the SLS synchrotron and processed with the XDS software package30. The structures were solved by molecular replacement with Phaser and refined either with CNX30 or BUSTER31. Model building was performed with Coot32, with inhibitors and water sites fitted into the initial |Fo|–|Fc| map. Compounds 17-22, and 67. A hexa-histidine tagged N-terminal fragment of HSP90α (residues 18-223) (NP_005339) was expressed, purified and crystallized as described in Supporting Material. Datasets were collected in-house on a Rigaku HF-007 rotating anode generator and a MAR CCD detector, and at the synchrotron. Diffraction data were processed with either XDS30 or MOSFLM33. The structures were solved by molecular replacement using one set of coordinates of N-HSP90 (PDB ID: 1YER). The structures were refined using either CNX30, REFMAC534 or AUTOBUSTER31. Ligands were placed manually, and the structural models were manually rebuilt using either TURBO-FRODO35 or COOT32. Final validation checks were performed using MOLPROBITY31. Data set statistics for all crystal structures are given in SI Table 2. The crystallographic coordinates have been deposited in the Protein Data Bank under the accession codes 5LQ9, 5LR1, 5LR7, 5LRZ, 5LS1,

6

ACS Paragon Plus Environment

Page 7 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

5T21, 5NYI, 6F1N, 6ELO, 6ELN, 6ELP, 2YKI, 6EYA, 5LO6, 6EY8, 6EY9, 6EYB, 6EI5, 6EL5, and 5LO5. Surface Plasmon Resonance (SPR). Measurements were performed on a Biacore 4000 instrument (GE Healthcare) as previously described28. Briefly, recombinant N-HSP90 with 17-desmethoxy-17-N,N-dimethylaminoethylamino-geldanamycin (17-DMAG, Merck Millipore) was immobilized on a Biacore CM5 chip at 25°C at a flow rate of 10 µL/min using amine coupling at pH 4.50 according to Biacore’s standard protocol. HBS-N (10 mM Hepes pH 7.40, 0.15 M NaCl) served as the running buffer during immobilization and all SPR binding kinetics measurements were performed in 20 mM HEPES pH 7.50, 150 mM NaCl, 0.05% Tween 20, 1 mM DTT, 0.1 mM EDTA, 2% DMSO. Data sets were processed and analyzed using the Biacore 4000 Evaluation software, version 1.1. Solvent corrected and double-referenced association and dissociation phase data were fitted to a simple 1:1 interaction model with mass transport limitations.

COMPUTATIONAL METHODS Structure preparation for simulations. The crystal structures of 37 complexes were used for the simulations (see SI Table 1). For the remaining protein-ligand complexes simulated, the structures were modelled from the crystal structures with PDB identifiers: 2VCI, 5J9X, 5OD7, and 5OCI for resorcinol loopbinder, resorcinol helix-binder, quinazoline and imidazole compounds, respectively. Ligands from the corresponding PDB files were used as templates for building new compounds by substitution with appropriate chemical groups. Ligand protonation at pH 7.5 was assigned using the Epik tool36,37. Additionally, the protonation states of several compounds, whose residence times were systematically underestimated in simulations (7-imidazopyridine, 7-azaindole, and 2-aminopyridine compounds shown in Figure 1 h, i, and m, respectively) were checked using Jaguar-pKa38 (see Supplementaty Methods), giving values that agreed with Epik results (see SI Table 4). The ligand parameters were generated using the Antechamber program39. RESP40,41 atomic partial charges were fitted using electrostatic potentials from ab initio calculations performed at the HF level with a 6-31G*(1d) basis set using GAMESS42. To account for the interactions of aryl halogens in polar surroundings, we modified the partial charges of two compounds, 5 and 17, that contain a Br atom, on the basis of the extra-charge model43. As the NAMD44 program does not support virtual sites, we employed

7

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 29

the partial negative charge of the Br atom obtained from the extra-charge model and distributed the positive charge of the virtual site over the aromatic carbon atoms. The protein (residues 16-223) was protonated assuming standard protonation states at pH 7.5 using Amber tools. The GAFF force field45 was employed for the ligands and the Amber14 force field46 for the protein. The crystallographic water molecules located within 3 Å of the ligand were included in the system. The tleap Amber tool47 was used for building a rectangular simulation box with a margin of 10 Å around the protein filled with TIP3P 48 water molecules. Some bulk water molecules were replaced by chlorine or sodium ions to retain system neutrality and a salt concentration of about 150 mM. RAMD protocol. For each compound, the main steps of the RAMD protocol are as follows: (i) preparation of a minimized and equilibrated structure; (ii) generation of starting bound-state replicas; (iii) simulation of dissociation trajectories for each starting replica; (iv) statistical analysis to provide a final residence time value and indicate the quality of the bound and transition state sampling. (i) The system was first energy minimized, heated and equilibrated using the AMBER14 software49. Minimization was done with 500 steps of steepest descent followed by 1000 steps of conjugate gradient minimization. Then the system was gradually heated to 300 K in 1 ns using harmonic restraints on nonhydrogen atoms with a force constant of 50 kcalmol-1Å-2. Equilibration was carried out by decreasing the restraints in 5 steps (each of 1 ns with force constants of 50, 10, 5, 2, 0.5 kcalmol-1Å-2) and then running 10 ns without restraints; a constant temperature of 300 K was maintained using a Langevin thermostat and a constant pressure of 1 atm using a Berendsen barostat. For all simulations, a cut-off of 10 Å for nonbonded Coulombic and Lennard-Jones interactions and periodic boundary conditions with a Particle Mesh Ewald treatment of long-range Coulombic interactions were used. A 2 fs time step was employed with bonds to hydrogen atoms constrained using the SHAKE algorithm50. (ii) The atomic coordinates of the equilibrated system generated with AMBER were used as input for the generation of starting replicas in NAMD format. The system was gradually heated over a period of 0.6 ns in steps of 10 K without restraints. Then equilibration was performed in the NPT ensemble using a Langevin thermostat and Nosé-Hoover Langevin pressure control to maintain the system at 1 atm and 300 K. For each compound, we carried out either 2 or 4 series of simulations. Two snapshots (starting replicas) were extracted from each trajectory with a stride of 5 ns and their atomic coordinates and velocities were used as the starting points for RAMD simulations. (iii) The RAMD simulation procedure22,51, implemented as a Tcl wrapper around the NAMD software44, was modified to take the force magnitude rather than acceleration as an input parameter to use new functions available from version 2.10 of NAMD onwards. A random force with a magnitude of 14 kcalmol8

ACS Paragon Plus Environment

Page 9 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

Journal of Chemical Theory and Computation

Å-1 was used for simulations. The direction of the force was chosen randomly initially and after each

check point (i.e. after 100 fs MD) either retained (if the ligand center of mass moved by at least 0.025 Å) or changed randomly otherwise. Simulations were stopped when the center of mass of the ligand had moved further than 30 Å from its original position. Coordinates were saved at 1 ps intervals. (iv) Initially, a set of 10 dissociation trajectories were generated for each starting replica. The residence time, comp, was defined as the simulation time required for ligand dissociation in 50% of the trajectories. A bootstrapping procedure (200 sets, each of the sets containing 80% of the trajectories chosen randomly) was employed to compute the residence time distribution for each set and to estimate the interim residence time, tr, as the mean of the distribution and its standard deviation SDr (tr distributions for all compounds are shown in SI Figure 2). The final value of comp was computed as an average over all simulated replicas as comp =replicas with a standard deviation, SDreplicas. The uncertainty of comp is defined as the greater of two standard deviation values: SDcomp=max(, SDreplicas). If SDr exceeded 50%, the number of simulation trajectories for this particular replica was increased by 5 repeatedly until this criterion was satisfied. If SDreplicas exceeded 50%, the number of replicas used for the simulations was increased up to a maximum of 8 replicas. Thus, a total of 40-200 RAMD trajectories was generated for each compound studied. Validation of the RAMD method. For the validation of the method, we considered the linear correlation between computed and measured residence time on a logarithmic scale for compounds i=1,2…N: Y=aX+b (where Y=log(comp) and X=log(expt)). Any compound with a Cook’s distance52 greater that its mean value multiplied by four over all sets was considered to be an outlier. The mean absolute error, MAE, of the linear regression model was computed as (see also SI Figure 3):

𝑀𝐴𝐸 = 100%

1/𝑁

|

− 10

|

10

where Yifitted =a log(i expt)+b. The final value of the MAE was obtained using a bootstrapping procedure (100 groups; 80% of compounds in each group). Since the correlation between computed and measured residence times was analyzed on a logarithmic scale, the MAE value does not directly represent the accuracy of the method in predicting expt. Therefore, we introduced a Mean Prediction Uncertainty (MPU) factor that provides the uncertainty of the relative values of the residence time obtained from computations assuming that Yifitted yelds a correct scaling of calculated to experimental data; MPU is defined as: 9

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

𝑀𝑃𝑈 = 1 𝑁

𝑚𝑎𝑥

10 

,

Page 10 of 29

 10

where Xicomp = (log(icomp) -b)/a. In the analysis of congeneric series, a linear regression model for the dependence of Y=aX+b , where Y=log(comp)- log(refcomp) and X=log(expt)- log(refexpt), was employed. The MAE and the MPU were computed as given above with Yifitted=a(log(iexpt)- log(refexpt)) and Xicomp=(log(icomp)- log(refcomp))/a.

RESULTS AND DISCUSSION The 70 inhibitors analyzed bind to two conformations of N-HSP90 and display a wide range of binding kinetic parameters.

The 70 compounds studied are listed in Supplementary Table 1. For 20

compounds, the crystal structure of the protein-ligand complex was determined in the present study, see SI Table 2. These structures, along with the structures of an additional 17 complexes available in the PDB/RCSB, were used for the computations. The crystal structures show 9 loop-binders and 28 helixbinders (as classified by the structure of the α-helix3 segment, residues 112-122, see Figure 1a). A common feature in the ATP-binding pocket of all helix- and loop-type complexes is one direct and one watermediated H-bond between the ligand and D93 (water #1 in Figure 1c-m). Additionally, water #1 is coordinated by T184 and two or three other water molecules (denoted as #2, #3, and #4 in Figure 1c-m) form a network connecting D93 and the NH or OH moiety of the bound ligand. The compounds can be roughly classified into 11 groups according to the chemical moiety that forms H-bonds with D93, as illustrated in Figure 1 c-m. These groups are (with the number of compounds given in brackets): resorcinol (27), hydroxyindazole (19), benzamide (3), aminoquinazoline (9), aminopyrrolopyrimidine (2), 7-imidazopyridine (1), 7-azaindole (2), aminothienopyridine (1), 6-hydroxyindole (1), adenine (1), and 2-aminopyridine (1). In the helix-type structures, a hydrophobic ligand moiety partially or completely occupies the transient hydrophobic pocket adjacent to α-helix3, as indicated in Figure 1a (and in 2D protein-ligand contact maps in SI Figure 1). This part of the ligand, as well as solvent-exposed ligand substitutions (indicated by black and red circles, respectively, in Fig 1a), define the differences within each group of compounds. The binding rate constants measured by SPR are given in Figure 1b and SI Table 5. The equilibrium dissociation constant, KD, and the dissociation rate constant, koff, vary over the dataset by about 6 and 4 orders of magnitude, respectively. Importantly, the association rate constant, kon, also varies notably within most of the ligand scaffold types, indicating that both the bound state and the transition state affect koff 10

ACS Paragon Plus Environment

Page 11 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

values across the set of compounds. Although in the dataset, both the binding affinities and the kinetic rates of loop- and helix-binders each cover approximately the same range of values, the slowest binding compounds (kon< 105 M-1s-1) are all helix-binders. This is in accord with the results of our previous study28 where, in a set of resorcinol-type compounds, all helix-binders were characterized by smaller KD values and lower kon and koff values than the loop-binders, which was associated with the protein structural changes required for formation of the helix-type complexes. Computed relative residence times, comp, correlate with measured residence times. comp values computed for all compounds are plotted against expt (expt=1/koff) obtained from SPR experiments on the logarithmic scale in Figure 2a. Most of the compounds demonstrate a good correlation between computed and measured residence times. However, comp is systematically underestimated for the 10 compounds of the amino-quinazoline and amino-pyrrolopyrimidine classes (58-67), which are shown by green rectangles and triangles, respectively, in Figure 2a, as well as for compound 70, which does not retain its crystallographic binding pose in MD equilibration simulations (SI Figure 4). Linear regression analysis omitting these compounds gives R2=0.66. Further, compounds, 11, 17, 30 and 69, can be clearly distinguished as outliers (using Cook’s distance, see Methods). Possible reasons for the simulation inaccuracy for these compounds are discussed below. Removal of these outliers gives an R2 value of the linear regression of the remaining 55 compounds (i.e. 78 % of the original set studied) of 0.86 with a mean absolute error, MAE, of 36% (see details of statistical analysis in Methods). Assuming that the linear regression analysis provides the correct rescaling of the experimental to computed residence time, the mean prediction uncertainty, MPU, of the  is about 2.3-fold (individual values are given in Table 6).

Sampling of both bound and transition states is essential for accurate prediction of the residence times. The uncertainty of the computed residence time, comp, depends on the sampling accuracy of both the transition and the bound state. The transition state is explored by simulation of multiple dissociation trajectories and its sampling efficiency can be quantitatively characterized by the standard deviation, SDr, of the residence time tr for a particular starting replica. The SDr value generally decreases as the number of simulated trajectories is increased, reaching SDr 10-2 s-1) from the binding pocket generally requires less than 1 ns of simulation under the random force used in the study (force magnitude of 14 kcalmol-1 Å-1). Fast dissociating compounds of the quinazoline series need even less than 0.1 ns for dissociation. Such rapid egress could lead to a reduced sensitivity of the method to different dissociation mechanisms. On the other hand, slower dissociation is more sensitive to variations in the free energy landscape and thus requires more extensive sampling. To explore the sensitivity of the method to the random force magnitude, we repeated simulations for 30 compounds with koff > 10-2 s-1 using a smaller force of 13 kcalmol-1 Å-1. This prolonged the dissociation 15

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 29

time by several folds for most compounds (SI Figure 10) except for the amino-quinazoline compounds with koff > 0.1 s-1 (additional simulations showed that the dissociation time for these compounds starts to increase only for force magnitudes less than 11 kcalmol-1Å-1). Importantly, the decrease of the force magnitude did not affect the outliers or notably change the accuracy of the simulations, which suggests that the accuracy of the computed residence time is mainly dependent on the quality of the MD simulations rather than on the magnitude of the random force. Importantly, we note that the quality of results of the simulations started with and without crystal structures (indicated in Figure 2a by filled and open symbols, respectively) does not differ significantly. Proper sampling of the bound state is critical even if a crystal structure of the complex is available. As discussed above, large variations of the computed residence time usually arise from long time-scale protein motions and are independent of whether a crystal structure is available or not. Finally, it is also noteworthy that computed residence times for loop- and helix-binders fit similarly well to the linear regression line. Analysis of dissociation trajectories reveals different factors affecting the residence time. RAMD simulations sample multiple dissociation pathways that can involve various ligand and protein conformations, different protein-ligand contacts, and sometimes follow different egress routes. Analysis of these trajectories may give insights into the determinants of binding kinetics. Here, by way of illustration, we discuss three benzamide helix-binders, compounds 20, 21 and 22, shown in Figure 4a, b. Compound 20 differs from compound 21 by the introduction of a bulky fragment that occupies the hydrophobic subpocket, whereas compound 22 differs from compound 21 only in the polarity of its solvent-exposed fragment (cyclohexanol is substituted by cyclohexanone). Although the residence times of these compounds are not significantly different, their binding rates, kon, and affinity vary by over two orders of magnitude (Figure 4c). In particular, compound 20, has a smaller affinity and a smaller kon than compound 21, i.e. weaker binding with a higher transition barrier. These two effects lead to a relatively small change in expt, which is reflected in the computed comp (Figure 4 d). The simulations reveal several egress routes for compound 20 showing exit from both the side of the ATP binding site and the side of the transient sub-pocket (Figure 4 f), which requires essential conformational changes. In contrast, compounds 21 and 22 can leave the binding pocket without notable effect on the protein structure (the distribution of RMSD of the binding site residues relative to the bound state in dissociation trajectories is illustrated in Figure 4e). This suggests that the sampling of energetically unfavorable conformations, which enable a bulky compound to leave a tight pocket, leads to a higher energy barrier to unbinding. In agreement with this, we observed an inverse correlation of kon with the number of heavy atoms in the hydroxy-indazole compounds of group b with different buried fragments (R2=0.68), 16

ACS Paragon Plus Environment

Page 17 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

whereas there is a very weak correlation (R2=0.16) for the group of compounds with various exposed fragments (SI Figure 11 e, f). Finally, in general for all helix-binders, koff correlates well with the number of heavy atoms (SI Figure 11 d).

Figure 4. Analysis of simulated RAMD trajectories for three benzamide compounds. (a) Chemical structures and measured koff values of the three benzamide compounds. (b) Positions of compounds 20 and 21 in the binding pocket showing occupation of the sub-pocket under α-helix3 by compound 20, which has the longest residence time of these three compounds; the compounds are shown in ball- and stick-representation with carbon atoms colored in cyan, the protein surface is colored by the electrostatic potential. (c) Variation in log(kon) values versus variation log(1/KD) with respect to compound 21. (d) Computed log(comp) against measured log(expt) upon fragment substitution in compound 21. (e) Histogram showing RMSD of the protein binding site during dissociation trajectories for benzamide compounds. (f) Dissociation routes of compound 20 observed in RAMD simulations shown by blue lines indicating the ligand center-of-mass trajectories. Three representative dissociation trajectories are indicated by the positions of the oxygen atom in the benzamide fragment (magenta spheres). Insets: ligand poses in two snapshots from the corresponding dissociation trajectories.

Comparison of compounds 21 and 22 suggests another mechanism of modulation of the transition state. The introduction of cyclohexanone in compound 22 instead of cyclohexanol in compound 21 stabilizes both the bound state and the transition state according to SPR data (both kon and 1/KD are larger by about an order of magnitude for compound 22 than compound 21, Figure 4c). A possible reason is the interaction of compound 22 with several charged residues on the pocket boundary, such as K58 and D54 (Figure 4f). Desolvation of a ligand and a protein binding site upon binding is often considered as an important factor affecting ligand binding rate 54,55,29. We did not observe any significant correlation between kinetic rates and the ligand solvation energy computed using 3D-RISM (SI Table 6) in any of the groups of compounds studied (R2