Predicting Residence Time And Drug Unbinding Pathway Through

Nov 30, 2018 - Predicting Residence Time And Drug Unbinding Pathway Through ... The derived computational predictions are in overall good agreement wi...
0 downloads 0 Views 4MB Size
Subscriber access provided by Gothenburg University Library

Pharmaceutical Modeling

Predicting Residence Time And Drug Unbinding Pathway Through Scaled Molecular Dynamics Doris Alexandra Schuetz, Mattia Bernetti, Martina Bertazzo, Djordje Musil, Hans-Michael Eggenweiler, Maurizio Recanatini, Matteo Masetti, Gerhard F. Ecker, and Andrea Cavalli J. Chem. Inf. Model., Just Accepted Manuscript • DOI: 10.1021/acs.jcim.8b00614 • Publication Date (Web): 30 Nov 2018 Downloaded from http://pubs.acs.org on December 1, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 55 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Predicting Residence Time and Drug Unbinding Pathway through Scaled Molecular Dynamics

Authors: Doris A. Schuetz†#, Mattia Bernetti‡#, Martina Bertazzo‡,§, Djordje Musil∥, HansMichael Eggenweiler⊥, Maurizio Recanatini‡, Matteo Masetti‡*, Gerhard F. Ecker†, and Andrea Cavalli‡,§*



University of Vienna, Department of Pharmaceutical Chemistry, UZA 2, Althanstrasse 14, 1090 Vienna, Austria



Department of Pharmacy and Biotechnology, Alma Mater Studiorum— Università di Bologna, via Belmeloro 6, I-40126 Bologna, Italy §

Computational Sciences, Istituto Italiano di Tecnologia, via Morego 30, 16163 Genova, Italy



Discovery Technologies, Merck KGaA, Frankfurter Straße 250, 64293 Darmstadt, Germany ⊥

#These

Medicinal Chemistry, Merck KGaA, 64293 Darmstadt, Germany

authors contributed equally.

*Co-corresponding authors.

ACS Paragon Plus Environment

1

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 55

Abstract Computational approaches currently assist medicinal chemistry through the entire drug discovery pipeline. However, while several computational tools and strategies are available to predict binding affinity, predicting the drug-target binding kinetics is still a matter of ongoing research. Here, we challenge scaled Molecular Dynamics simulations to assess the off-rates for a series of structurally diverse inhibitors of the heat shock protein 90 (Hsp90) covering three orders of magnitude in their experimental residence times. The derived computational predictions are in overall good agreement with experimental data. Aside from the estimation of exit times, unbinding pathways were assessed through dimensionality reduction techniques. The data analysis framework proposed in this work could lead to better understanding of the mechanistic aspects related to the observed kinetic behavior.

ACS Paragon Plus Environment

2

Page 3 of 55 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Introduction During the last decade, an increasing interest has been raised on binding and unbinding kinetics in the context of drug design.1–7 The traditional optimization strategies, mostly based on the thermodynamic signature of compounds,8,9 had to make room for kinetic considerations, which showed to be potentially as important as affinity. This is especially true for the residence time, which is a measure of how long a drug stays in contact with its target. It has been shown that the residence time can translate better to in vivo efficacy than affinity,6 and it should be considered as a key parameter at the early stages of drug discovery. Furthermore, the kinetic behavior of a drug towards its target can influence selectivity,10 drug safety,11 and PK-PD translation,12 making the evaluation of on- and off-rates of paramount importance for drug discovery programs. Taking into account binding and unbinding kinetics, however, requires expensive setup and time consuming testing of several newly synthesized compounds in complex kinetic assays.13 All these issues call for fast and reliable methods to predict binding kinetics through in silico approaches. Indeed, while several computational strategies exist that are routinely employed to assist the drug discovery and design process from a thermodynamic standpoint,9,14 established tools for effectively integrating kinetic information into this pipeline are currently missing. Molecular Dynamics (MD) simulations is the method of choice when it comes to investigating dynamic processes at a fully atomistic level.10,15,16 However, in the context of protein-ligand binding or unbinding, reaching an optimal tradeoff between efficiency and accuracy represents a challenging task.17–19 While protein-ligand binding events can nowadays be achieved through MDbased methods either relying on specialized hardware20–23 or smarter sampling strategies,15,24–27 obtaining quantitative estimates of unbinding rates is currently out of reach for the majority of pharmaceutically relevant complexes. To circumvent this difficulty, multiscale simulations and

ACS Paragon Plus Environment

3

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 55

biased-MD approaches aimed at accelerating the sampling of rare events (e.g., unbinding) have been introduced. In this context, multiscale modeling typically exploits a dual resolution scheme, in which the computationally cheaper Brownian dynamics is employed at high protein-ligand separation, whereas a fully atomistic description is switched-on in proximity of the binding site.28 Notably, the SEEKR method introduced by Amaro and coworkers takes advantage of both a multiscale modeling and an highly parallelizable milestoning approach to further improve the sampling efficiency within the atomistic regime.29,30 Concerning biased-MD approaches, most of them rely on the introduction of external biasing forces acting on selected degrees of freedom (or collective variables). This is the case for umbrella sampling and metadynamics, among many others.31,32 Although biased methods are primarily suited to achieve a thermodynamic characterization of the investigated process, the metadynamics formalism has recently also been extended to recover kinetic information from the biased dynamics.33 The methodology, named infrequent metadynamics, has been successfully applied to the well-known trypsin-benzamidine system,34 to characterize the unbinding kinetics and mechanistic steps of the anticancer drug dasatinib from c-Src kinase,35 and to predict the off-rates for a series of congeneric inhibitors from the p38 MAP kinase.36 Alternatively, one can accelerate all the degrees of freedom at once through high temperature dynamics or smoothed potential methods (tempering methods). In these cases, the canonical distribution of states is either preserved using suitable simulation setup (parallel tempering)37 or recovered in a following step through reweighting procedures.38 In between these classes of biased approaches lies the RAMD method, in which forces are randomly assigned to the ligand encouraging its egress from the binding site without the specification of collective variables.39 Notably, Kokh et al. very recently reported on the use of such method (τRAMD) to rank diverse compounds according to their dissociation time.40 Rather than biasing the dynamics

ACS Paragon Plus Environment

4

Page 5 of 55 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

through external forces or taking advantage of multiscale representations, another class of methods is currently emerging to predict the drug-target residence time. These are based on adaptive schemes whereby the dynamics is guided towards the most relevant regions of phase space to efficiently sample the rare event. Among these approaches, we mention the adaptive multilevel splitting,41 as well as the weighted ensemble algorithm.42 In particular, the latter has been successfully employed to a series of pharmaceutically relevant systems of increasing complexity.43–46 Among the previously introduced tempering methods, scaled MD (sMD) is a promising approach that achieves a speed-up in the observation of rare events by smoothing the potential energy surface (PES) through the introduction of a scaling factor λ ranging from 0 to 1.13,47,48 sMD has been shown to successfully rank congeneric series of drug-like compounds according to their residence time, holding great potential for drug prioritization based on kinetic arguments.13,49 Very recently, the methodology has also been challenged to predict the unbinding kinetics for a series of hDAAO inhibitors in a prospective fashion, with no a priori knowledge of the experimental residence times.50 Despite the efficiency of sMD and its ability to predict relative off rates of protein-ligand complexes accurately, extracting relevant information on the dissociation mechanism from repeated sMD runs is far from trivial. The reason is the high-dimensional configuration space in which unbinding events take place, as it comprises at least as many degrees of freedom as the Cartesian coordinates of the ligand’s heavy atoms (if protein conformational changes are neglected). Moreover, owing to the smoothed PES, in sMD the unbinding events occur as if the simulations were performed at high effective temperature. Thus, the ligand could follow non-physical unbinding routes that might be far away from the minimum free energy surface, making the dissociation pathway rather difficult to interpret.

ACS Paragon Plus Environment

5

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 55

Here, we used sMD to describe the kinetic behavior of several drug-like molecules in a reliable and efficient way. In particular, in the present study we challenge the methodology in two respects: i) ranking of structurally diverse drug-like ligands based on computed residence times, and ii) catching peculiar features of the unbinding process through subsequent analysis of the produced trajectories. Concerning the first aspect, we chose the heat shock protein 90 (Hsp90) as a test case (see Figure 1A). In particular, we predict the ranking of unbinding rates for a series of structurally diverse inhibitors that have been addressed in various studies as potential anti-cancer agents.51–54 The validity of the obtained prioritization is confirmed through comparison against experimental kinetic data. As for the second aspect, this study illustrates how relevant information about a drug’s dissociation process can potentially be retrieved taking advantage of dimensionality reduction techniques and cluster analysis. We show that the complex configurational space explored by ligands during dissociation can be successfully mapped into a low dimensional space through a non-linear multidimensional scaling technique.55 Additionally, the similarity of unbinding pathways is assessed directly on the low-dimensional space using the Fréchet distance and wellestablished clustering methods.56 Taken together, our results demonstrate how protein-ligand dissociation studies through sMD can provide valuable quantitative and qualitative insights. These can be ultimately exploited to drive kinetic-based drug design.

ACS Paragon Plus Environment

6

Page 7 of 55 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 1. A) The N-terminal domain of Hsp90 in complex with compound 2. The ligand is embedded inside the ATP binding pocket, whose hydrophobic and hydrophilic regions are herein highlighted in orange and blue, respectively. B) Focus on the bound states. Representative structures of the three scaffolds (compounds 2, 4 and 7, from top to bottom) are shown (only polar hydrogens are displayed). The hydrogen bond interaction with the key residue Asp93 is maintained by all of the three compounds. Moreover, all ligands accommodate the same regions of the binding pocket.

Methods

ACS Paragon Plus Environment

7

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 55

Preparation of the systems and simulation setups The crystal structures of human Hsp90α NTD in complex with compounds 2 (PDB: 5OD7), 3 (PDB: 5NYH), 4 (PDB: 5LNY) 6 (PDB: 5ODX) and 7 (PDB: 6HHR) were used as starting coordinates for the simulations.57 Compound 1 was modeled from compound 2 (PDB: 5OD7). Compound 5 was modified by manually exchanging one carbon atom for an oxygen atom of compound 4 (PDB: 5LNY). All the protein preparations and setups were done using the software BIKI 1.3.5.58 Compounds 1 and 3-7 were assumed to be neutral, while compound 2 was in a protonation state +1. The proteininhibitor complexes were placed inside a box of 7.58 nm length. Between 9,731 and 13,918 TIP3P59 water molecules were added for solvation. For systems 1 and 3-7, water molecules were replaced by 7 sodium ions, and for system 2, 6 sodium ions were added, to preserve electro neutrality of the simulation system. The Amber99SBildn60 and GAFF61,62 force fields were used for the protein and ligands, respectively. RESP charges63 for ligands were obtained from the ab initio optimized geometry computed at the HF/6-31G* level of theory with NwChem.64 Steepest descent method was used to minimize initial coordinates, in a 5,000 steps minimization run. Three equilibration steps in the NVT ensemble were carried out for 100 ps, starting at a temperature of 100 K, increasing during the three steps up to 300 K (coupling period: 0.1 ps). The first equilibration step was performed employing a 1 fs integration step, while a 2 fs integration step was used for the remaining stages. Positional restraints (an isotropic force constant of 1,000 kJ mol-1 nm-2) were applied to the heavy atoms of the protein backbone. The final equilibration step of 1 ns was performed in the NPT ensemble at 300 K and 1 atm. The pressure-coupling was performed using the Parrinello-Rahman barostat65,66 (target pressure 1.013 bar, period 2 ps, compressibility 4.57 x 10-5 bar-1). An integration step 1 fs was set and the resulting simulation box

ACS Paragon Plus Environment

8

Page 9 of 55 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

length was used for the following production runs of different duration. All simulations were performed with Gromacs 4.6.1,67–69 using the velocity Verlet algorithm70 for integrating the equations of motion. The neighbor list cut-off within the Verlet cut-off scheme was set to 1.1 nm. Periodic boundary conditions (pbc) were applied and Smooth Particle Mash Ewald method71 was used to calculate Coulombic interactions with a grid size of 0.16 nm. Cut-off for long range Coulomb interactions was set to 1.1 nm and short range van der Waals (vdW) interactions were similarly treated with a cut-off of 1.1 nm. Lincs algorithm (Linear Restraint Solver)72 was used to restrain all protein and ligand bonds while SETTLE algorithm73 was employed for water molecules. All boxes contained around 35,000 atoms (depending on ligand size). Scaled MD48 was carried out performing 25 repeated runs per ligand employing a scaling factor (λ coefficient) of 0.45. Harmonic positional restraints were applied to the backbone of the protein, as the protein fold had to be preserved throughout the simulation. 50 kJ mol-1 nm-2 restraints were applied to the heavy atoms of the backbone, while the residues composing the binding site, in 5 Å of the ligand were not restrained at all. Detecting one atom of any residues within a 5 Å radius from any atoms of the ligand was the condition leading to the exclusion of the entire amino acid from the restraints. In this way, while the overall fold of the protein was preserved, the binding site of the protein was allowed to freely rearrange upon ligand unbinding. We note that, because of the peculiar shape of the solvent exposed binding site, the same set of residues was excluded from the restraints for all the compounds investigated. The position output frequency was set to 10 ps, with a precision of 0.001 nm. A python script running within VMD was used to stop the simulations after the ligand had left the protein.74 The ligand was considered unbound when the distance between the protein center of mass (COM) and the ligand COM reached 30 Å. A total

ACS Paragon Plus Environment

9

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 55

number of 175 simulations were performed for a cumulative simulations time of almost 27 s (see Table S2). From the computed exit times recorded, the average unbinding times (corresponding to the computational residence time, τcalc) were determined for each ligand, along with standard deviations, standard errors, and Bootstrapped standard errors. 1,000 fold bootstrapping was applied to assess the robustness of the estimation procedure as reported by Mollica et al.13 (see Figure S1 for distribution of single runs). Bootstrapping was done using R Studio.75 The correlation of normalized computational residence times with the normalized experimental residence times, τexp, was also performed as described by Mollica et al.13 Post-processing of unbinding trajectories and data analysis The analysis strategy employed in this work consisted in three main steps: i) a clean-up stage in which sMD trajectories were scanned to obtain a productive sequence of states and reduce the noise introduced by the high effective temperature, ii) construction of a low-D embedded space from the cleaned trajectories, and iii) similarity analysis of unbinding pathways projected on the previously built low-D space. In particular, the low-D space was constructed to identify relevant features of unbinding events taking place in the high-dimensional configurational space of the simulated systems. The working hypothesis was that, despite sMD trajectories might be far away from minimum free energy paths, recurring events can reveal useful reaction coordinates to describe the protein-ligand binding process.55,76 Indeed, while the relative population of transition pathways (unbinding events) can change with temperature,77 we assume that, in the limit of an adequately large ensemble of sMD trajectories, the minimum free energy pathway can still be sampled with a relatively high probability.

ACS Paragon Plus Environment

10

Page 11 of 55 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

The low-D space was obtained through the Isomap method78 as implemented in the python library Pysomap.79 Isomap is a well-established non-linear dimensionality reduction technique. Unlike metric and non-metric multidimensional scaling (MDS) methods,80 which attempt to preserve the Euclidean distance of the high dimensionality (high-D) space, Isomap approximates the manifold using the geodesic distance. In this way, such approach is superior to MDS as it allows to reproduce not only pairwise distances, but also the inherent topology of the high-D space. This is particularly relevant if one wishes to devise a reaction coordinate from MD trajectories.81 Indeed, configurations close in distance in the high-D space are not necessarily close along a reactive pathway.55 In order to estimate the geodesic distance, the shortest path between pair of points in a k-nearest neighbor graph is computed. In particular, Pysomap implements the Floyd’s algorithm to create the network. Herein, the distance between the ligands’ heavy atom coordinates (after optimal alignment on the protein alpha carbons) computed over all pairs of trajectory frames, was used as a metric. The PLUMED software was employed for this aim.82 It is well known that the ability of Isomap to preserve the geodesic distance of the manifold depends on the sampling density of data and on the definition of neighboring points.83 In this work, the high-dimensional data were embedded into a 3D space using 10 k-nearest neighbor points (k = 10). This value was obtained by a trial and error procedure and comparing the features of the unbinding pathways projected in the low-D space (see below). Concerning the issue of data sampling, we note that even though a high density of points in the high-D space is required for accuracy, the method becomes extremely impractical in case of big data like those generated by typical MD runs. To face this problem, landmark variants of Isomap have been developed, in which only a fraction of the entire data set is employed to build the embedding, rather than exploiting all of the available points. This fraction can be chosen either randomly or based on geometric

ACS Paragon Plus Environment

11

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 55

assumptions.83 Here, we use a landmark-like approach, in which the relevant points in the high-D space are selected along productive unbinding trajectories. Thus, only those states describing a gradual advancement from the initial bound pose to fully solvated ligands were retained. This choice has the advantage of focusing the construction of the embedding on the portions of the manifold that carry the most relevant mechanistic information about the investigated events. Moreover, the unbinding trajectories obtained through sMD were extremely noisy. As such, this strategy also represents an effective procedure to filter out all the detours, dead-ends, and loops in the configurational space that are typical for non-equilibrium dynamics.84 In the clean-up stage of the data analysis, which precedes the dimensionality reduction, the trajectories from the bound to the unbound state of the ligand (running frame jm; m = 1, 2, …, M) are scanned in order to save all the in frames whose distance in terms of root mean-squared deviation (RMSD), after alignment on the protein’s alpha carbons, is greater or equal to a given threshold (δ) compared to the latest frame saved (iN). To achieve this, the current structure (frame jm) is compared to all the N previously saved frames (in; n = 1, 2, …, N). In case the previous condition is satisfied, the jm-th configuration is stored as additional frame in the cleaned-up trajectory (see Figure 2A). On the other hand, if the RMSD between the running frame and one of the previously stored structures is lower than the threshold (i.e. RMSD(in,jm) < δ), the in-th frame is replaced by the current configuration, and all the subsequent points of the cleaned-up trajectory (n+1, n+2, …, N) are discarded (see Figure 2A). In this way, only productive transitions are preserved, and all loops and/or dead ends along the unbinding trajectory are filtered out (see Figure 2B). Notably, this aspect becomes particularly relevant in light of the possibility to exploit the extracted pathway as a putative reaction coordinate, where continuous advancement along the considered mechanism becomes a crucial requisite. In this work, the RMSD was computed

ACS Paragon Plus Environment

12

Page 13 of 55 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

between all heavy atoms of ligands, and we used an RMSD threshold of 2.0 Å for compounds 1 and 7 and 2.5 Å for compounds 2-5. This choice reflected the different size of the investigated compounds (see Table 1), as we observed that a higher RMSD threshold was required to better describe the unbinding pathways of molecules carrying a greater amount of degrees of freedom. Finally, for every ligand, the low-D space was built using the whole ensemble of frames generated from each cleaned-up trajectory (Figure 2C).

Figure 2. A) Schematic diagram of the clean-up stage of data analysis. B) Pictorial representation of frame removal from a hypothetical trajectory which folds back to itself. C) The main steps involved in the analysis of unbinding trajectories. After cleaning-up the trajectories, the pairwise distance calculated in the configurational space of the ligand (previously aligned on the protein’s

ACS Paragon Plus Environment

13

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 55

frame) is computed using the whole ensemble of preserved frames (matrix D in the scheme) and embedded in the low-D space through Isomap. Then, each unbinding pathway corresponding to the previously obtained cleaned-up trajectories are projected in the low-D space. Finally, for each pair of unbinding pathways, the Fréchet distance is computed and stored in a symmetric matrix (F in the scheme) that can be further exploited for subsequent cluster analysis.

The pairwise similarity of unbinding pathways projected on the low-D space built by Pysomap was then measured using the Fréchet distance, as implemented in the Similarity Measures library of R.85 In particular, the discrete Fréchet distance between the paths Sk and Sl was computed as follows:56

𝐹(𝑆𝑘,𝑆𝑙) = inf max [𝑑(𝑆𝑘(𝛼(𝑡)),𝑆𝑙(𝛽(𝑡)))]

(1)

𝛼,𝛽 𝑡 ∈ [0,1]

In Equation 1, the discrete Fréchet distance is defined as the infimum among all the parameterizations α and β of Sk and Sl of the maximum distance t between the same curves. The distance d is the Euclidean distance between points of the parametric curves Sk(α) and Sl(β) evaluated in the low-D space. Notably, the two parametric curves do not need to have the same length, and this makes this measure particularly suited to compare distinct unbinding trajectories. Thus, using the discrete Fréchet distance as a dissimilarity metric, all of the unbinding pathways were further analyzed via well-established cluster analysis methods. Two clustering strategies, as implemented in R version 3.3.2, were employed depending on the aim of the analysis.75 For example, the agglomerative hierarchical clustering algorithm with complete linkage method was mainly exploited for visualization purposes. In agglomerative

ACS Paragon Plus Environment

14

Page 15 of 55 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

hierarchical clustering, similar objects (pathways in this specific case) are linked together to form growing clusters according to a bottom-up approach, where the order in which the clusters are merged is a function of their similarities. The hierarchical organization of the dissimilarity matrix can therefore be displayed as a tree diagram (dendrogram) which provides a simple visual interpretation of the dataset structure. Conversely, a less subjective classification of pathways in similar groups, was performed through the k-means clustering method.86,87 With this method, each object is assigned to one of the k clusters defined by centroids, where the initial number of clusters k is a user-specified parameter. To determine the optimal number of clusters, we adopted the average silhouette method,88 as implemented in the Factoextra library of R, where the silhouette value is a measure of how similar an object is to its own cluster compared to the others. Thus, the optimal number of clusters k is defined as the one that maximizes the average silhouette over a range of possible values of k. Each cluster represents a set of similar unbinding trajectories, while the representative pathway of each cluster is defined as the closest to the centroid.

Results and Discussion Compound ranking based on the computed residence times. We applied sMD to perform unbinding simulations for a series of drug-like inhibitors complexed to the N-terminal domain (NTD) of Hsp90.57 Here, Hsp90 was employed as a test case, while the seven investigated inhibitors were chosen because of the availability of experimental SPR measurements and because for five of them the crystal structure of the complex was available (compounds 2, 3, 4, 6, and 7). Concerning the remaining compounds (compounds 1 and 5), the bound states could be easily modeled in silico, because of their minor structural changes with respect to existing X-ray structures. While a simple replacement of a carbon atom by an oxygen was required to build

ACS Paragon Plus Environment

15

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 55

compound 5 from compound 4, cleavage of a bulky substituent was needed to model compound 1 from 2. The latter procedure could be safely performed as the cleaved group does not establish major interactions with the protein in the corresponding crystal structure (see Figure 1A). Thus, because of the main interactions of the quinazoline scaffold are preserved upon molecular cleavage, we can safely assume that this replacement should not affect the native pose of compound 1. All the small molecules comprised in our dataset, presented in Table 1, bind to the ATP binding site of the target protein Hsp90 (Figure 1A and 1B). As outlined, compounds 1 and 2 display a quinazoline scaffold, compounds 3 to 6 possess an indazole-like structure, and compound 7 is a pyrazole-containing, resorcinol-like inhibitor. Both affinity (equilibrium dissociation constant, KD) and kinetic (association and dissociation rates, kon and koff respectively) properties are shown in Table 1. The corresponding residence times (determined as the inverse of the koff) range from a few seconds (compounds 1 and 7) up to thousands of seconds (compounds 2, 4 and 6). This makes the dataset very diverse, both in terms of chemical structure and range of residence times. Moreover, as Figure 3 shows, differences in koff in our set of molecules cannot be entirely ascribed to differences in affinity. From this standpoint, our dataset can be considered as properly tailored to study relevant changes in unbinding kinetics.

Table 1. Structures of the seven compounds examined. Three scaffolds have been considered containing the quinazoline, indazole and resorcinol cores. cmp

Scaffold

Substitution

kon

koff

(M-1 s-1)

(s-1)

KD (nM)

Experimental residence time τexp (s)

ACS Paragon Plus Environment

16

Page 17 of 55 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

H

2152.00

2.56·105

5.52·10-1

1.81

2

13.00

5.28·104

6.80·10-4

1470.59

3

18.73

1.30·105

2.43·10-3

412.20

4

42.00

2.14·104

9.10·10-4

1098.90

5

485.00

2.72·104

1.32·10-2

75,76

6

195.30

1.65·103

2.94·10-4

3402.52

7

98.00

1.13·106

1.11·10-1

9.01

1

Experimental kinetic parameters (kon and koff) and binding affinities (KD) are reported for each chemical.57 SPR measurements for compound 7 was performed on a Biacore 4000 instrument from GE Healthcare (see the Supplementary Information, SI). All experimental data are shown as the 1 mean. The experimental residence time, calculated as 𝜏exp = 𝑘off, is shown in seconds.

ACS Paragon Plus Environment

17

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 55

Figure 3. 2D kinetic map of the association rate constants (kon) versus dissociation rate constants (koff) with iso-affinity lines for the seven compounds employed in this study. Individual ligand affinities are shown in color-scale.

Even though the dataset contains different scaffolds, thus displaying chemical diversity, the structures show good overlay and comparable orientation in their binding modes. When embedded in the ATP site, similar interactions are maintained by all considered compounds, most importantly the Asp93 H-bond that is one of the key interactions (Figure 1B). Moreover, as the compounds tend to occupy similar regions of the binding site, those possessing bulkier substituent groups are able to reach the deep hydrophobic portion of the pocket, comprising residues Met98, Leu103, Leu107, Phe138, Tyr139, Val150 and Trp162. As depicted in Figure 1, one moiety of the ligand is facing towards the pocket entrance and is therefore solvent exposed. This is a general observation for co-crystallized Hsp90 inhibitors, as reported in previous works on this target.57 Indeed, as the ATP binding site of Hsp90 is a relatively shallow pocket, most ligands, and particularly those presenting larger substituents, protrude into the bulk of the solvent.

ACS Paragon Plus Environment

18

Page 19 of 55 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

The initial states for our simulations were the protein-ligand complexes of the compounds listed in Table 1. For each protein-ligand complex, 25 sMD runs were carried out employing a scaling factor λ of 0.45. We note that, this rather aggressive scaling factor was chosen as a best compromise between accuracy and speed. Indeed, while λ values closer to 1 better preserve the dynamics at 300 K, much longer simulations are required to witness unbinding events. Thus, the choice of a scaling factor of 0.45 was dictated by the need to observe repeated unbinding events within a submicrosecond timescale for all the investigated compounds. In order to preserve the overall secondary and tertiary structure of the protein, as this might be compromised by the scaling factor,13,38 weak positional restraints were applied to all the backbone heavy atoms, made exception for specific residues comprising the binding pocket (see Methods for details about the simulation setup). The computational residence times τcalc were estimated as the average of the exit times recorded for each compound (see Figure S1 for distributions of exit times). The results obtained are displayed in Table 2.

Table 2. Experimental and computed residence times (τexp and τcalc, respectively) for each of the seven compounds. Experimental residence times are reported in seconds, while computed residence times are reported in nanoseconds. Computed quantities Compound

τexp ± σ τcalc

±σ

± σe

± σBS

1

1.81 ± 0.18

8.06

± 9.68

± 1.94

± 1.77

2

1470.59

291.11

± 259.04

± 52.88

± 49.96

169.81

± 150.50

± 30.72

± 29.20

412.2 ± 3 13.43

ACS Paragon Plus Environment

19

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 55

1098.9 ± 4

209.72

± 115.53

± 23.58

± 22.53

141.38

±133.01

±27.15

± 25.86

171.48

± 138.89

± 28.35

± 26.61

77.30

± 44.88

± 9.16

± 8.69

320.64 5

75.76 ± 3.91 3402.25 ±

6 177.06 7

9.01

σ stands for the standard deviation and σe for the standard error derived as σe =

𝜎

𝑛 where n equals the number of observations/measurements; σBS is the bootstrapped standard error. First, we analyzed the differences in predictions for structurally similar compounds that showed different kinetic behavior. Thus, we focused separately on the quinazoline analogs 1 and 2, and on the indazole compounds 3-6. As for the former, compound 2 includes a large substituent comprising a piperazine sulfonamide, which is charged at physiological pH. The significant dissimilarity with respect to compound 1, in which the entire substituent is absent, reflects in a pronounced difference in residence times. In fact, as shown in Table 1and Figure 3, the exit rates for compound 2 are significantly lower than those of compound 1. As mentioned before, in the bound state, the bulky group in compound 2 protrudes into the solvent and is not involved in relevant interactions with the protein (see Figure 1A and Figure 4). Nevertheless, it is reasonable to think that the presence of a variety of functional elements comprised in the substituent can be responsible for temporary interactions with protein residues, which might accelerate binding and/or slow down unbinding. Our simulations were able to distinguish the kinetic profile of compounds 1 and 2 very well, identifying a marked difference, of about 280 ns, between their computationally derived exit times (Table 2). Concerning the indazole-containing chemicals, namely 3 to 6, the differences in terms of structure were less dramatic. However, contrary to the

ACS Paragon Plus Environment

20

Page 21 of 55 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

quinazoline analogs, for which the bulky moiety was plunged into the solvent, here the varying substituents lie buried within the deep pocket of the binding site, which is made up exclusively of hydrophobic residues (Figure 4). Therefore, the presence of these substituent groups is expected to have great influence in the process of ligand unbinding, as additional interactions need to be disrupted to allow for the ligands’ dissociation from the binding site. Compounds 3 and 4 display a difference in ring size, as a piperidine in the former is replaced by a pyrrolidine in the latter. Differently, both 5 and 6 possess an oxygen atom that increases the polarity of the moiety, which is part of the ring in compound 5 resulting in a morpholine, while it is present as a methoxy group attached to the piperidine in compound 6. Basing on the experimental residence times reported in Table 1, compound 5 is unbinding more than 300 s faster than 3, which in turn shows an almost 700 s shorter residence time with respect to compound 4. Compound 6 is the slowest unbinder, requiring over 2000 s more than compound 4 to dissociate. The separations obtained from our simulations for compounds 3, 4 and 5 were distributed accordingly, reproducing the ranking resulting from the SPR experiments. Indeed, as shown in Table 2, compound 3 was characterized by an average unbinding time that was about 30 ns longer than the one obtained for compound 5. Compound 4 was in the computation, about 40 ns slower than compound 3 to dissociate from the protein.

ACS Paragon Plus Environment

21

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 55

Figure 4. The hydrophobic pocket. Focus on the binding mode of compound 2 is given: while maintaining the driving interaction with Asp93 (lower side of the picture, colors consistent with Figure 1B), the isoindoline moiety of the ligand is fully embedded inside a hydrophobic cavity comprising residues Met98, Leu103, Leu107, Phe138, Tyr139, Val150, Trp162 (in orange, stick representation); conversely, the bulky substituent on the quinazoline scaffold protrudes into the solvent, without engaging relevant interactions with the protein.

Despite the reliable reproduction of the experimental data for these drug-like chemotypes, it was not possible to rank compound 6 correctly. Within this series, compound 6 is undoubtedly the most structurally diverse with respect to the other three compounds. While it shares the piperidine moiety with compound 4, it also includes a methoxy group attached to the ring, which places a polar oxygen in a region that is relatively close to the one occupied by the in-ring oxygen in compound 5. From a structure-kinetics relationship standpoint, the presence of this methoxy substituent was predicted to accelerate dissociation of the ligand from the protein, similarly as for compound 5, possessing the in-ring oxygen, in contrast to experimental data. The polar oxygen in

ACS Paragon Plus Environment

22

Page 23 of 55 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

the morpholine ring (compound 5) however seems more exposed and therefore unfavorable in the hydrophobic pocket. Whereas, the rather buried oxygen in the methoxy substitution is more shielded, exposing the hydrophobic methyl group which is adapting to the hydrophobic pocket. In other words, in our simulations, 6 was predicted as an in-between situation of compounds 4 and 5. While this discrepancy may be ascribed to limitations of the force field in effectively discriminating such subtleties, it has to be kept in mind that this behavior might have been amplified by the choice of a low scaling factor (0.45). Furthermore, we examined structurally dissimilar compounds displaying comparable kinetic behavior. The smallest compounds in our dataset, compounds 1 and 7, possess significantly diverse structures and scaffolds. Compound 7 lacks the substituent reaching into the deep hydrophobic pocket of the binding site, which can be found in all other compounds (Figure 1B). Despite the overall structural difference, these compounds displayed residence times in the same timescale from the SPR experiments. Accordingly, the two species exhibited the lowest computed residence times. The more structurally complex and diverse compounds 2 and 4 both required more, but still comparable time to leave the protein binding site. Such behavior, expressed by experimental residence times in the order of 1000 s, was also confirmed by our computations, resulting in average unbinding times of more than 200 ns. Moreover, despite the profound structural differences, they were ranked correctly by the sMD runs, with compound 2 being predicted to take more time to leave the binding pocket. By gathering the data from the entire set of compounds (1 to 7, Table 1), we evaluated the degree of correlation between simulations and experiments. To this aim, a linear regression procedure was applied, in which the effect of employing a scaling factor in the simulations was taken into account (see Figure 5). Including all of the compounds in the dataset (Figure 5A) resulted in a Spearman

ACS Paragon Plus Environment

23

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 55

ρ of 0.89, while p-values were significant (P < 0.05). Pearson’s product-moment correlation resulted in an R of 0.73 and R2 = 0.53.

Figure 5. Normalized calculated residence times plotted against normalized experimental residence times. Experimental values were normalized according to

𝜆 𝜏exp 𝑖

( ) , in which the scaling 𝜏exp max

factor 𝜆 equals 0.45. Computational residence times were normalized according to

( ). Error 𝜏calc 𝑖

𝜏calc max

bars refer to the standard deviations σ reported in Table 2. The trend line from simple linear regression is shown, along with the corresponding R2 values. A) Results obtained including all of the seven considered compounds. Pearson Correlation Coefficient of R = 0.73 and R2 = 0.53 were obtained. B) After excluding the single outlier- compound 6- the ranking for the remaining 6 compounds was reproduced correctly (Spearman rank correlation = 1). Accordingly, the R2 increased to 0.89.

ACS Paragon Plus Environment

24

Page 25 of 55 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

It is clear from the plot that the result obtained for compound 6 significantly affected the correlation. Therefore, we examined more thoroughly the results for this compound. Investigation of the studentized residuals, resulting in a value of t > 3, (t = -3,75), identified it as an outlier. Thus, the linear regression procedure was repeated excluding this compound. The Spearman rank correlation gave a ρ of 1, which indicates a correct ranking reproduction, while p-values were significant (P < 0.01). Pearson’s product-moment correlation resulted in an R of 0.94 and R2 = 0.89, likewise showing significant p-values (P < 0.01). Thus, except for the outlier compound 6, we can conclude that the prioritization obtained from our simulations was in good agreement to the kinetic experimental data (Figure 5B). Most important, even with a relatively aggressive scaling factor of 0.45, it was possible to clearly discern very fast binders from slow binders, further pointing to this approach as a suitable tool to prioritize compounds for subsequent medicinal chemistry campaigns.

Similarity analysis of unbinding pathways. We further investigated the sMD trajectories in order to retrieve relevant mechanistic details about the unbinding process and depict structurekinetics relationships. In drug discovery, a major goal is identifying novel drug-like molecules that are able to bind and modulate the activity of a pharmacological target of interest. In this perspective, understanding which possible pathways can be followed to access the target binding site and the corresponding molecular features involved can be particularly key. Notably, multiple and complex routes can be involved in (un)binding processes. On one hand, the complexity varies as a function of the pocket shape, location, and nature of the residues making up the way leading to the binding site. On the other hand, additional complications are introduced by both the degrees of freedom of the ligand, and the variety of its functional groups, which can interact with the target protein along

ACS Paragon Plus Environment

25

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 55

the binding pathway. Therefore, recognizing and classifying the possible routes becomes a nontrivial task. Herein, to face this challenge, we applied a strategy based on dimensionality reduction and cluster analysis. Specifically, we tracked the Cartesian coordinates of the ligands’ heavy atoms along the unbinding trajectories and used these points to embed a low-D (3D) space. To this end, we took advantage of Isomap, a non-linear dimensionality reduction technique. Once the trajectories were projected in the low-D space, we evaluated their similarity calculating the Fréchet distance, which we employed as the metric for subsequent clustering. We applied this scheme to all the considered compounds (Table 1), made exception for compound 6 that, as discussed above, behaved as an outlier. Both, non-linear multidimensional scaling and the Fréchet distance have been extensively used to analyze MD trajectories.89,90 However, to the best of our knowledge, this is the first time they have been used in combination and applied to identify path similarities for the unbinding of drug-like molecules. In Figure 6, we provide an illustration of the strategy devised, using the data obtained from compound 1. First, the relevant configurations visited by the ligand during unbinding in all of the performed trajectories were used as an input for the dimensionality reduction procedure (see Figure 2 for a graphical illustration of the procedure). As a result, a low-D space was obtained, made up of the projection of the initial points in this new space (Figure 6A, grey points). Individual trajectories could then be followed in this new space with reduced dimensionality (Figure 6A, colored lines and points). Subsequently, the mutual distances between all of the trajectories available for the ligand (25 in the present case) were evaluated and stored in a squared matrix, shown as a heatmap in Figure 6B. A hierarchical cluster analysis (see Methods) was then performed on the obtained matrix in order to identify similar pathways (Figure 6B, dendrogram on top and on the side of the heatmap). For instance, according to Figure 6B, the path pair 3 and

ACS Paragon Plus Environment

26

Page 27 of 55 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

16 resulted to be very similar for compound 1. This similarity could be visualized and confirmed in the low-D space (Figure 6A, pathways in orange scale). Similar reasoning holds for paths 14 and 18 (Figure 6A, blue scale). As a further confirmation, we inspected the configurations sampled by these trajectories in the high-D space. In the left-hand panels of Figure 6C and 6D, the COM (center of mass) position during the unbinding event in the high-D space was tracked for pathway pairs 14 and 18, and 3 and 16, respectively (colors are consistent to those used in Figure 6A). The right-hand panels of Figure 6C and 6D show instead sample configurations (I and II) visited along the process. Figure 6C clearly shows that the configurations (I and II) for each of the two trajectories (14 and 18, light blue and dark blue, respectively) overlay almost perfectly. We can observe a similar pattern for pathways 3 and 16, as shown in Figure 6D. Therefore, the low Fréchet distance resulted as a suitable parameter to successfully identifying high similarity within a set of unbinding trajectories. Moreover, by comparing Figures 6C and 6D, it can be seen that the ligand is found in remarkably different orientations (blue scale and orange scale configurations, respectively). Indeed, pair 14 and 18, and pair 3 and 16 belong to different clusters, as indicated by the dendrograms (Figure 6B). Importantly, even though the COM trace in the Cartesian space is similar for pathways 3, 16 and 14, 18 (left-hand panel in Figures 6C and 6D), these two clusters of trajectories are highly separated in the low-D space, reflecting their mechanistic dissimilarity. Indeed, while pathways 14 and 18 depict a scenario where the quinazoline ring unbinds first, followed by the rest of the molecule, the opposite situation is portrayed by pathways 3 and 16.

ACS Paragon Plus Environment

27

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 28 of 55

Figure 6. Dimensionality reduction and dendrogram interpretation of the unbinding trajectories. A) Ligand configurations (grey points) in the low-D space. Projections of sample trajectories (pathways 3 and 16, in orange and yellow. Pathways 14 and 18 in light blue and dark blue) are highlighted as colored lines and points. B) Heatmap of the Fréchet distances between the 25 unbinding trajectories (labels go from 1 to 25). The color scale indicates high similarity in white (low values) and low similarity in blue (higher values). The result of the hierarchical cluster analysis on this matrix is represented on top and on the side of the heatmap in form of a dendrogram. We note that the unbinding pathways are reordered in the Fréchet matrix in such a way to reflect their relative similarity in the dendrogram. C) Left panel: the ligand COM coordinates are tracked along the unbinding route in the high-D space from the initial bound pose (black dot) to the bulk

ACS Paragon Plus Environment

28

Page 29 of 55 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

solvent. Two sample pathways, 14 and 18, are represented. Right panel: sample configurations visited during the unbinding process. Colors are consistent to section A. D) Same scheme indicated for section C, but relative to trajectories 3 and 16. MD simulation data of compound 1 were used to produce the figure.

Monitoring and comparing individual configurations visited in different runs would not be manageable when handling large sets of trajectories. The similarities/differences were instead relatively straightforward to determine through our procedure in the low-D projection. Interestingly, through the analysis of the low-D space for all the considered ligands, we were furthermore able to quickly identify significantly different exit routes. While in most of the trajectories the ligand reaches the bulk solvent through the solvent-exposed entrance of the binding site, a passage under -helix3 was incidentally followed (Figure 7A). In fact, while the vast majority of configurations are concentrated in the same volume (Figure 7A, grey points), a region separated from the rest, corresponding to the mentioned alternative path (Figure 7A and 7B, red line and points) could be observed. The heatmaps of the Fréchet distances for compounds 2-5 and 7 are reported in Figure S2.

ACS Paragon Plus Environment

29

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 30 of 55

Figure 7. The significantly different exit route visited by compound 5. A) In the low-D space, most of the data (grey points) are concentrated in the same region, corresponding to the ligand leaving the binding site at the solvent-exposed side of the pocket (sample pathways in green scale). A less populated agglomeration of points was evident, highlighted in red by the corresponding trajectory. B) The ligand COM coordinates are tracked along the unbinding route from the initial bound pose (black dot) to the bulk in the high-D. In red, the unusual passage under -helix3 is highlighted while the green points represent the most conventional unbinding pathways.

Potential implications for drug design. The different average unbinding times recorded (Table 2) for the considered ligands were the result of protein-ligand complex disruptions which took place in different ranges of simulation times. Taken together with the pathway classifications that can be achieved through the proposed strategy, another interesting aspect to investigate was a potential correlation between unbinding routes and residence times. It is indeed reasonable to think that, as long as the kinetic model provided by sMD is able to correctly rank the computed residence times for the investigated compounds, at least part of this information should reflect mechanistic aspects of the unbinding pathways. We therefore examined our trajectories to identify relevant

ACS Paragon Plus Environment

30

Page 31 of 55 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

mechanistic and structural features of ligand unbinding. Even though those features could only provide some general indications due to the relatively aggressive scaling factor, it is nonetheless interesting to gather all the potentially relevant information for kinetic-based drug design. In Table S3, we report the results of k-means clustering on the unbinding pathways together with the exit times averaged over all the members of each cluster (tav). As the Table shows, while no clear-cut correlation between geometric features and temporal behavior was found, a certain partition of pathways consistent with similar timescales could be inferred in some cases. This is particularly interesting for compound 1, for which the most populated cluster (cluster #1) is associated to the fastest unbinding trajectories (tav = 5.16 ns). Notably, the previously discussed unbinding pathways 14 and 18 (see Figure 6) are found within this cluster. In contrast, pathways 3 and 16 are grouped in cluster #2, which corresponds to the lowest average exit time recorded for this compound (tav = 12.17 ns). The trajectory analysis suggests that pathways characterized by an early exposure of the quinazoline ring to the solvent, with a more pronounced insertion of the dihydro-isoindole in the hydrophobic pocket (cluster #1), are intrinsically faster dissociating than those for which the quinazoline ring leaves later (cluster #2). Such observation implies that, in order to increase the residence time of this compound, one might operate by reducing the relative population of trajectories represented by cluster #1 in favor of those belonging to cluster #2. This can be obtained either by stabilizing the interactions of the quinazoline with proper substituents or by increasing the lipophilicity of the dihydro-isoindole inside the deep hydrophobic pocket. Both options would lead to a stabilization of the metastable states encountered along the cluster #1 pathway and, ultimately, slow down the entire unbinding process. A similar reasoning applies to compound 2, where the most populated cluster (cluster #3) accounts for the slowest unbinding pathways (tav = 490.63 ns). If we compare the representative

ACS Paragon Plus Environment

31

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 32 of 55

pathway of this cluster (run 20, also associated to the lowest individual exit time of 931.93 ns) with the one belonging to the fastest unbinding trajectories (cluster #2, representative member: run 25), it is possible to notice that similar configurations can be identified along the egress process. In particular, as reported in Figure 8A, both pathways show a ligand configuration in which the dihydro-isoindole is deeply buried inside the hydrophobic pocket, while the polar moiety of the molecule reaches towards the solvent exposed part of the binding site. While a salt bridge between compound 2 and Asp54 can be established in the representative member of cluster #3, this interaction was never found in cluster #2. We therefore hypothesize that this specific ionic interaction leads to higher residence times. Its preferential formation could be intentionally exploited in the optimization phase of drug design programs. While gathering kinetic information from a series of unbinding trajectories of the same ligand is certainly interesting, in the context of a drug design process it is often more relevant to compare the kinetic behavior of structurally related compounds. The dataset employed in this study allows a direct comparison of the two structurally related drug-like molecules, compounds 4 and 5, where the indazole scaffold is decorated with a piperidine substituent in the former and a morpholine group in the latter. This substitution is associated with one order of magnitude higher affinity in favor of compound 4 and a longer residence time (Table 1). By comparing the representative pathways of the most populated clusters for the two compounds (cluster #2, run 23 and cluster #3, run 22 for 4 and 5, respectively) it is possible to notice that in both cases the ligand protrudes further into the hydrophobic pocket (Figure 8B) before leaving the binding site. However, this configuration would correspond to more favorable contacts inside the hydrophobic pocket only for compound 4, possessing a more hydrophobic ring than compound 5. This gives rise to the longer

ACS Paragon Plus Environment

32

Page 33 of 55 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

residence time determined for compound 4. Again, the analysis suggests that more hydrophobic, and possibly bulkier, substituents might lead to a further decrease of koff for this molecular scaffold.

Figure 8. A) Similar configurations extracted from representative pathways of the most populated cluster (cluster #3, run 20, cyan sticks) and the cluster associated to the fastest average exit time (cluster #2, run 25, pink sticks) for compound 2. The additional salt bridge established by compound 2 and Asp54 is only observed in the representative pathway of cluster #3. B) Comparison of configurations extracted from the representative pathways of the most populated clusters for compound 4 and 5 (cyan and pink sticks, respectively).

ACS Paragon Plus Environment

33

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 34 of 55

We wish to underline that the previously reported discussion is partly based on a speculative interpretation of unbinding pathways. Indeed, further simulations, possibly at higher level of computation, would be required to fully address the aspects highlighted by our trajectory analysis and to detect statistically significant differences in the mean exit times between distinct clusters. Approaching the problem by employing a more conservative λ, and possibly relying on more statistics of unbinding events, would provide a better suited set of data to investigate the presence of correlations between geometric features of trajectories and unbinding timescales. However, this would translate into larger requirements in terms of computational resources, and most likely would imply focusing only on single ligands, rather than a set of compounds, diminishing the drug discovery-oriented appeal of the proposed methodology. Comparison with related methods. The sMD method has been largely employed in both retrospective13,49 and prospective studies,50 and can be nowadays considered as a well-established methodology for compound ranking based on computed residence times. However, while this work was already at an advanced stage of development, a similar approach based on RAMD was presented (τRAMD) and applied to the same pharmaceutical target investigated in this study.40 It is therefore interesting to compare the main features of the two methodologies which undoubtedly share some conceptual overlap. Indeed, while in sMD the effective temperature of the simulated system is enhanced by rescaling the potential energy function provided by the force field, in τRAMD an additional force with random orientation is applied to the ligand. In this way, the egress of compounds from their binding site is facilitated, and the computed residence time, defined as the simulation time required for ligand dissociation in half of the runs, can be obtained at affordable timescales. Thus, both methods share the presence of a tunable parameter (the scaling factor for

ACS Paragon Plus Environment

34

Page 35 of 55 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

sMD and the magnitude of the force for τRAMD) whose choice is critical in determining the accuracy-to-efficiency ratio of compound ranking. Both the scaling factor and the magnitude of the force must be chosen in a way to observe enough unbinding events to ensure acceptable statistics for all the compounds in the dataset at an affordable computational time. The number of unbinding events is typically set to 25 in case of sMD, whereas in τRAMD a more statistically oriented procedure is employed to reduce the uncertainty of results. In the work by Kokh et al., a total number of RAMD simulations ranging from 40 to 200 per compound was employed.40 That being said, the dissociation time for τRAMD was on the order of 1-10 ns timescales (with a magnitude of the random force of 14 kcal mol-1 Å-1), whereas in our case up to almost 1 μs of simulation was required to observe a single unbinding event for compound 2 at the relatively aggressive scaling factor of 0.45. Notably, this difference in computational efficiency allowed Kokh et al. to investigate a one order of magnitude larger dataset compared to ours. A side effect of using a scaled potential energy surface is that in sMD all the degrees of freedom are accelerated, requiring to preserve the secondary structure of the protein with additional restraints. Moreover, at the same time, enough conformational freedom must be allowed to the binding site in order not to interfere with the accelerated dynamics of the ligand. While a standard recipe to satisfy this requirement is available,13 it is important to note that τRAMD is devoid from this potential source of complication. We therefore conclude that both methods are well-equipped to provide an efficient and reliable estimation of relative residence time for drug-target complexes. Further investigation, possibly involving more difficult test cases, will ultimately be required to assess the relative merits of these important tools for drug discovery and development.

Conclusion and Future Perspective

ACS Paragon Plus Environment

35

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 36 of 55

The importance of taking into account the kinetic behavior of drugs, in addition to their affinity, has become a widely accepted concept in drug discovery. In spite of this, implementing the measurements of kinetic parameters in the early stages of the drug discovery process has proven to be difficult. While in silico methods are increasingly employed to assist the whole drug discovery process, so far, no well-established computational methods are available to routinely predict drugs’ off rates. Here, we investigated the capability of scaled Molecular Dynamics simulations to predict the residence time and the unbinding routes for a series of Hsp90 inhibitors. We challenged the methodology by examining very diverse structures of lead-like molecules, as it might be the case in a real drug discovery scenario. Notably, the experimental residence times of the examined compounds covered a range of three orders of magnitude, spanning from less than two seconds up to almost one hour. To the best of our knowledge, this is the dataset exhibiting the largest range in residence time examined with scaled MD simulations up to now. First, we have showed that structurally similar compounds could be ranked reliably, even though one inhibitor turned out to be an outlier of our kinetic model. Structurally highly similar compounds displaying distinct kinetic differences proved difficult to differentiate in single runs. However, statistical significance and robustness of the observed events was assessed by bootstrapping analysis, and it was shown that even very subtle changes in chemical structures could be ranked in good agreement with experimental evidence. Besides, the proper kinetic behavior could be reproduced despite large structural changes. Then, we have devised a fully automated data analysis approach to detect similar features on the ensemble of unbinding trajectories. Despite the aggressive scaling factor employed in this work, the methodology seems to be promising. By integrating this information with the observed unbinding times, recurring dynamical features implicated in determining characteristic residence times could be ultimately identified. Finally, the information obtained can

ACS Paragon Plus Environment

36

Page 37 of 55 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

be further exploited to guide more accurate sampling strategies, which purposely aim at identifying minimum free energy paths and the associated potentials of mean force. Indeed, the low-D embedding can be directly used as a collective variable space to feed biased-MD simulations. This would lead to characterize more reliably the mechanism of the process, to quantify the involved energy barriers, and to identify accurately the relevant metastable states encountered during unbinding. This latter aspect of the data analysis will be further investigated in following studies.

ACS Paragon Plus Environment

37

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 38 of 55

ASSOCIATED CONTENT

* The Supporting Information is available free of charge on the ACS Publications website at DOI: xxx Data collection and refinement statistic of X-ray structure (PDF) Molecular formula strings and some data (CSV)

Accession Codes: The atomic coordinates have been deposited in the ProteinData Bank (PDB code 6HHR for compound 7). Authors will release the atomic coordinates and experimental data upon article publication.

Author information Co-Corresponding authors Matteo Masetti E-mail: [email protected]

Andrea Cavalli E-mail: [email protected]

Present Author Addresses:

ACS Paragon Plus Environment

38

Page 39 of 55 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

D.A.S.: IRIC - Institut de Recherche en Immunologie et en Cancérologie, Université de Montréal, 2950 Chemin de Polytechnique, Marcelle-Coutu Pavilion, Montréal, QC H3T 1J4, Canada

Author Contributions D.A.S. and M.Bernetti performed the simulations. D.A.S., M.M, M.Bernetti, and M.Bertazzo analyzed data. D.M. solved crystal structure PDB 6HHR. H-M.E. developed Hsp90 inhibitors described in this article. M.M., G.E., and A.C. supervised the studies. D.A.S., M.Bernetti, M.M., and M.R. wrote the manuscript.

Notes Andrea Cavalli is co-founder of BiKi Technologies, a startup company that develops methods based on molecular dynamics and related approaches for investigating protein-ligand (un)binding.

Acknowledgement This work was supported by the EU/EFPIA Innovative Medicines Initiative (IMI) Joint Undertaking, K4DD (grant no. 1115366). This paper reflects only the authors’ views and neither the IMI nor the European Commission is liable for any use that may be made of the information contained herein. The computational results presented have been achieved in part using the Vienna Scientific Cluster (VSC). D.A.S. thanks Sergio Decherchi for technical support with BiKi Netics, Riccardo Martini for help with Gimp and Prof. Stefan Boresch for MD discussions and useful suggestions. Furthermore, she thanks Prof. Thierry Langer for providing the HP cluster “Hydra” for many of the here presented simulations. We wish to thank Dario Gioia for useful discussions

ACS Paragon Plus Environment

39

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 40 of 55

regarding the analysis of unbinding pathways and for careful reading of the manuscript. Paolo Cinelli is also acknowledged for technical support.

References (1)

(2)

(3) (4) (5) (6) (7) (8) (9) (10) (11) (12)

(13)

Schuetz, D. A.; de Witte, W. E. A.; Wong, Y. C.; Knasmueller, B.; Richter, L.; Kokh, D. B.; Sadiq, S. K.; Bosma, R.; Nederpelt, I.; Heitman, L. H.; Segala, E.; Amaral, M.; Guo, D.; Andres, D.; Georgi, V.; Stoddart, L. A.; Hill, S.; Cooke, R. M.; De Graaf, C.; Leurs, R.; Frech, M.; Wade, R. C.; de Lange, E. C. M.; IJzerman, A. P.; Müller-Fahrnow, A.; Ecker, G. F. Kinetics for Drug Discovery: An Industry-Driven Effort to Target Drug Residence Time. Drug Discov. Today 2017, 22, 896–911. Spagnuolo, L. A.; Eltschkner, S.; Yu, W.; Daryaee, F.; Davoodi, S.; Knudson, S. E.; Allen, E. K. H.; Merino, J.; Pschibul, A.; Moree, B.; Thivalapill, N.; Truglio, J. J.; Salafsky, J.; Slayden, R. A.; Kisker, C.; Tonge, P. J. Evaluating the Contribution of Transition-State Destabilization to Changes in the Residence Time of Triazole-Based InhA Inhibitors. J. Am. Chem. Soc. 2017, 139, 3417–3429. Lu, H.; Tonge, P. J. Drug-Target Residence Time: Critical Information for Lead Optimization. Curr. Opin. Chem. Biol. 2010, 14, 467–474. Copeland, R. A. The Drug-Target Residence Time Model: A 10-Year Retrospective. Nat. Rev. Drug Discov. 2016, 15, 87–95. Bernetti, M.; Cavalli, A.; Mollica, L. Protein–Ligand (Un) Binding Kinetics as a New Paradigm for Drug Discovery at the Crossroad between Experiments and Modelling. Medchemcomm 2017, 8, 534–550. Pan, A. C.; Borhani, D. W.; Dror, R. O.; Shaw, D. E. Molecular Determinants of DrugReceptor Binding Kinetics. Drug Discov. Today 2013, 18, 667–673. Schoop, A.; Dey, F. On-Rate Based Optimization of Structure-Kinetic Relationship-Surfing the Kinetic Map. Drug Discov. Today. Technol. 2015, 17, 9–15. Copeland, R. A.; Pompliano, D. L.; Meek, T. D. Drug–Target Residence Time and Its Implications for Lead Optimization. Nat. Rev. Drug Discov. 2006, 5, 730–739. Klebe, G. Applying Thermodynamic Profiling in Lead Finding and Optimization. Nat. Rev. Drug Discov. 2015, 14, nrd4486. Kruse, A. C.; Hu, J.; Pan, A. C.; Arlow, D. H.; Rosenbaum, D. M.; Rosemond, E.; Green, H. F.; Liu, T.; Chae, P. S.; Dror, R. O. Structure and Dynamics of the M3 Muscarinic Acetylcholine Receptor. Nature 2012, 482, 552–556. Sykes, D. A.; Moore, H.; Stott, L.; Holliday, N.; Javitch, J. A.; Lane, J. R.; Charlton, S. J. Extrapyramidal Side Effects of Antipsychotics Are Linked to Their Association Kinetics at Dopamine D 2 Receptors. Nat. Commun. 2017, 8, 763. Walkup, G. K.; You, Z.; Ross, P. L.; Allen, E. K. H.; Daryaee, F.; Hale, M. R.; O’Donnell, J.; Ehmann, D. E.; Schuck, V. J. A.; Buurman, E. T.; Choy, A. L.; Hajec, L.; MurphyBenenato, K.; Marone, V.; Patey, S. A.; Grosser, L. A.; Johnstone, M.; Walker, S. G.; Tonge, P. J.; Fisher, S. L. Translating Slow-Binding Inhibition Kinetics into Cellular and in Vivo Effects. Nat. Chem. Biol. 2015, 11, 416–423. Mollica, L.; Decherchi, S.; Zia, S. R.; Gaspari, R.; Cavalli, A.; Rocchia, W. Kinetics of

ACS Paragon Plus Environment

40

Page 41 of 55 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(14) (15) (16) (17) (18)

(19)

(20)

(21) (22) (23) (24) (25) (26) (27) (28)

Protein-Ligand Unbinding via Smoothed Potential Molecular Dynamics Simulations. Sci. Rep. 2015, 5, 11539. Klebe, G. The Use of Thermodynamic and Kinetic Data in Drug Discovery: Decisive Insight or Increasing the Puzzlement? ChemMedChem 2015, 10, 229–231. Gioia, D.; Bertazzo, M.; Recanatini, M.; Cavalli, A. Dynamic Docking : A Paradigm Shift in Computational Drug Discovery. Molecules 2017, 22, 1–21. Faulon, J.-L.; Bender, A. Handbook of Chemoinformatics Algorithms; CRC press, 2010. De Vivo, M.; Masetti, M.; Bottegoni, G.; Cavalli, A. Role of Molecular Dynamics and Related Methods in Drug Discovery. J. Med. Chem. 2016, 59, 4035–4061. Swinney, D. C.; Beavis, P.; Chuang, K.-T.; Zheng, Y.; Lee, I.; Gee, P.; Deval, J.; Rotstein, D. M.; Dioszegi, M.; Ravendran, P.; Zhang, J.; Sankuratri, S.; Kondru, R.; Vauquelin, G. A Study of the Molecular Mechanism of Binding Kinetics and Long Residence Times of Human CCR5 Receptor Small Molecule Allosteric Ligands. Br. J. Pharmacol. 2014, 171, 3364–3375. Bull, H. G.; Garcia-Calvo, M.; Andersson, S.; Baginsky, W. F.; Chan, H. K.; Ellsworth, D. E.; Miller, R. R.; Stearns, R. A.; Bakshi, R. K.; Rasmusson, G. H.; Tolman, R. L.; Myers, R. W.; Kozarich, J. W.; Harris, G. S. Mechanism-Based Inhibition of Human Steroid 5αReductase by Finasteride:  Enzyme-Catalyzed Formation of NADP−Dihydrofinasteride, a Potent Bisubstrate Analog Inhibitor. J. Am. Chem. Soc. 1996, 118, 2359–2365. Shaw, D. E.; Chao, J. C.; Eastwood, M. P.; Gagliardo, J.; Grossman, J. P.; Ho, C. R.; Lerardi, D. J.; Kolossváry, I.; Klepeis, J. L.; Layman, T.; McLeavey, C.; Deneroff, M. M.; Moraes, M. A.; Mueller, R.; Priest, E. C.; Shan, Y.; Spengler, J.; Theobald, M.; Towles, B.; Wang, S. C.; Dror, R. O.; Kuskin, J. S.; Larson, R. H.; Salmon, J. K.; Young, C.; Batson, B.; Bowers, K. J. Anton, a Special-Purpose Machine for Molecular Dynamics Simulation. Commun. ACM 2008, 51, 91–97. Ferruz, N.; Harvey, M. J.; Mestres, J.; De Fabritiis, G. Insights from Fragment Hit Binding Assays by Molecular Simulations. J. Chem. Inf. Model 2015, No. 55, 2200–2205. Doerr, S.; De Fabritiis, G. On-the-Fly Learning and Sampling of Ligand Binding by HighThroughput Molecular Simulations. J. Chem. Theory Comput. 2014, 10, 2064–2069. Martínez-Rosell, G.; Giorgino, T.; Harvey, M. J.; de Fabritiis, G. Drug Discovery and Molecular Dynamics: Methods, Applications and Perspective Beyond the Second Timescale. Curr. Top. Med. Chem. 2017, 17, 2617–2625. Spitaleri, A.; Decherchi, S.; Cavalli, A.; Rocchia, W. Fast Dynamic Docking Guided by Adaptive Electrostatic Bias: The MD-Binding Approach. J. Chem. Theory Comput. 2018, 14, 1727–1736. Martinez-Rosell, G.; Harvey, M. J.; De Fabritiis, G. Molecular-Simulation-Driven Fragment Screening for the Discovery of New CXCL12 Inhibitors. J. Chem. Inf. Model. 2018, 58, 683–691. Sabbadin, D.; Moro, S. Supervised Molecular Dynamics (SuMD) as a Helpful Tool to Depict GPCR-Ligand Recognition Pathway in a Nanosecond Time Scale. J. Chem. Inf. Model. 2014, 54, 372–376. Bertazzo, M.; Bernetti, M.; Recanatini, M.; Masetti, M.; Cavalli, A. Fully Flexible Docking via Reaction-Coordinate-Independent Molecular Dynamics Simulations. J. Chem. Inf. Model. 2018, 58, 490–500. Amaro, R. E.; Mulholland, A. J. Multiscale Methods in Drug Design Bridge Chemical and Biological Complexity in the Search for Cures. Nat. Rev. Chem. 2018, 2, 148.

ACS Paragon Plus Environment

41

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(29) (30) (31) (32) (33) (34) (35) (36) (37) (38) (39)

(40)

(41) (42) (43) (44) (45) (46)

Page 42 of 55

Votapka, L. W.; Jagger, B. R.; Heyneman, A. L.; Amaro, R. E. SEEKR: Simulation Enabled Estimation of Kinetic Rates, a Computational Tool to Estimate Molecular Kinetics and Its Application to Trypsin–Benzamidine Binding. J. Phys. Chem. B 2017, 121, 3597–3606. Jagger, B. R.; Lee, C. T.; Amaro, R. E. Quantitative Ranking of Ligand Binding Kinetics with a Multiscale Milestoning Simulation Approach. J. Phys. Chem. Lett. 2018, 9, 4941– 4948. Yang, L.; Liu, C.-W.; Shao, Q.; Zhang, J.; Gao, Y. Q. From Thermodynamics to Kinetics: Enhanced Sampling of Rare Events. Acc. Chem. Res. 2015, 48, 947–955. Bruce, N. J.; Ganotra, G. K.; Kokh, D. B.; Sadiq, S. K.; Wade, R. C. New Approaches for Computing Ligand–Receptor Binding Kinetics. Curr. Opin. Struct. Biol. 2018, 49, 1–10. Tiwary, P.; Parrinello, M. From Metadynamics to Dynamics. Phys. Rev. Lett. 2013, 111, 230602. Tiwary, P.; Limongelli, V.; Salvalaglio, M.; Parrinello, M. Kinetics of Protein–Ligand Unbinding: Predicting Pathways, Rates, and Rate-Limiting Steps. Proc. Natl. Acad. Sci. 2015, 112, E386–E391. Tiwary, P.; Mondal, J.; Berne, B. J. How and When Does an Anticancer Drug Leave Its Binding Site? Sci. Adv. 2017, 3, e1700014. Casasnovas, R.; Limongelli, V.; Tiwary, P.; Carloni, P.; Parrinello, M. Unbinding Kinetics of a P38 MAP Kinase Type II Inhibitor from Metadynamics Simulations. J. Am. Chem. Soc. 2017, 139, 4780–4788. Hansmann, U. H. E. Parallel Tempering Algorithm for Conformational Studies of Biological Molecules. Chem. Phys. Lett. 1997, 281, 140–150. Sinko, W.; Miao, Y.; de Oliveira, C. A. F.; McCammon, J. A. Population Based Reweighting of Scaled Molecular Dynamics. J. Phys. Chem. B 2013, 117, 12759–12768. Lüdemann, S. K.; Lounnas, V.; Wade, R. C. How Do Substrates Enter and Products Exit the Buried Active Site of Cytochrome P450cam? 1. Random Expulsion Molecular Dynamics Investigation of Ligand Access Channels and Mechanisms11Edited by J. Thornton. J. Mol. Biol. 2000, 303, 797–811. Kokh, D. B.; Amaral, M.; Bomke, J.; Grädler, U.; Musil, D.; Buchstaller, H.-P.; Dreyer, M. K.; Frech, M.; Lowinski, M.; Vallee, F.; Bianciotto, M.; Rak, A.; Wade, R. C. Estimation of Drug-Target Residence Times by τ-Random Acceleration Molecular Dynamics Simulations. J. Chem. Theory Comput. 2018, 14, 3859–3869. Teo, I.; Mayne, C. G.; Schulten, K.; Lelièvre, T. Adaptive Multilevel Splitting Method for Molecular Dynamics Calculation of Benzamidine-Trypsin Dissociation Time. J. Chem. Theory Comput. 2016, 12, 2983–2989. Dickson, A.; Brooks, C. L. WExplore: Hierarchical Exploration of High-Dimensional Spaces Using the Weighted Ensemble Algorithm. J. Phys. Chem. B 2014, 118, 3532–3542. Dickson, A.; Lotz, S. D. Ligand Release Pathways Obtained with WExplore: Residence Times and Mechanisms. J. Phys. Chem. B 2016, 120, 5377–5385. Dickson, A.; Lotz, S. D. Multiple Ligand Unbinding Pathways and Ligand-Induced Destabilization Revealed by WExplore. Biophys. J. 2017, 112, 620–629. Lotz, S. D.; Dickson, A. Unbiased Molecular Dynamics of 11 Min Timescale Drug Unbinding Reveals Transition State Stabilizing Interactions. J. Am. Chem. Soc. 2018, 140, 618–628. Dixon, T.; Lotz, S. D.; Dickson, A. Predicting Ligand Binding Affinity Using On- and offRates for the SAMPL6 SAMPLing Challenge. J. Comput. Aided. Mol. Des. 2018.

ACS Paragon Plus Environment

42

Page 43 of 55 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(47) (48) (49)

(50) (51) (52) (53) (54) (55) (56) (57)

(58) (59) (60) (61) (62) (63)

Frank, A. T.; Andricioaei, I. Reaction Coordinate-Free Approach to Recovering Kinetics from Potential-Scaled Simulations: Application of Kramers’ Rate Theory. J. Phys. Chem. B 2016, 120, 8600–8605. Tsujishita, H.; Moriguchi, I.; Hirono, S. Potential-Scaled Molecular Dynamics and Potential Annealing: Effective Conformational Search Techniques for Biomolecules. J. Phys. Chem. 1993, 97, 4416–4420. Mollica, L.; Theret, I.; Antoine, M.; Perron-Sierra, F.; Charton, Y.; Fourquez, J.-M.; Wierzbicki, M.; Boutin, J. A.; Ferry, G.; Decherchi, S. Molecular Dynamics Simulations and Kinetic Measurements to Estimate and Predict Protein–Ligand Residence Times. J. Med. Chem. 2016, 59, 7167–7176. Bernetti, M.; Rosini, E.; Mollica, L.; Masetti, M.; Pollegioni, L.; Recanatini, M.; Cavalli, A. Binding Residence Time through Scaled Molecular Dynamics: A Prospective Application to HDAAO Inhibitors. J. Chem. Inf. Model. 2018. Schopf, F. H.; Biebl, M. M.; Buchner, J. The HSP90 Chaperone Machinery. Nat. Rev. Mol. Cell Biol. 2017, 18, 345–360. Prodromou, C. Regulatory Mechanisms of Hsp90. Biochem. Mol. Biol. J. 2017, 3, 2. Stebbins, C. E.; Russo, A. A.; Schneider, C.; Rosen, N.; Hartl, F. U.; Pavletich, N. P. Crystal Structure of an Hsp90–Geldanamycin Complex: Targeting of a Protein Chaperone by an Antitumor Agent. Cell 1997, 89, 239–250. Schuetz, D. A.; Seidel, T.; Garon, A.; Martini, R.; Körbel, M.; Ecker, G. F.; Langer, T. GRAIL: GRids of PhArmacophore Interaction FieLds. J. Chem. Theory Comput. 2018, 14, 4958–4970. Rohrdanz, M. A.; Zheng, W.; Clementi, C. Discovering Mountain Passes via Torchlight: Methods for the Definition of Reaction Coordinates and Pathways in Complex Macromolecular Reactions. Annu. Rev. Phys. Chem. 2013, 64, 295–316. Alt, H.; Godau, M. Computing the Fréchet Distance between Two Polygonal Curves. Int. J. Comput. Geom. Appl. 1995, 05, 75–91. Schuetz, D. A.; Richter, L.; Amaral, M.; Grandits, M.; Grädler, U.; Musil, D.; Buchstaller, H.-P.; Eggenweiler, H.-M.; Frech, M.; Ecker, G. F. Ligand Desolvation Steers On-Rate and Impacts Drug Residence Time of Heat Shock Protein 90 (Hsp90) Inhibitors. J. Med. Chem. 2018, 61, 4397–4411. Decherchi, S.; Bottegoni, G.; Spitaleri, A.; Rocchia, W.; Cavalli, A. BiKi Life Sciences: A New Suite for Molecular Dynamics and Related Methods in Drug Discovery. J. Chem. Inf. Model. 2018, 58, 219–224. Jorgensen, W. L.; Chandrasekhar, J.; Madura, J. D.; Impey, R. W.; Klein, M. L. Comparison of Simple Potential Functions for Simulating Liquid Water. J Chem Phys 1983, 79, 926– 935. Lindorff-Larsen, K.; Piana, S.; Palmo, K.; Maragakis, P.; Klepeis, J. L.; Dror, R. O.; Shaw, D. E. Improved Side-Chain Torsion Potentials for the Amber Ff99SB Protein Force Field. Proteins 2010, 78, 1950–1958. Wang, J.; Wang, W.; Kollman, P. A.; Case, D. A. Automatic Atom Type and Bond Type Perception in Molecular Mechanical Calculations. J. Mol. Graph. Model. 2006, 25, 247– 260. Wang, J.; Wolf, R. M.; Caldwell, J. W.; Kollman, P. A.; Case, D. A. Development and Testing of a General Amber Force Field. J. Comput. Chem. 2004, 25, 1157–1174. Cornell, W. D.; Cieplak, P.; Bayly, C. I.; Kollman, P. A. Application of RESP Charges to

ACS Paragon Plus Environment

43

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(64) (65) (66) (67) (68) (69) (70) (71) (72) (73) (74) (75) (76) (77) (78) (79) (80) (81) (82)

(83)

Page 44 of 55

Calculate Conformational Energies, Hydrogen Bond Energies, and Free Energies of Solvation. J. Am. Chem. Soc. 1993, 115, 9620–9631. Gaussian 03. Gaussian, Inc.: Wallingford CT 2004. Nosé, S.; Klein, M. l. Constant Pressure Molecular Dynamics for Molecular Systems. Mol. Phys. 1983, 50, 1055–1076. Parrinello, M.; Rahman, A. Polymorphic Transitions in Single Crystals: A New Molecular Dynamics Method. J. Appl. Phys. 1981, 52, 7182–7190. Berendsen, H. J. C.; van der Spoel, D.; van Drunen, R. GROMACS: A Message-Passing Parallel Molecular Dynamics Implementation. Comput. Phys. Commun. 1995, 91, 43–56. Hess, B.; Kutzner, C.; van der Spoel, D.; Lindahl, E. GROMACS 4:  Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation. J. Chem. Theory Comput. 2008, 4, 435–447. Van Der Spoel, D.; Lindahl, E.; Hess, B.; Groenhof, G.; Mark, A. E.; Berendsen, H. J. C. GROMACS: Fast, Flexible, and Free. J. Comput. Chem. 2005, 26, 1701–1718. Swope, W. C.; Andersen, H. C.; Berens, P. H.; Wilson, K. R. A Computer Simulation Method for the Calculation of Equilibrium Constants for the Formation of Physical Clusters of Molecules: Application to Small Water Clusters. J. Chem. Phys. 1982, 76, 637–649. Essmann, U.; Perera, L.; Berkowitz, M. L.; Darden, T.; Lee, H.; Pedersen, L. G. A Smooth Particle Mesh Ewald Method. J. Chem. Phys. 1995, 103, 8577–8593. Hess, B.; Bekker, H.; Berendsen, H. J. C.; Fraaije, J. G. E. M. LINCS : A Linear Constraint Solver for Molecular Simulations. J. Comput. Chem. 1997, 18, 1463–1472. Miyamoto, S.; Kollman, P. A. Settle: An Analytical Version of the SHAKE and RATTLE Algorithm for Rigid Water Models. J. Comput. Chem. 1992, 13, 952–962. Humphrey, W.; Dalke, A.; Schulten, K. VMD: Visual Molecular Dynamics. J. Mol. Graph. 1996, 14, 33–38. RStudio – Open Source and Enterprise-Ready Professional Software for R. RStudio, Inc.: Boston, MA 2018. Li, W.; Ma, A. Recent Developments in Methods for Identifying Reaction Coordinates. Mol. Simul. 2014, 40, 784–793. Hartmann, C.; Banisch, R.; Sarich, M.; Badowski, T.; Schütte, C. Characterization of Rare Events in Molecular Dynamics. Entropy 2013, 16, 350–376. Tenenbaum, J. B.; de Silva, V.; Langford, J. C. A Global Geometric Framework for Nonlinear Dimensionality Reduction. Science 2000, 290, 2319–2323. Python Library for Isometric Feature Mapping (Isomap) https://web.vscht.cz/~spiwokv/pysomap/index.html (accessed May 23, 2018). Rajan, A.; Freddolino, P. L.; Schulten, K. Going beyond Clustering in MD Trajectory Analysis: An Application to Villin Headpiece Folding. PLoS One 2010, 5, e9890. Spiwok, V.; Králová, B. Metadynamics in the Conformational Space Nonlinearly Dimensionally Reduced by Isomap. J. Chem. Phys. 2011, 135, 224504. Bonomi, M.; Branduardi, D.; Bussi, G.; Camilloni, C.; Provasi, D.; Raiteri, P.; Donadio, D.; Marinelli, F.; Pietrucci, F.; Broglia, R. A.; Parrinello, M. PLUMED: A Portable Plugin for Free-Energy Calculations with Molecular Dynamics. Comput. Phys. Commun. 2009, 180, 1961–1972. Das, P.; Moll, M.; Stamati, H.; Kavraki, L. E.; Clementi, C. Low-Dimensional, Free-Energy Landscapes of Protein-Folding Reactions by Nonlinear Dimensionality Reduction. Proc. Natl. Acad. Sci. U. S. A. 2006, 103, 9885–9890.

ACS Paragon Plus Environment

44

Page 45 of 55 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(84) (85) (86) (87) (88) (89) (90)

Banisch, R.; Vanden-Eijnden, E. Direct Generation of Loop-Erased Transition Paths in Non-Equilibrium Reactions. Faraday Discuss. 2017, 195, 443–468. R: The R Project for Statistical Computing. May 30, 2018. Lloyd, S. Least Squares Quantization in PCM. IEEE Trans. Inf. theory 1982, 28, 129–137. MacQueen, J. Some Methods for Classification and Analysis of Multivariate Observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability; University of California Press: Berkeley, CA, USA., 1967; Vol. 1, pp 281–297. Kaufman, L.; Rousseeuw, P. J. Finding Groups in Data: An Introduction to Cluster Analysis; John Wiley & Sons, 2009; Vol. 344. Seyler, S. L.; Kumar, A.; Thorpe, M. F.; Beckstein, O. Path Similarity Analysis: A Method for Quantifying Macromolecular Pathways. PLoS Comput. Biol. 2015, 11, e1004568. Rydzewski, J.; Nowak, W. Machine Learning Based Dimensionality Reduction Facilitates Ligand Diffusion Paths Assessment: A Case of Cytochrome P450cam. J. Chem. Theory Comput. 2016, 12, 2110–2120.

ACS Paragon Plus Environment

45

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 46 of 55

Table of Contents (TOC)

ACS Paragon Plus Environment

46

Page 47 of 55 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 1. A) The N-terminal domain of Hsp90 in complex with compound 2. The ligand is embedded inside the ATP binding pocket, whose hydrophobic and hydrophilic regions are herein highlighted in orange and blue, respectively. B) Focus on the bound states. Representative structures of the three scaffolds (compounds 2, 4 and 7, from top to bottom) are shown (only polar hydrogens are displayed). The hydrogen bond interaction with the key residue Asp93 is maintained by all of the three compounds. Moreover, all ligands accommodate the same regions of the binding pocket.

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 2. A) Schematic diagram of the clean-up stage of data analysis. B) Pictorial representation of frame removal from a hypothetical trajectory which folds back to itself. C) The main steps involved in the analysis of unbinding trajectories. After cleaning-up the trajectories, the pairwise distance calculated in the configurational space of the ligand (previously aligned on the protein’s frame) is computed using the whole ensemble of preserved frames (matrix D in the scheme) and embedded in the low-D space through Isomap. Then, each unbinding pathway corresponding to the previously obtained cleaned-up trajectories are projected in the low-D space. Finally, for each pair of unbinding pathways, the Fréchet distance is computed and stored in a symmetric matrix (F in the scheme) that can be further exploited for subsequent cluster analysis. 177x127mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 48 of 55

Page 49 of 55 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 3. 2D kinetic map of the association rate constants (kon) versus dissociation rate constants (koff) with iso-affinity lines for the seven compounds employed in this study. Individual ligand affinities are shown in color-scale. 83x68mm (300 x 300 DPI)

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 4. The hydrophobic pocket. Focus on the binding mode of compound 2 is given: while maintaining the driving interaction with Asp93 (lower side of the picture, colors consistent with Figure 1B), the isoindoline moiety of the ligand is fully embedded inside a hydrophobic cavity comprising residues Met98, Leu103, Leu107, Phe138, Tyr139, Val150, Trp162 (in orange, stick representation); conversely, the bulky substituent on the quinazoline scaffold protrudes into the solvent, without engaging relevant interactions with the protein. 83x82mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 50 of 55

Page 51 of 55 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 5. Normalized calculated residence times plotted against normalized experimental residence times. Experimental values were normalized according to ((τ_i^exp )/(τ_max^exp ))^λ, in which the scaling factor λ equals 0.45. Computational residence times were normalized according to ((τ_i^calc )/(τ_max^calc )). Error bars refer to the standard deviations σ reported in Table 2. The trend line from simple linear regression is shown, along with the corresponding R2 values. A) Results obtained including all of the seven considered compounds. Pearson Correlation Coefficient of R = 0.73 and R2 = 0.53 were obtained. B) After excluding the single outlier- compound 6- the ranking for the remaining 6 compounds was reproduced correctly (Spearman rank correlation = 1). Accordingly, the R2 increased to 0.89.

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 6. Dimensionality reduction and dendrogram interpretation of the unbinding trajectories. A) Ligand configurations (grey points) in the low-D space. Projections of sample trajectories (pathways 3 and 16, in orange and yellow. Pathways 14 and 18 in light blue and dark blue) are highlighted as colored lines and points. B) Heatmap of the Fréchet distances between the 25 unbinding trajectories (labels go from 1 to 25). The color scale indicates high similarity in white (low values) and low similarity in blue (higher values). The result of the hierarchical cluster analysis on this matrix is represented on top and on the side of the heatmap in form of a dendrogram. We note that the unbinding pathways are reordered in the Fréchet matrix in such a way to reflect their relative similarity in the dendrogram. C) Left panel: the ligand COM coordinates are tracked along the unbinding route in the high-D space from the initial bound pose (black dot) to the bulk solvent. Two sample pathways, 14 and 18, are represented. Right panel: sample configurations visited during the unbinding process. Colors are consistent to section A. D) Same scheme indicated for section C, but relative to trajectories 3 and 16. MD simulation data of compound 1 were used to produce the figure.

ACS Paragon Plus Environment

Page 52 of 55

Page 53 of 55 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 7. The significantly different exit route visited by compound 5. A) In the low-D space, most of the data (grey points) are concentrated in the same region, corresponding to the ligand leaving the binding site at the solvent-exposed side of the pocket (sample pathways in green scale). A less populated agglomeration of points was evident, highlighted in red by the corresponding trajectory. B) The ligand COM coordinates are tracked along the unbinding route from the initial bound pose (black dot) to the bulk in the high-D. In red, the unusual passage under α-helix3 is highlighted while the green points represent the most conventional unbinding pathways.

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 8. A) Similar configurations extracted from representative pathways of the most populated cluster (cluster #3, run 20, cyan sticks) and the cluster associated to the fastest average exit time (cluster #2, run 25, pink sticks) for compound 2. The additional salt bridge established by compound 2 and Asp54 is only observed in the representative pathway of cluster #3. B) Comparison of configurations extracted from the representative pathways of the most populated clusters for compound 4 and 5 (cyan and pink sticks, respectively).

ACS Paragon Plus Environment

Page 54 of 55

Page 55 of 55 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Table of Contents graphic

ACS Paragon Plus Environment