Fully Flexible Docking via Reaction-Coordinate-Independent

Jan 29, 2018 - (5, 6) For efficiency, most docking programs neglect target rearrangement upon binding. At best, they account for it through approximat...
1 downloads 9 Views 2MB Size
Subscriber access provided by UNIV OF NEW ENGLAND ARMIDALE

Article

Fully Flexible Docking Via Reaction-CoordinateIndependent Molecular Dynamics Simulations Martina Bertazzo, Mattia Bernetti, Maurizio Recanatini, Matteo Masetti, and Andrea Cavalli J. Chem. Inf. Model., Just Accepted Manuscript • DOI: 10.1021/acs.jcim.7b00674 • Publication Date (Web): 29 Jan 2018 Downloaded from http://pubs.acs.org on January 30, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Chemical Information and Modeling is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 42 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Fully Flexible Docking Via Reaction-CoordinateIndependent Molecular Dynamics Simulations Martina Bertazzo,†,‡ Mattia Bernetti,†,‡ Maurizio Recanatini,† Matteo Masetti,*,† and Andrea Cavalli*,†,‡ †

Department of Pharmacy and Biotechnology, Alma Mater Studiorum – Università di Bologna,

Via Belmeloro 6, 40126, Bologna, Italy ‡

CompuNet, Istituto Italiano di Tecnologia, Via Morego 30, 16163, Genova, Italy

ABSTRACT. Predicting the geometry of protein-ligand binding complexes is of primary importance for structure-based drug discovery. Molecular dynamics (MD) is emerging as a reliable computational tool for use in conjunction with, or an alternative to, docking methods. However, simulating the protein-ligand binding process often requires very expensive simulations. This drastically limits the practical application of MD-based approaches. Here, we propose a general framework to accelerate the generation of putative protein-ligand binding modes using potential-scaled MD simulations. The proposed dynamical protocol has been applied to two pharmaceutically relevant systems (GSK-3β and the N-terminal domain of HSP90α). Our approach is fully independent of any predefined reaction coordinate (or collective

1

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 42

variable). It identified the correct binding mode of several ligands and can thus save valuable computational time in dynamic docking simulations.

1. INTRODUCTION. Accurately predicting the geometry of protein-ligand complexes is a challenging task in structure-based drug design. It is also essential in order to accelerate the identification and development of new drug candidates.1 Molecular docking has become the method of choice to address this problem because it is highly efficient and relatively straightforward to use.2,3 Notwithstanding many successful reports,4 common docking programs typically suffer from reliability issues, especially when protein conformational changes play a major role during the association process.5,6 For efficiency, most docking programs neglect target rearrangement upon binding. At best, they account for it through approximations.7 The use of scoring functions to evaluate binding-mode energetics is another well-known source of inaccuracies for this class of methods.8 These functions are commonly used as surrogates of binding-free-energy estimators, but their effectiveness is hampered by problematic contributions, such as solvation/desolvation effects and entropy changes upon binding.9,10 To reliably predict protein-ligand binding geometries and related thermodynamic quantities, it is necessary to capture the full conformational flexibility of both binding partners and explicit solvent models. In this respect, molecular dynamics (MD) simulations can provide a valuable alternative to conventional docking calculations. This is because a complete description of the protein-ligand binding process can, in principle, be obtained at different degrees of accuracy.11

2

ACS Paragon Plus Environment

Page 3 of 42 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

In recent years, there has been substantial progress in hardware and software, with a few studies demonstrating that it is becoming feasible to observe spontaneous protein-ligand associations via plain MD.12-15 Nevertheless, achieving proper statistics of these rare events generally requires a significant number of simulations on the microsecond timescale. As such, this strategy requires high-level computational facilities and is only occasionally affordable.13 Over the years, a number of theoretical approaches, usually referred to as “enhanced sampling methods”, have been developed to accelerate the investigation of rare events on affordable timescales.14 These methods can be classified into two broad categories: those that require the definition of reaction coordinates (or collective variables (CVs)), and those that do not. The former category includes umbrella sampling,15 steered MD,16 and metadynamics.17 The latter category includes temperature/Hamiltonian replica exchange MD, accelerated MD,18 and potential-scaled MD (sMD).19,20 The main advantage of CV-biasing is that the sampling acceleration is guided towards the event one wishes to observe.13 However, a major drawback is that choosing proper CVs is often far from trivial. This is because they are case-dependent and generally difficult to devise a priori. This is especially true for protein-ligand (un)binding,21 where complex routes require multiple CVs or even path-based descriptions.22,23 Remarkably, researchers recently reported a supervised MD protocol, which was specifically designed for efficient protein-ligand binding studies.24,25 This methodology does not require any external forces to dynamically dock ligands into proteins. However, a priori knowledge of a reaction coordinate is needed to allow the system to evolve towards the final binding pose. Here, we propose a general framework (Figure 1) for dynamic docking based on the potentialscaled MD method.20 Notably, sMD was recently used to rank the off-rates for congeneric series 3

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 42

of compounds.26,27 However, to the best of our knowledge, this is the first application of sMD for dynamic docking simulations.28 As such, our approach is fully independent of any reaction coordinate or CV. It is unsupervised, fully automated, and can easily be implemented in several MD-based codes. The protocol was validated against its ability to reproduce the experimental binding modes of two pharmaceutically relevant systems, glycogen synthase kinase 3β (GSK-3β) and the N-terminal domain of heat shock protein 90-α (HSP90α). Specifically, five ligands were considered for each system. Our procedure reproduced the experimental binding modes for all the cases examined. Moreover, we show how the method can be used prospectively by exploiting a sufficient number of repeated simulations and taking advantage of appropriate data analysis.

2. RESULTS AND DISCUSSION 2.1. Dynamic docking workflow. Our framework for dynamic docking is essentially based on performing repeated sMD simulations of protein-ligand complexes started from the unbound states. In sMD, the sampling is enhanced by modifying the potential energy function of the force field by a scaling factor (λ) between 0 (all the interactions are switched off) and 1 (plain MD). By properly choosing λ, it is possible to efficiently cross energy barriers, thus accelerating the transitions between distinct configurational states of the system. This strategy is equivalent to performing MD simulations at high temperature with small time-steps. Accordingly, simulating protein-ligand binding processes under scaled potential energy conditions means that all the degrees of freedom of the system are accelerated in a nonspecific way. Thus, without proper precautions, the potential energy scaling can become highly detrimental, leading to either protein unfolding or an unnecessary increase of the accessible configurational space of unbound ligands 4

ACS Paragon Plus Environment

Page 5 of 42 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(or both). These effects can be minimized by choosing a conservative value of λ, say close to 1. However, reaching an optimal tradeoff by tuning λ can be time-consuming and case-dependent. Therefore, we decided to pursue a more generalizable strategy. We used an intermediate scaling factor equal to 0.5 (but other values can be chosen without loss of generality). The choice of this value resulted as the best tradeoff between accuracy and the possibility of observing binding events in the hundreds of nanoseconds timescale. Then, we added two key ingredients to make the approach effective: an elastic network model (ENM) and a cylindrical volumetric confinement. First, the ENM was introduced to preserve the overall protein-folding during sMD simulations.29,30 Specifically, we adopted an isotropic ENM, which acted on all the Cα atoms except those belonging to the binding site, which was kept fully flexible (see Methods for further details). Notably, the ENM was parameterized through an iterative optimization strategy so as to satisfy the average Cα pair distances measured by hundreds of nanoseconds of plain MD simulations of the protein in the unbound state. Through a rather inexpensive iterative procedure, the force constants of the network can be easily optimized to match the natural fluctuations (i.e. Root Mean Square Fluctuations, RMSF) experienced by the apo protein (Figure 2). Moreover, we verified that the direction of the widest collective motions was also preserved in sMD simulations to some extent. In particular, the Root Mean Square Inner Product (RMSIP) between the first 10 eigenvectors was used to compare the essential subspace of protein motions with and without scaling factor (see Methods).31 In our simulations, these eigenvectors turned out to be sufficient to describe a significant fraction of the total variance, and the obtained RMSIP (larger than 0.7 in both cases) demonstrated a fair preservation of global motions. 5

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 42

As a second ingredient, we improved the efficiency of sampling by using flat-bottomed cylindrical restraints to confine the configurational space accessible to the unbound ligand.32 Provided there is a priori knowledge of the binding pocket, the setup of the cylinder is fully automated by exploiting functionalities of the NanoShaper33 and PLUMED34 software (see Figure 3A and Methods). This part of the framework is depicted in Figure 1 as “System Setup”. The “Production” phase involves a series of repeated sMD simulations, where the ligand is initially placed 25 Å away from the binding site in a random orientation. Notably, the biological targets are fully solvated with explicit water molecules. Here, for each protein-ligand complex, we collected statistics for 10 sMD production runs of 100 ns each. However, other setups can be envisioned depending on the available computational resources. To avoid any conformational bias at the protein, all the sMD simulations were performed starting from the apo structure. The final phase (“Data Analysis” in Figure 1) involved identifying the most probable binding mode by considering all the 10 runs per complex. From a retrospective standpoint, the Root Mean Square Deviation (RMSD) of the ligand’s heavy atoms against the crystal structure can be safely used to validate the protocol (Figure 4A). However, to evaluate the usefulness of our framework in prospective runs, we also ran a cluster analysis of all the protein-ligand metastable states sampled along the trajectories (see Methods). The metastable states were defined in a purely geometric manner (Figure 3B), and were subsequently ranked according to the corresponding cluster population. The most probable binding mode was defined as the center of the most populated cluster. We note that other approaches exist that allow determining bound states through simulations starting from fully solvated ligands. Spontaneous binding by means of brute-force, plain MD is 6

ACS Paragon Plus Environment

Page 7 of 42 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

increasingly becoming a viable strategy. However, a regime of tens of microseconds is typically necessary to observe few events.35 In our protocol, the computational effort is significantly reduced. Notwithstanding the contained requirements of our procedure, cheaper alternatives, such as supervised-MD,24 have been recently proposed and can be found in the literature. Finally, free energy based methods, such as metadynamics,17 can also be employed. Nevertheless, despite the appealing potential in achieving a detailed characterization of the energetics associated to the binding process, both the requirements in terms of setup and computational time are extremely case-specific and not easy to estimate in advance. 2.2. Test case 1: GSK-3β. As a first test case, we employed GSK-3β, a well-known pharmaceutical target. This proline-directed serine/threonine kinase plays a major role in several physiological processes, including glycogen metabolism and microtubule stability.36 From a pharmacological standpoint, GSK-3β is mostly involved in neurodegenerative conditions, such as Alzheimer’s disease.37 GSK-3β is a two-domain enzyme: an N-terminal lobe (N-lobe), mostly comprising β-sheets, and a large C-terminal lobe (C-lobe), mostly comprising α-helices. The ATP binding pocket is buried deep within the two lobes.38,39 We focused on five different reference crystal structures of human GSK-3β in complex with selective inhibitors characterized by different binding affinities (Table 1). In particular, we selected four crystal structures of GSK3β in complex with pyrazine-derived inhibitors40 (PDB code: 4ACD, 4ACC, 4ACG, 4ACD) and one structure in complex with a thiazole-derived inhibitor41 (PDB code: 1Q5K). As shown in Figure 4B, a 100% success rate was obtained in four out of the five complexes investigated. This means that, in these four cases, all of the sMD runs correctly identified the experimental binding mode within 100 ns of sampling. For 4ACG, the success rate was reduced 7

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 42

to 40%. We then performed cluster analysis to identify the metastable states observed in the aggregate trajectory (Figure 5A). This showed that the center of the most populated cluster for 4ACG (cluster #1, population: 25.7%) was only 2.64 Å from the reference crystal structure (in terms of RMSD) and, remarkably, all key interactions with the protein were recapitulated. Indeed, the RMSD was mostly due to an alternative conformation adopted by the molecule’s solvent-exposed piperazine moiety (Figure 5A), which does not establish key interactions. For comparison, in Figure 5A we also report the cluster analysis results obtained for one of the bestperforming complexes, while the others are reported in the SI. 2.3. Test case 2: HSP90α. Our second test case was provided by the N-terminal domain of HSP90α, a molecular chaperone with important roles in maintaining the functional stability of several client proteins.42 HSP90α is also involved in many complex pathologies such as cancer, Alzheimer’s disease, and Parkinson’s disease,43,44 highlighting its relevance as a pharmaceutical target.45 The N-terminal domain of HSP90α bears an ATP binding site shared by most of the known inhibitors.42 There is a considerable conformational flexibility in this site, which is characterized by an antiparallel β-sheet at the bottom and by three helices as its walls, one of which (α3) is located at the entrance of the pocket. This helix has been reported to adopt distinct and ligand-specific conformations, including partly misfolded states in the middle portion (residues 103 to 107). In particular, two main conformations are adopted by this region in the apo protein (“in” and “out” states, see Figure S1).46,47 As crystal structures of these limiting conformations were available in the PDB (codes 1YES and 1YER, respectively),46 we approached this more challenging system by considering both. The reference structures of HSP90α in complex with selective inhibitors (Table 1) included: three crystal structures in 8

ACS Paragon Plus Environment

Page 9 of 42 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

complex with thiadiazole-derived inhibitors (PDB code: 2YI7, 2YI5, 2YI0),48 one with a isoxazole-derived inhibitor (PDB code: 2UWD),49 and one with a pyrazole-derived inhibitor (PDB code: 2BSM)50.

Figure 4C shows the results for the five ligands starting from the “in” or “out” states. The results are less clear-cut than the previous test case. Notably, none of the individual proteinligand complexes displayed a 100% success rate. The best performing complex reproduced the reference structure in 55% of the cases (2YI7), while a less satisfying success rate (30%) was found with 2BSM. This performance most likely reflects a greater complexity related to the inherent flexibility of the binding site. Despite this, the experimental binding mode was reproduced at least twice in all the investigated complexes. Moreover, despite the inherent flexibility of the α3-helix was accentuated by the employed scaling factor (λ = 0.5), the conformation adopted in the corresponding crystal complex was visited at least once for each system, except for 2BSM. Remarkably, the protocol is robust enough to provide rather consistent results independently from the particular apo conformation chosen as the initial condition. Additionally, most of the unsuccessful runs never reached a well-defined metastable state, as the “incorrect binding modes” turned out to be transient states, hence only marginally affecting the correct identification of the experimental geometry among the most populated clusters. This concept is exemplified in Figure 5B, where we show the results of the metastable states cluster analysis for the best and worst performing complexes investigated here (i.e. 2YI7 and 2BSM, respectively). In the former, the most populated cluster (cluster #1, population: 44.6%) shows a binding mode with an RMSD of 1.69 Å from the crystallographic structure. Despite the lower 9

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 42

success rate, the binding mode identified by our protocol was still found among the top ranked clusters in the case of 2BSM (cluster #1, 0.91 Å, 30.6%). Most importantly, this cluster was unambiguously separated from the second-best cluster in terms of relative population (incorrect binding mode; cluster #2, 6.80 Å, 13.2%). Hence, we can safely conclude that, as long as the correct geometry is reproduced at least once in repeated sMD runs, our framework has a very good potential to successfully identify it among the most populated clusters.

3. CONCLUSION

In summary, we present an innovative dynamic docking tool based on sMD, which combines full flexibility with a fairly low computational effort. Our approach’s key ingredients are a reaction-coordinate-independent enhanced sampling method, a tunable ENM, and a cylindrical volumetric confinement. The ligand is automatically positioned 25 Å from the binding pocket at the edge of the cylinder in a random configuration, and the target proteins are in the apo conformations. The methodology is also fully independent of the biological target’s conformation. For all the investigated test cases, we successfully identified the experimental binding modes as the most populated clusters. This approach is conceptually simple, flexible, straightforward to implement in several MD-based platforms, and can be fully automated. More challenging scenarios, including shape, location and accessibility to the binding site, as well as the number of degrees of freedom associated to ligands, can be faced when extending the procedure to different complexes. In addition, wider libraries of compounds can be considered for a specific biomolecular target. These aspects will be investigated in follow-ups of the present 10

ACS Paragon Plus Environment

Page 11 of 42 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

work. In light of these considerations, we believe that our dynamic docking approach holds great potential for structure-based drug discovery, where it is crucial to achieve an appropriate tradeoff between accuracy and speed.

4. METHODS Essentially, the overall protocol here proposed for dynamic docking via sMD consists of the following phases (see Figure 1): i)

System setup: starting from the apo structure of a specific protein, an initial 100 ns of plain MD simulations is performed in order to calculate the average matrix of pair Cα distances required to implement the ENM. Simultaneously, flat-bottomed cylindrical restraints are defined to limit the configurational space accessible to the unbound ligand.

ii)

Production: a series of multiple sMD simulations with a scaling factor of λ = 0.5 are performed, starting from an apo structure of the protein, and with the ligand initially placed 25 Å from the binding site in a random orientation.

iii)

Data Analysis: identification of the most probable binding mode based on either a reference structure (retrospective analysis) or cluster analysis (prospective analysis).

Hereafter, we report the methodological aspects required to fulfill all the steps of the pipeline depicted in Figure 1. Moreover, specific details regarding the two investigated systems are also reported. All of the original codes and scripts produced to carry out the above steps will be available upon request.

11

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 42

4.1. Setup and calibration of the elastic network model. As previously mentioned, a substantial issue when applying sMD to biological systems it is that it is difficult to predict whether the overall protein folding is maintained in the long run. Indeed, using a scaling factor of λ = 0.5, large fluctuations in the protein structure are typically noticed, which are not observed for the same system in plain MD simulations. Thus, in order to prevent protein unfolding and to guarantee fluctuations consistent with plain MD simulations, we took advantage of an ENM that was applied to the whole backbone Cα atoms with exception of those residues that composed the binding pocket. As a result, the overall protein structure was preserved while the binding pocket was allowed to sample structural rearrangements required for ligand binding. In particular, the ligand-binding pockets were identified on the initial PDB structures with NanoShaper,33 as implemented in BikiLifeScience Suite.51 NanoShaper defines a pocket as the difference between two solvent excluded surface enclosed volumes generated with different probe radii. For HSP90 (PDB code: 1YER and 1YES),46 the two radii used to compute the ATP-binding pocket were 1.4 Å and 4 Å, respectively, while for GSK3β (PDB code: 1H8F)52 radii of 1.8 Å and 4 Å were employed. Our ENM was implemented as a set of elastic springs (or harmonic potentials) of uniform and isotropic force constant  acting between pairs of Cα atoms within a cut-off distance  . This led to a potential energy  ∗  for the scaled and restrained system that can be expressed as:  ∗  =  +  

(1)

where   represents the harmonic potential of the ENM: 

  = ∑   −   

(2)

12

ACS Paragon Plus Environment

Page 13 of 42 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

In Eq. 2,  is the displacement between the ith and jth Cα from their equilibrium-distance  . The g_distMat53 tool for GROMACS was used to calculate the average equilibrium-distance matrix between Cα atoms from a 100 ns-long plain MD simulations. An in-house tcl script running under VMD was used to generate and to visualize the ENM basing on the Cα equilibrium-distance matrix and the parameter values ( ,  ) specified by the user. The resulting effect is shown on the protein structures reported in Figure 2A. The ENM was validated through 10 ns long sMD simulations. To this end, we compared the Cα-RMSF calculated from trial sMD simulations and plain MD, which was acting as a reference. The ENM was considered satisfying when similar RMSF profiles could be obtained. Sets of parameters providing reasonable overlap were:  = 1.0 ! and  = 150 # !$% & !& for GSK3β, and  = 1.0 ! and  = 100 # !$% & !& for HSP90α. A more thorough validation of the dynamics of the systems was performed by comparing the essential conformational sub-space. To quantify the difference, we computed the RMSIP between the first 10 eigenvectors obtained from each simulation as a measure of the overlap of the configurational space explored by the two sets of simulations: '()* =

 +

+ - . ∑+ / ∑/, ,

(3)

where ,- e ,. are the ith and jth eigenvectors obtained from Principal Component Analysis (PCA) to be compared, respectively, and D is the number of eigenvectors considered (in our case D = 10). RMSIP ranges from 0 to 1: the value is 1 if sampled subspaces are identical, and 0 if the sampled subspaces are completely orthogonal.

13

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 42

4.2. Definition of the cylindrical volumetric confinement. To improve sampling and save computational resources, we confined the exploration of the configurational space to those regions that are more relevant for the binding process. To do so, we applied flat-bottomed restraints to discourage the ligand from moving outside the volume of a cylinder centered on the binding site. The axis of the cylinder runs along the norm of the binding pocket, with one extremity extending within the pocket and the other one extending in the bulk, far enough to allow the ligand to be entirely solvated (25 Å from the center of mass, COM, of the binding pocket). The cylinder is constructed as follows. First, the norm of the binding pocket on the crystal structures is defined as the axis connecting the COM of the atoms composing the binding site and the COM of those atoms comprised in the entrance region of the pocket. The software NanoShaper33 was employed to unambiguously identify the two groups of atoms. Then, in order to avoid large fluctuations of the cylinder position during the sMD simulations, we defined the cylinder axis basing on the more stable portions of the protein as revealed from trial sMD simulations (Cα-RMSF < 1.5 Å). To this aim, we selected two groups amongst such atoms whose COMs lied along the previously defined norm (p1 and p2 in Figure 3A). The whole procedure was carried out through an in house python script. The shape of the cylinder is defined by two flat-bottomed restraints with a force constant of 300 kJ mol-1 nm-2 each. In particular, the distance between the projection of the ligand COM on the axis and the point p1 defines the length of the cylinder (d║ in Figure 3A), while the distance between the ligand COM and its projection on the axis define the radius of the cylinder (d┴ in Figure 3A). As previously mentioned, d║ was set to 25 Å from the COM of binding pocket. Conversely, the radius of the cylinder (d┴) was 14

ACS Paragon Plus Environment

Page 15 of 42 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

chosen as a trade-off between computational efficiency and the ability of not affecting the binding event. Specifically, on the one hand we wanted to avoid excluding any regions of the protein surrounding the binding site entrance that might interact with the ligand and contribute significantly to the binding process (larger values for d┴). On the other hand, we wanted to maintain the volume accessible to sampling as much limited as possible (smaller values for d┴). We tested different values for d┴ through trial simulations, which finally led us to determine an optimal radius of 8.1 Å for both systems. These flat-bottomed restraints were implemented exploiting the MATHEVAL functionalities provided by the PLUMED 2.1 software.34 4.3. Retrospective and prospective analysis. In order to assess whether ligand binding took place within the 100 ns of sMD simulations, we monitored the RMSD of ligand heavy atoms with respect to the crystal structure after optimal alignment on protein Cα atoms. The crystal binding pose was considered reproduced when a RMSD of 2.8 Å or lower was observed and maintained for at least 10 ns (Figure 4A). A metastable state was considered reached when a RMSD greater than 2.8 Å was maintained for at least 10 ns. The previous approach can be exploited only when the crystal structure of the bound complex is available. Thus, we devised a procedure to identify relevant binding poses even in the absence of experimental data. First, we filtered the trajectory of each sMD run in order to consider only relevant metastable states, that is relatively stable ligand configurations preserved for at least 10 ns. From a geometric standpoint, a metastable state was defined in terms of the position of the ligand COM with respect to the axis of the cylinder (d║ and d┴). The ligand position was considered stable when the average values of d║ and d┴ varied at most of 0.5 Å, and 90% of the total fluctuations fell below 2.0 Å (Figure 3B). For each protein-ligand system, the first 10 ns of 15

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 42

each run representing metastable states were joined together in a concatenated trajectory, on which the RMSD matrix was computed and stored. The RMSD matrix was then processed by means of a hierarchical clustering algorithm,54 using the average-linkage method as implemented in R version 3.3.2.55 The hierarchical clustering algorithm (bottom-up) combines existing clusters, creating a hierarchical structure that reflects the order in which the clusters are merged. For our clustering procedure, we employed a cut-off value of 2.0 Å. To discard those conformations that were not frequently sampled during the simulations, we discarded those clusters which population fell below 2% of the total number of structures retained for the analysis. Through this procedure, we were able to find major clusters that corresponded to the main ligand binding poses identified during the sMD simulation. 4.4. MD simulations. Ligands were parameterized according to the general AMBER force field (GAFF)56,57 for organic molecules using the Antechamber and parmchk tools implemented in Ambertools.58 The Gaussian 0359 package was employed for geometrical optimization and calculation of the electrostatic potential of each ligand, applying the 6-31G* basis set at the Hartree-Fock level of theory. Partial charges were then calculated using the RESP method60 as implemented in Antechamber. Preparation of each protein structure was performed through the “Protein Preparation Wizard” module of the Maestro interface (version 9.3.5) in the Schrödinger suite.61 The protein preparation process involved addition of hydrogen atoms, deletion of all crystal waters, and capping of protein termini. Using an in house script written in tcl for VMD,62 each ligand was randomly placed at the base of the cylinder (25 Å from the binding pocket COM) defined taking advantage of the PLUMED 2.1 software.34

16

ACS Paragon Plus Environment

Page 17 of 42 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

The Amber ff99SB-ILDN force field63 was employed to model the proteins. Complexes were solvated using TIP3P64 water molecules in a cubic box extending at least 12 Å beyond the complex surface and counterions were added for system neutralization. Each system was energy minimized for 5000 steps of steepest descent followed by gradual heating up to 300 K in a NVT equilibration, with steps of 100 K over 300 ps. Then, each system was equilibrated in the NPT ensemble at 300 K and 1 atm for 1 ns using a Parinello-Rahman barostat.65 A cutoff of 10 Å and the particle-mesh Ewald (PME) method66 were used for computing short-range interactions and long-range interactions, respectively. Constant temperature conditions were provided by using the v-rescale thermostat.67 A time step of 2 fs was employed and all bonds to hydrogen atoms were constrained using the LINCS algorithm.68 Production runs were carried out in the NVT ensemble at 300 K. Plain MD simulations were performed using version 4.6.5 of the GROMACS69 software, while sMD runs were carried out using an in-house version of GROMACS 4.6.7. On a hybrid CPU/GPU workstation, comprising one quad-core processor (Intel Xeon L5520 @ 2.27 GHz) and one NVIDIA GeForce GTX 980, the performances were about 14 ns/day and 11 ns/day for HSP90 (about 70,000 atoms/complex) and GSK3β (about 95,000 atoms/complex), respectively. Table 1. Structures and activities of the 5 known inhibitors for GSK3β and HSP90α.

Biological Activity Protein

PDB

Structure

01 (nm)

02 (nm)

17

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 42

1Q5K

38

4ACH

20

4ACC

20

4ACG

4.9

4ACD

0.22

GSK3β

2BSM

78

2YI5

39

HSP90α

18

ACS Paragon Plus Environment

Page 19 of 42 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

2YI0

7.5

2UWD

5

2YI7

4.8

19

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 42

Figure 1. Schematic representation of the dynamic docking workflow.

Figure 2. Pictorial representation of the ENM applied to GSK3β (A) and to HSP90α (B). Comparison of the corresponding Cα-RMSF profile calculated between plain MD simulations (green) and sMD with ENM (red) is reported in the plots at the bottom of the Figure. 20

ACS Paragon Plus Environment

Page 21 of 42 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 3. A) Definition of the cylindrical volumetric confinement. The axis of the cylinder runs along the norm of the binding pocket and is defined as the line passing through p1 and p2. These points represent the COMs of groups of atoms belonging to stable portions of the protein. The 21

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 42

length of the cylinder (d║) is defined as the distance between the projection of the ligand COM on the axis and the COM of the binding pocket. The radius of the cylinder (d┴) is the distance between ligand COM and its projection on the axis. B) Identification of metastable states from the time series of d║ and d┴ (example taken from run 7 of 2YI0, “in” conformational state of the protein). A metastable state corresponds to any stable ligand configuration preserved for at least 10 ns. In this example, configurations sampled from 76 to 86 ns are collected for further cluster analysis (see Methods).

Figure 4. A) Time evolution of a ligand’s RMSD from the cognate crystal structure in a typical productive run (in this example: run5 of 2YI5, “out” state). The crystal binding pose is considered reproduced when a RMSD of 2.8 Å or less was observed and maintained for at least 22

ACS Paragon Plus Environment

Page 23 of 42 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

10 ns. B) Retrospective performance of the 10 independent runs performed on GSK-3β. Productive runs are shown in green, while simulations ending in metastable states other than the experimental binding mode are reported in orange. Red bars refer to runs where no stable state (either correct or incorrect) was reached within 100 ns of simulation. C) Retrospective performance of the 10 independent runs performed on HSP90α. For each run, two initial conformations of the protein were considered (see Text and SI).

23

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 42

Figure 5. Validation of our protocol for its ability to identify the experimental binding modes based on cluster population. The clusters of metastable states for the best and the worst performing complexes of GSK-3β (4ACH and 4ACG, respectively) and HSP90α (2YI7 and 2BSM) are presented in the pie charts, with color-code based on the RMSD between the center of the cluster with respect to the corresponding crystal structure. For each complexes reported, the representative configuration from the three top-ranked clusters (indicated by #1, #2 and #3, 24

ACS Paragon Plus Environment

Page 25 of 42 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

with detail about population and RMSD) is shown, superimposed to the experimental binding mode (white) and color-coded according to the corresponding cluster.

ASSOCIATED CONTENT Supporting Information. Details of the simulation setup and the analysis performed. The following files are available free of charge. AUTHOR INFORMATION Corresponding Author *[email protected] *[email protected] ACKNOWLEDGMENT We acknowledge the CINECA award under the ISCRA initiative for the availability of highperformance computing resources, and Dario Gioia for helpful discussions. CONFLICT OF INTEREST Andrea Cavalli is co-founder of BiKi Technologies s.r.l., a startup company that develops methods based on molecular dynamics and related approaches for investigating protein-ligand (un)binding.

REFERENCES (1)

Sliwoski, G.; Kothiwale, S.; Meiler, J.; Lowe Jr., E. W. Computational Methods in Drug 25

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 42

Discovery. Pharmacol. Rev. 2014, 66 (1), 334–395. (2)

Friesner, R. A.; Banks, J. L.; Murphy, R. B.; Halgren, T. A.; Klicic, J. J.; Mainz, D. T.; Repasky, M. P.; Knoll, E. H.; Shelley, M.; Perry, J. K.; Shaw, D. E.; Francis, P.; Shenkin, P. S. Glide: A New Approach for Rapid, Accurate Docking and Scoring. 1. Method and Assessment of Docking Accuracy. J. Med. Chem. 2004, 47 (7), 1739–1749.

(3)

Verdonk, M. L.; Cole, J. C.; Hartshorn, M. J.; Murray, C. W.; Taylor, R. D. Improved Protein-Ligand Docking Using GOLD. Proteins Struct. Funct. Genet. 2003, 52 (4), 609– 623.

(4)

Meng, X.-Y.; Zhang, H.-X.; Mezei, M.; Cui, M. Molecular Docking: A Powerful Approach for Structure-Based Drug Discovery. Curr. Comput. Aided. Drug Des. 2011, 7 (2), 146–157.

(5)

Teague, S. J. Implications of Protein Flexibility for Drug Discovery. Nat. Rev. Drug Discov. 2003, 2 (7), 527–541.

(6)

Davis, A. M.; Teague, S. J. Hydrogen Bonding, Hydrophobic Interactions, and Failure of the Rigid Receptor Hypothesis. Angew. Chemie - Int. Ed. 1999, 38 (6), 736–749.

(7)

Buonfiglio, R.; Recanatini, M.; Masetti, M. Protein Flexibility in Drug Discovery: From Theory to Computation. ChemMedChem 2015, 10 (7), 1141–1148.

(8)

Kitchen, D. B.; Decornez, H.; Furr, J. R.; Bajorath, J. Docking and Scoring in Virtual Screening for Drug Discovery: Methods and Applications. Nat. Rev. Drug Discov. 2004, 3 26

ACS Paragon Plus Environment

Page 27 of 42 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(11), 935–949. (9)

Huang, S.-Y.; Zou, X. Inclusion of Solvation and Entropy in the Knowledge-Based Scoring Function for Protein-Ligand Interactions. J. Chem. Inf. Model. 2010, 50 (2), 262– 273.

(10)

Di Martino, G. P.; Masetti, M.; Ceccarini, L.; Cavalli, A.; Recanatini, M. An Automated Docking Protocol for hERG Channel Blockers. J. Chem. Inf. Model. 2013, 53 (1), 159– 175.

(11)

Durrant, J. D.; McCammon, J. A. Molecular Dynamics Simulations and Drug Discovery. BMC Biol. 2011, 9 (1), 71.

(12)

Buch, I.; Giorgino, T.; De Fabritiis, G. Complete Reconstruction of an Enzyme-Inhibitor Binding Process by Molecular Dynamics Simulations. Proc. Natl. Acad. Sci. 2011, 108 (25), 10184–10189.

(13)

De Vivo, M.; Masetti, M.; Bottegoni, G.; Cavalli, A. Role of Molecular Dynamics and Related Methods in Drug Discovery. J. Med. Chem. 2016, 59 (9), 4035–4061.

(14)

Rocchia, W.; Masetti, M.; Cavalli, A. Chapter 11 Enhanced Sampling Methods in Drug Design. In Physico-Chemical and Computational Approaches to Drug Discovery; The Royal Society of Chemistry, 2012; pp 273–301.

(15)

Torrie, G. M.; Valleau, J. P. Nonphysical Sampling Distributions in Monte Carlo FreeEnergy Estimation: Umbrella Sampling. J. Comput. Phys. 1977, 23 (2), 187–199. 27

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(16)

Page 28 of 42

Grubmuller, H.; Heymann, B.; Tavan, P. Ligand Binding: Molecular Mechanics Calculation of the Streptavidin-Biotin Rupture Force. Science (80-. ). 1996, 271 (5251), 997–999.

(17)

Laio, A.; Parrinello, M. Escaping Free-Energy Minima. Proc. Natl. Acad. Sci. U. S. A. 2002, 99, 12562.

(18)

Hamelberg, D.; Mongan, J.; McCammon, J. A. Accelerated Molecular Dynamics: A Promising and Efficient Simulation Method for Biomolecules. J. Chem. Phys. 2004, 120 (24), 11919–11929.

(19)

Mark, A. E.; van Gunsteren, W. F.; Berendsen, H. J. C. Calculation of Relative Free Energy via Indirect Pathways. J. Chem. Phys. 1991, 94 (5), 3808–3816.

(20)

Sinko, W.; Miao, Y.; de Oliveira, C. A. F.; McCammon, J. A. Population Based Reweighting of Scaled Molecular Dynamics. J. Phys. Chem. B 2013, 117, 12759−12768.

(21)

Deganutti, G.; Moro, S. Estimation of Kinetic and Thermodynamic Ligand-Binding Parameters Using Computational Strategies. Futur. Med. Chem. 2017, 9 (5), 507–523.

(22)

Llabrés, S.; Juàrez-Jiménez, J.; Masetti, M.; Leiva, R.; Vàzquez, S.; Gazzarrini, S.; Moroni, A.; Cavalli, A.; Luque, F. J. Mechanism of the Pseudoirreversible Binding of Amantadine to the M2 Proton Channel. J. Am. Chem. Soc. 2016, 138 (47), 15345–15358.

(23)

Favia, A. D.; Masetti, M.; Recanatini, M.; Cavalli, A. Substrate Binding Process and Mechanistic Functioning of Type 1 11β-Hydroxysteroid Dehydrogenase from Enhanced 28

ACS Paragon Plus Environment

Page 29 of 42 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Sampling Methods. PLoS One 2011, 6 (9), 1–14. (24)

Sabbadin, D.; Moro, S. Supervised Molecular Dynamics (SuMD) as a Helpful Tool to Depict GPCR-Ligand Recognition Pathway in a Nanosecond Time Scale. J. Chem. Inf. Model. 2014, 54 (2), 372–376.

(25)

Cuzzolin, A.; Sturlese, M.; Deganutti, G.; Salmaso, V.; Sabbadin, D.; Ciancetta, A.; Moro, S. Deciphering the Complexity of Ligand-Protein Recognition Pathways Using Supervised Molecular Dynamics (SuMD) Simulations. J. Chem. Inf. Model. 2016, 56 (4), 687–705.

(26)

Mollica, L.; Decherchi, S.; Zia, S. R.; Gaspari, R.; Cavalli, A.; Rocchia, W. Kinetics of Protein-Ligand Unbinding via Smoothed Potential Molecular Dynamics Simulations. Sci. Rep. 2015, 5, 11539.

(27)

Mollica, L.; Theret, I.; Antoine, M.; Perron-Sierra, F.; Charton, Y.; Fourquez, J. M.; Wierzbicki, M.; Boutin, J. A.; Ferry, G.; Decherchi, S.; Bottegoni, G.; Ducrot, P.; Cavalli, A. Molecular Dynamics Simulations and Kinetic Measurements to Estimate and Predict Protein-Ligand Residence Times. J. Med. Chem. 2016, 59 (15), 7167–7176.

(28)

Gioia, D.; Bertazzo, M.; Recanatini, M.; Cavalli, A. Dynamic Docking : A Paradigm Shift in Computational Drug Discovery. Molecules 2017, 22 (11), 1–21.

(29)

Lezon, T. R.; Shrivastava, I. H.; Yang, Z.; Bahar, I. Elastic Network Models For Biomolecular Dynamics: Theory and Application to Membrane Proteins and Viruses. Handb. Biol. Networks 2009, 129–158. 29

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(30)

Page 30 of 42

Kim, M. H.; Kim, M. K. Elastic Network Model for Protein Structural Dynamics. JSM Enzym. Protein Sci. 2014, 1 (1), 1001.

(31)

Fuglebakk, E.; Tiwari, S. P.; Reuter, N. Comparing the Intrinsic Dynamics of Multiple Protein Structures Using Elastic Network Models. Biochim. Biophys. Acta - Gen. Subj. 2015, 1850 (5), 911–922.

(32)

Allen, T. W.; Andersen, O. S.; Roux, B. Energetics of Ion Conduction through the Gramicidin Channel. Proc. Natl. Acad. Sci. 2004, 101 (1), 117–122.

(33)

Decherchi, S.; Rocchia, W. A General and Robust Ray-Casting-Based Algorithm for Triangulating Surfaces at the Nanoscale. PLoS One 2013, 8 (4).

(34)

Tribello, G. A.; Bonomi, M.; Branduardi, D.; Camilloni, C.; Bussi, G. PLUMED 2: New Feathers for an Old Bird. Comput. Phys. Commun. 2014, 185 (2), 604–613.

(35)

Dror, R. O.; Pan, A. C.; Arlow, D. H.; Borhani, D. W.; Maragakis, P.; Shan, Y.; Xu, H.; Shaw, D. E. Pathway and Mechanism of Drug Binding to G-Protein-Coupled Receptors. Proc. Natl. Acad. Sci. 2011, 108 (32), 13118–13123.

(36)

Doble, B.; Woodgett, J. R. Author Manuscript / Manuscrit D â€TM Auteur GSK-3 : Tricks of the Trade for a Multi-Tasking Kinase. J. Cell Sci. 2003, 116 (7), 1175–1186.

(37)

Hooper, C.; Killick, R.; Lovestone, S. The GSK3 Hypothesis of Alzheimer’s Disease. J. Neurochem. 2008, 104 (6), 1433–1439.

(38)

Mazanetz, M. P.; Withers, I. M.; Laughton, C. A.; Fischer, P. M. Exploiting Glycogen 30

ACS Paragon Plus Environment

Page 31 of 42 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Synthase Kinase 3β Flexibility in Molecular Recognition. Biochem. Soc. Trans. 2008, 36 (1), 55–58. (39)

Mazanetz, M. P.; Laughton, C. A.; Fischer, P. M. Investigation of the Flexibility of Protein Kinases Implicated in the Pathology of Alzheimer’s Disease. Molecules 2014, 19 (7), 9134–9159.

(40)

Berg, S.; Bergh, M.; Hellberg, S.; Hogdin, K.; Lo-Alfredsson, Y.; Soderman, P.; von Berg, S.; Weigelt, T.; Ormo, M.; Xue, Y.; Tucker, J.; Neelissen, J.; Jerning, E.; Nilsson, Y.; Bhat, R. Discovery of Novel Potent and Highly Selective Glycogen Synthase Kinase3beta (GSK3beta) Inhibitors for Alzheimer’s Disease: Design, Synthesis, and Characterization of Pyrazines. J. Med. Chem. 2012, 55 (21), 9107–9119.

(41)

Bhat, R.; Xue, Y.; Berg, S.; Hellberg, S.; Ormö, M.; Nilsson, Y.; Radesäter, A. C.; Jerning, E.; Markgren, P. O.; Borgegård, T.; Nylöf, M.; Giménez-Cassina, A.; Hernández, F.; Lucas, J. J.; Díaz-Nido, J.; Avila, J. Structural Insights and Biological Effects of Glycogen Synthase Kinase 3-Specific Inhibitor AR-A014418. J. Biol. Chem. 2003, 278 (46), 45937–45945.

(42)

Pearl, L. H.; Prodromou, C. Structure and Mechanism of the Hsp90 Molecular Chaperone Machinery. Annu. Rev. Biochem. 2006, 75 (1), 271–294.

(43)

Luo, W.; Rodina, A.; Chiosis, G. Heat Shock Protein 90: Translation from Cancer to Alzheimer’s Disease Treatment? BMC Neurosci. 2008, 9 (Suppl 2), S7.

(44)

Shonhai, A.; Maier, A. G.; Przyborski, J. M.; Blatch, G. L. Intracellular Protozoan 31

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 32 of 42

Parasites of Humans: The Role of Molecular Chaperones in Development and Pathogenesis. Protein Pept. Lett. 2011, 18 (2), 143–157. (45)

Solit, D. B.; Chiosis, G. Development and Application of Hsp90 Inhibitors. Drug Discov. Today 2008, 13 (1–2), 38–43.

(46)

Stebbins, C. E.; Russo, A. A.; Schneider, C.; Rosen, N.; Hartl, F. U.; Pavletich, N. P. Crystal Structure of an Hsp90–Geldanamycin Complex: Targeting of a Protein Chaperone by an Antitumor Agent. Cell 1997, 89 (2), 239–250.

(47)

Kokh, D. B.; Czodrowski, P.; Rippmann, F.; Wade, R. C. Perturbation Approaches for Exploring Protein Binding Site Flexibility to Predict Transient Binding Pockets. J. Chem. Theory Comput. 2016, 12 (8), 4100–4113.

(48)

Sharp, S. Y.; Roe, S. M.; Kazlauskas, E.; Čikotiene, I.; Workman, P.; Matulis, D.; Prodromou, C. Co-Crystalization and In Vitro Biological Characterization of 5-Aryl-4-(5Substituted-2-4-Dihydroxyphenyl)-1,2,3-Thiadiazole Hsp90 Inhibitors. PLoS One 2012, 7 (9), 5–12.

(49)

Sharp, S. Y.; Boxall, K.; Rowlands, M.; Prodromou, C.; Roe, S. M.; Maloney, A.; Powers, M.; Clarke, P. A.; Box, G.; Sanderson, S.; Patterson, L.; Matthews, T. P.; Cheung, K. M. J.; Ball, K.; Hayes, A.; Raynaud, F.; Marais, R.; Pearl, L.; Eccles, S.; Aherne, W.; McDonald, E.; Workman, P. In Vitro Biological Characterization of a Novel, Synthetic Diaryl Pyrazole Resorcinol Class of Heat Shock Protein 90 Inhibitors. Cancer Res. 2007, 67 (5), 2206–2216. 32

ACS Paragon Plus Environment

Page 33 of 42 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(50)

Dymock, B. W.; Barril, X.; Brough, P. A.; Cansfield, J. E.; Massey, A.; McDonald, E.; Hubbard, R. E.; Surgenor, A.; Roughley, S. D.; Webb, P.; Workman, P.; Wright, L.; Drysdale, M. J. Novel, Potent Small-Molecule Inhibitors of the Molecular Chaperone Hsp90 Discovered through Structure-Based Design. J. Med. Chem. 2005, 48 (13), 4212– 4215.

(51)

Decherchi, S.; Bottegoni, G.; Spitaleri, A.; Rocchia, W.; Cavalli, A. BiKi Life Sciences: a New Suite for Molecular Dynamics and Related Methods in Drug Discovery. J. Chem. Inf. Model. 2018, Just Accepted Manuscript. DOI: 10.1021/acs.jcim.7b00680

(52)

Dajani, R.; Fraser, E.; Roe, S. M.; Young, N.; Good, V.; Dale, T. C.; Pearl, L. H. Crystal Structure of Glycogen Synthase Kinase 3: Structural Basis for Phosphate-Primed Substrate Specificity and Autoinhibition. Cell 2001, 105, 721–732.

(53)

Kumar, R. g_distMat https://github.com/rjdkmr/g_distMat.

(54)

Shao, J.; Tanner, S. W.; Thompson, N.; Cheatham, T. E. Clustering Molecular Dynamics Trajectories : 1 . Characterizing the Performance of Different Clustering Algorithms. 2007, 2312–2334.

(55)

R Core Team. No Title. R Foundation for Statistical Computing, Vienna, Austria 2013.

(56)

Wang, J.; Wang, W.; Kollman, P. A.; Case, D. A. Automatic Atom Type and Bond Type Perception in Molecular Mechanical Calculations. J. Mol. Graph. Model. 2006, 25, 247– 260.

33

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(57)

Page 34 of 42

Wang, J.; Wolf, R. M.; Caldwell, J. W.; Kollman, P. A.; Case, D. A. Development and Testing of a General Amber Force Field. J. Comput. Chem. 2004, 25, 1157–1174.

(58)

Case, D. A.; Iii, T. E. C.; Darden, T. O. M.; Gohlke, H.; Jr, K. M. M.; Onufriev, A.; Simmerling, C.; Woods, R. J. The Amber Biomolecular Simulations Programs. J. Comput. Chem. 2005, 26 (16), 1668–1688.

(59)

Frisch, M. J.; Trucks, G. W.; Schlegel, H. B.; Scuseria, G. E.; Robb, M. A.; Cheeseman, J. R.; Montgomery, J. A.; Vreven, T. K.; Kudin, K. N.; Burant, J. C.; Millam, J. M.; Iyengar, S. S.; Tomasi, J.; Barone, V.; Mennucci, B.; Cossi, M.; Scalmani, G.; Rega, N.; Petersson, G. A.; Nakatsuji, H.; Hada, M.; Ehara, M.; Toyota, K.; Fukuda, R.; Hasegawa, J.; Ishida, M.; Nakajima, T.; Honda, Y.; Kitao, O.; Nakai, H.; Klene, M.; Li, X.; Knox, J. E.; Hratchian, H. P.; Cross, J. B.; Adamo, C.; Jaramillo, J.; Gomperts, R.; Stratmann, R. E.; Yazyev, O.; Austin, A. J.; Cammi, R.; Pomelli, C.; Ochterski, J. W.; Ayala, P. Y.; Morokuma, K.; Voth, G. A. S.; Salvador, P.; Dannenberg, J. J.; Zakrzewski, V. J.; Dapprich, S.; Daniels, A. D.; Strain, M. C.; Farkas, O.; Malick, D. K.; Rabuck, A. D.; Raghavachari, K.; Foresman, J. B.; Ortiz, J. V.; Cui, Q.; Baboul, A. G.; Clifford, S.; Cioslowski, J.; Stefanov, B. B.; Liu, G.; Liashenko, A.; Piskorz, P.; Komaromi, I.; Martin, R. L.; Fox, D. J.; Keith, T.; Al-Laham, M. A.; Peng, C. Y.; Nanayakkara, A.; Challacombe, M.; Gill, P. M. W.; Johnson, B.; Chen, W.; Wong, M. W.; Gonzalez, C.; Pople, J. A. No Title. Gaussian 03, Revis. A.1, Gaussian, Inc., Pittsburgh PA 2003.

(60)

Cornell, W. D.; Cieplak, P.; Bayly, C. I.; Kollman, P. A. RESP Charges. 1993, No. 7, 9620–9631. 34

ACS Paragon Plus Environment

Page 35 of 42 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(61)

Schrödinger Release 2012. No Title. Schrödinger, LLC, New York, NY, (2012).

(62)

Humphrey, W.; Dalke, A.; Schulten, K. No Title. J. Mol. Graph. Model. 1996, 14, 33–38.

(63)

Lindorff-larsen, K.; Piana, S.; Palmo, K.; Maragakis, P.; Klepeis, J. L.; Dror, R. O.; Shaw, D. E. Improved Side-Chain Torsion Potentials for the. Proteins Struct. Funct. Genet. 2010, 1950–1958.

(64)

Jorgensen, W. L.; Chandrasekhar, J.; Madura, J. D. Comparison of Simple Potential Functions for Simulating Liquid Water Comparison of Simple Potential Functions for Simulating Liquid Water. J. Chem. Phys. 1983, 79 (2), 926–935.

(65)

Parrinello, M.; Rahman, A. Polymorphic Transitions in Single Crystals : A New Molecular Dynamics Method Polymorphic Transitions in Single Crystals : A New Molecular Dynamics Method. J. Appl. Phys. 1981, 52 (12), 7182.

(66)

Essmann, U.; Perera, L.; Darden, M. L. B.; Lee, H.; Pedersen, L. G.; Essmann, U.; Perera, L.; Berkowitz, M. L.; Darden, T.; Lee, H.; Pedersen, L. G. A Smooth Particle Mesh Ewald Method A Smooth Particle Mesh Ewald Method. J. Chem. Phys. 1995, 103 (19), 8577– 8593.

(67)

Bussi, G.; Donadio, D.; Parrinello, M.; Bussi, G.; Donadio, D.; Parrinello, M. Canonical Sampling through Velocity Rescaling. J. Chem. Phys. 2007, 126 (1), 14101.

(68)

Hess, B.; Bekker, H.; Berendsen, H. J. C.; Fraaije, J. G. E. M. LINCS : A Linear Constraint Solver for Molecular Simulations. J. Comput. Chem. 1997, 18 (12), 1463– 35

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 36 of 42

1472. (69)

Van der Spoel, D.; Lindhal, E.; Hess, B.; Team, and the G. development. No Title. GROMACS User Man. version 4.6.5 2013.

36

ACS Paragon Plus Environment

Page 37 of 42 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 1. Schematic representation of the dynamic docking workflow. 488x206mm (96 x 96 DPI)

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 2. Pictorial representation of the ENM applied to GSK3β (A) and to HSP90α (B). Comparison of the corresponding Cα-RMSF profile calculated between plain MD simulations (green) and sMD with ENM (red) is reported in the plots at the bottom of the Figure. 572x299mm (96 x 96 DPI)

ACS Paragon Plus Environment

Page 38 of 42

Page 39 of 42 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 3. A) Definition of the cylindric volumetric confinement. The axis of the cylinder runs along the norm of the binding pocket and is defined as the line passing through p1 and p2. These points represent the COMs of groups of atoms belonging to stable portions of the protein. The length of the cylinder (d║) is defined as the distance between the projection of the ligand COM on the axis and the COM of the binding pocket. The radius of the cylinder (d┴) is the distance between ligand COM and its projection on the axis. B) Identification of metastable states from the time series of d║ and d┴ (example taken from run 7 of 2YI0, “in” conformational state of the protein). A metastable state corresponds to any stable ligand configuration preserved for at least 10 ns. In this example, configurations sampled from 76 to 86 ns are collected for further cluster analysis (see Methods). 306x564mm (96 x 96 DPI)

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 4. A) Time evolution of a ligand’s RMSD from the cognate crystal structure in a typical productive run (in this example: run5 of 2YI5, “out” state). The crystal binding pose is considered reproduced when a RMSD of 2.8 Å or less was observed and maintained for at least 10 ns. B) Retrospective performance of the 10 independent runs performed on GSK-3β. Productive runs are shown in green, while simulations ending in metastable states other than the experimental binding mode are reported in orange. Red bars refer to runs where no stable state (either correct or incorrect) was reached within 100 ns of simulation. C) Retrospective performance of the 10 independent runs performed on HSP90α. For each run, two initial conformations of the protein were considered (see Text and SI). 150x90mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 40 of 42

Page 41 of 42 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 5. Validation of our protocol for its ability to identify the experimental binding modes based on cluster population. The clusters of metastable states for the best and the worst performing complexes of GSK-3β (4ACH and 4ACG, respectively) and HSP90α (2YI7 and 2BSM) are presented in the pie charts, with colorcode based on the RMSD between the center of the cluster with respect to the corresponding crystal structure. For each complexes reported, the representative configuration from the three top-ranked clusters (indicated by #1, #2 and #3, with detail about population and RMSD) is shown, superimposed to the experimental binding mode (white) and color-coded according to the corresponding cluster. 177x162mm (300 x 300 DPI)

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

79x45mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 42 of 42