Accurate Prediction of Complex Structure and ... - ACS Publications

May 8, 2017 - the center of mass (COM) of the ligand in the X-ray structure is ∼3.4 Е. ... the pocket and the bulk region and call this method the ...
0 downloads 0 Views 1MB Size
Subscriber access provided by CORNELL UNIVERSITY LIBRARY

Article

Accurate Prediction of Complex Structure and Affinity for a Flexible Protein Receptor and its Inhibitor Gert-Jan Bekker, Narutoshi Kamiya, Mitsugu Araki, Ikuo Fukuda, Yasushi Okuno, and Haruki Nakamura J. Chem. Theory Comput., Just Accepted Manuscript • Publication Date (Web): 08 May 2017 Downloaded from http://pubs.acs.org on May 11, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Chemical Theory and Computation is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 37

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Accurate Prediction of Complex Structure and Affinity for a Flexible Protein Receptor and its Inhibitor

Gert-Jan Bekker,† Narutoshi Kamiya,*, †, ¶, § Mitsugu Araki, ¶, ‡ Ikuo Fukuda,† Yasushi Okuno, ¶, ‡



and Haruki Nakamura†

Institute for Protein Research, Osaka University, 3-2 Yamadaoka, Suita, Osaka 565-0871,

Japan ¶

Advanced Institute for Computational Science, RIKEN, 7-1-26 Minatojima Minami-machi,

Chuo-ku, Kobe, Hyogo 650-0047, Japan §

Graduate School of Simulation Studies, University of Hyogo, 7-1-28 Minatojima Minami-

machi, Chuo-ku, Kobe, Hyogo 650-0047, Japan ‡Graduate School of Medicine, Kyoto University, 54 Shogoin-Kawaharacho, Sakyo-ku, Kyoto 606-8507, Japan

1

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ABSTRACT

In order to predict the accurate binding configuration as well as the binding affinity for a flexible protein receptor and its inhibitor drug, enhanced sampling with multicanonical molecular dynamics (McMD) simulation and thermodynamic integration (TI) were combined as a general drug docking method. CDK2, cyclin-dependent kinase 2, is involved in the cell cycle regulation. Malfunctions in CDK2 can cause tumorigenesis, and thus it is a potential drug target. Here, we performed a long McMD simulation for docking the inhibitor CS3 to CDK2 starting from the unbound structure. Subsequently, a potential binding/unbinding pathway was given from the multicanonical ensemble, and the binding free energy was readily computed by TI along the pathway. Using this combination, the correct binding configuration of CS3 to CDK2 was obtained, and its affinity coincided well with the experimental value.

2

ACS Paragon Plus Environment

Page 2 of 37

Page 3 of 37

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

INTRODUCTION Molecular docking between a protein receptor and small chemical compounds is one of the key tools in virtual screening from ligand databases. So far, many “flexible docking” methods have been developed, where only the ligand molecules are treated as flexible, leaving the protein receptors essentially “rigid”, such as DOCK,1 AutoDock,2 Flex,3 GOLD,4 and myPresto/sievgene.5 They try to make a rapid search to find correct docking poses for millions of ligand candidates by reducing computational costs. Since the screening result strongly depends on the structure of the proteins, the “ensemble docking” method has been developed. It is a common strategy to use Molecular Dynamics (MD) to generate an ensemble of receptor conformations,6 after which rigid body docking is applied to generate docking poses. Alternatively, some tools use MD or molecular mechanics for intermediary or post-processing operations.7 The docking poses are finally ranked by some empirical or physicochemical scores using shape complementarity and electrostatics. While these approaches can take protein flexibility into account, the conformational space searched by such ensemble docking methods is limited. In addition, the scores lack accuracy due to an inclusion of many approximations and empirical parameters, because they are designed for rapid screening. Thus, the quantitatively correct affinities between the protein and ligands are so difficult to be obtained that researchers have to be satisfied by the qualitative, relative affinities. Therefore, a method that can predict both the accurate protein-ligand complex and its quantitatively correct affinity is needed, where the protein receptor is treated as flexible in addition to using flexible ligands. Exhaustively sampling the protein and ligand structures by conventional MD simulations to obtain the docking configuration and binding free energy may be one solution to the above problem, but it is impractical, even nowadays. The reason is that binding of a

3

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ligand to a receptor generally takes in the order of micro to milliseconds. In order to properly sample a natively bound structure, multiple binding-unbinding events are required, and so the total simulation time becomes in the order of seconds to minutes. These timescales are far outside the realm of even the fastest MD optimized supercomputer in existence, i.e. the ANTON system,8 which can sample in the low-millisecond timescale. As such, exhaustively searching the conformational space by brute force is not a viable solution. The rate-limiting process is conformational trapping of a ligand at many local energy minima. In order to overcome this trapping, several biasing methods, which add weighted probabilities, have been designed to improve sampling efficiency. This has been realized by generalized ensemble and biasing methods, such as Multicanonical MD (McMD),9 Replica Exchange MD10 (REMD), Filling Potential,11 Meta-dynamics,12 Adaptive Umbrella Sampling13 (AUS), Accelerated Molecular Dynamics14 (AMD) and Adaptive Biasing Force15 (ABF). In the McMD method, a bias is applied to the system to enable uniform sampling over a specific energy range, facilitating a random walk within the energy space, where each energetic state is equally probable to be sampled. This bias enables sampling at higher energy regions, similar to a high temperature simulation, and at lower energy regions, similar to a low temperature simulation, resulting in exploring a wide conformational space without conformational trapping at local minima. A merit of the McMD method is that the physically accepted ensemble, i.e. the canonical ensemble, can be obtained by the reweighting procedure.9 The McMD method can continuously switch between the higher and lower temperature regions defined by the bias, and it has been widely applied for enhanced conformational sampling16 and docking17 simulations. The McMD method first forms the multicanonical ensemble, which covers a wide temperature range. Then, the conformational ensemble at 300 K is obtained by the reweighing method, followed by calculating the

4

ACS Paragon Plus Environment

Page 4 of 37

Page 5 of 37

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Potential of Mean Force (PMF) from the probability distribution of the sampled structures. Thus, the Free Energy Landscape (FEL) is displayed as the PMF distribution by one or more principal components16 or any other arbitrary axes. REMD is another popular enhanced sampling method,10 and has also been applied to find native docking configurations.18 For both the McMD method and the REMD method (as well as its derivates REST19 and REST220), the canonical ensemble at 300 K can be reobtained. With REST and REST2, only part of the system is accelerated, while the other part remains in the canonical ensemble. Nakajima et al.21 also developed a two component McMD version for accelerating the solute. Finally, AMD is also one of the enhanced sampling methods, but reweighing to the canonical ensemble is not trivial, although a new form of reweighing was recently devised.22 Besides the above mentioned sampling the docking configurations, it is also important to calculate the binding affinity for drug development. Various methods using MD exist to calculate the binding free energy. A popular technique is to use MM/PBSA23 which can calculate the binding free energy at a reasonable computational cost, however the absolute values often differ significantly from the experimental value.24 More expensive methods are alchemical based methods such as Free Energy Perturbation, which can be used to calculate the binding free energy.25 Alternatively, one can use path sampling methods. The Umbrella Sampling (US) method26 is an equilibrium simulation technique that is widely used to calculate the binding free energy between a ligand and a protein and has seen some promising results, however it can be difficult to find the appropriate reaction coordinate despite its expensive computation costs. The Thermodynamic Integration (TI) method27 is another one to compute the binding free energy, where the average force is calculated by the force field along a binding or unbinding path. Integrating these forces produces the binding free energy. In the Smooth Reaction Path Generation (SRPG) method,28 bound ligand molecules are first

5

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

dissociated by the Filling Potential method11 in vacuo to assume and generate an appropriate smooth unbinding pathway. Then, TI along the path in aqueous solvent is performed to estimate the binding free energy. The SRPG method however, requires an experimental complex structure or a reliable model with a correctly predicted bound ligand configuration. Both US and TI require a definition of the reaction coordinate and initial structures spread over this reaction coordinate, preferably a single structure for every window. On the contrary, McMD and REMD are more general without requiring an effective reaction coordinate. Due to the strict requirements of a spatial reaction coordinate of the former methods, the choice of using the latter methods, which don’t require an explicit reaction coordinate, seems better for ab-initio docking, where the binding configuration is not yet known. Thus, one would also expect that McMD or REMD would provide the binding free energy correctly,17 since the simulations are not restrained to a reaction coordinate. However, both the McMD and REMD simulations give the correct canonical ensemble, which essentially causes lower ligand densities at positions far from the protein receptor. Previously, we performed flexible docking simulations between an SH3 domain and a proline-rich peptide in vacuo,29 between lysozyme and one of its inhibitors tri-NAG in explicit solvent,17 and between the N-terminal receptor domain of neural restrictive silencer factor and the paired amphipathic helix domain of mSin3 in explicit solvent.30 Although we executed simulations that successfully sampled near the native complex structure by using the powerful McMD method, we were unable to obtain accurate values for the binding free energies due to insufficient sampling in the unbound region. In order to obtain an accurate binding free energy, such less probable structures are also required. Since they can only be sampled by very long simulations or by employing a bias in an additional dimension via e.g. an umbrella potential,31 they would require much

6

ACS Paragon Plus Environment

Page 6 of 37

Page 7 of 37

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

higher computing costs. To overcome this problem, we here propose a combination of the McMD simulation with TI to accurately predict both the binding complex and the affinity. The essential weakness of the TI or SRPG methods requiring the correct bound complex structure28 must also be overcome by providing a reliable complex model from the McMD simulation. In this study, we have conducted flexible docking of an aminopyrazole inhibitor (CS3) to the flexible cyclin-dependent kinase 2 (CDK2), which is involved in regulating cell cycle control. The malfunctions of CDK2 are thought to cause tumorigenesis,32 and so its inhibitor is a highly potential drug candidate. Our goal is to reproduce the native complex structure and to calculate their binding affinity in a general ab-initio manner. Although the complex crystal structure of CDK2 with CS3 (PDB ID 4EK5)33 was deposited at the Protein Data Bank (PDB)34 and their affinity data from Community Structure-Activity Resource (CSAR)33 are available, they are only used here as the experimental references. The position of the ligand in a cylindrical region, which covers both the ligand binding pocket and the bulk water, is sampled by the McMD simulation starting from the unbound ligand structure, where the atoms of the protein inside the cylinder and those surrounding it, were treated as free without any restraints. By combining the McMD simulations and TI, both the bound complex of CDK2 with CS3 and their affinity were reproduced correctly. This combined scheme could become a method used for a general flexible protein-ligand system, providing accurate prediction of both the bound complex and the binding free energy, starting from the unbound structure.

MATERIALS & METHODS

7

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The CDK2 structure in complex with an aminopyrazole inhibitor with the PDB ID 4EK5 (resolution 1.6 Å) was used. This complex has a known binding affinity of -8.31 kcal/mol obtained by ITC experiments which was deposited to the CSAR database.33 The simulations were performed using a modified version of Gromacs 201635 and the system was prepared using Gromacs’ built-in tools, resulting in a system size of 72.2 x 63.0 x 62.5 Å3. Additional details regarding the system’s preparation and the simulation details are written in S1. Our strategy to obtain the binding free energy consists of the following four steps: 1) Define an axis λ as a basis for the space restraints on the ligand. 2) Perform McMD to sample complex structures in the restricted space along λ and to predict the native binding configuration. 3) Generate a reaction path ξ from the McMD data in order to obtain a smooth and realistic path, which can be effectively sampled by TI. 4) Calculate the binding free energy using TI. (1) Estimation of a space restraining cylinder using a naive method In rigid body docking, the search area is pre-defined by the user and limited to inside the pocket. When docking using MD however, we want to simulate binding and unbinding events, i.e. the simulation is not limited to only the pocket, but the bulk should be included as well. However, the bulk is assumed to be infinite in size. This means that the ligand can spend a long time in the bulk due to becoming adrift. This decreases the number of binding events and thus the number of docking experiments the simulation can do in a limited amount of time. To increase the efficiency of the simulation, i.e. to maximize the number of binding

8

ACS Paragon Plus Environment

Page 8 of 37

Page 9 of 37

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

events, the search space of the ligand is assumed to be limited in a finite region that connects a bulk region to the binding-pocket region. Each region can be represented as a sphere like the cyan areas in Figure 1, where these two spherical regions can be connected by a cylinder. To determine the direction of the cylinder, we employed a method that only requires the initial protein structure, the ligand molecular structure and the rough location of the binding pocket. To estimate the center of the binding pocket, we selected all the residues whose side chains have atoms within 8 Å from the ligand in the complex structure and calculated the geometric center as the pocket center using their heavy atoms. Here, the distance between the pocket center and the Center Of Mass (COM) of the ligand in the X-ray structure is ~ 3.4 Å. Here, we determined the direction of λ in the following manner by simply searching the optimal vector that connects the pocket and the bulk region and call this method the naive  R, m = 1 … m } method, as shown in Figure S4. First, spherical surface grid points {h are generated at an increasing radius R at constant intervals ∆R (e.g. 1 Å) starting from the center of a given pocket. Here, mmax is determined so that the density of the surface grid  R,  = 1 …  } for points is approximately 1 Å-2. Second, among the grid points {h

R which is the furthest from the all the heavy each radius R, we defined the point h atoms of the receptor. Among these points we only select those such that the distance is

between Rg + 2 Å and Rg + d Å. Here Rg is the radius of gyration of the ligand molecule and d is a value representing a buffer distance between the receptor and the ligand in the bulk. R and the given Finally, by using linear least squares fitting of the selected points h center of the pocket, a straight line connecting the pocket and the bulk is generated, which forms the basis for our space restraints. Finally, the ligand was positioned at the end of the

9

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

vector λ. After preparing the vector λ and placing the ligand at the end of the vector, the system was further prepared as describe in S1. (2) McMD simulations for ligand docking Then, the McMD simulations were performed for the ligand docking to predict a binding configuration. The initial configuration for McMD was obtained by our naive method and is shown in Figure 1. The ligand was restrained to stay within a cylinder defined by the direction calculated from the naive method. The cylinder was defined with a radius of 5 Å and λ spanned from -2.5 Å to 24 Å, where 24 Å corresponds to the length estimated by the naive method and 0 Å to the center of the pocket, as supplied to the naive algorithm. During the McMD simulation, in case the COM of the ligand leaves the boundary of the cylinder in the orthogonal direction to the axis, a harmonic potential with a force constant of 20 kcal / (mol Å2) was used to pull it back towards the cylinder. Similarly, in case the ligand leaves the confines of the cylinder along its axis, a force constant of 1000 kcal / (mol Å2) was used. This cylinder thus restrains the COM of the ligand in order to reduce the degrees of freedom, while connecting the bound and unbound regions together to enable multiple binding-unbinding experiments. The protein atoms beyond 14.5 Å from the axis of the cylinder were restrained as described in S1. The McMD simulations consist of training runs to iteratively estimate the density of states using eq. (4), described in S2, where the McMD method is explained in greater detail. This is followed by a successive run (i.e. the productive run) to sample structure trajectories. The potential energy (E) and the COM of the ligand at each step were stored for the production run, which lasted in total 6.144 x 109 steps (12.288 µs). In addition, the structures were stored every 20 ps. The structures sampled during the McMD simulations were reweighted and the canonical ensemble at a given temperature T, Pc(E, T), was obtained. 10

ACS Paragon Plus Environment

Page 10 of 37

Page 11 of 37

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Next, the FEL along λ and PC1/PC2 from the PCA (Principal Component Analysis) were calculated. We used a strategy which involves picking a representative structure at the predicted global minimum from the FEL along λ, to predict the native binding configuration. After calculating the FEL on λ at 300 K, all the structures near the global minimum were selected. These structures were then clustered based on their relative RMSD (Root Mean Square Deviation) within 0.1 Å. The cluster with the largest contribution to the PMF (which would correspond to the most stable one) was taken, after which the structure that was the closest to the average structure in this cluster was chosen as the representative structure. This is followed by picking structures to make a new path ξ, as described below.

(3) Generation of the improved reaction path ξ The initial structures for TI should be similar enough so that the conformational ensemble sampled in a window overlaps with the ensembles sampled in its neighboring windows. However, simply selecting configurations in the McMD ensemble based on the proximity of their COM can result in very abrupt changes in the ligand’s configuration between neighboring windows. In order to prevent this, first the configurations from these unsuccessful docking attempts need to be detected and excluded. Ligands inside deep pockets are generally unable to rotate about their own COM and change their orientation with respect to the native bound configuration. However, it is possible to filter the ensemble by orientation with respect to the predicted native configuration among those that deviate too much. In the case of CDK2-CS3, three layers can be identified naively from Figure 3A: 1) bound layer (λ < 7.5 Å) 11

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

2) intermediary layer (7.5 < λ < 12.5 Å) 3) unbound layer (λ > 12.5 Å) In each of the above layers, different criteria should be used due to changes in accessible volume, where in the bound region the lack of available space prevents the ligand from rotating, while in the unbound region the ligand is completely free. Because the axis of the cylinder λ is a rough estimate of the binding/unbinding direction, a smooth and realistic path for binding/unbinding should be generated in order to conduct TI. First, along the λ axis, representative configurations {qn, n=45} are picked in 0.5 Å windows starting from the predicted native bound complex, which was obtained using the method described above. For each window, all configurations with their λ value inside the window were selected and their RMSDs with respect to the selected configuration of the preceding window (qk-1) was calculated. For the structure with the smallest RMSD value, RMSDL was calculated, which functions as a measure for the relative orientation with respect to the predicted native bound configuration by calculating the RMSD after centering both configurations. In case the RMSDL value is smaller than a cutoff value, the configuration is accepted and the COM of the configuration is stored. The cutoff value depends on the layer, where in the bound layer the cutoff was set to 2 Å, in the intermediary layer the cutoff value was interpolated between 2 Å and the maximum RMSD value detected in the intermediary layer (8.4 Å), while in the unbound layer all configurations were accepted. The resulting representative ensemble {qn} has been selected so that the configurations are accessible from the predicted native bound configuration, as if the latter configuration were to dissociate from the pocket, similar to an ensemble produced by pulling simulations such as SMD. The COM values for windows with no configuration assigned were estimated using cubic interpolation, after which a path was traced through the points {qn} and the resulting path was smoothed 12

ACS Paragon Plus Environment

Page 12 of 37

Page 13 of 37

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

using a Savitzky–Golay filter.36 This produces a new path, which is called as ξ hereafter and represents the binding/unbinding path of CS3 to CDK2 predicted from the McMD simulations, which functions as our reaction path during TI. Whereas λ can only function as a guideline to differentiate between the bound and unbound regions, ξ is a path connecting physically accepted structures from the McMD ensemble at 300 K in the bound region to those in the bulk region. Thus, this should be an appropriate path, traveling near along the transition state without frictions between receptor and ligand. (4) Binding free energy calculation by TI along ξ Finally, in order to perform TI along ξ, the path was partitioned into 0.1 Å interval windows, where the center of each window was positioned on the path and separated by 0.1 Å. For each window, a structure was taken in a similar manner as described above, where configurations of neighboring windows should be similar along ξ. In the case of the intermediary and unbound regions, neighboring bins often share the same starting structure, due to a limited amount of structures available in the McMD ensemble at 300 K that are similar enough between neighboring bins. Energy minimization was first performed for each trajectory, after which they were equilibrated for 2 ns and subsequently sampled for 1 ns in the canonical ensemble at 300 K. For each window, three independent simulations were performed, where the velocities were randomized according to a different random seed for each simulation following a Maxwell distribution. The COM of the ligand was restrained using a harmonic potential to the window center using a strong force constant of 400 kcal / (mol Å2). The PMF was calculated by integrating the forces along ξ using the following equations:28a

13

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

         ∑   "#

〈 〉 = =

+

Page 14 of 37

(1)

∑  "#

〉 · (  $% &' (  = ) 〈 (

(2)

,

where   is the force on the COM of the ligand as calculated by the force field, r is a position

on ξ to which the ligand’s COM is restrained to and V+ is the restrained energy to keep the ligand along the path from 0 to ξ calculated as 1 V+ = /+  − + 1 2

(3)

Here, the square distance between the COM r and the anchor point + is computed with the

weight /+ = 400 kcal / (mol Å2). The integration along the path in eq. (2) was approximated

by taking a sum of 〈〉 · 4 at each small bin, the width of which was 0.1 Å, along the path

ξ. Note that the effect of the restraint is compensated between the denominator and numerator in the last formula of eq. (1). We then have 5G

∞  − G 7 8

= $% &' ξ → ∞

(4)

In order to compare the calculated binding free energy with the experimentally determined one, the simulation has to be corrected to take the sampled volume into account. The standard binding free energy can be determined as follows: ∆; ln A

14

$= B $

ACS Paragon Plus Environment

(5)

Page 15 of 37

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

where $= is the probability of the bound form and $ is probability of the unbound form.

Here, $= and PU are defined as follows:

$= =  C

7

)  C   DEFG

$ = H,  C

(6)

∞

(7)

Here, G(r), G(r0), H, and G(r∞) are the PMF at position r, the PMF at the global free energy minimum r0, the standard concentration at 1M (1661 Å3) and the PMF at a reference position in the bulk r∞, respectively28a. The PB term in eq. (6) can be approximated using the following equation: $= ≈  C

7

J  C+,ν,K 4(4L4M

(8)

DEFG

where L and ζ are axes perpendicular to ξ, and ∆ξ∆L∆ζ is a small area part of the site. Finally we have ∆; ln N

1 J  C+,ν,K 4(4L4M O H,

(9)

DEFG

Thus, the binding free energy is obtained by estimating G

∞ −

G

7

from eq. (4) and by

estimating ∑DEFG  C+,ν,K 4(4L4M from the binding probabilities at the binding site obtained

by the McMD simulation. This can be done by partitioning the region around the binding site into small cubes with a volume of 0.1 Å3 each, forming a 3D grid along the axes ξ, L and ζ.

Then, by calculating the PMF contribution ;(, ν, M of each cube, the relative sampling

probabilities over the binding region can be calculated. Finally, the volume sampled by the simulation can be calculated by summing the weighted volume of each cube.

15

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The current simulations were made on our in-house HPC system (3x HP s6500 with Xeon E5-2660v2) consisting of 24 GPGPUs (Nvidia Tesla K20) to accelerate the non-bonded calculations by the zero-dipole summation method37, and it required approximately two weeks to finish all the McMD simulations including the training and production runs. The final TI computations required another 2 days to be completed.

RESULTS Docking of CS3 to CDK2 by McMD The probability histogram of McMD obtained from the productive run and the reweighted canonical distributions at 300 K, 500 K and 700 K are plotted in Figure S5. Although the energy distribution of the production run has a slight incline towards the lower energy regions, the McMD simulation performed wide conformational sampling and no significant trapping was observed. The production run successfully sampled both the bound and unbound regions as indicated by Figure S6. In Figure S6A, the Solvent Accessible Surface Area (SASA) of CS3 at the various configurations along λ is plotted with the RMSD values of the ligand configurations against the crystal structure of CS3 at 300 K, 500 K and 700 K, respectively. Similarly, in Figure S6B, for each temperature, 1000 random samples from the trajectories were selected and the COM of the ligand was plotted. In the figure, at 300 K the samples mainly occupy the bound region, while at 500 K the ligand also ventures deep into the bulk. At 700 K, the samples are more uniformly distributed along the search space. By using McMD to randomly walk over the energy space, it becomes possible to efficiently sample both the bound and unbound states and perform multiple docking

16

ACS Paragon Plus Environment

Page 16 of 37

Page 17 of 37

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

experiments using a far lower timescale than that would be required by canonical brute force simulations. To accurately determine the PMF, especially in the bulk region, we have stored the COM of the ligand for each step during a significantly extended production run, 6.144 x 109 steps. The PMF along λ in Figure 3A, shows a clear global minimum (a) around λ = 3.3 Å, where the native bound structure is located (whose value is indicated by a red circle). Namely, we successfully predicted the native bound structure by using free energy along the λ axis. Furthermore, 96% of the predicted configurations along λ within 0.5 Å from the predicted global minimum had an RMSD value of less than 2 Å with respect to the X-ray structure. Other local minima were found around (b) 1.9 Å, (c) 0.7 Å and (d) 8.0 Å. Figure 3B shows representative structures at these minima. The structures at the global minimum and at the local minima were as follows. First, the near-native configuration depicted in blue (Figure S7B, which coincides with the blue structure in Figure 3B) was very close to the native structure (Figure S7A) in CPK colors (whose carbon atoms are colored green), and it has an RMSD of 0.5 Å against the crystal structure. CS3 was hydrogen bonded to CDK2 primarily via CDK2’s backbone in this near native configuration as well as in the X-ray structure. In the case of the X-ray structure, one hydrogen bond is made between the sidechain of Gln85 and CS3’s amine group. This hydrogen bond however was not present in the predicted structure, but was replaced with a hydrogen bond to the backbone oxygen of His84. Since CS3’s amine group is on the surface, the hydrogen bond in the X-ray structure may be unstable in explicit water due to many other hydrogen bonding partners being available to the amine group. Second, the structure with a λ value of 1.9 Å in cyan in Figure 3B had a significantly different configuration compared to the native structure and has a high RMSD value of 5.8 Å 17

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

against the crystal structure. Another configuration with a λ value of 0.7 Å in red with a RMSD value of 5.6 Å is very similar to the cyan configuration. These configurations are perpendicular to the axis of the cylinder inside the pocket, and it would require the ligand to rotate from its native configuration, forcing it to break the stable hydrogen bonds as shown in Figure S7B. Alternatively, the ligand might directly bind as this configuration without first assuming the native bound configuration as an intermediary configuration. Figure S8 shows the superposition between the red configuration and the X-ray structure for the agonist ATP, which are very similar configurations and are described in further detail in the DISCUSSION section. Lastly, the structure with a λ value of 8.0 Å in orange shown in Figure 3B was an intermediary structure between the bound and the unbound forms near the mouth of the pocket with an RMSD of 5.3 Å. The cyclopropyl group partially entered the pocket, while the carboxamide group was still outside the pocket, pointing into the bulk space. Noticeably, the carbonyl group was upwards in this configuration, versus downwards in the native bound complex, indicating that for successful binding from this configuration, the bond about the secondary amine group would have to rotate in order to assume the correct binding configuration. In addition, PCA of the configurations at 300 K was performed using the distance matrix for the atoms listed in Table S9, and the resulting FEL is shown in Figure S10A. Here, X denotes the correct binding configuration observed in the crystal structure, which is located close to the free energy minimum. In Figure S10B, typical structures picked from the FEL are shown. The representative structure at (a) is very close to the native structure, while the representative structures at (b) and (c) are similar to the ATP-like binding configuration.

18

ACS Paragon Plus Environment

Page 18 of 37

Page 19 of 37

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Binding affinity of CS3 to CDK2 by TI After obtaining the path ξ from the McMD simulation data as shown in Figure 4A, intermediary structures along the path were selected as the initial starting structures for the TI sampling simulations, as shown in Figure S11A. After sampling the forces and calculating the average forces in each bin according to eq. (1), TI was performed as in eq. (2) to obtain the binding free energy ∆; ln R" ∑DEFG  C+,ν,K 4(4L4MU = −4.82. T

Then, from eq. (9) with the above TI value, the binding free energy, ∆; 17.0 Å, the entropy reaches its peak and stabilizes, indicating that

CS3’s interaction with CDK2 is very limited and indistinguishable from CS3’s interaction with the surrounding bulk. This can also be seen in the TI plot (Figure 5), where the PMF barely changes from this point onwards. Finally, at ( > 21.0 Å the ligand has fully dissociated from the receptor.

Position restraints in this work (described in S1) were primarily used for two reasons: preventing the protein from unfolding and to prevent the protein from dissociating away from 25

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

the cylinder. In case distance restraints were to be used, this would not prevent the protein from dissociating away from the cylinder. By using position restraints for the distant areas of the protein, the pocket and the surrounding areas can be treated as flexible, while the protein remains largely stable and doesn’t dissociate away from the cylinder. This strategy however introduces some difficulties. For example, in Figure S8 the CDK2-CS3 complex is colored in light-red/magenta, while the CDK2-ATP is colored light-blue/cyan. The loop region, which has a darker tint, is shown to differ between the two structures. This interaction in combination with the position restraints, limit the movement of the sheet structure and thus make ATP’s sub-pocket more rigid. Similarly, in case the pocket were to be near a large hinge movement region, this movement could be restrained, thus preventing from properly simulating the protein’s dynamics and possibly preventing the ligand from binding to the receptor. In this situation, one could use distance restraints for the protein instead of position restraints. To prevent the protein from dissociating away from the cylinder, the position of the cylinder based on the relative position and orientation of the protein needs to be updated, or translational and rotational restraints need to be applied on the COM of the protein.

Using McMD we were able to predict the correct binding configuration of the ligand, however, from the PMF of λ in Figure 3A, the sub-stable configurations were discovered around λ = 1.9 Å and λ = 0.7 Å. Interestingly, these configurations are similar to that of CDK2’s agonist, ATP. Figure S8 shows the superposition between the structure at λ = 0.7 Å (red/magenta) and the X-ray structure for the agonist ATP (blue/cyan, PDB ID=1FIN39). The predicted configuration binds in a similar manner to ATP, but this configuration was not found by the X-ray experiment. Considering that ATP binds in a similar manner to CDK2 as this docking configuration, this could potentially also be a valid binding mode of CS3 to

26

ACS Paragon Plus Environment

Page 26 of 37

Page 27 of 37

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

CDK2, even though the X-ray experiment only seems to have found the primary binding site. However, when the overlapping volume was measured and the Jaccard coefficient was calculated, a ratio of only 0.27 was detected. Here, this coefficient is defined as the volume of the intersection between the CS3 configuration and the ATP configuration divided by their union, calculating the overlap ratio between the two molecules. This low overlap ratio is primarily caused by CS3 being translated away from the sheet region with respect to ATP, thus decreasing the overlap. ATP’s larger volume also decreases the Jaccard coefficient. Noticeable though is that the sheet region in the right side is folded differently to make room for ATP. A major difference between the fold of the ATP bound CDK2 and the CS3 bound CDK2, is that in the case of CS3 bound CDK2 the loop region from Phe149 – Thr165 is folded in such a way that it interacts with the sheet structure near the pocket, while in the ATP bound CDK2 the loop is interacting with a region around Val123, forming a sheet structure distant from the pocket. Finally, from our canonical simulations, we found that configuration (c) is not stable at 400 K as well as the similar configuration (b). Furthermore, even the canonical simulation at 300 K showed some fluctuations for configuration (c). Therefore these ATP-like configurations are not as stable as the natively bound configuration, while the 1D FEL obtained from the McMD simulation suggests that these minima are not very far from the near-native configuration. Although the position of the predicted configuration differs somewhat from ATP’s configuration based on the volume overlap, our powerful McMD simulation suggested a secondary binding site for CS3.

CONCLUSION We executed McMD simulations in order to dock the CS3 inhibitor into CDK2 starting from the unbound structure, followed by TI to calculate the affinity. The most stable 27

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

binding configuration was found to coincide with the native X-ray structure, as indicated by the FEL and the RMSD values. From the ensemble of the configurations obtained by McMD docking, we created a new path, ξ, along which we performed TI. Then, the binding free energy was accurately predicted in accordance with the experimental result. Although the current procedure requires considerable computational time even using a GPGPU system, it could provide highly reliable information to optimize chemical compounds for drug development.

Figure 1. Space restraints and initial structure for McMD simulation. The center of mass (COM) of the ligand is restrained to stay inside the cylinder, which is defined based on the axis λ ranging from -2.5 Å to +24 Å as a black dotted line. The zero point is identified by a black sphere. The ligand in the X-ray structure is colored in CPK with the carbon atoms colored green and is denoted by “X-ray” and hereon after referred to as the native configuration or structure, while the protein is rendered as cartoon in white. Also shown is the position of the ligand before the McMD simulation denoted by “McMD initial configuration”. Here, the bound and unbound regions are indicated schematically as cyan circles. All of the images of

28

ACS Paragon Plus Environment

Page 28 of 37

Page 29 of 37

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

molecular structures were rendered using our new program Molmil,40 which is a high quality WebGL based molecular viewers available on many platforms.

Figure 2. Schematic representation of the generation of the path ξ for TI by binning along λ. Starting from the representative configuration q0 at the PMF’s free energy minimum and using λ as a guiding parameter, representative configurations along λ are picked from the filtered ensemble at 0.5 Å intervals, where the minimum RMSD configuration in each bin k is picked with respect to the picked configuration from the preceding bin k-1. A path is traced through the COMs of these picked configurations, where after this path is smoothed, a new path ξ is obtained.

Figure 3. (A) PMF of the ligand by projection of the COMs obtained from the reweighted McMD ensemble at 300 K onto λ. The PMF was normalized so that the global free energy minimum was set to zero. The λ value was stored at each step and was used to generate the plot, with a bin-size of 0.1 Å. The positions of each minima are shown in the figure as a-d. (B)

29

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Binding configurations for structures at λ values of (a) 3.3 Å, (b) 1.9 Å, (c) 0.7 Å and (d) 8.0 Å in blue, cyan, red and orange respectively. The native structure is shown in CPK color with the carbon atoms colored green as a reference. Also shown is the axis λ as a black dotted line. Notable are the cyan and red configurations, which are perpendicular to the axis of the cylinder inside the pocket and bind in a similar orientation as CDK2’s agonist ATP.

Figure 4. (A) Smoothed path ξ in red estimated from the McMD calculations along which the configurations were sampled using TI. Also shown are the axis λ as a black dotted line and the predicted bound configuration in blue. (B) PMF from the reweighted McMD ensemble at 300 K by projection of the COMs onto ξ. The PMF was normalized so that the global free energy minimum was set to zero. Also indicated is the range used to calculate the volumetric correction in eq. (9) as dotted lines with a double headed arrow.

Figure 5. PMF along ξ obtained by TI simulations. The average PMF value over the final 1 Å corresponds to 13.67 kcal/mol, with a standard deviation of ±0.005 kcal/mol and the error, which was calculated using 1000 bootstraps corresponds to ±0.02 kcal/mol. The PMF was normalized so that the global free energy minimum was set to zero. The zero value of ξ corresponds

30

ACS Paragon Plus Environment

Page 30 of 37

Page 31 of 37

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

to the predicted binding configuration, i.e. (a) in Figure 3B.

31

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ASSOCIATED CONTENT

Supporting Information. Supporting information S1, S2, Figures S3-S8, S10-S14 with each corresponding figure legend and Table S9 (SupportingInformation.pdf).

AUTHOR INFORMATION Corresponding Author *E-mail: [email protected].

Notes The authors declare no competing financial interest.

ACKNOWLEDGMENTS This work was supported by Grant-in-Aid for Scientific Research on Innovative Areas “Transcription cycle” (24118008) to G.-J.B. and H.N. from Ministry of Education, Culture, Sports, Science and Technology-Japan (MEXT) Grand-in-Aid for Scientific Research C (JP16K07331) from the Japan Society for the Promotion of Science (JSPS) to N.K., and Grant-in-Aid for Exploratory Research (JP16K14711) to H. N. from JSPS. H.N. and I.F. were supported by the “Development of core technologies for innovative drug development based upon IT” from Japan Agency for Medical Research and development (AMED). It was performed in part under the Cooperative Research Program of the Institute for Protein Research, Osaka University, CR-16-05 to N.K. This research was done in activities of the K supercomputer-based drug discovery project by Biogrid pharma consortium (KBDD), as well as MEXT as "Priority Issue on Post-K computer" (Building Innovative Drug Discovery

32

ACS Paragon Plus Environment

Page 32 of 37

Page 33 of 37

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Infrastructure Through Functional Control of Biomolecular Systems) and FOCUS Establishing Supercomputing Center of Excellence to N.K., M.A., and Y.O. Computational resources of a PC cluster at Cybermedia Center, Osaka University were partly provided by the HPCI Research Project (hp150025, hp150146, hp150272, hp160010 and hp160213).

REFERENCES 1. Kuntz, I. D.; Blaney, J. M.; Oatley, S. J.; Langridge, R.; Ferrin, T. E., A Geometric Approach to Macromolecule-Ligand Interactions. J. Mol. Biol. 1982, 161 (2), 269-288. 2. Goodsell, D. S.; Olson, A. J., Automated Docking of Substrates to Proteins by Simulated Annealing. Proteins 1990, 8 (3), 195-202. 3. Rarey, M.; Kramer, B.; Lengauer, T.; Klebe, G., A fast flexible docking method using an incremental construction algorithm. J. Mol. Biol. 1996, 261 (3), 470-489. 4. Jones, G.; Willett, P.; Glen, R. C.; Leach, A. R.; Taylor, R., Development and validation of a genetic algorithm for flexible docking. J. Mol. Biol. 1997, 267 (3), 727-748. 5. Fukunishi, Y.; Mikami, Y.; Nakamura, H., Similarities among receptor pockets and among compounds: Analysis and application to in silico ligand screening. J. Mol. Graphics Model. 2005, 24 (1), 34-45. 6. (a) Craig, I. R.; Essex, J. W.; Spiegel, K., Ensemble Docking into Multiple Crystallographically Derived Protein Structures: An Evaluation Based on the Statistical Analysis of Enrichments. J. Chem. Inf. Model. 2010, 50 (4), 511-524; (b) Wada, M.; Kanamori, E.; Nakamura, H.; Fukunishi, Y., Selection of In Silico Drug Screening Results for G-Protein-Coupled Receptors by Using Universal Active Probes. J. Chem. Inf. Model. 2011, 51 (9), 2398-2407. 7. Sherman, W.; Day, T.; Jacobson, M. P.; Friesner, R. A.; Farid, R., Novel procedure for modeling ligand/receptor induced fit effects. J Med Chem 2006, 49 (2), 534-553. 8. Shaw, D. E.; Deneroff, M. M.; Dror, R. O.; Kuskin, J. S.; Larson, R. H.; Salmon, J. K.; Young, C.; Batson, B.; Bowers, K. J.; Chao, J. C.; Eastwood, M. P.; Gagliardo, J.; Grossman, J. P.; Ho, C. R.; Ierardi, D. J.; Kolossvary, I.; Klepeis, J. L.; Layman, T.; Mcleavey, C.; Moraes, M. A.; Mueller, R.; Priest, E. C.; Shan, Y. B.; Spengler, J.; Theobald, M.; Towles, B.; Wang, S. C., Anton, a special-purpose machine for molecular dynamics simulation. Commun. Acm. 2008, 51 (7), 91-97. 9. Nakajima, N.; Nakamura, H.; Kidera, A., Multicanonical ensemble generated by molecular dynamics simulation for enhanced conformational sampling of peptides. J. Phys. Chem. B 1997, 101 (5), 817-824. 10. Sugita, Y.; Okamoto, Y., Replica-exchange molecular dynamics method for protein folding. Chem. Phys. Lett. 1999, 314 (1-2), 141-151. 11. Fukunishi, Y.; Mikami, Y.; Nakamura, H., The filling potential method: A method for estimating the free energy surface for protein-ligand docking. J. Phys. Chem. B 2003, 107 (47), 13201-13210. 12. Laio, A.; Parrinello, M., Escaping free-energy minima. Proc. Natl. Acad. Sci. USA 2002, 99 (20), 12562-12566.

33

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

13. (a) Bartels, C.; Karplus, M., Multidimensional adaptive umbrella sampling: Applications to main chain and side chain peptide conformations. J. Comput. Chem. 1997, 18 (12), 1450-1462; (b) Higo, J.; Dasgupta, B.; Mashimo, T.; Kasahara, K.; Fukunishi, Y.; Nakamura, H., Virtual-system-coupled adaptive umbrella sampling to compute free-energy landscape for flexible molecular docking. J. Comput. Chem. 2015, 36 (20), 1489-1501. 14. Hamelberg, D.; Mongan, J.; McCammon, J. A., Accelerated molecular dynamics: A promising and efficient simulation method for biomolecules. J. Chem. Phys. 2004, 120 (24), 11919-11929. 15. Darve, E.; Rodriguez-Gomez, D.; Pohorille, A., Adaptive biasing force method for scalar and vector free energy calculations. J. Chem. Phys. 2008, 128 (14). 16. Kamiya, N.; Higo, J.; Nakamura, H., Conformational transition states of a beta-hairpin peptide between the ordered and disordered conformations in explicit water. Protein Sci. 2002, 11 (10), 2297-2307. 17. Kamiya, N.; Yonezawa, Y.; Nakamura, H.; Higo, J., Protein-inhibitor flexible docking by a multicanonical sampling: Native complex structure with the lowest free energy and a free-energy barrier distinguishing the native complex from the others. Proteins 2008, 70 (1), 41-53. 18. Kokubo, H.; Tanaka, T.; Okamoto, Y., Ab Initio Prediction of Protein-Ligand Binding Structures by Replica-Exchange Umbrella Sampling Simulations. J. Comput. Chem. 2011, 32 (13), 2810-2821. 19. Liu, P.; Kim, B.; Friesner, R. A.; Berne, B. J., Replica exchange with solute tempering: A method for sampling biological systems in explicit water. Proc. Natl. Acad. Sci. USA 2005, 102 (39), 13749-13754. 20. Wang, L. L.; Friesner, R. A.; Berne, B. J., Replica Exchange with Solute Scaling: A More Efficient Version of Replica Exchange with Solute Tempering (REST2). J. Phys. Chem. B 2011, 115 (30), 9431-9438. 21. Nakajima, N., A selectively enhanced multicanonical molecular dynamics method for conformational sampling of peptides in realistic water molecules. Chem. Phys. Lett. 1998, 288 (2-4), 319-326. 22. Miao, Y. L.; Sinko, W.; Pierce, L.; Bucher, D.; Walker, R. C.; McCammon, J. A., Improved Reweighting of Accelerated Molecular Dynamics Simulations for Free Energy Calculation. J. Chem. Theory. Comput. 2014, 10 (7), 2677-2689. 23. (a) Massova, I.; Kollman, P. A., Computational alanine scanning to probe proteinprotein interactions: A novel approach to evaluate binding free energies. J. Am. Chem. Soc. 1999, 121 (36), 8133-8143; (b) Kollman, P. A.; Massova, I.; Reyes, C.; Kuhn, B.; Huo, S. H.; Chong, L.; Lee, M.; Lee, T.; Duan, Y.; Wang, W.; Donini, O.; Cieplak, P.; Srinivasan, J.; Case, D. A.; Cheatham, T. E., Calculating structures and free energies of complex molecules: Combining molecular mechanics and continuum models. Acc. Chem. Res. 2000, 33 (12), 889897. 24. (a) Zhou, R. H., Free energy landscape of protein folding in water: Explicit vs. implicit solvent. Proteins 2003, 53 (2), 148-161; (b) Singh, N.; Warshel, A., Absolute binding free energy calculations: On the accuracy of computational scoring of protein-ligand interactions. Proteins 2010, 78 (7), 1705-1723. 25. (a) Mobley, D. L.; Graves, A. P.; Chodera, J. D.; McReynolds, A. C.; Shoichet, B. K.; Dill, K. A., Predicting absolute ligand binding free energies to a simple model site. J. Mol. Biol. 2007, 371 (4), 1118-1134; (b) Fujitani, H.; Tanida, Y.; Matsuura, A., Massively parallel computation of absolute binding free energy with well-equilibrated states. Phys. Rev. E 2009, 79 (2); (c) Wang, L.; Wu, Y. J.; Deng, Y. Q.; Kim, B.; Pierce, L.; Krilov, G.; Lupyan, D.; Robinson, S.; Dahlgren, M. K.; Greenwood, J.; Romero, D. L.; Masse, C.; Knight, J. L.;

34

ACS Paragon Plus Environment

Page 34 of 37

Page 35 of 37

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Steinbrecher, T.; Beuming, T.; Damm, W.; Harder, E.; Sherman, W.; Brewer, M.; Wester, R.; Murcko, M.; Frye, L.; Farid, R.; Lin, T.; Mobley, D. L.; Jorgensen, W. L.; Berne, B. J.; Friesner, R. A.; Abel, R., Accurate and Reliable Prediction of Relative Ligand Binding Potency in Prospective Drug Discovery by Way of a Modern Free-Energy Calculation Protocol and Force Field. J Am Chem Soc 2015, 137 (7), 2695-2703. 26. Torrie, G. M.; Valleau, J. P., Non-Physical Sampling Distributions in Monte-Carlo Free-Energy Estimation - Umbrella Sampling. J. Comput. Phys. 1977, 23 (2), 187-199. 27. Frenkel, D. S., B., Thermodynamic Integration. In Understanding molecular simulation: From algorithms to applications, Second Edition ed.; 2002; pp 168-172. 28. (a) Fukunishi, Y.; Mitomo, D.; Nakamura, H., Protein-Ligand Binding Free Energy Calculation by the Smooth Reaction Path Generation (SRPG) Method. J Chem Inf Model. 2009, 49 (8), 1944-1951; (b) Nguyen, H.; Tran, T.; Fukunishi, Y.; Higo, J.; Nakamura, H.; Le, L., Computational Study of Drug Binding Affinity to Influenza A Neuraminidase Using Smooth Reaction Path Generation (SRPG) Method. J. Chem. Inf. Model. 2015, 55 (9), 19361943. 29. Nakajima, N.; Higo, J.; Kidera, A.; Nakamura, H., Flexible docking of a ligand peptide to a receptor protein by multicanonical molecular dynamics simulation. Chem. Phys. Lett. 1997, 278 (4-6), 297-301. 30. Higo, J.; Nishimura, Y.; Nakamura, H., A Free-Energy Landscape for Coupled Folding and Binding of an Intrinsically Disordered Protein in Explicit Solvent from Detailed All-Atom Computations. J. Am. Chem. Soc. 2011, 133 (27), 10448-10458. 31. Dickson, A.; Ahlstrom, L. S.; Brooks III, C. L., Coupled Folding and Binding with 2D Window-Exchange Umbrella Sampling. J. Comput. Chem. 2016, 37 (6), 587-594. 32. (a) Malumbres, M.; Barbacid, M., Cell cycle, CDKs and cancer: a changing paradigm. Nat. Rev. Cancer 2009, 9 (3), 153-166; (b) Asghar, U.; Witkiewicz, A. K.; Turner, N. C.; Knudsen, E. S., The history and future of targeting cyclin-dependent kinases in cancer therapy. Nat. Rev. Drug Discov. 2015, 14 (2), 130-146. 33. Dunbar, J. B., Jr.; Smith, R. D.; Damm-Ganamet, K. L.; Ahmed, A.; Esposito, E. X.; Delproposto, J.; Chinnaswamy, K.; Kang, Y. N.; Kubish, G.; Gestwicki, J. E.; Stuckey, J. A.; Carlson, H. A., CSAR data set release 2012: ligands, affinities, complexes, and docking decoys. J. Chem. Inf. Model. 2013, 53 (8), 1842-1852. 34. Berman, H.; Henrick, K.; Nakamura, H.; Markley, J. L., The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data. Nucleic. Acids Res. 2007, 35, D301-D303. 35. (a) Pronk, S.; Pall, S.; Schulz, R.; Larsson, P.; Bjelkmar, P.; Apostolov, R.; Shirts, M. R.; Smith, J. C.; Kasson, P. M.; van der Spoel, D.; Hess, B.; Lindahl, E., GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit. Bioinformatics 2013, 29 (7), 845-854; (b) Abraham, M. J.; Murtola, T.; Schulz, R.; Páll, S.; Smith, J. C.; Hess, B.; Lindahl, E., GROMACS: High performance molecular simulations through multilevel parallelism from laptops to supercomputers. SoftwareX 2015, 1-2, 19-25. 36. Savitzky, A.; Golay, M. J. E., Smoothing + Differentiation of Data by Simplified Least Squares Procedures. Anal. Chem. 1964, 36 (8), 1627-1639. 37. Fukuda, I.; Yonezawa, Y.; Nakamura, H., Molecular dynamics scheme for precise estimation of electrostatic interaction via zero-dipole summation principle. J. Chem. Phys. 2011, 134 (16), 164107. 38. Araki, M.; Kamiya, N.; Sato, M.; Nakatsui, M.; Hirokawa, T.; Okuno, Y., The Effect of Conformational Flexibility on Binding Free Energy Estimation between Kinases and Their Inhibitors. J. Chem. Inf. Model 2016, 56 (12), 2445-2456.

35

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

39. Jeffrey, P. D.; Ruso, A. A.; Polyak, K.; Gibbs, E.; Hurwitz, J.; Massague, J.; Pavletich, N. P., Mechanism of Cdk Activation Revealed by the Structure of a Cyclina-Cdk2 Complex. Nature 1995, 376 (6538), 313-320. 40. Bekker, G. J.; Nakamura, H.; Kinjo, A. R., Molmil: a molecular viewer for the PDB and beyond. J. Cheminform. 2016, 8.

36

ACS Paragon Plus Environment

Page 36 of 37

Page 37 of 37

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Graphical Abstract

37

ACS Paragon Plus Environment