Binding Mode and Induced Fit Predictions for Prospective

Mar 14, 2016 - Computer-aided drug design plays an important role in medicinal chemistry to ... binding modes for a range of systems using PELE (Prote...
1 downloads 0 Views 4MB Size
Subscriber access provided by UNIV OF CALIFORNIA SAN DIEGO LIBRARIES

Article

Binding mode and induced fit predictions for prospective computational drug design Christoph Grebner, Jessica Iegre, Johan Ulander, Karl Edman, Anders Hogner, and Christian Tyrchan J. Chem. Inf. Model., Just Accepted Manuscript • DOI: 10.1021/acs.jcim.5b00744 • Publication Date (Web): 14 Mar 2016 Downloaded from http://pubs.acs.org on March 15, 2016

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Chemical Information and Modeling is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 67

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Binding Mode and Induced Fit Predictions for Prospective Computational Drug Design

Christoph Grebner*1, Jessica Iegre2, Johan Ulander1, Karl Edman3, Anders Hogner1, Christian Tyrchan*2 1

CVMD Innovative Medicine, 2RIA Innovative Medicine, 3Discovery Science, AstraZeneca

R&D 43283 Mölndal, Sweden; *[email protected], [email protected]

Abstract Computer-aided drug design plays an important role in medicinal chemistry to obtain insights into molecular mechanisms and to prioritize design strategies. Although significant improvement has been made in structure based design, it still remains a key challenge to accurately model and predict induced fit mechanisms. Most of the current available techniques either do not provide sufficient protein conformational sampling or are too computationally demanding to fit an industrial setting. The current study presents a systematic and exhaustive investigation of predicting binding modes for a range of systems using PELE (Protein Energy Landscape Exploration), an efficient and fast protein-ligand sampling algorithm. The systems analyzed (cytochrome P, kinase, protease, nuclear hormone receptor) exhibit different complexities of ligand induced fit mechanisms and protein dynamics. The results are compared with results from classical molecular dynamics simulations and (induced fit) docking. This study shows that ligand induced sidechain rearrangements and smaller to medium backbone movements are captured well in PELE. Large secondary structure rearrangements, however, remain challenging for all employed techniques. Relevant binding modes (ligand heavy atom RMSD < 1.0 Å) can be obtained by the PELE method within a few hours of simulation, positioning PELE as a tool applicable for rapid drug design cycles.

1 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 67

1. Introduction Computer Aided Drug Design (CADD) or specifically Structural Based Drug Design (SBDD) allows for atomic level analysis of how ligands interact with their biological target.1 Most importantly it provides a way to map biological activity data to distinct structural features that can be used for prospective design.2,

3

This requires a combination of computational

modelling and experimental information from various biophysical methods, e.g. X-ray crystallography and NMR.4-6 However, most of the well-characterized experimental methods do not provide insights into all aspects of the dynamic nature of protein-ligand complexes. Further, it is not always possible to obtain relevant experimental structural information in which case homology models might represent an alternative, although the information from such models needs to be treated with great scrutiny.7-9 In a drug design context, once the structural information is available, medicinal and computational chemists often use molecular docking at any stage of the process to analyze design hypothesis and select compounds for synthesis.10 Although computationally cheap and therefore applicable when thousands or even millions of compounds have to be screened virtually, docking algorithms have to simplify the binding process greatly.10 Flexibility of the ligand is typically treated using conformational ensembles whereas protein flexibility is not taken into account. Thus all components remain in the same conformation throughout the docking process. Once the molecules have been docked, a fit optimization of the ligand is performed. This approach relies on ligand conformational generation and sampling as a preparation step, potential knowledge of the bioactive conformation of the ligand and the assumption that neither the binding site nor the binding mode is changed.11 In reality, small sidechain or even backbone movements on binding of different ligands into the same protein are common12 and various strategies are available to address the dynamics of protein-ligand binding.1,

13-16

A

computationally cheap method to account for protein flexibility is Induced Fit Docking (IFD).17, 18

However, only residues around the ligand are considered to be flexible and the energetic

quantification of protein-ligand binding is calculated using simplified scoring functions.1, 15-18

2 ACS Paragon Plus Environment

Page 3 of 67

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Some recent improvements to flexible protein-ligand docking include GalaxyDock,19 DOLINA,20 or CDOCKER.21 GalaxyDock19 combines global optimization of preselected sidechains with flexible ligand docking and can be seen as an extension of LigDockCSA.22 While the method is an improvement over previous flexible docking approaches, the preselection of flexible sidechains can be a critical part of the performance.19 DOLINA selects the flexible sidechains as part of the binding site and treats the flexibility by combinatorial rearrangements of the sidechains. Ligand flexibility is, however, not directly taken into account. Instead, it relies on a pre-generated set of ligand conformations.20 In contrast, protein and ligand flexibility is handled simultaneously in the CDOCKER method21 which represents an advantage over the other approaches. One common denominator for all of the outlined approaches is that the protein flexibility is restricted to sidechains and does not include backbone movement. Applying more rigorous statistical mechanical treatment of the flexibility is computationally considerably more expensive. Approaches such as Monte Carlo (MC) or Molecular dynamic simulations (MD) both simulate the movements of atoms or molecules while allowing for interactions between the particles.23 As such they should account for flexibility and describing the dynamics of protein-ligand binding events.1,

24-26

In principle both MD and MC can be

used to calculate free energies of binding.27-30 The accuracy of such simulations hinges on two main parameters; adequate sampling of relevant conformations and accuracy of force fields of all components. A key word here is “adequate”. The sampling required is a function of system size and the precise details of the force field.31, 32 Typically, the degrees of freedom needed to estimate the free energy difference for a process are unknown a priori. Due to the many degrees of freedom for atomistic force fields, brute force simulations can never be expected to fully map the free energy changes for biologically relevant processes.31, 32 The amount of sampling required should reduce drastically if the relative free energies between similar processes are compared. This is utilized in free energy perturbation theory33 or various thermodynamic integration schemes.34 Nevertheless system size, sampling

3 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 67

accuracy, relevant time scales for larger movements and the applied molecular mechanics force-fields remain challenging.1, 31, 32 The focus of this work will be the “sampling problem”. In short the goal is to sample relevant conformations in an effective manner. For a process involving rearrangement of sidechains one needs ways of artificially constrain the system to avoid sampling irrelevant states. Additionally, one has to avoid being trapped in local minima. Several attempts to modify conventional molecular dynamics have been made in order to address these problems, among

others

replica

exchange

dynamics,35

steered

molecular

dynamic,36,

37

metadynamics,38, 39 transition path sampling40 or modified potential landscape methods.41, 42 All these methods have been used extensively and have both strengths and drawbacks.32, 43, 44

For this study, we wanted to explore a different approach; the Protein Energy Landscape

Exploration (PELE) method.45,

46

PELE is a Monte Carlo based sampling technique that

combines protein and ligand perturbations with protein structure prediction algorithms, allowing for an efficient and accurate sampling of a subset of the conformational space of protein-ligand complexes.47-49 The PELE results are then compared with results from classical explicit solvent MD, conventional docking, and IFD. In this context, we are interested in assessing the performance and capabilities of the different methods in describing protein-ligand dynamics and especially in identifying induced fit pockets which is of precious use for compound optimization. Currently, benchmark sets for divers protein-ligand complexes with different complexity of the conformational changes are missing. To address this issue, several systems were selected using public high resolution Xray structures where the quality of the electron density at the active site and for ligand was further checked manually. The systems were taken as a benchmark set with the aim to study binding mode and induced fit predictions with the above mentioned different methods. The test set includes a cytochrome P system, several kinases, a protease and a nuclear hormone receptor which is also discussed in the context of non-linear structure-activity relationships.50 Beside investigating the performance for locating the binding site of a protein without using 4 ACS Paragon Plus Environment

Page 5 of 67

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

prior knowledge (free ligand binding site exploration), the capability of capturing conformational changes, ranging from sidechain movements up to larger backbone and secondary structure movements upon ligand binding (induced fit), are explored and summarized as well. Furthermore, the applicability of PELE within drug design cycles is discussed which includes both the accuracy of the predictions and the required simulation time.

5 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

2. Methods and Computational Details 2.1. Protein Preparation Protein structures were prepared using the Protein Preparation Wizard51 of Maestro13 adding hydrogen atoms, checking the protonation states of sidechains and ligands, and optimizing the hydrogen-bonding network. If necessary, missing loops or sidechains have been added during the preparation using Prime. The resulting structures were checked by visual inspection. For consistency across the different systems, all solvent molecules were removed in the preparation step. Furthermore, the main purpose of the study is to investigate ligandinduced fit mechanisms for future application in prospective drug design (i.e. with unknown ligand binding modes). In this context, placement of water molecules is not known a priori and it is not accurately predictable where such position might be.52-54 2.2. Unbiased free exploration using PELE PELE (Protein Energy Landscape Exploration) is a Monte Carlo based technique combined with protein structure prediction algorithms.45, 46 It consists of three main steps: 1) ligand and protein perturbation, 2) sidechain sampling, and 3) minimization with a Metropolis acceptance criteria. Ligand perturbation is based on a combination of rotation, translation, and sampling of a rotamer library. Protein dynamics are sampled by displacement of the protein following normal modes from an elastic network model.55, 56 The sidechain sampling steps are based on a sidechain prediction algorithm using steric filtering as well as clustering45 and include all sidechains close to the ligand within a user defined distance of 6 Å around the heavy atoms of the ligand. The last step involves a local optimization of specific regions including, at least, all residues local to the atoms involved in the perturbation and the sidechain prediction step. The resulting structure is accepted or rejected by applying a Metropolis criterion. In the current simulations, an OPLS (Optimized Potentials for Liquid Simulations) all-atom force field with an implicit surface-generalized Born (SBG) continuum solvent model was used.45, 46 6 ACS Paragon Plus Environment

Page 6 of 67

Page 7 of 67

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

In a parallel PELE execution with n processors, one processor is collecting and managing the data and communication, while n-1 processors perform independent Monte Carlo sampling leading to n-1 trajectories. The sampling of specific events like ligand entry can be enhanced by employing the spawning algorithm of PELE. A user-defined metric (e.g. the distance between ligand and binding site) is measured. The leading trajectory with the best value for this metric (e.g. lowest distance to binding site) updates the trajectory with the worst value (in this case farthest away from the binding site) if it exceeds a metric cutoff value. This leads to a collective exploration of the defined metric without a predefined direction. Protein backbone perturbation is based on the ANM method55, 56 and includes a maximum Calpha displacement of 1.5 Å. The moving direction was updated every 8 steps picking a random mode from the 6 lowest normal modes mixed with 40% of the remaining 5 modes. The modes are mixed by first choosing the main mode randomly from the 6 used normal modes. The moving direction is then the normalized vector obtained from a linear combination using formula (1) with a = 0.6, N = 6, and main equals the index of the chosen main mode. The mixing scheme was used for all protein perturbation steps.       = ∑  ∗  ; =  



 = 

 ≠ 

(1)

Ligand perturbation included equally distributed translational steps of 2.0 and 4.0 Å and rotational steps of 0.02, 0.20, and 0.45 radians. 50 trials combined with 20 sterical trials were used to generate different random orientations and conformations of the ligand. Acceptance of trials is based on steric and energetic selection criteria. The perturbation direction was updated every 3 steps. For low normalized SASA (Solvent Accessible Surface Area) values (between 0.1 - 0.3), smaller translational steps of 1.0 Å were taken and the ligand perturbation direction was updated at each step. The number of trials was increased to 200 and the number of steric trials decreased to 5. For the free exploration, 156 processors for 24h were used.

7 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 67

2.3. Binding site refinement simulation using PELE For the refinement simulations, the random ligand perturbation direction was updated every step. Small translational steps (0.05 and 0.25 Å equally distributed) were combined with rotational steps of 0.3 radians in 67% of the steps and 0.05 radians in the remaining steps. The rotational changes are correlated to 50 trials together with a large number of steric trials (10000) in 67% of the steps and 10 trials with 50 steric trials were used for the remaining steps. To investigate the best procedure for the binding site refinement, the refinement simulations were initialized from 1) the structures with the lowest binding energy, 2) the structures with the lowest SASA value, 3) the structures from the most populated clusters, or 4) a combination of the cases described before. For refinement runs, 48 processors for 12h were employed.

2.4. Analysis of PELE simulations Analysis of the simulation data was done using VMD57 and custom created scripts. RMSD values of the ligand heavy atom with respect to the corresponding X-ray structure were calculated using the “RMSD Trajectory tool” included in VMD. For the trajectories showing the

lowest

LRMSD

values,

the

entry

pathways,

protein

movements,

sidechain

rearrangements etc. were investigated by visual inspection. In all benchmark cases, knowledge about the correct binding pose as well as the X-ray structure was already available. To get insights into the applicability for prospective drug design, the potential of different metrics in predicting the binding site/pose were investigated. The metrics included the lowest binding energy, the lowest SASA values and the highest populated ligand clusters. 10 frames coming from different trajectories and showing the lowest value for the specific metric were taken and compared to the X-ray structure. Furthermore, the four highest populated ligand clusters of the sampled phase space were overlaid to the experimentally observed structure. Cluster analysis was performed using the

8 ACS Paragon Plus Environment

Page 9 of 67

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Clustering Tool available as a VMD plugin58 where the ligand position was clustered due to its heavy atom RMSD value with a cutoff of 1.5 Å. The clustering tool was also used to explore if PELE was able to reproduce specific protein conformational changes and if the change is correlated to a ligand entry. Therefore, the most populated clusters regarding the ligand RMSD were calculated, however, using a larger cutoff of 2.5 Å as the general ligand position rather than the exact ligand conformation was of interest. The 5 largest clusters were investigated to see if the correct binding pose of the ligand and the conformation change of the protein can be observed. Using the respective crystal structures as a reference, the RMSD values of the residues of interest were calculated and the frame with the lowest RMSD was inspected visually. 2.5. Docking Rigid XP (extra precision) and induced fit docking was performed using the Glide package from Schrodinger.59-61 Rigid dockings were performed using extra precision (XP), ligand sampling was set to flexible and the planarity of conjugated pi groups was enhanced. For conformer generation, an energy window of 3.5 kcal/mol was used and the enhanced sampling option was selected. For induced fit docking the standard protocol with default settings was employed. 2.6. Molecular dynamic simulation Molecular dynamics (MD) simulations were performed with NAMD.62 The system was prepared and generated using the MOE program.14 The protein-ligand complex was prepared and parametrized with the LigX GUI of MOE using the Amber10:EHT force field. The complex was solvated with explicit water molecules using a margin to the complex of 7 Å and NaCl was added as salt. The system was minimized with a gradient RMS of 0.1 kcal/mol/A2 before starting the simulation. The simulation included a heating protocol (0.5 ns with rising temperature from 0 K to 300 K) 9 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

followed by an equilibration step for 1 ns at 300 K. The final production run was performed for 60 ns employing periodic boundary conditions. The simulations were performed using 48 CPUs. The total simulations time was 115 hours. Analysis of the MD simulations was performed using VMD. 2.7. Benchmark systems The benchmark set included six different classes showing different complexity upon ligand binding to test the ability of PELE to reproduce tasks of increasing difficulty. The systems selected for the benchmark validation were chosen to have a X-ray resolution below 3 Å and the crystallized proteins should have human sequences. Furthermore, the quality and resolution of the electron density at the active site and the co-crystallized ligand was checked manually. An overview of the classes can be found in Table 1. The different tasks varied from prediction of the correct binding pose of relatively similar ligands such as the case of the Estrogen Receptor β to entry pathways exploration like for the system CYP2B6. With increasing complexity, tasks including sidechain movement (HNE) and small backbone rearrangements (Syk Kinase) were also performed. The last examples of the benchmark - i.e Abl and Src kinases - were picked to test the ability of PELE to reproduce large conformational changes of loops. A classification of small and large movements is given in

10 ACS Paragon Plus Environment

Page 10 of 67

Page 11 of 67

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Table 2.

Table 1 Overview of benchmark systems and description of challenges for the protein-ligand sampling algorithms

11 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

System

PDB-id

X-ray

Number of

resolution amino acids

Challenges for protein-ligand sampling Investigation of different possible

CYP2B6

3IBD63

2.0 Å

465

entry pathways and prediction of deeply buried binding pose

Estrogen 1U3Q

2.5 Å

receptor β (ER-

Similar ligands with different 235

1U3S64

2.5 Å

3Q76

1.9 Å

3Q7765

2.0 Å

4FL166

1.8 Å

4RX867,

1.6 Å

4I0S68

1.9 Å

binding pose

β) Human neutrophile

218

Sidechain rearrangement

273

Small loop rearrangement

elastase (HNE)

Syk kinase

2HYY69, 2.4 Å Abl-kinase

Large loop rearrangement and 264

2F4J70

1.9 Å

1Y5771,

1.9 Å

c-Scr kinase

folding/refolding Large loop rearrangement and 263

1YOM72

3.0 Å

folding/refolding

12 ACS Paragon Plus Environment

Page 12 of 67

Page 13 of 67

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Table 2 Classficiation of the backbone rearrangements Small backbone movement Secondary structure

No change

Large backbone movement Refolding

Number of amino acids < 10

> 15 (16-23)

Cα-RMSD

< 5Å (observed: 2.0 Å– 4.8 Å) > 8 Å (observed: 8.5 – 18.4 Å)

Maximal Cα-deviation

< 9 Å (observed: 3 Å – 8.5Å)

> 13 (observed: 13.4 Å – 32 Å)

13 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

3. Results 3.1. Overall results The benchmark set was selected to include six different protein systems showing different complexity upon ligand binding (see Table 1) including very simple cases (without induced fit mechanism) up to complex loop refolding. For all of these systems, PELE free-ligand binding site explorations, docking, IFD, and MD simulations were performed (see Section Methods and Computational Details). Docking and IFD were selected as being the general tool for virtual screening allowing for quick screening of a large number of molecules. MD simulations have been reported to be able to capture ligand binding mechanism on a very long time scale24, 25, 73, 74 and the method was therefore included to set a baseline of what is possible with MD using the same resources as for the PELE simulations. Clearly, the docking protocols include the information about the binding site while MD simulations were started from the same configuration as PELE. All PELE free-ligand binding site exploration results were additionally refined in further simulations. When applicable, two cases were examined: Re-docking (ligand and protein from the same protein-ligand complex) and cross-docking (ligand and protein from different protein-ligand complexes or apo protein). In total, 16 test setups were investigated (see Table 3), representing more than 3 months overall simulation wall time.

In 15 cases out of the 16 test systems, the PELE free binding site explorations were able to place the ligand in the putative binding site of the corresponding protein structure (for HNE and Syk, the apo structure was used). Furthermore, in 12 of these 15 cases the ligand binding mode with respect to the ligand heavy atom root mean square deviation (LRMSD) and protein conformation was correctly reproduced. In those 12 systems, at least one of the following criteria: Binding Energy, SASA or most populated cluster, correlated to the structure having the lowest LRMSD with respect to the crystal structure of the complex. Refinement simulations of the free-exploration runs improved the results in 13 cases and remained 14 ACS Paragon Plus Environment

Page 14 of 67

Page 15 of 67

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

comparable to the original results in the other three cases. The Human Neutrophile Elastase (HNE), Spleen Tyrosine Kinase (Syk), BCR-Abl, and c-Src were used as systems to test the ability of PELE to reproduce protein conformational changes starting from apo- or different co-crystallized ligand structures. In total, eleven PELE simulations were performed on these systems. Small loop movement such as the one observed in the Syk systems as well as sidechain movement like in HNE, were reproduced during the free-exploration runs. Large movements such as loop refolding (Abl and Src systems) were not correctly reproduced and remain challenging. For comparison to standard CADD tools, docking studies and MD simulations were performed as well (see Table 4). Using Rigid XP docking and IFD from Glide,59-61 exploiting the knowledge about the experimentally observed binding site, the ligands were docked in the correct binding mode for nine systems. In the remaining cases, the ligands were docked in the putative binding sites, but with different binding poses. For the MD simulations, only one system came close to the experimentally observed binding mode (c-Src, 1Y57, redocking). For all other systems, the correct binding site and binding mode was not captured in the 60 ns MD simulations. This is indeed not surprising as the simulation time of 60 ns is clearly too short to capture binding events which require significant longer time scales up to micro- or milliseconds.25, 74, 75 As MD simulations within this time frame, however, can already provide information about flexible loops or sidechains, the results of MD can be seen as a complement to the results of PELE rather than a direct comparison and alternative.

Detailed results for each benchmark system and method are presented in the following sections.

15 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 67

Table 3 Overview of the results of the free-exploration and refinement PELE simulations. (*)

Analysis of the correlation of three metrics to the lowest found LRMSD structure to test

PELE’s ability to predict the correct binding mode. “Yes” indicates that the lowest LRMSD structure is included in (a) 10 lowest Binding Energy conformations; (b) 10 lowest SASA value conformation; (c) five most populated cluster. (**) PELE refinements of the freeexploration runs used the following starting structures: (1) Most populated ligand conformation clusters; (2) Lowest Binding Energy structures; (3) Lowest SASA-value conformation; (4) Lowest Binding Energy and SASA-value structures (green: more trajectories with lower LRMSD, yellow: same as before, red: higher LRMSD than before). PELE simulation

Protein

PDB code

CYP2B6

3IBD

Yes

Yes (