WScore: A Flexible and Accurate Treatment of Explicit Water

Apr 7, 2016 - We have developed a new methodology for protein–ligand docking and scoring, WScore, incorporating a flexible description of explicit w...
1 downloads 5 Views 2MB Size
Subscriber access provided by LAURENTIAN UNIV

Article

WScore: A flexible and accurate treatment of explicit water molecules in ligand-receptor docking Robert B. Murphy, Matthew P. Repasky, Jeremy R. Greenwood, Ivan TubertBrohman, Steven Jerome, Ramakrishna Annabhimoju, Nicholas A Boyles, Christopher D Schmitz, Robert Abel, Ramy Farid, and Richard A Friesner J. Med. Chem., Just Accepted Manuscript • DOI: 10.1021/acs.jmedchem.6b00131 • Publication Date (Web): 07 Apr 2016 Downloaded from http://pubs.acs.org on April 9, 2016

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Medicinal Chemistry is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 85

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

WScore: A flexible and accurate treatment of explicit water molecules in ligand-receptor docking

Robert B. Murphy,† Matthew P. Repasky,† Jeremy R. Greenwood,§ Ivan Tubert-Brohman,§ Steven Jerome,§ Ramakrishna Annabhimoju,# Nicholas A. Boyles,† Christopher D. Schmitz,† Robert Abel, § Ramy Farid,§ Richard A. Friesner||*



Schrödinger, Inc., 101 SW Main Street, Portland Oregon 97204, USA

§

Schrödinger, Inc., 120 West 45th Street, New York, New York 10036, USA

#

Schrödinger, Plot No. 573, Road No. 1, Jubilee Hills, Hyderabad-5000096, Telangana, India

||

Department of Chemistry, Columbia University, New York, New York 10036, USA

ABSTRACT: We have developed a new methodology for protein-ligand docking and scoring, WScore, incorporating a flexible description of explicit water molecules. The locations and thermodynamics of the waters are derived from a WaterMap molecular dynamics simulation. The water structure is employed to provide an atomic level description of ligand and protein desolvation. WScore also contains a detailed model for localized ligand and protein strain energy, and integrates an MM-GBSA scoring component with these terms to assess delocalized strain of the complex. Ensemble docking is used to take into account induced fit effects on the receptor conformation, and protein reorganization free energies are assigned via fitting to experimental data. The performance of the method is evaluated for pose prediction, rank ordering of self-docked complexes, and enrichment in virtual screening, using a large data set of

1

ACS Paragon Plus Environment

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

PDB complexes, and compared with the Glide SP and Glide XP models; significant improvements are obtained.

2

ACS Paragon Plus Environment

Page 2 of 85

Page 3 of 85

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

I. Introduction Rigid receptor docking is widely used in drug discovery projects in both lead discovery and lead optimization applications. The restriction of optimization to the ligand degrees of freedom makes the problem computationally tractable; docking times per typical ligand are on the order of seconds to minutes for widely used docking programs, such as Glide,1-3 GOLD,4, 5 FRED,6 FlexX,7 and Surflex.8 A number of scoring functions for binding affinity prediction have been developed,1-4, 6, 7, 9-11 which are extensively used in both academia and industry. Pose prediction has become increasingly accurate,12 while hit rates in virtual screening campaigns reported in recent work range from 2.5% (in a study13 where the top scoring 0.1% of a ~200,000 compound database was assayed without any human intervention) to ~5-10% (using an inhibition cutoff of 10µM) when docking calculations were further triaged by filtering based on visual inspection.1416

The enrichment in the first above-cited study as compared to random compound selection was

estimated to be 34, based on a careful comparison of the docking results with extensively analyzed experimental screening data. The hit rate of 5-10% for filtered docking represents progress as compared to earlier work, where hit rates of 1-2% for this type of protocol were not uncommon.17 A review of the performance of virtual screening in prospective applications in drug discovery over the past decade can be found in ref. 18. Despite these successes, such methods have clear limitations as an approach for robust modeling of protein-ligand binding. Firstly, induced fit effects in the receptor are common, and are important in enabling the binding of new classes of compounds as well as in accommodating larger appendages to a known active ligand than would be inferred from the crystal structure. Secondly, while it is possible to include explicit water molecules (e.g. from a crystal structure) in a docking calculation, the treatment of individual water molecules is generally inflexible; they

3

ACS Paragon Plus Environment

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

are either present or not, and are not allowed to move in response to binding of different ligands, as often happens routinely in the actual binding event.19, 20 The role of the structure and thermodynamics of active site water molecules in molecular recognition has become widely appreciated over the past decade. Methods such as WaterMap21, 22

, as well as alternatives including SZMAP23 and SPAM24, are capable of identifying highly

probable sites of water occupation and estimating free energies of water displacement, which can also be determined more rigorously using free energy perturbation approaches25, 26. The role of individual waters, and water networks, in impacting ligand binding, has been elucidated in model systems19, 20 and for a wide range of pharmaceutically interesting ligand-receptor complexes24, 25, 27-30

. The role of water structure in controlling binding kinetics has also been recently

investigated31, 32. Flexible integration of the high occupancy explicit water sites into a docking protocol enables these effects to be captured in a fast empirical scoring function, requiring only a small number of molecular dynamics simulations to initially generate the relevant structural and thermodynamic information. Finally, current empirical scoring functions contain crude approximations to the actual physics, leading to particular problems in modeling key phenomena such as desolvation and strain energy. These limitations result in both false negatives (for example due to failure of many ligands to fit into a specific crystal structure) and false positives (for example due to failure to penalize poor ligand conformations, or displacement of a key water in the receptor without making a compensating hydrogen bond). In the present paper, we describe a new docking methodology, which we have named WScore, aimed at addressing the challenges outlined above. Receptor flexibility is incorporated into the

4

ACS Paragon Plus Environment

Page 4 of 85

Page 5 of 85

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

model via ensemble docking. This approach has been explored previously, but as noted for example by Shoichet and coworkers, enrichments can be degraded by the use of an ensemble due to an increased number of false positives33 unless the differing free energies of the receptor conformations are taken into account. We show that conformational free energy assignment can be systematically accomplished by using experimental binding data for the crystal structure complexes, and that such an approach enables a physically realistic scoring function to properly accommodate many different ligand chemotypes. Realistic modeling of solvation is accomplished by the integration of explicit waters into the docking protocol, using the WaterMap21, 22, 34 program to determine quasilocalized water positions and thermodynamics. The WScore docking algorithm treats these waters flexibly, allowing them to be displaced into bulk solution, moved to a different location (e.g. a bridging position between two protein groups, or a protein and ligand group), or be trapped by hydrophobic groups of the ligand. Desolvation effects are assessed by determining whether water molecules are available to solvate key polar and charged groups on both ligand and protein. We define desolvation, here and in what follows, as the creation of an environment around a polar or charged protein or ligand group which is unable to satisfy the complementary polar interactions of the group in question. While hydrogen bonds will most often be used to satisfy such interactions, it should be noted that there are cases where alternative favorable interactions such as aromatic C-H interactions, or pi-cation interactions, can effectively avoid desolvation as well. Wscore considers all of these possibilities in a balanced fashion. Strain energy is perhaps the most complex of the physical phenomena noted above. WScore treats localized ligand strain by penalizing ligand conformations with bad torsion angles, close atom-atom contacts, or that break an important internal hydrogen bond, and similarly identifies

5

ACS Paragon Plus Environment

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

particularly unfavorable protein-ligand interactions, for example between two donors or two acceptors. Delocalized strain, which can involve accretion of many small terms, is modeled via the use of an MM-GBSA calculation35, 36 of the total energy of the protein-ligand complex, as compared to the protein and ligand in solution. Careful integration of the MM-GBSA results with the other WScore terms leads to a consensus-like scoring approach in which ligands receive highly favorable scores only if they do well in both empirical scoring and MM-GBSA calculations. The integration protocol incorporates a novel method for calibrating the MMGBSA values based on a ligand’s molecular weight, charge, and number of rotatable bonds, which is necessary when trying to compare diverse ligands as opposed to congeneric series. We have initially developed and tested WScore using a set of 542 binding sites within 506 protein-ligand complexes, associated with 22 receptors, from the Protein Data Bank (PDB)37. For this initial publication, we have chosen to work exclusively with ligands derived from PDB structures, so as to be able to quantitatively calibrate the effects of cross docking into the ensemble of receptor structures. This requires knowledge of the correct binding mode of the ligand in the receptor, which the availability of the PDB structure provides. The paper is organized as follows. In Section II, we write down the functional form of the scoring function, and outline the nature of the various individual terms. In section III, we review the Glide XP methodology, emphasizing the key features that provide a useful starting point for WScore development, and discuss augmentations to the molecular recognition terms in Glide XP that favor binding. In Section IV, we discuss the new desolvation and strain energy models outlined above. These unfavorable terms provide qualitatively enhanced performance in rejecting false positives as compared to both the Glide SP and XP scoring functions.

6

ACS Paragon Plus Environment

Page 6 of 85

Page 7 of 85

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

In section V, we describe the process of assigning protein reorganization free energies to each receptor conformation in the PDB data set introduced above. We then discuss the construction of small ensembles (2-6 receptor conformations for each receptor) for docking the full set of PDB ligands, along with a standard set of random decoy ligands,3 thus enabling evaluation of enrichment metrics for each receptor. Finally, we develop an approximate model for estimating the entropy of the protein-ligand complex, based on the distribution of docking scores across the ensemble. This model shows significant ability to assist in the discrimination of active compounds from decoys. In section VI we discuss preparation of the PDB-derived complexes used in this study and in section VII we present results for both self-docking of the PDB complexes and for enrichments obtained from ensemble docking. These results are compared with Glide SP and XP results, to evaluate whether the improved physics introduced in the WScore model is capable of improving docking accuracy, score prediction, and the ability to reject false positives. We evaluate enrichment results for a test set of PDB complexes not included in the ensemble optimization process to investigate the performance of our optimized ensembles for new ligands. Finally, we consider the implications of the results in the Section VIII, Discussion, and summarize our results in Section IX, the Conclusion. II. Overview of the WScore scoring function The total predicted binding free energy in WScore can be written as:  =  ∗ +   +  +  +  +   +   + (1)

! + " # 7

ACS Paragon Plus Environment

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

With the exception of the last term, Scomplex, these terms can unambiguously be classified as either favoring, or opposing, ligand binding. The terms favoring binding include c*Evdw (the van der Waals energy of the docked complex, multiplied by a scaling coefficient c which is identical to the optimized value used in the Glide XP scoring function), Ehydrophobic, a hydrophobic atomatom interaction term (which can be viewed as heuristically modeling the displacement of unstable waters from the active site), EHB, a hydrogen bonding and salt bridge term (which differentiate these types of interactions on the basis of geometry and environment), and EMR, which contains other specialized types of protein-ligand interactions (e.g., pi-cation, and aromatic C-H---O interactions). We refer to these latter terms as “molecular recognition” (MR) terms, and label the associated energy accordingly in Eq. 1 above. Included in the MR terms are reward terms identifying unusually favorable electrostatic interactions arising from special features of the active site. The identification and scoring of hydrogen bonds, in terms of distances and angles, is identical that used in refs. 1 and 2. Pi-cation and aromatic-C-H---O interactions are discussed in more detail below and in the Supplementary Material. We use the term “unstable water”, here and in what follows, to refer to waters which have a net favorable free energy of transfer from the active site to bulk solution, as estimated by the implementation of inhomogenous solvation theory38 in the WaterMap program21, 22. Such a favorable free energy does not imply that the site is unoccupied; the usual chemical equilibrium must be computed, taking into account the 55M concentration of water in bulk solution. Creation of a void in the active site occurs only in a very unusual situation in which the receptor cavity is highly hydrophobic and constricted, as is discussed in detail in ref. 39. The terms opposing binding can be divided into four classes. The first, Estrain, provides a description of localized strain energy. Unusual conformations of the ligand or protein are

8

ACS Paragon Plus Environment

Page 8 of 85

Page 9 of 85

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

identified and assigned penalties in accordance with experimental and/or quantum chemical data. We also include in this class of terms a subset of localized electrostatic repulsions between protein and ligand acceptor groups. The second set of terms, Edesolvation, represents localized desolvation of polar or charged groups of the ligand or protein. The detailed description of the water structure of the active site, provided by WaterMap, enables an accurate assessment of the solvation of individual ligand and protein groups by discrete water molecules. When desolvation of such a group is detected, a suitable penalty term is applied. The third class of terms opposing binding, Edeloc, is designed to assess delocalized strain and desolvation effects. Delocalized strain effect can involve steric clashes between the protein and ligand (indicating that the ligand does not physically “fit” optimally into the active site), repulsive electrostatic effects (including many not handled by the localized electrostatic terms discussed above), and moderately unfavorable torsional energies. There can be multiple effects of this type distributed across the complex. Delocalized desolvation effects include interactions such as the second shell of water, which can contribute significantly to the solvation free energy of charged or highly polar groups. A molecular mechanics based, continuum solvation approach is the principal term used to model delocalized strain and desolvation effects. The fourth term opposing binding is the reorganization free energy of the protein, Ereorg (henceforth shortened to protein reorganization energy). This quantity is formally defined as the free energy required to alter the protein conformation from the apo state (by definition the lowest free energy conformation of the protein in solution) to the holo state induced by the cognate ligand associated with that particular conformation. The holo conformation (designating here the conformation of the protein generated by starting from the protein-ligand complex and removing the ligand) always costs free energy to create as compared to the apo state; hence, Ereorg always

9

ACS Paragon Plus Environment

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

opposes binding. In Section V, we discuss several approaches by which Ereorg can be estimated in WScore for each conformation employed in the ensemble. Finally, we introduce a highly approximate, but still useful, treatment of the entropy of the protein-ligand complex by using the score distribution obtained from docking into the conformational ensemble of receptors. We denote this term in Eq. 1 as Scomplex. The underlying idea is that when a ligand receives favorable scores in multiple receptors, it indicates that a greater portion of the phase space of the complex is energetically accessible to the ligand, and hence that the total entropy of the complex is higher. The implementation of this term is described in Section IV.

We conclude this section with a brief discussion of the approach taken to parameter optimization, and the potential dangers of overfitting. Optimization of term favoring binding has been performed by fitting parameters to reproduce the experimental binding affinities of active complexes. Ligand penalty terms (torsions, contact penalties) can be derived primarily from quantum chemical results and data in the Cambridge Crystallographic Database (CCD), although testing and fine tuning using docking results for known actives was helpful as well. Desolvation terms are designed primarily by examining high scoring decoy molecules and identifying structural features (e.g. burial of polar or charged groups) which are not characteristic of properly docked active compounds, and then estimating the size of the penalties based on the very small number of PDB cases that do exhibit a similar structure. Optimization of these terms primarily involves modification of geometrical parameters to avoid penalizing the properly docked actives. Finally, the MM-GBSA model was constructed primarily by surveying the active compounds

10

ACS Paragon Plus Environment

Page 10 of 85

Page 11 of 85

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

across the entire training set, and making the parameters as aggressive as possible while avoiding penalization of the actives. Given the large number of parameters in Wscore, it is important to consider the possibility that the model has been subject to overfitting. Very little optimization was aimed at improving pose prediction (which in any case is only marginally better than Glide SP), so this area is not a concern. Overfitting in enrichment was directly interrogated via the use of the test set, PDB2014; the result show some degradation, but typically due to issues with the coverage of the conformational ensemble, as opposed to a breakdown of the scoring function upon being exposed to new molecules. Rank ordering of actives is the aspect of the scoring function that is most likely to suffer from overfitting.. However, in practice one would always expect to see poorer correlations in realistic applications, due to a lack of a complete set of cognate crystal structures for each ligand. We intend to further explore the ability of Wscore to rank order compounds in a cross docking context in a future publication; for the moment, the results for this functionality should be viewed with caution. III. WScore terms favoring binding: review of the Glide XP model and discussion of new molecular recognition motifs incorporated in WScore A. Review of the Glide XP model for Terms Favoring Protein-Ligand Binding Based on a wide range of results that have emerged from both empirical scoring function development and all-atom molecular simulations, it is our belief that the primary driving force for the formation of the great majority of protein-ligand complexes can be identified as displacement from the receptor active site of quasilocalized water molecules that are thermodynamically unstable relative to bulk water. WScore has a number of terms that represent

11

ACS Paragon Plus Environment

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

the effects of water displacement on binding affinity. For the great majority of ligands, the dominant term in the Wscore scoring function is the hydrophobic atom-atom pair term, which models the effect of displacing receptor water molecules near hydrophobic protein atoms, as well as removing hydrophobic regions of the ligand from water via hydrophobic contacts. This term is quite similar to the analogous function in the ChemScore9 scoring function. The pair function is summed over the interactions of hydrophobic atoms on the ligand interacting with hydrophobic atoms on the protein. The use of this function to represent water displacement is heuristic, but has proven effective in binding affinity prediction as demonstrated in refs. 1, 2, 9. In WScore, as in Glide XP2, the atom-atom pair term is augmented with a term that recognizes hydrophobic enclosure of water molecules; the displacement of such waters leads to enhanced gains in binding free energy. The sum of these two terms defines the WScore estimation of the contribution to binding affinity arising from the hydrophobic effect. The hydrophobic enclosure term is described in detail in ref. 2. Both the hydrophobic pair term, and the hydrophobic enclosure term, are identical, in functional form and numerical parametrization, to those used in the Glide XP scoring function. If a water molecule in the binding region forms a hydrogen bond with the protein, replacement of the hydrogen bond by the ligand is required to realize free energy gains from displacing the water into the bulk. The magnitude of the net gain in free energy for replacing a hydrogen bonded water with a suitable hydrogen bond to a ligand group can be estimated to be on the order of 0-2 kcal/mol, depending upon the type of hydrogen bond, the environment of the hydrogen bond, and the quality of the hydrogen bonding geometry. In WScore, we use the functional form and parameterization for the hydrogen bonding terms developed for Glide XP, described in detail in ref. 2, to calibrate the contribution to the binding free energy of various

12

ACS Paragon Plus Environment

Page 12 of 85

Page 13 of 85

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

types of protein-ligand hydrogen bonds. We also introduce a new term, discussed in section IV, to describe a type of strain energy that can reduce the total effective contribution of hydrogen bonding to binding when multiple hydrogen bonds are present. Hydrogen bonds and salt bridges in specific types of enclosed or recessed regions of the receptor, and, in the case of multiple interactions, arranged in specific patterns, will contribute more to free energy of binding than ordinary hydrogen bonds. Solvation of the protein polar or charged groups in such regions becomes problematic, often leading to poor entropy of the highly confined solvent, and replacing such solvent with a rigid ligand group, which does not suffer entropic penalty from confinement, leads to a substantial gain in free energy. Glide XP models these effects through special rewards assigned to specific types of structures (e.g. correlated hydrogen bonds, recessed salt bridges). WScore utilizes the Glide XP model for these effects, which is discussed in detail in ref. 2. An early discussion of the effects of hydrogen bond pattern and enclosure on solvation entropy and free energy, with an illustrative example, can be found in ref. 22. The individual magnitude of these terms typically ranges from 0.5-3.0 kcal/mol. A subset of receptors contain regions where the electrostatic potential is anomalously polarized (either positively or negatively); such regions serve to recognize the complementary charge on the ligand. Examples in the WScore training set include DPP4 (in which a pair of carboxylates recognize a positively charged nitrogen motif) and PTP1B (which recognizes ligands with a -1 or -2 charge via multiple positively charged side chains and backbone NH groups). Glide XP assigns rewards for specific motifs, and also more general electrostatic rewards for putting a complementary ligand group in regions of exceptionally favorable electrostatic potential. These terms have been optimized in WScore with regard to their geometrical details and magnitudes, by fitting to the much larger set of active complexes in the Wscore training set (including the

13

ACS Paragon Plus Environment

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

receptors noted above), but retain the same functional form and purpose as in Glide XP. The individual electrostatic reward terms typically range in size from 0.5-2.0 kcal/mol. Finally, empirical scoring functions have a noticeable bias towards ligands with higher molecular weight. To correct for this, the empirical term in Glide XP that rewards smaller molecular weight ligands is used in WScore. One change that has been made is to cap atomic masses such that all atoms with mass greater than chlorine are assigned the mass of chlorine. The reward has a value of 1 kcal/mol for molecular weight of 300 Da or less, and goes to zero at a molecular weight of 450 Da; in between, the reward value is linearly interpolated. B. Additions and major modification of Glide XP terms in WScore WScore incorporates a number of molecular recognition motifs that were not treated in Glide XP. These include pi-cation interactions and aromatic C-H---O interactions. In the Wscore model, a pi-cation reward requires multiple protein aromatic residues in an appropriate geometry surrounding a positively charged ligand group with at most one hydrogen atom (with more hydrogens, our assumption is that the pi-cation environment would not be preferable to being in solution). In a recent article reviewing the role of pi-cation interactions in molecular recognition,40 there is a discussion of the “aromatic box” motif, which is a key feature in a number of biologically important receptors enabling the recognition of suitable cations via interaction with multiple aromatic residues. Our model for assigning pi-cation rewards is an attempt to recognize a suitable ligand-aromatic box interaction based on geometrical criteria. Among the Wscore training set receptors, only FXa displays complexes which receive a pication reward. Other receptors will constitute a test set to investigate the performance of the model across a greater range of structures. The reward for a pi-cation interaction is set at 1.5

14

ACS Paragon Plus Environment

Page 14 of 85

Page 15 of 85

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

kcal/mol. A detailed description of the geometrical criteria required to assign a pi-cation reward is given in the Supplementary Material in Appendix 4. As judged by the Wscore training set, aromatic C-H---O interactions are ubiquitous in proteinligand complexes, typically involving an aromatic group of the ligand interacting with a backbone carbonyl or carboxylate oxygen atom of the protein. Fitting the magnitude of the C-H--O interaction to the training set native binding affinity data yields a very small value (which we approximate as zero), leading to the conclusion that such interactions do not contribute a favorable free energy term to binding, but they do enable the protein oxygen atom to avoid desolvation (which is why such interactions are often seen for example in the kinase hinge backbone region, as opposed to placement of a methyl group in a similar location). Thus, identification of an acceptable aromatic C-H---O interaction with regard to geometry is essential in applying the desolvation penalties discussed in Section IV. Specific geometrical criteria are given in Appendix 1 in the Supplementary Material. Two modifications of the hydrophobic contribution to the binding free energy are introduced in WScore. Firstly, the ChemScore-derived atom-atom hydrophobic pair function does not just detect close contacts; it has a tail with a range of 4-6 Å. Numerical optimization of the atomatom pair term using PDB complexes with known binding affinities (carried out in the original ChemScore paper5) shows that this tail is in many cases important in achieving good agreement with experiment. In our heuristic interpretation of the hydrophobic pair function as approximately representing the free energy gain from water displacement, the tail helps to detect the overall hydrophobicity of the environment of water molecules in a binding pocket, which in turn impacts how favorable it is to displace those waters. However, in an extreme situation, in which a ligand group (typically a ring system) is placed in what is essentially bulk solution, a

15

ACS Paragon Plus Environment

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

substantial tail contribution may still be present, but in this case the function is going to assign a free energy contribution to binding that will not be present experimentally, as displacement of bulk-like waters cannot contribute meaningfully to binding. When such an extreme situation is detected, the contribution of the highly solvent exposed ligand group to the hydrophobic pair term is set to zero. The second modification involves the use of the free energies of the displaced WaterMap waters to estimate the hydrophobic contribution to the free energy, and correct the atom-atom pair/enclosure empirical term described above. This method describes the displacement of waters from the active site directly, as opposed to using a heuristic approximation. However, the pair/enclosure approach has some advantages over the water displacement model. It incorporates calibration of burial of hydrophobic area of the ligand and protein, the term is already implicitly corrected for average strain and entropy loss, and treatment of displacements by polar atoms (e.g. those which form hydrogen bonds) is separated out into different terms. Therefore, the pair/enclosure term is the model that is used by default in WScore. We have, however, found one receptor in the WScore training set (PIM1) in which some of the ligands displace multiple unstable waters while making anomalously few hydrophobic contacts with the receptor. In this case, the empirical terms underestimate the hydrophobic free energy, and can be corrected by using the WaterMap displacement free energy. Therefore, we have implemented a WaterMap based correction term into WScore, which can be invoked at the option of the user. The correction algorithm is described in Appendix 2 of the Supplementary Material. In the results that follow, the model is utilized only in treating the PIM1 data set. IV. Penalty Terms for Localized Strain and Desolvation Energy

16

ACS Paragon Plus Environment

Page 16 of 85

Page 17 of 85

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

A. Localized Ligand Strain Energy Localized ligand strain energy is manifested in unusually high-energy features in docked ligand conformations. Only features forced on the ligand by the constraints of receptor binding are penalized; a feature that is also manifested in solution, for example due to steric hindrance within the ligand, does not impact binding affinity and is not penalized. The features described below have been validated via extensive surveys of both the PDB and the Cambridge Structural Database (CSD)41. If a feature, as described quantitatively in the scoring function, is observed in very few or no PDB or CSD complexes, and a physical explanation as to why the feature would have a specially unfavorable free energy is available, it becomes a reliable means of distinguishing active from inactive compounds. The numerical values of the penalties applied for various features are estimated in most cases from quantum chemical calculations of torsional energy profiles for various types of chemical groups, as carried out in for example the development of the OPLS3 force field42, and (where available) from the small number of active compounds manifesting the feature. We have focused on the following types of localized ligand strain features: (1) Steric clashes, i.e. inappropriately close contacts between ligand atoms. These penalty functions are based on atom-atom distances; the parameters vary with the types of atom pairs involved (including hydrogens) and penalize the contact if the distance is shorter than a specified threshold value, becoming larger as the distance diminishes from the threshold. Gross violations of van der Waals radii lead to very large penalties, as such extreme cases are typically caused by failure of the ligand to properly “fit” into the active site. Account is also taken of special features such as internal hydrogen bonds, including aromatic C-H---O interactions; if the geometry is

17

ACS Paragon Plus Environment

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

appropriate for a favorable interaction of this type, the heavy atom distance can be significantly shorter than is allowed otherwise. Fig. 1 displays results from cross docking the ABL1 ligand from 2v7a into the receptor from 1opk in which an unacceptable close ligand contact is observed. The ligand is quite tightly packed into the receptor cavity, and it would be very difficult to relieve the contact by small movements of the protein. (2) Torsional rotations that break aromatic conjugation. Conjugation effects imply for example that an amine or amide group connected to an aromatic ring would prefer to be in the plane of the ring as opposed to perpendicular to that plane, so as to enable the low energy resonance structure to be accessed. Determination of the optimal torsion angle becomes complicated when there are conflicts between steric clashes and optimizing the conjugation pathway. We have developed algorithms that use a ligand conformational search and a set of simple rules to estimate optimal torsion angles in “aromatic conjugation paths” in solution, and penalties are applied only when the solution angle enables significant conjugation energy to be captured, and the angle observed in the protein-ligand complex significantly reduces the conjugation energy. A threshold deviation from the optimal angle is defined, and if the angle in the docked complex exceeds this threshold, a penalty is applied, proportional to the amount of deviation. The importance of conjugation is confirmed by the WScore training set of PDB complexes, in which one hardly ever sees significant breaking of conjugation (other than when it is forced by steric clashes) in complexes of known active compounds. (3) Conformations that break unstrained internal (or intramolecular) hydrogen bonds. Of particular importance are intramolecular hydrogen bonds that form five and six membered rings,

18

ACS Paragon Plus Environment

Page 18 of 85

Page 19 of 85

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

as shown in the examples in Fig. 2 and Fig. 3. These structures are favorable on the assumption that there is no steric hindrance in the conformation making the intramolecular hydrogen bond. In solution, the intramolecular hydrogen bond may not be made with 100% probability, but it will exhibit a high occupancy fraction. If the protein conformation eliminates the hydrogen bond, this will cost a substantial amount of free energy, as evidenced by the fact that no such breaking of a qualifying internal hydrogen bond is seen in any of our 506 training set ligands. WScore identifies possible internal hydrogen bonds of this type, determines whether they can be made in solution without strain energy, and applies a suitable penalty if the internal hydrogen bond is not found in the docked conformation. A detailed description of this algorithm is provided in the Supplementary Material in Appendix 5. Fig. 2 displays a PDB complex in which a ligand with an intramolecular hydrogen bonded ring is shown in complex with its cognate receptor. It can be seen that the internal hydrogen bond is preserved. Many other examples of such internal hydrogen bond preservation can be found in various PDB ligand-receptor complexes. In contrast, Fig. 3 presents a random database molecule docked into the PKA 1ydr receptor, along with the preferred solution conformation. In an otherwise suitable pose, the ligand in this case has to break the internal hydrogen bond in order to interact effectively with the protein. Breaking the hydrogen bond will, as argued above, substantially reduce the magnitude of the protein-ligand binding affinity. B. Localized Protein Strain The protocols described in this paper entail docking a flexible ligand into a rigid protein crystal structure. Since the protein structure does not change upon docking of the ligand, the protein strain energy (protein reorganization energy) is invariant when the various ligands are docked

19

ACS Paragon Plus Environment

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

into the same structure. Presumably this strain energy is relatively low or a crystal structure with a ligand with significant binding affinity could not have been formed. It should be noted that an apo structure would have the lowest strain energy, although it might not be able to accommodate many ligand chemotypes in the binding site. In principle, one should be able to compute the relative strain energy going from one protein conformation to another via a rigorous statistical mechanical protocol such as free energy perturbation theory43, but these calculations are very difficult and expensive for drug-like ligands and complex protein conformational changes. As a practical alternative (although approximate) approach, we hypothesize that differential strain energy within a docking ensemble can be estimated using empirical data, provided that there are experimental binding affinities for subsets of ligands that dock into each diverse conformation of the protein. A straightforward example is that of p38α. While there are numerous variations in the conformation of the p38α receptor, the two most dramatic variants involve a very large movement of the activation loop. The lowest energy state is referred to as “DFG-in” while the alternative loop conformation is referred to as “DFG-out”. Fig. 4 presents visualizations of two complexes displaying the key differences between the DFG-in and DFG-out conformations. To illustrate the estimation of protein strain energy for this case, we assume that all DFG-in conformations have approximately the same protein strain energy (there actually are two classes of conformers identified by protein structure analysis for DFG-in, but we will not consider this for simplicity in the present analysis), and similarly all DFG-out conformations have approximately the same strain energies (but different from those of the DFG-in complexes). Fig. 5 plots the binding affinity calculated by WScore based on docking into the cognate crystal structure versus experiment for the various complexes in our training set. It can be seen that the

20

ACS Paragon Plus Environment

Page 20 of 85

Page 21 of 85

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

two types of complexes each lie on a more or less straight line (R2 values are 0.47 and 0.71 for the two classes of ligands), but the two lines are offset from each other by ~3 kcal/mol. While other physical rationales for this difference are possible, we interpret this 3 kcal/mol as the strain energy of the DFG-out conformation relative to the DFG-in conformation. For ABL1, a number of explicit solvent molecular dynamics simulations have been performed44-47 with the objective of observing the DFG-in to DFG-out transition and estimating the free energy difference; values for the latter between 1.5-4 kcal/mol have been obtained, in qualitative agreement with our estimated results for p38α. In the general case, there can be a significant number of differentiated protein conformations with low free energies, each of which can in principle have its own reorganization energy relative to the apo structure. The question of how to translate the available experimental data into a docking ensemble, with appropriate reorganization energies assigned to each conformation used in the ensemble, is a key issue that we have addressed in the development of WScore as a practical tool that can be applied to drug discovery projects, and is discussed in detail in Section V. C. Hydrogen Bond Strain Induced by Ligand Rings In a conventional empirical scoring function, the total hydrogen bond energy (we use this term in what follows to refer to the contribution of protein-ligand hydrogen bonding to the protein-ligand binding free energy) is a sum over the individual hydrogen bond terms. We have found that an additive functional form leads to substantial overprediction of binding energies for many complexes, particularly those with two or more neutral-neutral hydrogen bonds. We hypothesize that this non-additive effect arises because it is difficult to maintain multiple ideal hydrogen bond

21

ACS Paragon Plus Environment

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

geometries when the ligand is undergoing dynamical excursions through phase space. We have further found that the effective hydrogen bonding contributions to the free energy of binding are reduced when ligand rings are buried in the active site. A ligand ring is a bulky, rigid object, which requires adaptation of the protein and ligand rotatable bonds to avoid steric clashes. In a dynamical trajectory, these requirements increase the difficulty of maintaining optimal hydrogen bond geometries. Thus, the hydrogen bonding contribution to the free energy depends upon both the number and type of hydrogen bonds in the complex, and the number of ligand rings enclosed in the binding cavity. A detailed description of the complete hydrogen bond function is presented in the Supplementary Material in Section Appendix 3. D. Integration of WaterMap into Glide XP Scoring: A Better Description of Desolvation Effects and Ligand-Water Interactions D.1 Overview The WaterMap methodology21, 22, 34 converts a molecular dynamics simulation of solvent in a protein active site into a complex hydrogen bonding network composed of charged and polar protein groups interspersed with localized high occupancy water sites. In addition to predicting the location of high-occupancy water sites, the enthalpy and entropy of each water site relative to bulk water is computed with inhomogeneous solvation theory. Not all solvent regions in a given active site display such localized sites; highly solvent exposed polar groups are often surrounded by bulk-like water density. A WaterMap calculation must be performed for each receptor conformation prior to any WScore calculation. Integration of the WaterMap sites and thermodynamic quantities into the docking grid is carried out automatically by the WScore program. The detailed map of the water structure and displacement free energies enables a

22

ACS Paragon Plus Environment

Page 22 of 85

Page 23 of 85

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

correspondingly detailed model for desolvation of ligand and protein groups and hydrophobic trapping of active site waters to be developed and tested. Such a model is a key component of WScore and is the principal subject of this section. WaterMap calculations necessary for WScore were setup and executed automatically by the WScore program using WaterMap’s default settings. The binding region to be examined is specified by the location of the native ligand though the native ligand is not included in the WaterMap MD simulations. WaterMap solvates the protein in a box of explicit TIP4P water molecules that are equilibrated via a sequence of minimization and short MD simulations followed by 2 nanoseconds of production simulations with the OPLS3 force field. Harmonic restraints are applied to the protein structures, and the coordinates of the water solvating the active site are recorded every 0.3 ps. WaterMap then spatially clusters water molecules to identify hydration site positions. Enthalpy of water occupying each hydration site is computed as the difference between average nonbonded interactions of the water in the active site of the protein versus the nonbonded interactions of the water in the bulk solvent, and entropies are computed by integrating over the positional and angular correlation functions of the water in each hydration site. With these settings, generating a WaterMap for a protein-ligand complex using 8 cpus takes on average around 32 hours on 2.2 GHz AMD Opteron processors.

This calculation only needs to be performed once per protein-ligand complex.

We first consider desolvation of polar or charged protein groups due to displacement of one or more hydrogen bonded WaterMap waters by the ligand. If the ligand replaces the relevant hydrogen bonds between WaterMap water sites and the protein, no penalty is applied and the pairwise hydrogen bond term will make the interaction favor binding. If full replacement does not occur, the detailed criteria for desolvation depend upon the protein group and surrounding

23

ACS Paragon Plus Environment

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

water structure, as discussed further below. Sampling of water positions as necessary is used to account for reorganization of the water structure upon ligand binding. In performing this sampling, we move the water molecule to all possible locations on a grid, consistent with retaining its original hydrogen bond to the polar or charged group, while not overlapping with any protein atoms or other WaterMap waters. If no grid point satisfying these criteria can be found, a desolvation penalty is applied. Desolvation of a polar or charged ligand group is identified by the absence of any hydrogen bonds from that group to protein groups, intramolecular hydrogen bonds, or neighboring WaterMap waters. A polar or charged ligand group which is buried in the protein cavity (as opposed to occupying an open region with bulk-like solvent) without any such hydrogen bonds is classified as fully desolvated, and assigned a substantial penalty. Partial desolvation effects are treated implicitly by the MM-GBSA model incorporated into WScore, discussed further in subsection E.1 of this section below, and in more detail in Appendix 3 of the Supplementary Information. Individual WaterMap waters can also be elevated in free energy via unfavorable hydrophobic contacts with ligand groups. Such contacts are identified and penalties are assigned based on the overall environment of the water molecule. More subtle effects, involving reorganization of larger areas of the hydrogen bonded water network due to the presence of ligand hydrophobic groups, are not treated in WScore at present. A listing of all penalty terms in the above categories, as well as the values of the penalties, is given in Table S2 of the Supplementary Material. Brief descriptions of the various terms are provides in Sections D2-D4 and E below.

24

ACS Paragon Plus Environment

Page 24 of 85

Page 25 of 85

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

D.2 Use of WaterMap to Calculate Localized Desolvation of Protein Polar and Charged Groups As noted above, prior to any docking calculations, a WaterMap simulation is performed for each receptor conformation that is a member of the WScore ensemble. Each structure is then preprocessed to identify key solvating water molecules, the displacement of one or more of which can lead to a penalty term if the ligand does not replace the important hydrogen bond(s) being made by the water molecule. An important hydrogen bond in this context is one that is performing an essential solvation function for a protein polar or charged group. A key solvating water must occupy a localized site identified by WaterMap, and fall into one of these categories: (1) Water molecules in regions of hydrophobic enclosure that are hydrogen bonded to a protein backbone group (2) Bridging water making two or more hydrogen bonds to polar or charged protein groups (3) Water molecule hydrogen bonded to a charged side chain atom (4) A water molecule hydrogen bonded to a protein N-H group The rules for assigning penalties depend upon the protein group that is potentially being desolvated. Bridging waters and waters that are bound to a protein NH group that are displaced, but not replaced, always lead to a penalty. In contrast, the rules for carbonyl oxygens, such as backbone carbonyls and carboxylate oxygens, are more complex, because these atoms can make more than one hydrogen bond and may be within hydrogen bonding distance of multiple waters. Parameters for these more complex cases have been optimized based on avoiding penalties for known active compounds, while penalizing as many decoy ligands as possible.

25

ACS Paragon Plus Environment

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

As the WScore algorithm searches for the best ligand docking pose, any key solvating water that overlaps with the ligand is investigated. If the ligand replaces all of the important hydrogen bonds to protein groups made by the key solvating water, then no penalty is assessed. If the ligand fails to replace one or more of the important hydrogen bonds, a search is conducted in which shifted positions for the water (including locations where the water forms bridging hydrogen bonds between the ligand and protein) are evaluated. In order to avoid counting towards a penalty, one of these positions must retain the original hydrogen bonds made by the water, and the position is not allowed to overlap with other WaterMap waters. Active compounds routinely display successful water reorganizations of this type. In contrast, many random database ligands cannot reposition a displaced special water successfully. Examples of both situations are shown in Fig. 6. There are a number of cases for which it is more challenging to assess desolvation than those discussed above. These include backbone carbonyls in solvent exposed regions, and polar groups of side chains, particularly the TYR hydroxyl and amides from ASN or GLN. In order to desolvate solvent exposed backbone carbonyls, multiple waters must be displaced, while for neutral side chains, rotation of the side chain must be significantly restricted, and special rules are needed to define what constitutes a desolvation event. As for previous terms, the WScore models have been optimized to avoid penalizing known active complexes. D.3 Destabilization of Waters by Contacts With Ligand Hydrophobic Groups The displacement of unstable waters by the ligand is typically a favorable process. We hypothesize here, as in previous papers21, 22, that for most protein-ligand complexes, such displacement provides a high fraction of the driving force for binding. However, many if not

26

ACS Paragon Plus Environment

Page 26 of 85

Page 27 of 85

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

most ligands fail to displace all unstable waters in the binding site. If the ligand makes a hydrogen bond to such a water, this is unproblematic as it constitutes a replacement of a solvent hydrogen bond with one that is more or less equivalent. However, if the ligand contacts an unstable water molecule with a hydrophobic group, this can lead to further destabilization of the water in question, and thus require a free energy penalty in the scoring function. In WScore, we only consider such questions for localized waters as identified by WaterMap. A detailed examination of our training set of active PDB complexes provides guidance as to what sort of hydrophobic interactions (in terms of e.g. geometry, functional group, and number of contacts) leads to substantial loss of free energy. Specifically, we find that certain types of interactions appear frequently in the PDB training set, with no apparent impact on binding affinity, whereas other types of interactions appear very infrequently or not at all. For example, aromatic groups of the ligand will often be found in a geometry where making a C-H---O “hydrogen bond” is feasible (i.e. the distance and angle are reasonable for this interaction). Hence, such interactions clearly should not be penalized. In contrast, one never sees an unstable WaterMap water making multiple contacts with aliphatic ligand carbons in a complex with an active ligand. D.4 Desolvation of Ligand Polar and Charged Groups Ligand polar groups that play a significant role in protein-ligand binding for many complexes include carbonyls, amines, hydroxyls, and ring nitrogens and oxygens. These groups are considered fully desolvated if they fail to make any first shell hydrogen bonds with either a ligand group through intramolecular hydrogen bonds, protein residue, or WaterMap water.

27

ACS Paragon Plus Environment

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Complexes of known active compounds rarely display structures of this type, but they can often be observed in docking of decoys. A number of examples are shown in Fig. 7. Charged ligand groups also must pass the first shell hydrogen bond criterion described above to avoid a penalty, but such groups in addition have to either form a salt bridge with a complementary protein charged group or reside in a reasonably solvent exposed region. Solvent exposure is assessed by examining the second, as well as the first, solvation shell; charged ligands that fail to form a salt bridge, and have too few waters in this shell are assigned a penalty. E. Continuum Solvation Modeling: Delocalized Strain and Solvation Energy E.1 Distance-dependent dielectric and MM-GBSA modeling of delocalized strain and solvation energy As in the case of Glide SP and Glide XP, WScore uses a distance dependent dielectric (DDD) model, plus a molecular mechanics term, in evaluating the energy of docked poses in the initial sampling phase of rigid receptor docking. The DDD based protein-ligand interaction energy of active complexes displays a universal scaling behavior as a function of the molecular weight and flexibility of the ligand, increasing with both of these quantities. A threshold function can be developed from this behavior, and is used to penalize complexes whose interaction terms are insufficiently favorable, given their molecular weight and number of rotatable bonds in the ligand core. This provides a crude, first pass evaluation of delocalized strain energy, as unfavorable interactions will arise from electrostatic and/or steric clashes. An accumulation of such clashes indicates that the ligand pose fits poorly into the receptor. Parameters of the threshold functions are set conservatively to avoid penalizing active compounds under normal cross-docking conditions.

28

ACS Paragon Plus Environment

Page 28 of 85

Page 29 of 85

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

A more accurate calibration of the delocalized strain/solvation energy can be obtained by minimizing the predicted structure of the complex; however, this has to be done using a continuum solvation model superior to DDD (note that a continuum solvent approach is the only feasible one if the overall free energy of the complex in solvent is to be rapidly estimated). Therefore, we employ an MM-GBSA approach, using the OPLS342 force field and the VSGB 2.1 MM-GBSA parametrization,35, 48, 49 to calculate the difference in free energy between the complex and the separately solvated ligand and receptor. MM-GBSA models, and the VSGB approach in particular, have been shown to yield good results in many cases in rank ordering the binding affinities of congeneric series in the context of lead optimization.35, 36, 50 By minimizing the complex, we reduce the noise by eliminating, for example, avoidable (via minimization) minor steric clashes found in the docked structure. As in the case of the DDD based model discussed above, we define a threshold function (based on the molecular weight, net charge, and flexibility of the ligand, as well as the WScore value from docking), and apply a penalty if the actual MM-GBSA energy does not achieve this threshold value (with the size of the penalty dependent upon how far away the actual MM-GBSA energy is from the threshold value). Each receptor conformation is scored separately, using the MM-GBSA results obtained from minimization of the selected WScore pose in that receptor. The threshold value is defined by a global (target-independent) function that is linearly dependent upon the molecular weight, number of core rotatable bonds, and ligand WScore value, plus a constant term that depends upon the target. This constant value, which we have named GBSHIFT, primarily reflects the density of protein atoms surrounding the ligand. In an active site like the NNRTI site of HIV-1 RT, the allosteric pocket is tight (as it is completely induced by the ligand), and the atom density near the ligand is significantly larger than it is for a typical

29

ACS Paragon Plus Environment

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

active site. This leads to a GBSHIFT value that is more negative than average (that is the total MM-GBSA energy for a typical active ligand docked into HIV-1 RT is significantly more negative than for many other targets, e.g. CDK2). The value of GBSHIFT is obtained by empirical optimization using known actives and a set of random database molecules. More details concerning the MM-GBSA model are provided in the Supplementary Material (section Appendix 4). Empirical optimization of the GBSHIFT parameter for the models in the present paper is discussed below in Section V. E.2 Delocalized Desolvation effects in highly hydrophilic receptor active sites SiteMap51, 52 can be used to analyze the polarity of a receptor cavity. We have found that the most useful metric for this purpose is the ratio of the hydrophilic to hydrophobic score obtained from an analysis of SiteMap data. Most of the values for proteins in the PDB2008 dataset fall in a range near unity. However, a subset of receptors (DPP4, PTP1B, uPA, OppA) have a very large ratio greater than 3.5, primarily due to a very small hydrophobic score. Two of the receptors have cognate ligands that are charged (DPP4 and PTP1B) and are clearly optimized to bind positive (DPP4) or negative (PTP1B) groups. OppA is similarly optimized to bind positively charged peptides. UPA is a serine protease and unlike related proteins such as thrombin and FXa, it appears to require a positively charged group in the S1 pocket, at least for the ligands in the WScore training set. Thus the extreme value of the SiteMap polarity metric for these receptors correlates well with function and with the known active compounds in the PDB. An active site with substantially lower hydrophobic surface area than normal, and a propensity to bind charged residues, is going to lose a greater amount of free energy from water displacement by the ligand, particularly from second shell solvation effects. We postulate that the loss of

30

ACS Paragon Plus Environment

Page 30 of 85

Page 31 of 85

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

favorable Coulomb interactions due to solvation needs to be compensated by similarly favorable Coulomb interactions with the ligand. This hypothesis suggests penalizing ligands that make an inadequate screened Coulomb interaction with the receptor. Here the screened Coulomb interaction is the molecular mechanics Coulomb interaction modulated by the DDD term in the WScore docking model. Analysis of scores for the four targets listed above reveals that all of the known active compounds receive highly favorable scaled Coulomb scores, whereas many high scoring decoys do not. Penalizing these decoys, via the use of a cutoff function (which is the same for all receptors identified as highly hydrophilic), significantly improves enrichment without penalizing any of the known active compounds. The application of a penalty of this type, applied exclusively to highly hydrophilic receptors as defined above, represents an improvement over the common practice of including a substantial component of the Coulomb score in the overall docking free energy score. The latter approach leads to the assignment of very large binding affinities (often 3-4 orders of magnitude too large) to molecules that experimentally are only weakly active, for example a phosphate ion binding to PTP1B. This phenomenon is manifested in a number of widely used scoring functions such as Glide SP. In our approach, a strong Coulombic interaction is necessary, but not sufficient, for sub-micromolar binding in a highly hydrophilic site. These sites do contain small but important hydrophobic regions, in which the displacement of unstable water molecules plays a key role in enhancing binding. Furthermore, uniformly applying a Coulomb term to all receptors leads to a bias towards the formation of solvent exposed salt bridges, which generally contribute very little to binding affinity2. The present approach provides a balanced treatment, which allows for further optimization (if necessary) as more receptors in the highly hydrophilic class are studied. F. WScore treatment of protein and ligand entropic effects

31

ACS Paragon Plus Environment

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The predominant WScore treatment of ligand entropy is implicit; a better MM-GBSA score is required for highly flexible ligands than for rigid ligands if the ligand is to avoid a penalty. Efforts to directly fit a ligand entropy term to experimental binding affinities based on the number of rotatable bonds in the ligand yields a very small effect. Flexible ligands can (and often do) compensate for explicit entropy loss by more easily accommodating themselves to the protein environment, thus increasing favorable protein-ligand interactions and/or requiring a smaller adaptation in conformation from the protein. Further discussion of these compensation effects is provided in the Supplementary Material in Section Appendix 5. An approximate correction for the overall entropy of the protein-ligand complex is estimated by considering the distribution of WScore values across the ensemble of receptor conformations. For most cases, the final predicted binding affinity is estimated by averaging over the top two values of WScore obtained from the ensemble. For a subset of complex (large ring systems, exceptionally low energy docking or MM-GBSA scores), only the best WScore value is used. A minimum ensemble size of three conformations is required to use this approach. Further details are provided in the Supplementary Material in Appendix 5. V. Determination of Receptor Specific Parameters for WScore Models: Protein Reorganization Energies, Ensembles, and GBSHIFT For the most part, WScore is a global model; the same parameters are employed in evaluating binding affinities for all receptors. However, there are a few key parameters that are receptor specific: protein reorganization energies for each receptor conformation, and the GBSHIFT parameter, which is essential in setting MM-GBSA threshold values for a given receptor, as discussed in Section IV E. Determination of these parameters, followed by selection of an

32

ACS Paragon Plus Environment

Page 32 of 85

Page 33 of 85

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

ensemble for virtual screening, constitutes construction of a working WScore model for a particular target. We discuss two different approaches to this problem in what follows. In the first, we begin by classifying receptor conformations according to important structural differences, either in loop regions or in the rotamer state of a particularly important side chain (or side chains). Each class of receptor conformations is then assigned its own protein reorganization energy, by fitting this single parameter to achieve the best agreement with the experimental binding affinities of the ligands comprising the class. We have endeavored to minimize the number of classes for each receptor in order to reduce the danger of overfitting (details are discussed below). This approach enables the rank ordering capability of WScore, as assessed based on self-docking of each ligand into its cognate receptor conformation, to be approximately evaluated. Ensembles were then selected to maximize the number of actives capable of achieving scores close to their selfdocking values, while restricting the number of members of the ensemble to a value between two and six (with most cases falling in the 3-4 member range), and the GBSHIFT value was estimated by comparing MM-GBSA values for actives and decoys. The second approach is automated, and does not require any classification of receptors by conformation. Rather, the set of available (to the user) known active compounds, plus a set of 1000 random drug-like decoys, are docked into each receptor structure, and the receptor conformation reorganization energies and optimal ensemble are determined together, numerically, via simultaneously fitting the actives compounds to their experimental binding affinities and maximizing the enrichment obtained for the known actives as compared to the set of 1000 random decoys. An optimal ensemble is obtained by selecting the receptor conformation giving the best enrichment over all actives, then adding to this a second conformation yielding

33

ACS Paragon Plus Environment

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

the best improvement in enrichment, and repeating the addition of a new conformation until either a user specified limit on the number of receptors is reached, or the enrichment ceases to improve as another receptor is added. The details of the enrichment metric can be adjusted by the user; here we use the BEDROC metric with α=160.9 which emphasizes very early enrichment, i.e. those expected to be the most tightly binding actives. We show below that this method yields results for a benchmark set of test cases that are very similar to those obtained by the receptor classification approach. The advantage of this approach is that it employs the available experimental information, does not require manual analysis and classification of receptor conformations, and can run without user intervention (although there are various places where an expert user can adjust the optimization protocol). We view the assignment of protein reorganization energies to various conformations of a protein via fitting to experiment as a major strength of WScore. These numbers are critical to understanding the free energy cost associated with targeting one receptor conformation as opposed to another when considering various possible ligand design strategies. In order to attain a high ligand binding affinity, protein-ligand interactions must compensate for any free energy sacrificed by the protein in order to achieve the binding conformation. Optimized protein reorganization energies allow docking scores obtained in different receptors to be fairly compared against each other. This capability is fundamental to the effective use of ensemble docking. We exploit the fact that the automated approach uses a large data set to optimize parameters by including optimization of the GBSHIFT parameter in the protocol. GBSHIFT is varied along with the conformational offsets, and the final value that is chosen considers both enrichment and preservation of scores for the known actives that agree with experiment. Optimization of

34

ACS Paragon Plus Environment

Page 34 of 85

Page 35 of 85

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

GBSHIFT in this fashion ensures that the MM-GBSA component of the scoring has maximal impact in eliminating false positives, while not penalizing active compounds. In the present protocol, we optimize a single value of GBSHIFT across the entire conformational ensemble. Note, however, that it is possible to optimize a separate GBSHIFT value for each receptor conformation, which makes some physical sense as details of the cavity shape can have a significant effect on the packing of the ligand and receptor. Experiments with this approach are currently in progress. VI. PDB Derived Data Set We have assembled protein-ligand complex data sets for a range of receptors from the PDB. The WScore optimization project began in 2008; the training set was constructed at that time focusing on receptors with at least five fully resolved (i.e., no missing atoms and unambiguous assignment of side chain conformations based on the electron density in the binding site) crystal structures, and consists of protein-ligand complexes associated with the receptors listed in Table 1, which were deposited in the PDB prior to this date. We shall refer to this data set as the PDB2008 data set in what follows. Complexes deposited after this date, for the same set of receptors, were reserved for use as a test set, which could be employed to assess the performance of the model after converging results based on the training set data. We refer to this data set as PDB2014. We have checked the self-docking of the PDB2014 compounds to make sure that there are no large, incorrect penalties (data not shown), but they were not used in building or optimizing the conformational ensemble. Hence, these compounds can be used as a test set to investigate performance of the ensemble for compounds outside of the training set. We expect in this case that a subset of these compounds will fit poorly in the ensemble and receive penalties; however, it is an interesting question as to how the enrichment for such a data set will compare

35

ACS Paragon Plus Environment

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

with that obtained using Glide SP and XP, which are lacking extensive penalty terms present in WScore. For each complex, the systems were first prepared using the standard Schrödinger Suite as described previously,53 and then examined visually to check for any residual errors. Accurate assessment of a scoring function is not possible unless the complex has an appropriate protonation states for all relevant ionizable groups, proper rotamers for GLN and ASN (which are sometimes incorrectly assigned in PDB files), etc. In some cases significant problems with PDB structures were uncovered and these complexes were discarded, but this was an infrequent occurrence. Finally, we retained only complexes for which experimental binding free energy is available, as the binding affinity is used not only in evaluating rank ordering capability, but also in classifying the compound with regard to the degree of difficulty of decoy comparison in enrichment studies. The number of complexes in the PDB2008 data set for each receptor is given in Table 1; the number of PDB2014 ligands is similarly presented in Table 7. A complete list of PDB2008 and PDB2014 complexes is presented in the Supplementary Material in Table S3. All Glide and SiteMap calculations were performed with the Schrödinger Suite 2015-4 release on a cluster of 2.2 GHz AMD Opteron 2427 processors. WScore calculations were performed using a prerelease version of WScore on the same cluster. Glide SP, Glide XP and WScore docking timings are dependent on the number of ligand rotatable bonds, but one can expect per ligand timings for ligands with up to 8 rotatable bonds of around 10-20 seconds/ligand, 3-5 minutes/ligand, and 8-10 minutes/ligand, respectively. WScore performs an MM-GBSA calculation only on compounds with WScores more negative then a threshold, thus the total

36

ACS Paragon Plus Environment

Page 36 of 85

Page 37 of 85

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

calculation time will be a function of the number of screened ligands that achieve at least the threshold which in our experience varies greatly by target. In the most recent version of WScore calculation times have been reduced by roughly a factor of three with similar performance in pose prediction, enrichment, and rank ordering. For compounds with 8 or fewer rotatable bonds, average timings per ligand are now 2-3 minutes as compared to the 8-10 minutes of the older code used to generate all data shown in this article. VII. Results A. Docking Accuracy Table 2 summarizes self-docking results from WScore, Glide SP, and Glide XP for the PDB2008 data set, showing the distribution of RMSDs across all 542 binding sites in 506 complexes of the data set. The average WScore RMSD is 0.83 Å, the median is 0.58 Å, and the largest RMSD is 6.62 Å, which compares favorably to the average, median and largest RMSDs from Glide SP of 0.98 Å, 0.57 Å and 9.39 Å. Glide SP failed to identify a docked pose in four complexes (1ykr, 1nvr, 1yhs, and 1stc) and those complexes are not included in these statistics. There is significant improvement from WScore as is reflected in the diminished number of outliers; for example, there are 19 structures with an RMSD worse than 2.5 Å in WScore docking, whereas Glide SP has 37 such cases and Glide XP has 34 (see Table 2 for more details). We further note that all three methods display improved statistical performance as compared with results obtained with Glide SP and other programs from various standard literature test sets10, 12, 53 ; the Glide SP results on the Astex data set, as reported in ref. 12. We attribute the general enhancement of performance, seen in Table 2, to extremely careful preparation of the crystal structures. Crystal structures contain a nontrivial number of errors of various types, many of which will cause a

37

ACS Paragon Plus Environment

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

physics based method with a “hard” scoring function (which describes Glide SP and XP as well as WScore) to make an erroneous pose prediction. Furthermore, assignment of proper protonation states of ligand and protein is often challenging. However, if sufficient effort is applied, our experience is that docking accuracy is substantially improved, and a high fraction of the worst outliers are typically eliminated, as is the case here.12, 53 B. Rank ordering from self-docking scores Our first requirement for WScore is that it yields consistently measurable rank ordering for the training set in self-docking calculations. There are fundamental limitations as to what we can expect in terms of the correlation coefficient, R2, for the various data sets in Table 1. Firstly, the experiments are conducted by multiple research groups, employ different experimental technologies, and are of varying quality. This likely leads to a variance in the experimental binding affinities of ~0.5-1.0 kcal/mol.54 Secondly, some of the data sets display a very narrow range of binding affinity, making manifestation of substantial correlation highly challenging for those receptors. One can work out the expected correlation coefficient for a highly accurate scoring function given the intrinsic experimental noise estimates given above, and the range of binding affinities in the data set. Nevertheless, extracting a signal for suitable receptor data sets should be possible if WScore is able to get the basic physics right. We used the first method described in Section V to assign protein reorganization energies; this involved manual clustering of receptors based on protein conformation (see below) and assigning a protein reorganization energy based on the experimental binding affinities of the ligands comprising the conformational cluster. Of the 22 receptors in the training set, we assign a single protein reorganization energy value to 14 of them. These receptors are fairly rigid and we did not

38

ACS Paragon Plus Environment

Page 38 of 85

Page 39 of 85

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

locate dramatically different conformations (e.g., different rotamers for side chains in the binding site or different loop conformations) with corresponding, systematic differences in binding free energy prediction. Note that a single protein reorganization energy parameter has no effect on the correlation coefficient. Of the remaining 8 receptors, two (p38α and Lck) were separated into classic DFG-in and DFGout conformations, each with its own protein reorganization energy; the DFG-in form of p38α further splits into two conformational classes. As we have discussed above, the difference between the DFG-in and DFG-out results for p38α, 3 kcal/mol, is in the range estimated for ABL1 from FEP simulations. The Lck difference, 4.8 kcal/mol, is a little bit larger than the upper end of the ABL1 range (4.0 kcal/mol), but is certainly a reasonable value to obtain. Two other cases, PPAR-γ and Hsp90, display a second distinct conformation in which a key hydrophobic side chain makes a significant movement to alter the topology of the binding pocket. As illustrated in Fig. 8 for the case of Hsp90, LEU107 shifts exposing a classic “cryptic” pocket, enabling binding of a different class of ligands. Two cases (ERK2 and HIV-1 RT) display individual structures in which there are extremely large loop movements, leading to a 3-5 kcal/mol increase in the protein reorganization energy. Two distinct loop movements can be identified in HIV-1 RT, which is unsurprising given the origin of the NNRTI site as an allosteric pocket. The ERK2 loop movement is interesting in that it shows a long loop moving towards a relatively small ligand in order to close up the pocket. The correlation of the loop movement with the large reorganization energy confirms the physical picture of reorganization, and the approach to modeling this phenomenon, that is presented here.

39

ACS Paragon Plus Environment

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Finally, ABL1 kinase has a highly flexible active site, with very substantial variation in the conformation of the activation loop and the P-loop as a function of the bound ligand. Our data set considers only DFG-in ABL1 conformations. We have analyzed the receptor conformations for the PDB2008 data set and classified them into three groups. One of these groups has a much larger protein reorganization energy than the other two. These two latter groups could in fact be assigned the same protein reorganization energy, even though they differ significantly in loop conformations. Allowing them to have slightly different protein reorganization energies (by 1 kcal/mol) modestly improves rank ordering. In Table 1 we summarize the number of different conformational classes for each receptor, the number of conformations assigned to each class, and the protein reorganization energy for each class of conformations. Tables 3 and 4 summarize the key data concerning rank ordering performance of WScore for the 22 receptors in the training set; we include correlations with molecular weight (MW) and with Glide SP and Glide XP for comparison. We optimize a single protein reorganization energy parameter for each receptor for Glide SP and XP to provide a fair comparison of the mean unsigned errors (MUEs) and RMS errors with WScore. The WScore results shown in Tables 3 and 4 are most notable for their consistency across a wide variety of receptors. The MUE is between 0.44 and 1.83 kcal/mol with a per-receptor average of 0.98 kcal/mol. The RMS errors, which are more sensitive to large outliers, follow a similar trend. The R2 values exhibit greater variation, but this is primarily due to the differing ranges of activity available for each receptor, as can be seen by comparing the actual R-squared value achieved with the estimated theoretical maximum value available given our estimates for the experimental and random model noise and the activity range of the data set (HIV-1 RT and OppA are the exceptions). Over the entire data set, WScore significantly outperforms Glide SP,

40

ACS Paragon Plus Environment

Page 40 of 85

Page 41 of 85

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

and displays considerably more consistency. Glide SP and XP work quite well for some of the receptors, but perform poorly for others. WScore is better able to handle a diverse group of receptors, which is not surprising given the increased sophistication of the scoring function. WScore also substantially outperforms molecular weight as a descriptor, a key metric when assessing whether a scoring function has any nontrivial ability to predict binding affinities. C. Enrichment Results in Retrospective Virtual Screening: Results for the PDB2008 and PDB2014 Data Sets The PDB2008 data set contains 9 kinase targets, 4 serine proteases, and 3 nuclear hormone receptors. The kinases and nuclear hormone receptors have distinct molecular recognition sites and are generally rather hydrophobic. The serine proteases range from quite hydrophobic (FXa) to quite hydrophilic (uPA). The remaining targets (HIV-1 RT, Hsp90, OppA, DPP4, PTP1B) are diverse, and range in polarity from extremely hydrophilic (DPP4, PTP1B, OppA) to extremely hydrophobic (HIV-1 RT). Thus, there is considerable diversity in the data set, but it cannot be viewed by any means as spanning all of receptor space. Future studies will investigate a large number of additional targets and ligands; for our initial work, we wanted to employ PDB derived actives so as to be able to reliably separate cross docking errors from scoring function errors. In evaluating enrichment, we consider only ligands with binding affinities that are better than 10 µM. This is a typical cutoff for experimental screening assays; furthermore, as the known active ligands become weaker, it is less clear how many database ligands should be competitive with them. Generally we would expect the experimental hit rate to constitute less than 0.5% of the database, which is 5 ligands in a 1000 compound random database, although one might see more active ligands if the receptor is particularly promiscuous. These estimates enable an approximate

41

ACS Paragon Plus Environment

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

calibration as to how both WScore and Glide SP are performing as compared to a theoretically perfect scoring function. Virtual screens are performed using the standard 1000 compound decoy set (http://www.schrodinger.com/Glide/Ligand-Decoys-Set) described in previous work3 though three compounds were eliminated from consideration in computing enrichments. These three were alternative ionization/tautomeric states of other decoy compounds. These molecules have been selected to represent a diverse set of drug-like molecules with various specified property distributions (molecular weight, number of donors and acceptors, etc.). Preliminary tests using alternative decoy sets indicate that similar results to those shown below are obtained. We carry out virtual screens for all receptors in the database for WScore, Glide SP and Glide XP. We initially select the receptor ensembles using method 1 of Section V; comparisons are made for a subset of receptors using method 2. The same ensembles are used for SP and XP docking with default docking settings. For Glide SP and Glide XP, the score of a given ligand is computed by selecting the best score from among all of the receptors in the ensemble. Table 5 presents BEDROC(α=160.9)55, BEDROC(α=20) and ROC(1%) enrichment results for 22 targets of the PDB2008 data set using WScore, Glide SP, and Glide XP. ROCS plots for each receptor are reported in the Supplementary Material (Figure S1). The results demonstrate that WScore represents a significant improvement on Glide SP and Glide XP in the ability to discriminate active compounds from random database molecules. There are some screens where the results from both methods are comparable; these are generally data sets dominated by very tight binding ligands (e.g. ABL1). For other screens, such as HIV-1 RT, even tightly binding ligands are difficult to separate out using Glide SP. Table 6 presents results for a subset of

42

ACS Paragon Plus Environment

Page 42 of 85

Page 43 of 85

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

receptors where there are enough weakly binding ligands (in the -7.0 to -9.0 kcal/mol range) to make it worthwhile to report these results separately. For these weakly binding ligands, large improvements using WScore are generally seen. Table 7 presents test set enrichment results spanning 17 targets where the complexes from the PDB2014 data set are used to supply the actives. The ensembles have not been optimized for these actives, so as expected, we find some ligands that do not fit well into any member of the ensemble and receive penalties. Some ensembles, such as that for ERK2, are a poor match for all of the PDB2014 ligands, and the results could clearly be improved substantially if additional receptors were incorporated into the ensemble. An extreme case is ABL1, where the enrichment rates for PDB2014 are near zero. This is attributable to a specific problem: the PDB2008 data set for ABL consists of all DFG-in structures (we neglected to make a separate data set for DFG-out structures), whereas the PDB2014 ligands are mostly DFG-out due to subsequent interest in Gleevec and related compounds. Wscore does not do well at docking DFG-out ligands into DFG-in structures. Despite this, WScore enrichments on average are superior to Glide SP or XP. In Fig. 9, we present the score distributions of actives and decoys for a selected subset of receptors. These results highlight the greatly enhanced ability of WScore to separate the vast majority of decoys from the active compounds. In particular, WScore has many fewer compounds in the intermediate score range, corresponding to weakly binding actives. Recovering a true active that scores in this range with Glide SP is therefore quite difficult. The superior ability of WScore to separate actives and decoys can primarily be attributed to the penalty terms in WScore described in sections IV. To illustrate this point, we report in Table 8

43

ACS Paragon Plus Environment

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

the BEDROC and ROC(1%) values for WScore with the aforementioned terms turned off. Turning these terms off has little effect on the scores of the actives, but drastically changes the distribution of decoy scores in all receptors. Without penalty terms, WScore performance in virtual screening is similar to that observed for Glide SP Finally, in Table 9 we present results obtained with the automated model building algorithm described in Section V, applied to the PDB2008 data set. The results can be compared directly with those from the hand optimized ensembles discussed previously; this is done in the last several columns of Table 9. On average, the results of the automated approach and hand optimized approach are very similar; however, the automated approach is much easier to apply VIII. Discussion The self-docking pose prediction (RMSD) and energetic rank ordering results presented above demonstrate that the WScore model yields a robust description of native complexes, as long as an accurate structure and protonation state assignments of the receptor are available. The improvements as compared to Glide SP and XP are significant, and close to the limit of what one could expect for a fast empirical method for which the scoring of each ligand is predominantly based on a single structure ( the entropic term in Wscore does use data from multiple structures, but this is a very crude correction and would not be able to correct many types of flaws in the single structure estimates). This data illustrates what is possible with a physics based, rigid receptor docking approach when there is no issue concerning the adequacy of the ligand fit in the receptor binding site. The PDB2008 enrichment results examine the performance of WScore in the context of cross docking into an optimized ensemble of receptors. These calculations serve a number of

44

ACS Paragon Plus Environment

Page 44 of 85

Page 45 of 85

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

purposes. Firstly, they show that a relatively small ensemble, composed of 3-6 receptors, is capable of docking a large, diverse set of active compounds and achieving scores and poses that are reasonably close to self-docking, despite the presence of a dense web of penalty terms. While reliable rank ordering is challenging with such an ensemble, approximate classification of ligands as strongly active, weakly active, or inactive, is still feasible. The enrichment metrics are robust across the entire set of targets, and in many cases reflect qualitatively better rejection of false positives than Glide SP or XP. The improvements in ranking for the case of weakly binding actives, as presented in Table 6, are particularly striking. As shown in Table 8, the penalty terms are primarily responsible for the much higher false positive rejection rates, while having little impact on the scores of the actives. The results of docking the PDB2014 ligands into the ensembles created from the PDB2008 database illustrate the limitations of WScore when confronted with ligands requiring significant induced fit effects beyond those present in the ensemble (and “significant” could be a relatively small, but critical, movement of a side chain or a few backbone atoms). Ligands in this category are not going to be retrieved by WScore in a virtual screening exercise, and certainly cannot be rank ordered sensibly. In order to recover such ligands, additional receptor conformations must be added to the ensemble. We outline below several systematic approaches to ensemble generation and augmentation that could address this problem effectively in the context of a live drug discovery project. If none of the methods below are feasible, then Wscore can still be used; a reduced number of false positives can be expected, but the coverage of the active ligand chemical space will be restricted to compounds that bind to conformations close to the few that are available for docking.

45

ACS Paragon Plus Environment

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Some key practical questions concerning the use of WScore revolve around the question of how to best generate accurate and robust conformational ensembles. We first consider the case of constructing an ensemble for virtual screening. The automated optimization tool described in Section V will build a suitable WScore model if supplied with fully resolved structures, but what should be done if there are very few (perhaps even only one) crystal structure available? Adding relevant new structures requires using computational methods to generate the structures. One possible method involves the use of induced fit docking (IFD)56 into the available crystal structure or structures, using known active compounds from the literature or in house screening experiments. It is often the case that there are a number of such compounds that have been identified, even if there are very few crystals structures. In forthcoming work, we show that metadynamics methods can be used to greatly increase the reliability of IFD methods in predicting the structure of protein-ligand complexes when induced fit effects are important. The use of IFD generated structures in the docking ensemble still needs to be validated by retrospective and prospective virtual screening studies; these efforts are currently in progress. A second approach, which does not necessarily require additional experimentally validated active compounds, is to use free energy perturbation (FEP) methods to alchemically modify the ligand and generate new induced fit structures. The FEP process also produces a predicted binding free energy for the modified ligand, enabling this quantity to be supplied, in place of experimental binding affinities, to the automated ensemble optimization method described in Section V for the calculation of protein reorganization energies, GBSHIFT values, and ensemble selection. In this approach, a technology is required to extract an approximation to a crystal structure from the FEP molecular dynamics simulation. This is not a trivial problem, but we believe that progress can be made and are currently exploring various methods.

46

ACS Paragon Plus Environment

Page 46 of 85

Page 47 of 85

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

Finally, there is the question of whether WScore can be effectively used for approximate rank ordering in a lead optimization context. Any such application requires a high quality ensemble, possibly in many cases superior to those employed here for virtual screening with regard to the number of receptor conformations, and coverage of the induced fit space of the series being optimized. We are in the process of performing experiments to obtain results for a wide variety of lead optimization data sets (such as the ones used in ref. 57 to evaluate FEP). IX. Conclusion The various WScore components (hydrogen bonding, hydrophobic contacts, localized desolvation and ligand strain, MM-GBSA to describe delocalized strain, and protein reorganization energies for each receptor conformation) provide a more sophisticated and flexible description of the protein-ligand complex, as compared to earlier models such as Glide SP. The physics appears to encompass the important terms affecting ligand binding affinities, and the parameters of the model correspond to what one would estimate physically for the various effects that are treated. More work remains to be done to optimize the details of the model, and there may be challenges that will be uncovered as new systems are investigated, and the approach is applied prospectively in virtual screens. The results reported here provide a solid foundation to initiate such applications, which will provide the most rigorous testing of the performance of the method. In addition to quantitative applications such as virtual screening, a key feature of WScore is the ability to decompose the binding free energy into readily understandable physical terms. We are developing an automated visualization system that will enable the user to focus in on specific structural features responsible for WScore terms identified as important in any protein-ligand

47

ACS Paragon Plus Environment

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

complex. The assignment of protein reorganization energies similarly provides parameters that can play an important role in deciding which receptor conformations to target in a ligand design effort. This paper demonstrates that ensemble docking can be highly effective in conjunction with careful calibration of protein reorganization energies for the various members of the ensemble. Most studies of methodology for predicting binding affinity focus on the scoring function being used, and treat the receptor structure (often only one is used) as a secondary issue. We believe that in fact, many scoring functions would yield significantly better signals if a suitable ensemble were properly employed. While WScore has been particularly adapted to this approach, the importance of the structural ensemble cannot be fully separated from the properties of the scoring function. In the future, our belief is that a great deal of progress can be made by pursuing this idea to its logical conclusion. The use of significantly larger ensembles, generated by for example by FEP calculations, yielding much more complete coverage of the energetically accessible receptor conformational space, coupled with a robust and accurate method for estimating protein reorganization energies, has the potential to dramatically expand the chemical space of tightly binding ligands, presenting qualitatively more options for structure based drug discovery projects. Notes: The authors declare the following competing financial interest(s): R.A.F. has a significant financial stake in Schrödinger, Inc., is a consultant to Schrödinger, Inc., and is on the Scientific Advisory Board of Schrödinger, Inc.

48

ACS Paragon Plus Environment

Page 48 of 85

Page 49 of 85

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

Funding Sources Author Professor Friesner received funding from the Bill and Melinda Gates Foundation and the Department of Energy Grant DE-FG02-90ER14162. Neither were used to fund research in this work.

49

ACS Paragon Plus Environment

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 50 of 85

Supporting Information. Appendices describing the algorithm for recognition and scoring of aromatic CH to O interactions, hydrogen bonds, pi-cation interactions, and 1,6-intramolecular hydrogen bonds as well as the WaterMap correction term, and the MM-GBSA model for delocalized strain and solvation energy are provided. Also provided is a table showing the description and magnitude of WScore penalty and reward terms, select data to illustrate the rationale for the hydrogen bond strain term, lists of PDB IDs for complexes that comprise the PDB2008 and PDB2014 datasets, tables of enrichments with both AUC and early enrichment metrics for the PDB2008, weakly active PDB2008 and PDB2014 datasets, and ROC plots for enrichment of the PDB2008 set active ligands against the Schrödinger decoy ligand set. Corresponding Author Information: * Phone: 212-854-7606 E-mail: [email protected]

Abbreviations Used ABL1, Abelson tyrosine-protein kinase 1; AURKA, aurora kinase A; CDK2, cyclin-dependent kinase 2; CSD, Cambridge Structural Database Chk1, checkpoint kinase 1; DDD, distance dependent dielectric; DPP4, dipeptidyl peptidase IV; ERK2, extracellular signal-regulated kinase 2; ERRγ, estrogen-related receptor gamma; ER α , estrogen receptor alpha; ERβ, estrogen receptor beta; FEP, free-energy perturbation; FVIIa, Factor VIIa; FXa, Factor Xa; HIV-1 RT, human immunodeficiency virus type 1 reverse transcriptase; Hsp90, heat shock protein 90; Lck, lymphocyte-specific protein tyrosine kinase; MM-GBSA, molecular mechanics with generalized Born and surface area solvation NNRTI, non-nucleoside reverse transcriptase inhibitor OppA, periplasmic oligopeptide-binding protein; PDB, Protein Data Bank; PIM1, proto-oncogene

50

ACS Paragon Plus Environment

Page 51 of 85

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

serine/threonine-protein kinase Pim-1; PKA, protein kinase A; PPAR-γ, peroxisome proliferatoractivated receptor gamma; PTP1B, protein-tyrosine phosphatase 1B; ROCK1, Rho-associated protein kinase 1; ROCK2, Rho-associated protein kinase 2; p38α-in, p38-α mitogen-activated protein kinase (DFG-in form); p38α-out, p38-α mitogen-activated protein kinase (DFG-out form); uPA, urokinase type plasminogen activator

51

ACS Paragon Plus Environment

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

References 1.

Friesner, R. A.; Banks, J. L.; Murphy, R. B.; Halgren, T. A.; Klicic, J. J.; Mainz, D. T.;

Repasky, M. P.; Knoll, E. H.; Shelley, M.; Perry, J. K.; Shaw, D. E.; Francis, P.; Shenkin, P. S. Glide: a new approach for rapid, accurate docking and scoring. 1. method and assessment of docking accuracy. J. Med. Chem. 2004, 47, 1739-1749. 2.

Friesner, R. A.; Murphy, R. B.; Repasky, M. P.; Frye, L. L.; Greenwood, J. R.; Halgren,

T. A.; Sanschagrin, P. C.; Mainz, D. T. Extra precision glide: docking and scoring incorporating a model of hydrophobic enclosure for protein-ligand complexes. J. Med. Chem. 2006, 49, 61776196. 3.

Halgren, T. A.; Murphy, R. B.; Friesner, R. A.; Beard, H. S.; Frye, L. L.; Pollard, W. T.;

Banks, J. L. Glide: a new approach for rapid, accurate docking and scoring. 2. enrichment factors in database screening. J. Med. Chem. 2004, 47, 1750-1759. 4.

Verdonk, M. L.; Cole, J. C.; Hartshorn, M. J.; Murray, C. W.; Taylor, R. D. Improved

protein-ligand docking using GOLD. Proteins 2003, 52, 609-623. 5.

Jones, G.; Willett, P.; Glen, R. C.; Leach, A. R.; Taylor, R. Development and validation

of a genetic algorithm for flexible docking. J. Mol. Biol. 1997, 267, 727-748. 6.

McGann, M. FRED pose prediction and virtual screening accuracy. J. Chem. Inf. Model.

2011, 51, 578-596. 7.

Rarey, M.; Kramer, B.; Lengauer, T.; Klebe, G. A fast flexible docking method using an

incremental construction algorithm. J. Mol. Biol. 1996, 261, 470-489. 8.

Jain, A. N. Surflex: fully automatic flexible molecular docking using a molecular

similarity-based search engine. J. Med. Chem. 2003, 46, 499-511.

52

ACS Paragon Plus Environment

Page 52 of 85

Page 53 of 85

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

9.

Eldridge, M. D.; Murray, C. W.; Auton, T. R.; Paolini, G. V.; Mee, R. P. Empirical

scoring functions: I. The development of a fast empirical scoring function to estimate the binding affinity of ligands in receptor complexes. J. Comput. Aided Mol. Des. 1997, 11, 425-445. 10.

Perola, E.; Walters, W. P.; Charifson, P. S. A detailed comparison of current docking and

scoring methods on systems of pharmaceutical relevance. Proteins 2004, 56, 235-249. 11.

Wang, R.; Lu, Y.; Wang, S. Comparative evaluation of 11 scoring functions for

molecular docking. J. Med. Chem. 2003, 46, 2287-2303. 12.

Repasky, M. P.; Murphy, R. B.; Banks, J. L.; Greenwood, J. R.; Tubert-Brohman, I.;

Bhat, S.; Friesner, R. A. Docking performance of the glide program as evaluated on the Astex and DUD datasets: a complete set of glide SP results and selected results for a new scoring function integrating WaterMap and glide. J. Comput. Aided Mol. Des. 2012, 26, 787-799. 13.

Ferreira, R. S.; Simeonov, A.; Jadhav, A.; Eidam, O.; Mott, B. T.; Keiser, M. J.;

McKerrow, J. H.; Maloney, D. J.; Irwin, J. J.; Shoichet, B. K. Complementarity between a docking and a high-throughput screen in discovering new cruzain inhibitors. J. Med. Chem. 2010, 53, 4891-4905. 14.

Kincaid, V. A.; London, N.; Wangkanont, K.; Wesener, D. A.; Marcus, S. A.; Heroux,

A.; Nedyalkova, L.; Talaat, A. M.; Forest, K. T.; Shoichet, B. K.; Kiessling, L. L. Virtual Screening for UDP-galactopyranose mutase ligands identifies a new class of antimycobacterial agents. ACS Chem. Biol. 2015, 10, 2209-2218. 15.

Mysinger, M. M.; Weiss, D. R.; Ziarek, J. J.; Gravel, S.; Doak, A. K.; Karpiak, J.;

Heveker, N.; Shoichet, B. K.; Volkman, B. F. Structure-based ligand discovery for the proteinprotein interface of chemokine receptor CXCR4. Proc. Natl. Acad. Sci. U.S.A. 2012, 109, 55175522.

53

ACS Paragon Plus Environment

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

16.

Pierce, A. C.; Jacobs, M.; Stuver-Moody, C. Docking study yields four novel inhibitors

of the protooncogene Pim-1 kinase. J. Med. Chem. 2008, 51, 1972-1975. 17.

Roughley, S.; Wright, L.; Brough, P.; Massey, A.; Hubbard, R. E. Hsp90 inhibitors and

drugs from fragment and virtual screening. Top. Curr. Chem. 2012, 317, 61-82. 18.

Sliwoski, G.; Kothiwale, S.; Meiler, J.; Lowe, E. W., Jr. Computational methods in drug

discovery. Pharmacol. Rev. 2014, 66, 334-395. 19.

Snyder, P. W.; Mecinovic, J.; Moustakas, D. T.; Thomas, S. W., 3rd; Harder, M.; Mack,

E. T.; Lockett, M. R.; Heroux, A.; Sherman, W.; Whitesides, G. M. Mechanism of the hydrophobic effect in the biomolecular recognition of arylsulfonamides by carbonic anhydrase. Proc. Natl. Acad. Sci. U.S.A. 2011, 108, 17889-17894. 20.

Breiten, B.; Lockett, M. R.; Sherman, W.; Fujita, S.; Al-Sayah, M.; Lange, H.; Bowers,

C. M.; Heroux, A.; Krilov, G.; Whitesides, G. M. Water networks contribute to enthalpy/entropy compensation in protein-ligand binding. J. Am. Chem. Soc. 2013, 135, 15579-15584. 21.

Abel, R.; Young, T.; Farid, R.; Berne, B. J.; Friesner, R. A. Role of the active-site solvent

in the thermodynamics of factor Xa ligand binding. J. Am. Chem. Soc. 2008, 130, 2817-2831. 22.

Young, T.; Abel, R.; Kim, B.; Berne, B. J.; Friesner, R. A. Motifs for molecular

recognition exploiting hydrophobic enclosure in protein-ligand binding. Proc. Natl. Acad. Sci. U.S.A. 2007, 104, 808-813. 23.

Bayden, A. S.; Moustakas, D. T.; Joseph-McCarthy, D.; Lamb, M. L. Evaluating free

energies of binding and conservation of crystallographic waters using SZMAP. J. Chem. Inf. Model. 2015, 55, 1552-1565. 24.

Cui, G.; Swails, J. M.; Manas, E. S. SPAM: A simple approach for profiling bound water

molecules. J. Chem. Theory Comput. 2013, 9, 5539-5549.

54

ACS Paragon Plus Environment

Page 54 of 85

Page 55 of 85

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

25.

Bodnarchuk, M. S.; Viner, R.; Michel, J.; Essex, J. W. Strategies to calculate water

binding free energies in protein-ligand complexes. J. Chem. Inf. Model. 2014, 54, 1623-1633. 26.

Michel, J.; Tirado-Rives, J.; Jorgensen, W. L. Prediction of the water content in protein

binding sites. J. Phys. Chem. B 2009, 113, 13337-13346. 27.

Le Roux, J.; Leriche, C.; Chamiot-Clerc, P.; Feutrill, J.; Halley, F.; Papin, D.; Derimay,

N.; Mugler, C.; Grepin, C.; Schio, L. Preparation and optimization of pyrazolo[1,5-a]pyrimidines as new potent PDE4 inhibitors. Bioorg. Med. Chem. Lett. 2016, 26, 454-459. 28.

Bortolato, A.; Tehan, B. G.; Bodnarchuk, M. S.; Essex, J. W.; Mason, J. S. Water

network perturbation in ligand binding: adenosine A(2A) antagonists as a case study. J. Chem. Inf. Model. 2013, 53, 1700-1713. 29.

Chrencik, J. E.; Patny, A.; Leung, I. K.; Korniski, B.; Emmons, T. L.; Hall, T.; Weinberg,

R. A.; Gormley, J. A.; Williams, J. M.; Day, J. E.; Hirsch, J. L.; Kiefer, J. R.; Leone, J. W.; Fischer, H. D.; Sommers, C. D.; Huang, H. C.; Jacobsen, E. J.; Tenbrink, R. E.; Tomasselli, A. G.; Benson, T. E. Structural and thermodynamic characterization of the TYK2 and JAK3 kinase domains in complex with CP-690550 and CMP-6. J. Mol. Biol. 2010, 400, 413-433. 30.

Mason, J. S.; Bortolato, A.; Congreve, M.; Marshall, F. H. New insights from structural

biology into the druggability of G protein-coupled receptors. Trends Pharmacol. Sci. 2012, 33, 249-260. 31.

Pearlstein, R. A.; Sherman, W.; Abel, R. Contributions of water transfer energy to

protein-ligand association and dissociation barriers: watermap analysis of a series of p38alpha MAP kinase inhibitors. Proteins 2013, 81, 1509-1526. 32.

Mondal, J.; Friesner, R. A.; Berne, B. J. Role of desolvation in thermodynamics and

kinetics of ligand binding to a kinase. J. Chem. Theory Comput. 2014, 10, 5696-5705.

55

ACS Paragon Plus Environment

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

33.

Wei, B. Q.; Weaver, L. H.; Ferrari, A. M.; Matthews, B. W.; Shoichet, B. K. Testing a

flexible-receptor docking algorithm in a model binding site. J. Mol. Biol. 2004, 337, 1161-1182. 34.

Abel, R.; Salam, N. K.; Shelley, J.; Farid, R.; Friesner, R. A.; Sherman, W. Contribution

of explicit solvent effects to the binding affinity of small-molecule inhibitors in blood coagulation factor serine proteases. ChemMedChem 2011, 6, 1049-1066. 35.

Knight, J. L.; Krilov, G.; Borrelli, K. W.; Williams, J.; Gunn, J. R.; Clowes, A.; Cheng,

L.; Friesner, R. A.; Abel, R. Leveraging data fusion strategies in multireceptor lead optimization MM/GBSA end-point methods. J. Chem. Theory Comput. 2014, 10, 3207-3220. 36.

Lyne, P. D.; Lamb, M. L.; Saeh, J. C. Accurate prediction of the relative potencies of

members of a series of kinase inhibitors using molecular docking and MM-GBSA scoring. J. Med. Chem. 2006, 49, 4805-4808. 37.

Berman, H. M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T. N.; Weissig, H.;

Shindyalov, I. N.; Bourne, P. E. The Protein Data Bank. Nucleic Acids Res. 2000, 28, 235-242. 38.

Lazaridis, T. Inhomogeneous Fluid Approach to Solvation Thermodynamics. 1. Theory.

J. Phys. Chem. B 1998, 102, 3531-3541. 39.

Wang, L.; Berne, B. J.; Friesner, R. A. Ligand binding to protein-binding pockets with

wet and dry regions. Proc. Natl. Acad. Sci. U.S.A. 2011, 108, 1326-1330. 40.

Dougherty, D. A. The cation-pi interaction. Acc. Chem. Res. 2013, 46, 885-893.

41.

Allen, F. H. The Cambridge Structural Database: a quarter of a million crystal structures

and rising. Acta Cryst. B 2002, 58, 380-388. 42.

Harder, E.; Damm, W.; Maple, J.; Wu, C.; Reboul, M.; Xiang, J. Y.; Wang, L.; Lupyan,

D.; Dahlgren, M. K.; Knight, J. L.; Kaus, J. W.; Cerutti, D. S.; Krilov, G.; Jorgensen, W. L.;

56

ACS Paragon Plus Environment

Page 56 of 85

Page 57 of 85

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

Abel, R.; Friesner, R. A. OPLS3: a force field providing broad coverage of drug-like small molecules and proteins. J. Chem. Theory Comput. 2015, 12, 281-296. 43.

Mobley, D. L.; Chodera, J. D.; Dill, K. A. The confine-and-release method: obtaining

correct binding free energies in the presence of protein conformational change. J. Chem. Theory Comput. 2007, 3, 1231-1235. 44.

Lin, Y. L.; Meng, Y.; Jiang, W.; Roux, B. Explaining why Gleevec is a specific and

potent inhibitor of Abl kinase. Proc. Natl. Acad. Sci. U.S.A. 2013, 110, 1664-1669. 45.

Lovera, S.; Sutto, L.; Boubeva, R.; Scapozza, L.; Dolker, N.; Gervasio, F. L. The

different flexibility of c-Src and c-Abl kinases regulates the accessibility of a druggable inactive conformation. J. Am. Chem. Soc. 2012, 134, 2496-2499. 46.

Meng, Y.; Lin, Y. L.; Roux, B. Computational study of the "DFG-flip" conformational

transition in c-Abl and c-Src tyrosine kinases. J. Phys. Chem. B 2015, 119, 1443-1456. 47.

Shan, Y.; Seeliger, M. A.; Eastwood, M. P.; Frank, F.; Xu, H.; Jensen, M. O.; Dror, R.

O.; Kuriyan, J.; Shaw, D. E. A conserved protonation-dependent switch controls drug binding in the Abl kinase. Proc. Natl. Acad. Sci. U S A 2009, 106, 139-144. 48.

Ghosh, A.; Rapp, C. S.; Friesner, R. A. Generalized born model based on a surface

integral formulation. J. Phys. Chem. B 1998, 102, 10983-10990. 49.

Zhu, K.; Shirts, M. R.; Friesner, R. A. Improved methods for side chain and loop

predictions via the protein local optimization program: variable dielectric model for implicitly improving the treatment of polarization effects. J. Chem. Theory Comput. 2007, 3, 2108-2119. 50.

Ravindranathan, K.; Tirado-Rives, J.; Jorgensen, W. L.; Guimaraes, C. R. Improving

MM-GB/SA scoring through the application of the variable dielectric model. J. Chem. Theory Comput. 2011, 7, 3859-3865.

57

ACS Paragon Plus Environment

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

51.

Halgren, T. New method for fast and accurate binding-site identification and analysis.

Chem. Biol. Drug Des. 2007, 69, 146-148. 52.

Halgren, T. A. Identifying and characterizing binding sites and assessing druggability. J.

Chem. Inf. Model 2009, 49, 377-389. 53.

Sastry, G. M.; Adzhigirey, M.; Day, T.; Annabhimoju, R.; Sherman, W. Protein and

ligand preparation: parameters, protocols, and influence on virtual screening enrichments. J. Comput. Aided Mol. Des. 2013, 27, 221-234. 54.

Brown, S. P.; Muchmore, S. W.; Hajduk, P. J. Healthy skepticism: assessing realistic

model performance. Drug Discovery Today 2009, 14, 420-427. 55.

Truchon, J. F.; Bayly, C. I. Evaluating virtual screening methods: good and bad metrics

for the "early recognition" problem. J. Chem. Inf. Model. 2007, 47, 488-508. 56.

Sherman, W.; Day, T.; Jacobson, M. P.; Friesner, R. A.; Farid, R. Novel procedure for

modeling ligand/receptor induced fit effects. J. Med. Chem. 2006, 49, 534-553. 57.

Wang, L.; Wu, Y.; Deng, Y.; Kim, B.; Pierce, L.; Krilov, G.; Lupyan, D.; Robinson, S.;

Dahlgren, M. K.; Greenwood, J.; Romero, D. L.; Masse, C.; Knight, J. L.; Steinbrecher, T.; Beuming, T.; Damm, W.; Harder, E.; Sherman, W.; Brewer, M.; Wester, R.; Murcko, M.; Frye, L.; Farid, R.; Lin, T.; Mobley, D. L.; Jorgensen, W. L.; Berne, B. J.; Friesner, R. A.; Abel, R. Accurate and reliable prediction of relative ligand binding potency in prospective drug discovery by way of a modern free-energy calculation protocol and force field. J. Am. Chem. Soc. 2015, 137, 2695-2703.

58

ACS Paragon Plus Environment

Page 58 of 85

Page 59 of 85

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

Figure 1. Pose from cross docking the ABL1 ligand from 2v7a into the receptor from 1opk with the receptor cavity displayed as a surface. Very close intramolecular contacts are observed of 1.19 and 1.58 Å. The ligand is tightly packed into the receptor cavity and is unable to relieve these unphysical contacts by rotation of the methoxymethylbenzene group. Thus, WScore applies a strain energy penalty of 3.0 kcal/mol to this pose.

59

ACS Paragon Plus Environment

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 2. This docked pose from self-docking PDB complex 2ywp retains the expected 1,6internal hydrogen bond (atoms shown in ball representation) involving a pyridine nitrogen atom. By forming the 1,6-internal hydrogen bond in the docked pose a penalty of up to 3.0 kcal/mol is avoided. No reward term for forming the internal hydrogen bond is assessed.

60

ACS Paragon Plus Environment

Page 60 of 85

Page 61 of 85

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

Figure 3. Pose of a decoy ligand docked into PDB entry 1ydr. The 1,6-internal hydrogen bond that exists in solution (insert) between the carbonyl oxygen and guanidine NH (shown in ball representation) is broken by the rotation of the C=NC=O torsion. To avoid clashing with ALA70 (shown in ball and stick) the angle between planes formed by CC=O and HNC goes from 8.8° in solution to 44.9° in the protein structure. A penalty of 3 kcal/mol is assigned for breaking this internal hydrogen bond.

61

ACS Paragon Plus Environment

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 4. Binding sites structures for p38α DFG-in are shown in A (PDB ID 1a9u) and for DFG-out are shown in B (PDB ID 1w82) conformations. The DFG residues are shown in ball and stick representation to illustrate how positions of the ASP and PHE residues are flipped relative to the ATP-binding site. The ATP-binding site is occupied by the native ligand in the catalytically active DFG-in complex. The ligand in the catalytically inactive DFG-out conformation occupies an allosteric site only accessible in the DFG-out conformation.

62

ACS Paragon Plus Environment

Page 62 of 85

Page 63 of 85

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

Figure 5. Plot of p38α DFG-in and DFG-out native ligand docking WScores, sans the contribution from the protein strain energy, against the experimental binding affinity. R2 of 0.71 and 0.45 for the DFG-in and DFG-out subsets are observed. The linear least squares best fit lines to the DFG-in and –out native ligands differ in energy by about 3 kcal/mol. This is interpreted as the strain energy of the DFG-out conformation relative to the DFG-in conformation.

63

ACS Paragon Plus Environment

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

64

ACS Paragon Plus Environment

Page 64 of 85

Page 65 of 85

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

Figure 6. Examples showing the impact of WaterMap water displacement. (A) For CDK2 2b53 a WaterMap water bridges the docked ligand pose and GLU81 and LEU83. (B) For CDK2 2bhh the WaterMap water bridging ASP86 and GLN131 is displaced by the docked ligand pose shown. The ligand does not replace WaterMap water interactions with the protein. This displacement contributes 3.0 kcal/mol resulting in a WScore of -9.1 kcal/mol compared to the

65

ACS Paragon Plus Environment

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

experimental ∆G of -7.9 kcal/mol. (C) For DPP4 2ogz a WaterMap site bridges GLU205 and GLU206 and is displaced by the docked decoy ligand pose. The ligand does not replace WaterMap site interactions with the protein. Displacement of the WaterMap site results in a 3.0 kcal/mol desolvation penalty. Ligands with