Thermodynamic Characterization of Hydration Sites from Integral

May 31, 2017 - Here, the nature of water interactions in protein binding sites is investigated by ... 3D-RISM and 3D convolutional neural network for ...
0 downloads 0 Views 4MB Size
Subscriber access provided by Binghamton University | Libraries

Article

Thermodynamic Characterization of Hydration Sites from Integral Equation-Derived Free Energy Densities: Application to Protein Binding Sites and Ligand Series Stefan Gussregen, Hans Matter, Gerhard Hessler, Evanthia Lionta, Jochen Heil, and Stefan M. M. Kast J. Chem. Inf. Model., Just Accepted Manuscript • Publication Date (Web): 31 May 2017 Downloaded from http://pubs.acs.org on June 1, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Chemical Information and Modeling is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 53

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Thermodynamic Characterization of Hydration Sites from

Integral

Equation-Derived

Free

Energy

Densities: Application to Protein Binding Sites and Ligand Series Stefan Güssregen,†,* Hans Matter,† Gerhard Hessler,† Evanthia Lionta,† Jochen Heil‡ Stefan M. Kast‡,* †

Sanofi-Aventis Deutschland GmbH, R&D, IDD, Structure, Design and Informatics, Building G

877, 65926 Frankfurt am Main, Germany ‡

Physikalische Chemie III, Technische Universität Dortmund, Otto-Hahn-Str. 4a, 44227

Dortmund, Germany KEYWORDS Water, protein binding sites, 3D RISM theory, solvation free energy, structure-activity relationship (SAR)

ACS Paragon Plus Environment

1

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 53

ABSTRACT

Water molecules play an essential role for mediating interactions between ligands and protein binding sites. Displacement of specific water molecules can favorably modulate the free energy of binding of protein-ligand complexes. Here, the nature of water interactions in protein binding sites is investigated by 3D RISM (3D reference interaction site model) integral equation theory to understand and exploit local thermodynamic features of water molecules by ranking their possible displacement in structure-based design. Unlike molecular dynamics-based approaches, 3D RISM theory allows for fast and noise-free calculations using the same detailed level of solute-solvent interaction description. Here we correlate molecular water entities instead of mere site density maxima with local contributions to the solvation free energy using novel algorithms. Distinct water molecules and hydration sites are investigated in multiple protein-ligand X-ray structures, namely streptavidin, factor Xa and factor VIIa, based on 3D RISM-derived free energy density fields. Our approach allows the semi-quantitative assessment of whether a given structural water molecule can potentially be targeted for replacement in structure-based design. Finally, PLS-based regression models from free energy density fields used within as 3D-QSAR approach (CARMa - Comparative Analysis of 3D RISM Maps) is shown to be able to extract relevant information for the interpretation of SAR trends, as demonstrated for a series of serine protease inhibitors.

ACS Paragon Plus Environment

2

Page 3 of 53

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

1. INTRODUCTION Experimental and theoretical evidence underscores the importance of water molecules as partners mediating interactions between ligands and their respective protein binding sites.1 In specific cases, displacement of individual water molecules from binding sites is possible by parts of ligands, if such a process favorably contributes to the (standard) Gibbs free energy of binding (hereinafter denoted by ∆bindG, omitting the symbol referring to the standard for simplicity) for the resulting complex.2 One classic example for the replacement of structurally conserved water is the cyclic urea series as HIV-1 protease inhibitors.3 Therefore, a better understanding of the thermodynamic characteristics of water molecules might impact structure-based drug design since direct protein-ligand interaction patterns are modulated by features of the solvent close to the binding region.4 Thus it is crucial to determine the propensity of structurally conserved water molecules within a binding site to be replaced by a part of novel ligands. The main question for considering water molecules explicitly in structure-based design is which of those can be favorably targeted for displacement by any part of a putative ligand molecule. Earlier work identified some common characteristics for crystallographic water sites in proteins, namely involvement in a large number of hydrogen-bonds, low experimental B-factors plus localization in grooves on the protein surface.2,5,6 While these initial studies did not discriminate between tightly bound and ligand-displaceable water molecules, the programs Consolv7 and WaterScore8 were reported by different groups towards such a classification, both based on statistical analysis of descriptors characterizing the water environment in protein sites. A recent contribution9 involves the program HINT10 to identify conserved and displaceable water molecules in a protein-apo structure. Water placement/ranking algorithms are also provided by commercial software vendors, such as SZMAP (Openeye) or WaterFlap (Molecular Discovery).

ACS Paragon Plus Environment

3

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 53

These works aim at predicting whether or not a water molecule can be displaced when the apostructure interacts with a ligand. The present study will extend this toward a more general hydration profile of a protein binding site by a classification of hydration sites into favorable and non-favorable ones for ligand displacement, which will not only depend on the nature of the binding site, but also of the ligand. To address this question from a theoretical perspective, it is necessary to model not only the protein-ligand complex, but also the aqueous environment on an atomic-level basis. The technique of classical molecular dynamics (MD) simulation provides an adequate computational framework due to its capability to calculate thermodynamic properties of the entire system or parts of it, such as water molecules.11 This feature is based on the statisticalmechanical connection between distribution functions, determined by counting and normalizing the frequency of occurrence of solvent species in certain spatial regions over time, and energetic and entropic properties. The fundamental theory of solvation thermodynamics behind such an approach has been formulated by Lazaridis12,13 in the form of the “inhomogeneous solvation theory” (IST). Later studies employed this concept to model local contributions of water molecules and link them to thermodynamic quantities. For protein-ligand complexes the WaterMap approach14,15,16 was introduced based on explicit solvent MD simulations of the active site combined with approximations for thermodynamic properties based on IST.12,13 While MD-based analyses like WaterMap focus on properties of individual water molecules, it is also possible to discretize the IST-based MD estimates of solvation energy and entropy on three-dimensional (3D) grids, as has been introduced by Gilson and co-workers (“grid inhomogeneous solvation theory, GIST).17,18 The obtained data have also been employed to develop scoring functions15,16,19 to estimate ∆bindG of protein-ligand complexes, to predict the scalar solvation free energy20 by computing the total

ACS Paragon Plus Environment

4

Page 5 of 53

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

change of water molecules’ energy and entropy in comparison with an appropriate reference state, and for characterizing enthalpic and entropic spatially-resolved contributions to hydrophobicity.21 For further MD-based applications, see Refs.22,23,24 Two further MD methods to determine water entropy are worth mentioning in this context, namely the 2PT (two-phase thermodynamics) technique by Goddard III25 and co-workers and the “cell theory” ansatz developed by Henchman.26 Other approaches to address this question include Monte-Carlo (MC) simulations to compute the absolute free energy of binding for specific water molecules in protein binding sites. This approach was used e.g. by Barillari et al.4 for water molecules from multiple proteins employing the double decoupling method with replica exchange thermodynamic integration in MC simulations. From statistical analysis, the probability that a water molecule can be displaced by a ligand was estimated from the computed water binding free energy. Interestingly, it was found that structurally conserved water is, on average, more tightly bound than displaceable water.4 However, adequate sampling of water dynamics during MC or MD simulations for obtaining sufficiently converged distribution functions and derived thermodynamics requires considerable computer resources. Alternative approaches based on the same solute-solvent interaction potentials as in MD and retaining the level of molecular granularity in the solvent description are therefore highly desirable. Here we advocate the use of molecular integral equation theory, the solute molecule-solvent site formulation of the Ornstein-Zernike equation, as a suitable candidate of an analytical liquid state theory that is not hampered by the sampling problem. In particular, we apply the 3D variant of the “reference interaction site model”27,28 (3D RISM29,30) that yields solvent-site distribution functions around solutes of arbitrary shape. Although such a liquid state approach derived from classical density functional theory is an approximation, it essentially

ACS Paragon Plus Environment

5

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 53

yields correct (measured by comparison with experimental and explicit MD data) distribution functions of water and ions around and within biomolecular systems, as has been demonstrated by us and others (e.g.31,32,33,34,35,36), also in the context of protein-ligand binding.37 A particularly attractive feature of 3D RISM theory represents an advantage for predicting solvation thermodynamics, namely its ability to compute the free energy of solvation analytically from distribution functions via the excess chemical potential. More specifically, the excess chemical potential is represented as integral over 3D space, which allows for an unambiguous assignment of the contribution of certain spatial regions to the total solvation free energy via the thus-defined free energy density (FED) appearing as integrand, as worked out by several groups.38,39,40,41,42 We here proceed further by combining the FED with a topographical analysis of solute-water distribution function using a novel, simpler yet more flexible algorithm compared to ref.40 This way, we gain essentially similar information as MD-based approaches by computing the thermodynamic relevance of individual localized water sites. Similar to MD-based approaches but at considerably less computational cost, we classify water molecules into two distinct groups: Those that can be replaced by a part of the ligand and those which are essentially an integral part of the receptor and cannot be replaced, but should potentially be targeted in structure-based design. Our approach allows the quantitative assessment of whether or not some specific structural water molecule could indeed be potentially addressed. We note in passing that a feature of 3D RISM theory not immediately accessible by (simple) MD is that FED data is obtained in regions where no water molecules are found with finite trajectories. These contributions are relevant for estimating the cavity formation free energies (entropically dominated) and will be analyzed in the context of the thermodynamics of water sites elsewhere.

ACS Paragon Plus Environment

6

Page 7 of 53

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

In order not to draw conclusions only about experimentally detectable water molecules from Xray structures with varying resolutions, the employed method generates a hydration profile for protein binding sites computed on a regular grid. The interpretation of potential hydration sites then allows identifying binding site regions with negative hydration free energy (∆aqG) contributions for individual water molecules, i.e. favorable hydration sites, where individual water molecules should be included in the design process, and regions with positive ∆aqG values for water molecules. The latter then refers to more unfavorable hydration sites, where placement of specific parts of ligand molecules should result in improved ligand binding affinity. In the following sections, we will first outline the basic theory of 3D RISM calculations and the associated thermodynamic implications to the study of features of water molecules. We then investigate hydration sites in a model system, followed by a selection of water molecules in Xray structures of relevant protein binding sites, namely streptavidin, factor Xa and factor VIIa. We will show that the location of hydration sites from 3D RISM calculations is confirmed by experimental water positions from X-ray crystal structure analyses. Specific structure-activity relationship trends for both serine proteases factor Xa and factor VIIa will then be discussed to illustrate which hydration positions can be replaced. We will then describe an extension of this approach by generating hydration profiles for a series of superimposed ligands derived from Xray crystallographic information of reference compounds. These hydration profiles, obtained in a similar manner to those computed for binding sites, are used as molecular interaction fields capturing information about solvation free energy for a comparative analysis of 3D RISM maps (CARMa) as novel 3D QSAR approach in analogy to the CoMFA method.43 These hydration profiles obtained from 3D RISM FED fields contain information about the ligand’s solvation free energy. Partial least squares (PLS)44,45 based regression models from ligand solvation fields

ACS Paragon Plus Environment

7

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 53

then result in interpretable statistical models with information on relevant SAR features. This will be demonstrated for a congeneric series of serine protease inhibitions. We complement the discussion by a formal analysis of design concepts derived from the local FED decomposition.

2. COMPUTATIONAL METHODS 2.1. Integral Equation Theory. The 3D RISM equations for modeling solvation thermodynamics predict total correlation functions between solute molecule (in infinite dilution) and solvent sites γ, hγ(r), on a spatial grid defined by vectors r via the convolution product29,30 ρ γ0 hγ (r ) = ∑ c γ ' (r ) ∗χ γγ ' (| r |)

(1)

γ'

with gγ(r) = hγ(r)+1 defining the corresponding pair distribution functions. The product of g and the bulk site density ρ γ0 ,

ργ0 g γ (r ) = ρ γ (r ) ,

(2)

yields the local solvent site density that is spatially inhomogeneous due to the presence of a solute molecule. cγ(r) is the so-called direct correlation function (which is formally defined by the integral equation) which physically represents (vaguely) a measure for the modulation of the direct solute-solvent interactions due to the presence of embedding solvent species. In addition, (1) contains the pure solvent site-site susceptibility χγγ’(|r|). For a pure solvent it is given by

χ γγ ' (| r |) ≡ χ γγ ' (r ) = ργ0 [ωγγ ' (r ) + ρ γ0 hγγ ' (r )]

(3)

ACS Paragon Plus Environment

8

Page 9 of 53

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

where ωγγ’ is the solvent molecule’s intramolecular correlation function (essentially Dirac delta functions for rigid species) and hγγ’ the respective total correlation function between solvent sites. This function can be pre-computed for a given solvent model (for instance an established water model potential and associated structure) and can be derived from a simpler 1D RISM integral equation solution or from MD simulations.46 The integral equations have to be closed by combining them with additional “closure” relations between c and h that also contain the solute-solvent interaction potential. Here, we approximate the closure by our previously developed “partial series expansion” of order n (PSE-n)47 that satisfies the “hypernetted chain closure” (HNC)48,49 approximation for n → ∞, ∑ k (t γ* (r )) i / i! − 1 ⇔ t γ* > 0 hγ (r ) =  i = 0 * ⇔ t γ* ≤ 0 exp(t γ (r )) − 1

(4)

where t γ* (r ) = − βu γ (r ) + hγ (r ) − c γ (r ) defines the so-called renormalized indirect correlation function along with the interaction between solute and solvent sites specified by uγ(r), made dimensionless by multiplication with the reciprocal temperature β. Orders 3 and 4 have been found to represent a reasonable compromise between numerical stability and accuracy with respect to HNC by us and others.50 2.2. Free Energy Calculation and Thermodynamic Implications. A particular benefit of PSE and HNC closure approximations is the availability of an analytical expression for the excess chemical potential in terms of the converged solution to the set of nonlinear equations.47 Conceptually, the result is derived from the coupling parameter integral

ACS Paragon Plus Environment

9

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

µ ex = ∫ dλ∑ ∫ dr ρ γ (r, λ) 0

γ

du γ (r, λ) dλ

Page 10 of 53

(5)

that can be shown to be an integral over an exact differential in the case of PSE and HNC approximations by considering a constrained variational formalism,47,51 yielding

ex µ PSE - k = ∫ dr ∑ γ

ρ γ0  1 2 (t γ* (r )) k +1  1 h ( r ) − c ( r ) − h ( r ) c ( r ) − Θ ( h ( r ))  γ  γ γ γ γ β  2 2 (k + 1)! 

(6)

where Θ is the Heaviside step function. Thus, only the endpoint solution is required without the need to compute the coupling dependence numerically, i.e. one only needs to solve the coupled equations (1) and (4) for λ = 1. Since the solvation free energy is ensemble-independent52 we can define the latter l.h.s. as integral over the FED, i.e. ρG(r), via ex µ PSE - k ≡ ∫ dr ρ G (r )

(7)

where ρG(r) represents the contribution of spatial regions around the solute to the total free energy of solvation. In earlier work on hydration and transfer free energies, such a solvation free energy density has been approximated empirically and mapped to surface contributions of solute sites.53,54 Besides the direct analysis of the FED,39,40 this strategy is similar to a local 3D RISM free energy assignment strategy developed later to map contributions to individual solute atoms.38 With the free energy density, we now have access to a physical quantity that can be interpreted locally, if we correlate spatial regions of high solvent density with the associated free energy contribution upon integration over subvolumes. This is the key innovation offered by the present paper. 2.3. Preparation of Protein Structures. The Protein Preparation Wizard of the Maestro

ACS Paragon Plus Environment

10

Page 11 of 53

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

software suite55 was used as first step for protein preparation. Protein X-ray structures were taken from the RCSB56,57,58 or from our internal repository for fVIIa. All non-protein atoms (ligands, ions, water) were initially retained. Subsequent steps include addition of hydrogen atoms, addition of missing side chains, and optimization of the hydrogen bonding network at pH 7.4. The resulting structure was subjected to a restrained energy minimization. Default parameters were used unless noted otherwise. Then all non-protein atoms were removed and names of residues and atoms were checked to be compliant with the Amber naming convention. 2.4. Computational Details for 3D RISM Fields. Force field parameters from the AMBER ff99SB force field were assigned to the prepared protein structures using the module tleap from AmberTools.59,60 For ligands, the nonpolar interactions with the solvent were modeled by a 12-6 Lennard-Jones potential with the “General Amber Force Field” GAFF (version 1.4, March 2010)61,62 parameters. Ligand charges were computed by Antechamber using the AM1BCC charge set. To be consistent with earlier work, the original conformation was kept rigid during charge calculation. Water was described as in earlier work63,64 by a modified variant of the SPC/E65 model, representing ambient conditions at 298.15 K and 1 bar. All 3D RISM calculations were performed with the software developed by us on cubic grids with 2003 points in each direction and a grid spacing of 0.4 Å. The PSE-3 closure was applied throughout; the convergence threshold was set to a maximum deviation of 10-5 of the direct correlation functions between successive iteration steps. A typical calculation including field analysis (fXa) requires 30-40 min on 12 CPU cores. 2.5. Analysis of 3D RISM Fields. As result of the 3D RISM calculations we obtain the pair distribution functions gO(r) and gH(r) that describe the individual densities of the solvent sites (water-O- and H-atoms). Values for gO(r) and gH(r) near 1.0 correspond to the bulk density of

ACS Paragon Plus Environment

11

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 53

the solvent while values below 1.0 represent a depletion of solvent as compared to bulk. This situation typically occurs inside the solvent-accessible surface of the solute. Values greater than 1.0 indicate regions where the solute induces a higher density of solvent atoms as compared to the bulk. In our experience gO(r) values above 20 typically occur at locations of crystal waters, but that varies from protein to protein. Furthermore, we found the average peak height of gO(r) roughly to be twice that of gH(r). For consistency we decided to use the 99.9 percentile values as cutoff for the identification of local gO/H(r) maximum regions. Similarly, we used the 99.9 percentile to define local maxima and the 0.1 percentile to define local minima of the FED. Energy density values were converted to local energy contributions by multiplying with the cube size of ∆V = 0.064 Å 3 such that the direct summation of voxel elements yields the total solvation free energy. 2.6. Identification and Thermodynamic Characterization of Water Molecules. We employed two different approaches to identify the position of water molecules and to determine their thermodynamic characteristics. The first one (“water-sphere” algorithm) is just based on the gO(r) pair distribution function, representing the position of the water molecules by the oxygen sites only. All volume elements with gO(r) values below a specified threshold (typically the 99.9 percentile) are removed from the candidate list. The remaining volume elements are then iterated in descending order of their gO(r) values. In turn, all volume elements that surround the water position identified at each step within 2 Å are removed from the candidate list until the list is empty or a specified number of water molecules was found. This approach is rather simple; it closely reproduces the positions of X-ray water molecules, but might be less suitable for the thermodynamic characterization of water molecules. Therefore we developed a second approach (“water-region” algorithm) that is not just based on the maxima of

ACS Paragon Plus Environment

12

Page 13 of 53

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

the gO(r) pair distributions, but takes information from gH(r) into account as well. Both functions are used to identify regions of water molecules which form the basis for their thermodynamic characterization. In the first step all volume elements where either the value of gO(r) or gH(r) exceeds its respective 99.9 percentile cutoff are determined and compiled into a set of disconnected “raw” water regions. For each “raw” water region, the position is determined as the geometric center of all volume elements and an energy score is calculated by summing up the absolute values of ρG(r)∆V for the corresponding volume elements. This free energy score is used in the second step to generate a sorted list of water regions that is iterated starting from the highest score and merging all regions that fall within a 2.5 Å distance of their respective center of mass. By doing so, we aim at defining an agglomerated “water region” by the gO(r) maximum region combined with the respective gH(r) maximum regions of the same water molecule. The thermodynamic properties of these water regions are calculated easily in the third step by summing ρG(r)∆V over the respective volume elements. The final spatial position representing the water region is recomputed from the geometric center of the agglomerated volume elements. Both algorithms “water-sphere” and “water-region” were implemented in the Python programming language (version 2.7) using the OEChem, NumPy66 and SciPy67 toolkits. 2.7. 3D RISM Calculations for (Q)SAR Analysis. In order to study how the 3D RISM approach can help elucidating structure-activity relationship (SAR) trends or can be used as descriptors for 3D QSAR, we investigated a known dataset of factor Xa (fXa) inhibitors representing one of the proteins herein. The ligand alignment in the binding site was taken from the literature.68,69 The structure-based alignment resulted from docking ligands into fXa crystal structures using the program QXP70 with a modified AMBER force field,71 followed by optimization of the protein-ligand complex using MMFF94s72,73 in Sybyl.74

ACS Paragon Plus Environment

13

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 53

3D RISM calculations for proteins and each ligand were performed and analyzed as follows in order to study how structural differences influence solvation free energies and how those capture binding affinity differences for a particular protein binding site. First, we estimated the local free energy of desolvation for the binding event by adding integrals of the local free energy density

ρG(r) for ligand water displaced by the protein and for protein water displaced by the ligand: ∆ aq|bind G ≈

∫ρ

G

(lig) dr +

V ( prot )

∫ρ

G

(prot ) dr .

(8)

V ( lig )

The volumes that represent the displaceable parts for both protein (prot) and ligands (lig) are modeled as a set Gaussian spheres as implemented in OEChem75 at each atom position. The cutoff-values of 0.2 for ligands and 0.01 for the protein have been chosen such that the volume obtained closely resembles the one that is enclosed by a solvent-accessible surface. Results will be discussed in Section 3.5. Second, the results for the ligands are used to introduce CARMa (Comparative Analysis of 3D RISM Maps) as 3D-QSAR field type for PLS analyses44,45 complementing the CoMFA43 approach. The region for computing 3D molecular interaction fields was defined as in previous studies68 to ensure comparability of results. A grid spacing of 2 Å was used. The values at the grid points were taken from the 3D RISM-generated gO(r), gH(r) and ρG(r) fields and imported into Sybyl74 for statistical analysis without modification. The detailed statistical analysis for these novel field types is described in the next section. Default settings were used for CoMFA analysis, if not otherwise indicated. Steric and electrostatic energies were calculated at grid points with 2 Å spacing using a positively charged carbon atom as probe. A distance-dependent dielectric constant was used and MMFF9472,73 molecular atomic charges were used as for the

ACS Paragon Plus Environment

14

Page 15 of 53

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

original CoMFA study. Results are shown in Section 3.6. 2.8. Statistical Analysis. Biological data on fXa inhibition are reported as dissociation constants Ki, they were transformed as –log10(Ki) (pKi) for statistical analysis. The dataset was divided into a training set of 80 compounds and a test set of 27 compounds according to the original publication.68 Leave-one-out (LOO) cross-validation analyses76,77,78 were performed using SAMPLS79 up to 9 components. The optimal number of PLS44,45 components was determined from the first minimum of the standard error for the corresponding LOO models. Equal weights for hydration-based molecular property fields from 3D RISM were assigned using CoMFA_STD scaling.80,81,82 No column filtering was used for 3D RISM-fields. The overall quality of statistical analyses was expressed using the cross-validated r2(cv), defined as r 2 (cv) = ( SD − PRESS ) / SD

(9)

where SD is the variance of the biological activities around the mean values and PRESS represents the sum of squared differences between predicted and target property values.

3. RESULTS AND DISCUSSION 3.1. Classification of Water in Binding Sites. First we discuss the relevance of FED analysis for ligand design. In contrast to approaches such as WaterMap that yield solvation free energy information indirectly, the 3D RISM approach works effectively the other way around. Here, an approximation to the solvation free energy follows from distribution functions, which allows for an a posteriori assignment of free energy contributions with density maxima that are associated with localized hydration sites. Therefore integration of the FED within regions of high gO/H(r),

ACS Paragon Plus Environment

15

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 53

i.e. in those where the propensity for water molecules is highest, results in the contribution of localized hydration sites to the total free energy of solvation ∆aqG. These contributions can be used for a thermodynamic water classification via 3D RISM by relating the local components of the solvation free energy of a protein-ligand complex, ∆aqGPL, to direct (enthalpic) protein-ligand interactions, ∆bindHPL, and individual species solvation free energies, ∆aqGP and ∆aqGL (as worked out in the Supporting Information). This analysis leads to the following cases: (a) Hydration sites with either positive or negative local ρG integrals are not important for the total binding free energy, if they are not displaced during the binding process, except if a ligand can favorably interact with these sites via ∆bindHPL. (b) The solvation contribution to ∆bindGPL is favorable if the ligand-free binding site contains hydration sites with positive local ∆aqG that are displaced upon ligand binding. The results for fXa and fVIIa (next sections) support this for 3D RISM free energy densities. (c) If water molecules with negative local ∆aqG are displaced by ligands despite an overall spontaneous binding process, this penalty has to be compensated by negative contributions to ∆bindHPL. 3.2. 3D RISM Results for para-Chlorobenzoic Acid. To illustrate the properties of 3D RISM fields the simple yet interesting model system para-chlorobenzoic acid in its neutral form is studied first, with the results shown in Figure 1. Values for gO(r) range from 0 to ~37, with 1 referring to the bulk value by normalizing to the density. Structuring of the solvent layers by the solute is clearly visible in the distribution of the gO(r) values in the aromatic plane of the molecule (Figure 1A). The maxima for both gO(r) and gH(r) fields are shown as isodensity surfaces in Figure 1B.

ACS Paragon Plus Environment

16

Page 17 of 53

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

As expected, the maxima resemble a structured solvation region next to the polar carboxylic acid, where the energy gain upon solvation is high, while the lipophilic chlorophenyl moiety does not significantly organize the solvent shells and only weak maxima of gO(r) are found above and below the aromatic plane.

Figure 1. 3D RISM results for para-chlorobenzoic acid. (A) Distribution of water-O in the x-y plane intersecting the molecular plane. (B) Isosurfaces of the H-(white, level=2.5) and O-atoms (red, level=6.0) of the water molecules. (C) Solvation free energy density (multiplied by the voxel volume) in the x-y plane intersecting the molecular plane. Moreover, the FED plotted in Figure 1C demonstrates that not all solvent atom positions contribute favorably to the free energy of solvation for this model system. There is a deep minimum near the solute carboxylic acid donor-acceptor pair as expected. Solvation of the lipophilic chlorophenyl-moiety in contrast is energetically penalized. Thus, interactions of this chlorophenyl-moiety with more lipophilic protein binding site areas are assumed to be energetically more favorable. In addition, this representation indicates the (unfavorable) cavity formation contribution of the molecular core, which cannot be determined easily from similar MD-based approaches. While structural fluctuations only accessible by MD allow in principle for populating even buried sites, the statistical problem persists that configurations with very low

ACS Paragon Plus Environment

17

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 53

probability p contain substantial noise which makes the p log p expression dominating the entropy ill-defined for limited simulation time. The inspection of the free energy of solvation map (Figure 1C) also implies that small movements of the solute relative to an interacting partner (e.g. protein binding site) can have a significant impact on ∆aqG and thus on the binding affinity in protein-ligand complexes. Small geometric changes in hydrogen bond patterns therefore might significantly change the binding free energy in protein-ligand complexes, as also observed experimentally.83,84 3.3. 3D RISM Results for Streptavidin. Next we investigate the binding site of the protein streptavidin in complex with biotin (PDB 1stp,85 resolution 2.60 Å). This protein-ligand complex has one with the highest (i.e. most negative) free energy of binding currently known. This unusually high affinity results from several factors including formation of multiple hydrogen bonds and van der Waals interactions between streptavidin and biotin, in addition to the ordering of surface polypeptide loops that bury the ligand in the protein interior.85 The binding affinity of biotin in this system is significantly larger than expected on the basis of most current theoretical models, which could be attributed to the favorable effect of a hydrogen bonding network within a hydrophobic enclosure in this protein binding site.14 Biotin binds in a pocket at the end of the streptavidin β-barrel, lined by aromatic and polar residues. While these side chains are solventexposed in the apo-streptavidin site with water molecules in the site, ligand binding involves displacement of water, formation of multiple interactions between the ligand and binding site residues in addition to burial of biotin by ordering of a surface loop. The polar interactions with the ligand involve an extensive pattern of hydrogen bonds with its ureido-group to five protein side chains, an interaction of the biotin sulfur to the Thr90-OH group plus hydrogen-bonding

ACS Paragon Plus Environment

18

Page 19 of 53

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

interactions of the biotin terminal carboxylate to the backbone NH of Asn49 and the side chain of Ser88. Interactions with the ureido-ring system, which is involved in an extensive hydrogenbond network, predominate in stabilizing the protein-ligand complex.85

Figure 2. 3D RISM results for the binding site of the biotin-streptavidin complex (PDB 1stp). Protein carbon atoms are colored in grey, ligand carbon atoms are shown in green, a Connolly surface is shown for the protein. The water structure is shown as local maxima (99.9 percentiles) of gO(r) (orange mesh surface) and gH(r) (blue mesh surface). (A) Positions of hydration sites within the five-membered water ring, as determined by the “water-sphere” algorithm are displayed as red spheres and highlighted by dashed yellow connections. (B) Minima (0.1 percentile, blue solid surface) and maxima (99.9 percentile, red solid surface) of the FED ρG(r). (C) Hydrogen bonding network within the protein-ligand binding site and minima (0.1 percentile, blue solid surface) and maxima (99.9 percentile, red solid surface) of the FED. (D) Hydrogen bonding network within the protein-ligand binding site and positions of hydration sites as determined by the “water-region” algorithm. Regions with lipophilic displaceable hydration sites are shown in red, those containing polar displaceable sites are shown in blue.

ACS Paragon Plus Environment

19

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 53

Our results for the water structure within the binding site are shown in Figure 2. Local maxima of gO(r) are found throughout the binding site, while local maxima of gH(r) are predominantly located near polar residues that are involved in fixating the ureido group (Asp128, Asn23, Ser27) and the carboxylic acid tail of biotin (Ser88 and Asn49). This reflects that those water molecules are more restricted in their rotational degrees of freedom by being engaged in hydrogen bonding interactions with polar residues. In order to shed further light on the thermodynamic properties of those hydrations sites, minima and maxima from the FED have been added in Figure 2B and 2C and overlaid with biotin. It is worth noting that the FED extremes coincide with gO/H(r) maxima. FED minima (blue solid surfaces) are found near residues where the both the ureido group and the carboxylic acid moiety of biotin is located. FED maxima (red solid surfaces) are found together with FED minima in regions where water is highly structured by charged residues like Asp128. In contrast, regions near non-polar residues like Trp108 and Trp79 are characterized by large elongated regions of high gO(r) values where only FED maxima are present. The simultaneous absence of gH(r) maxima indicates that water in these regions does not form strong localized hydrogen bonds. Although gO/H(r) and FED combined provide detailed information on water structure and thermodynamics, a simplified representation that facilitates visualization and design is needed. We calculate representative positions for hydration sites using the “water-region” algorithm (see section 2) that goes beyond the simple calculation of water-O positions: The sites are displayed in Figure 2D as spheres at the corresponding center of mass for each site colored by energetic values representing the solvation free energy of the protein according to the corresponding integral of ρG(r).

ACS Paragon Plus Environment

20

Page 21 of 53

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Focusing on gO(r) only we show the exact positions of the corresponding local maxima (as determined by the "water-sphere" algorithm) as red spheres in Figure 2A for comparison. The maxima form a distinct five-membered ring that is described in the literature14 and that was found by MD simulations in earlier work by Young et al.14 The gO(r) maxima 1 and 2 correspond to blue hydration sites 387 and 449 that have both negative local ∆aqG and are replaced by the polar ureido group of biotin. gO(r) maximum 3 corresponds to red hydration site 77. This site being replaced by the non-polar tetrahydrothiophene moiety is in line with a positive local ∆aqG. The two remaining gO(r) maxima 4 and 5 correspond to the large and stretched red hydration site 26. It shows a strongly positive local ∆aqG and is replaced by the alkyl part of biotin. Overall, the results from 3D RISM calculations are in line with the notion that strong biotin binding is linked to the replacement of water molecules.

ACS Paragon Plus Environment

21

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 53

3.4. 3D RISM Results for Serine Proteases fXa and fVIIa. Next we study X-ray structures of two serine proteases, namely factor Xa (fXa) and factor VIIa (fVIIa). The location of hydration sites from 3D RISM is confirmed by experimental X-ray water positions. In the following, structure-activity relationship trends for both proteases will be discussed to illustrate which 3D RISM-derived hydration sites can be displaced. For fXa, an X-ray structure with a 3-oxybenzamide inhibitor was investigated (PDB 2bmg, cpd 50 in Scheme 1, resolution 2.70 Å).68 For fVIIa an internal X-ray structure in complex with an amidinophenylurea inhibitor (cpd 29 in Scheme 1,86 resolution 2.70 Å) was used.86,87 In Figure 3, 3D RISM results are visualized for the fXa / 50 X-ray complex (green ligand carbon atoms). In panel A, red crosses indicate experimental water oxygen positions from 26 public fXa X-ray-structures88 with a resolution < 2.5 Å, retrieved using ReliBase.89,90,91 These positions reveal a remarkable agreement with 3D RISM hydration sites from maxima of the gO(r) function (density of oxygen atoms, orange mesh surface), as shown in Supporting Figure S2. In all panels, water positions, protein and hydration sites are displayed within a radius of 5 Å around the ligand. In few areas, experimental water oxygen atoms are clustered, e.g. in the S1 pocket on top of the Tyr228 aryl ring or next to the Asp189 carboxylate (dotted circles) in agreement with 3D RISM. In the fXa complex, the S1 pocket is filled with a lipophilic dichlorophenyl ring with its para-chlorine (right) on top of the Tyr228 aryl-ring.68 At this position, an experimental water molecule is present in complexes without a ligand moiety. Chlorine displaces this water molecule in the fXa / 50 X-ray complex. The agreement in other fXa pockets is also good, namely in the ester-binding pocket (EPB), S4 site and the region connecting S1 and S4 with residues Trp215 and Gly216.

ACS Paragon Plus Environment

22

Page 23 of 53

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 3. 3D RISM results for the X-ray structure of the fXa / 50 complex from the PDB file 2bmg showing the FED of water regions calculated within the ligand binding site using the “water-region” algorithm. The water structure is shown as local maxima (99.9 percentiles) of gO(r) (orange mesh surface) and gH(r) (blue mesh surface). (A) Positions of experimentally determined oxygen atoms from water molecules representing 26 public domain X-ray structures (small red crosses) in comparison to local maxima of gO(r) (orange mesh surface). (B) Minima (0.1 percentile, blue solid surface) and maxima (99.9 percentile, red solid surface) of the FED in addition to local maxima (99.9 percentiles) of gO(r) (orange mesh surface) and gH(r) (blue mesh surface). (C) Positions of 3D RISM-derived hydration sites as determined by the “water region” algorithm in addition to local maxima (99.9 percentiles) of gO(r) (orange mesh surface) and gH(r) (blue mesh surface). Regions with lipophilic displaceable hydration sites are shown in red, those containing polar displaceable hydration sites are displayed in blue. (D) Simplified representation of the fXa binding site with lipophilic (red) and polar displaceable (blue) numbered hydration sites from 3D RISM analysis.

ACS Paragon Plus Environment

23

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 53

In Figure 3B, minima (0.1 percentile, blue solid surface) and maxima (99.9 percentile, red solid surface) of the FED using the “water-region” algorithm are displayed in addition to local maxima (99.9 percentiles) of gO(r) (orange mesh surface, shown also in A) and gH(r) (blue mesh surface). Energy values and positions for hydration sites are obtained as described above from mesh surfaces and displayed as spheres at the corresponding center of mass for each site in Figure 3C to summarize the isosurfaces from Figure 3B. Hydration sites colored in red represent water molecules which could be replaced by lipophilic ligand moieties (positive free energy), while those representing polar displaceable water molecules are colored in blue (negative free energy). These energies and colors represent protein solvation according to the integral of the FED over the respective regions. Figure 3D provides a simplified representation showing only colored spheres as 3D RISM hydration sites. A view on fXa pockets is in accord with the oxybenzamide SAR. In particular the S1 pocket is filled by hydration sites with positive FED values for which a displacement with lipophilic ligand parts is expected to result in binding affinity improvement if no clashes are generated. Two high energy hydration sites (red) in S1 (site 114 and 162) are present in areas, where nonpolar and polarizable substituents bind favorably.68,92,93 It is of interest to note that the lipophilic para-chlorine atom (right) of the dichlorophenyl-substituent in 50 displaces the high energy hydration site 114 in S1 on top of Tyr228 (dotted circle), which indicates a positive contribution to affinity by displacement with lipophilic ligand parts.15,68 This contribution of the dichlorophenyl ring was earlier attributed in parts to a chlorine-π interaction to the Tyr228 ring.68,69,92 Additional affinity is caused by displacing this high energy hydration site (red). A comparative analysis for oxybenzamides and a series of indoles (Scheme 1)68,92 uncovers contributions for hydration site displacements and Cl-π interactions.

ACS Paragon Plus Environment

24

Page 25 of 53

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Ligands with undecorated phenyl ring in S1 show Ki values of 478 nM for the oxybenzamide and 204 nM for the indole series (1b and 1a in ref.92). Addition of a para-methyl substituent increases affinity, which corresponds to a change in free energy of binding (∆∆bindG) of ~ –3 kJ/mol (6a and 6b in ref.92). This enhancement relates to displacement of the hydration site next to Tyr228 by the para-methyl group. Derivatives with para-fluoro substitution (4a and 4b in ref.92) also show an enhanced affinity with ∆∆bindG values of ~ –3.5 kJ/mol. Fluorine is larger than hydrogen, but smaller than higher halogen atoms (F: 1.47, Cl: 1.75, Br: 1.85, but H: 1.20 Å).94 Therefore, a favorable displacement of this high energy hydration site (red) might be possible. Interestingly the para-chloro atom decoration led to very active inhibitors with a ∆∆bindG change of –10.5 and –8.5 kJ/mol for 3a and 3b in ref.92 (Scheme 1), respectively. Our analysis shows that replacement of this unfavorable hydration site in S1 improves affinity by -3.0 to -3.5 kJ/mol. This suggests that 5-7 kJ/mol affinity improvement is related to the chlorine-π interaction. The 3D RISM calculations facilitate understanding of this affinity enhancement and discerning the improvement into different interaction components for two factor Xa ligand series. In contrast, displacements of low energy hydration sites (blue) should be performed using polar ligand substituents to compensate for the loss of hydrogen bonds to water or protein. As alternative, interactions involving such hydration sites might be favorable, as it is the case for site 699 close to Asp189 (dotted circle). Early fXa inhibitors with benzamidines displace this site by favorable hydrogen bonding interactions in parallel to the terminal Asp189 carboxylate plus Gly218-CO, respectively.95 The S4- and ester-binding pockets are exposed to bulk solvent. Here, hydration sites display a lower free energy of solvation in agreement with the observation that polar substituents bind

ACS Paragon Plus Environment

25

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 53

favorably to S4.68,95 Nevertheless, the high energy hydration sites 152 and 161 (red) are located in the S4 pocket in a more hydrophobic region surrounded by aromatic residues (Tyr99, Phe174, Trp215) and are displaced by the distal pyridyl ring in the fXa inhibitor. Of particular interest is the low energy hydration site 640 (blue, Figure 3D), which indicates a structurally conserved water engaged in hydrogen bonding to the pyridyl ring in agreement to ligand SAR analysis.68 For the ester-binding pocket, the interpretation is less clear due to slight side chain disordering plus the polar hydration site 708 next to the aromatic Arg143 guanidine carbon atom in this solvent-exposed area. Interestingly this hydration site can be replaced by lipophilic groups, such as -OMe in compound 50, but also by polar group like -NH2 (31 in ref.68). The entire polar area is large (orange mesh in Figure 3C); hydrogen bonds to the Arg143 guanidine group are for both cases not possible. A similar analysis for fVIIa with the amidinophenylurea inhibitor 2986 is given in Figure 4. 3D RISM-derived hydration sites from maxima of the gO(r) function are displayed as orange mesh surfaces in Figure 4A, while red spheres indicate experimental water oxygen atoms positions from X-ray-structures with 29 or benzamidine (PDB 2aer, resolution 1.87 Å).96 Comparing experimental water positions reveals a good agreement with 3D RISM predictions. In Figure 4B, minima (0.1 percentile, blue solid surface) and maxima (99.9 percentile, red solid surface) of the FED (from the “water-region” algorithm) are displayed, while hydration sites are shown as colored spheres at the center of mass for each site in Figures 4C and 4D. High energy hydration sites (red) indicate areas for favorable displacement by lipophilic ligand parts, while polar displaceable hydration sites are colored in blue. For fVIIa, red hydration sites are present in the S1 pocket. Those sites include site 50 next to Tyr228, site 21 displaced by the

ACS Paragon Plus Environment

26

Page 27 of 53

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

benzamidine ring and site 37 displaced by the amidine, respectively. Site 50 is located at a position similar to the corresponding red hydration site in fXa (site 114, dotted circle in Figure 3D). In contrast to fXa, replacement of this site by halo-aromatic ring systems is difficult,97 while it was possible for fXa98,99 and thrombin.100 Attempts for displacement by chlorine and fluorine atoms result in a loss of activity between 4-140 fold (cf. cpds 24-27 in ref.87). Interestingly, a study by Cheney et al.97 reports X-ray structures of lipophilic fragments binding to fVIIa. In some structures bromine or chlorine is oriented towards the Tyr228 aryl ring, thus favorably displacing the red hydration site. Although serine proteases are conserved in S1, one difference is the amino acid in position 190, which is either Ala190 in fXa or Ser190 in fVIIa.101 Ser190 is positioned to allow for an additional hydrogen bond to the substrate arginine or lysine, thus stabilizing these residues.102,103 The larger and polar Ser190 side chain results in polar hydration sites (site 372) in fVIIa. Furthermore the pocket is smaller in fVIIa due to this bulkier side chain. Collectively, these differences account for the SAR differences in the S1 pocket for both proteases. Additional blue hydration sites in the fVIIa S1 pocket are displaced by the polar amidino-group in 29. The lipophilic benzamidine ring itself displaces a red hydration site (21) in the S1 pocket. The paraurea linker attached to the benzamidine next to the protease catalytic triad replaces two blue hydration sites (396, 418) with its polar urea NH groups, both involved in hydrogen bond interactions to Ser195-Oγ and to a structurally conserved water molecule.86,87 Interestingly, the fVIIa S2 pocket is occupied by a lipophilic methyl group in S-configuration in 29, which affects the orientation of the neighboring amide bond and displaces the high energy hydration site 35 (red) at this position. The analysis of SAR data86,87 shows that only this configuration of the methyl group results in high activity, while the lack of a methyl-group or a

ACS Paragon Plus Environment

27

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 28 of 53

much larger phenyl substituent results in lower affinity. Hence the interpretation of 3D RISM hydration sites in serine protease X-ray structures is in agreement with SAR data, suggesting its usefulness for structure-based design.

Figure 4. 3D RISM results for the X-ray structure of the fVIIa / 29 complex. Experimental water oxygen atom positions in (A) shown as red spheres are taken from X-ray structures of fVIIa with 29 or benzamidine (PDB 2aer). (B) Minima (0.1 percentile, blue solid surface) and maxima (99.9 percentile, red solid surface) of the FED in addition to local maxima (99.9 percentiles) of gO(r) (orange mesh surface) and gH(r) (blue mesh surface). (C) Positions of 3D RISM-derived hydration sites in addition to local maxima (99.9 percentiles) of gO(r) (orange mesh surface) and gH(r) (blue mesh surface). (D) Simplified representation with lipophilic (red) and polar displaceable (blue) numbered hydration sites from 3D RISM. For further details see Figure 3.

ACS Paragon Plus Environment

28

Page 29 of 53

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

3.5. Qualitative Analysis of SAR Trends in Congeneric Series. Next we investigate the ability of 3D RISM to reproduce SAR trends. For the dataset of 3-oxybenzamide fXa inhibitors68, all pairs of similar molecules (Dice similarity coefficient > 0.9 using RDkit104 Morgan circular fingerprints with a radius of 2 and feature-based invariants) are extracted; activity difference is given as ∆pKi. For each ligand, the local free energy of solvation ∆aq|bindG according to Eq. (8) estimates the contribution of solvation free energy to affinity; it is obtained by integrating FED over both ligand water displaced by protein atoms and protein water molecules displaced by ligand atoms. We then compare the experimental trend (sign of ∆pKi) with the trend in ∆aq|bindG differences (predicted trends) for each of 494 pairs. Figure 5 shows a histogram of predicted trends for increasing pKi differences105 on the x-axis using ∆∆aq|bindG values in the upper left panel. Black bars indicate correct SAR trends, while open bars indicate incorrect trends, respectively. The ratio of correct vs. incorrect trends is displayed in the upper right panel. Values for the local solvation free energy < 0 for most cases show that desolvation of protein and ligand positively contributes to affinity, while a strong correlation between ∆∆aq|bindG and affinity is not expected due to the additional influence of protein-ligand interactions. However, the more active ligand typically displays a more negative ∆∆aq|bindG. Both graphs in Figure 5 (upper panel) indicate that these trends are correctly reproduced for most pairs. Although for large experimental differences, these ratios are statistically less significant, the positive effect of desolvation to affinity is obvious.

ACS Paragon Plus Environment

29

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 30 of 53

Figure 5. Contribution of the local solvation free energy to the free energy of binding for similar molecular pairs of a series of 3-oxybenzamide-based fXa inhibitors. A Tanimoto coefficient of 0.9 using RDkit Morgan2 circular fingerprints was used as similarity threshold. Upper left: Histogram of predicted SAR trends for increasing pKi differences using ∆∆aq|bindG values (in kJ/mol) for similar compound pairs. Black bars indicate a correct SAR trend ∆∆aq|bindG < 0), open bars indicate incorrect trends. Upper right: Ratio of correct vs. incorrect trends from division of the number of correct by incorrect pairs. Lower panel: Three compound pairs with 3D RISMderived hydration sites colored from red (lipophilic) to blue (polar displaceable) to illustrate the influence of desolvation on binding affinity. Hydration sites serve for visualization, while ∆∆aq|bindG values were computed from the FED. The compound IDs for all examples is used according to ref.68 In the lower panel of Figure 5, three ligand pairs with 3D RISM hydration sites illustrate the influence of desolvation on SAR. Hydration sites serve for visualization, while ∆∆aq|bindG values are computed from the local FED. Within the first ligand pair in the lower panel of Figure 5, the more active ligand 53 (green carbon atoms, pKi 7.34, compound ID for all examples according to

ACS Paragon Plus Environment

30

Page 31 of 53

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

ref.68) displaces a highly energetic hydration site above the ring plane of Tyr288 in the fXa S1 pocket. This is not the case for the less active ligand 10 (orange carbon atoms, pKi 5.71). In this case the affinity difference can be partially attributed to the difference in solvation free energy in S1, as the binding mode of the two ligands is very similar in all other parts. For a second pair of ligands in the lower middle panel, the replacement of a chlorine in 54 (green carbon atoms, pKi 7.89) by a methyl group in 7 (orange carbon atoms, pKi 6.90) results in a reduction of the affinity by one pKi unit, although local solvation free energies are comparably low for both molecules. This discrepancy is due to the halogen-π interaction between the parachlorine and the Tyr228 aryl-ring,92 which also contributes to binding affinity. In the third example in the lower right panel, the more active ligand 50 (green carbon atoms, pKi 7.74) can form an attractive interaction involving its para-pyridyl nitrogen atom to a low-energy water in S4, as indicated by a blue hydration site. Due to the replacement of the para-pyridylmoiety by a para-chloro-phenyl side chain in the less active ligand 76 (orange carbon atoms, pKi 5.00), this interaction with a structurally conserved water in S4 is no longer possible. Collectively these data suggest a marked influence of desolvation to the free energy of binding in cases which are not dominated by steric or protein-ligand interaction effects. The presented 3D RISM-based analysis is able to capture these trends. 3.6. Use of 3D RISM for Quantitative SAR Analysis. The suitability of 3D RISM-derived interaction fields for ligand-based 3D-QSAR is further investigated for the series of 3oxybenzamides. 3D QSAR descriptors are computed as described above for ligands in dockingbased superposition. Field values on a regular grid with 2 Å spacing are extracted from the gO(r), gH(r) and ρG(r) 3D RISM fields. Fields for the aligned ligands are then used for CARMa as field

ACS Paragon Plus Environment

31

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 32 of 53

type for PLS statistical analysis. This workflow results in statistically significant PLS models for our training set of 80 3-oxybenzamides (Table 1). The ligand-based gH/O(r) fields result in significant models that are in agreement with observed structure-activity relationships. Table 1. Statistics for PLS models using CoMFA or 3D RISM molecular interaction fields.

2

Model

a

q (train, n=80)

CoMFA all

2

c

r (train, n=80)

pred. r (test, n=27)

0.71

4

0.90

0.69

CoMFA steric

0.65

3

0.80

0.72

CoMFA electrostatic

0.46

5

0.82

0.69

FED

0.46

5

0.78

0.74

FED + CoMFA steric

0.58

6

0.91

0.68

gO(r)a

0.60

4

0.85

0.60

gH(r)

0.60

3

0.80

0.71

gO(r) + gH(r)

0.63

4

0.87

0.66

gO(r) + CoMFA steric

0.63

4

0.87

0.62

gH(r) + CoMFA steric

0.62

3

0.80

0.72

2

Contour map for model based on “standard deviation * coefficient” (Std*coeff) in Figure 6.

Using FED−derived interaction fields, a PLS model with a q2 value (crossvalidated r2) of 0.46 for 5 components and an r2 value of 0.78 results. Adding conventional CoMFA steric fields then yields an improved model with a q2 value of 0.58 (6 components) and an r2 value of 0.91. The situation improves when analyzing fields derived from gO(r) or gH(r). For the gO(r) field, a model with a q2 value of 0.60 (4 components) and an r2 value of 0.85 results, for which only a

ACS Paragon Plus Environment

32

Page 33 of 53

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

slightly better model with a q2 of 0.63 (4 components) and an r2 value of 0.87 is obtained when the CoMFA steric field is added. This is due to the fact that the structure of the solvation shells around the solute calculated by 3D RISM implicitly incorporates information on the solute shape. For the gH(r) field, the statistics and improvement is comparable. Combining both gO(r) or gH(r) derived field types also results in an acceptable model with a q2 of 0.63 (4 components) and an r2 value of 0.87, respectively.

Figure 6. Contour maps for a PLS model using the gO(r) field in combination with compound 50 (fXa Ki = 18 nM).68 (A) “Standard deviation * coefficient” (Std*coeff) contour map. Blue contours (>75% contribution) represent more polar or less lipophilic interactions favorable for binding affinity. Red contours (75% contribution) represent more polar interactions or less lipophilic favorable for binding affinity. Red contours (