Hydration in Discrete Water (II): From Neutral to Charged Solutes - The

Apr 21, 2015 - In our previous work, we introduced a solvation model based on discrete solvent representation and demonstrated its ability to estimate...
1 downloads 11 Views 1MB Size
Article pubs.acs.org/JPCB

Hydration in Discrete Water (II): From Neutral to Charged Solutes Piotr Setny* Centre of New Technologies, University of Warsaw, Banacha 2c, 02-097 Warsaw, Poland S Supporting Information *

ABSTRACT: In our previous work, we introduced a solvation model based on discrete solvent representation and demonstrated its ability to estimate hydration free energies for neutral solutes. Here, we present modifications extending the applicability of the model to charged solutes. They include improvements in the representation of the first hydration shell and systematic treatment of long-range interactions. While sharing computational efficiency of implicit solvent models, our approach avoids some of their important limitations, both in the context of electrostatic and nonpolar hydration effects: it naturally captures hydration asymmetry of opposite charges, it relies on solute description by standard all atom force fields instead of utilizing specialized sets of atomic parameters, it predicts solvent distribution in space without the need to geometrically define solvent accessible surface. By combining discrete solvent representation in vicinity of a solute with continuum description of long-range interactions, the model addresses two distinct aspects of biomolecular hydration: complex, short-range effects arising due to molecular nature of aqueous solvent, and bulk contributions. We demonstrate that the model provides good agreement with experimental results for an extensive set of roughly 700 diverse compounds, including neutral and charged solutes with hydration free energies ranging from +3.4 to −536 kcal/mol.



INTRODUCTION Computational methods aiming at investigation of living matter at atomistic level need to take into account the influence of aqueous environment on biomolecules and their interactions.1,2 One way of doing this is to perform simulations in which water molecules are explicitly modeled together with the system of interest. Such explicit solvent approaches provide most detailed insights; however, large computational effort makes their use prohibitive in many cases.3 As an alternative one can estimate changes in hydration free energya free energy cost of transferring a solute from gas to aqueous phasebetween the states of interest of a given system and include them into calculations.4 To date multiple approaches have been devised in this latter respect, starting from the evaluation of the type and area of hydrated molecular surface,5,6 the use of continuum electrostatic calculations7−10 together with less or more sophisticated estimates of nonpolar hydration free energies,11−14 the application of the density functional theory of fluids,15 to methods based on physicochemical descriptors.16,17 Owing to computational efficiency, such implicit solvent methods can be used to account for the role of water in receptor−ligand binding, macromolecular docking, protein folding, and other problems,18,19 which are currently hard to investigate with explicit solvent simulations. Most implicit solvent approaches were introduced, parametrized and validated based on relatively simple, small molecules with experimentally determined hydration free energies.20 This does not guarantee, however, their trans© 2015 American Chemical Society

ferability to systems where peculiar solvent effects take place. For instance, water arrangement in protein interiors may vary from the absence in sterically hydratable cavities that remain dry due to hydrophobic nature of the surrounding residues,21,22 to persistent residence of solitary water molecules embedded in hydrogen bond network within a macromolecular structure.23 While of particular relevance for docking or protein folding studies, such effects are not captured by most implicit solvent approaches−in this respect, the development of 3D-RISM (three-dimensional reference interaction site model) based methods is of particular promise,24 yet their routine application for high throughput calculations is still limited. Another important issue is related to charged solutes such as atomic or molecular ions, including ubiquitous charged side chains in proteins or phosphate groups in nucleic acids. First, the representation of charged species in typical training sets for the estimation of hydration free energies is sparse due to experimental problems associated with determining thermodynamic data for extremely hydrophilic solutes.25 As a result the accuracy of methods parametrization toward ionic solutes is limited, or such solutes are totally excluded from consideration. Second, the asymmetry of charge hydration (i.e., different hydration of positive and negative charges due to the asymmetry of water molecule)26 is not captured by popular continuum electrostatic methods such as Poisson−Boltzmann Received: February 28, 2015 Revised: April 20, 2015 Published: April 21, 2015 5970

DOI: 10.1021/acs.jpcb.5b01982 J. Phys. Chem. B 2015, 119, 5970−5978

Article

The Journal of Physical Chemistry B

allow calculations for charged solutes. Finally, we present the assessment of model efficacy on extended test sets.

or generalized Born approaches. In an attempt to address this issue, empirical corrections can be introduced to both formalisms,27,28 yet their standard, unmodified versions are still used in the core of most implicit solvent calculations. The complexity of hydration effects calls for methods suitable to represent solute-induced peculiarities in aqueous environment better than continuum models. A possible direction is to combine elements of explicit water description with methods allowing for simplified estimation of thermodynamic properties. In this spirit, a Langevin dipoles model29 accounts for electrostatic effects by considering free energy of a grid of polarizable dipoles exposed to electric field generated by a solute. This approach explicitly captures water polarization (and its saturation in strong electric fields) as well as mutual interaction between solvent dipoles. A more recent method, called semiexplicit assembly30 makes use of tabularized free energy components (nonpolar and polar) of first hydration shell water determined in explicit solvent simulations for various model spheres. For an arbitrary solute, relevant free energy components are selected based on particular attributes sampled along its solvent accessible surface (curvature, electrostatic field, Lennard-Jones potential), and summed up to give the total hydration free energy. The method has been recently upgraded31 to provide improved treatment of charged solutes by introducing solvent accessible surface that is iteratively adopted to account for closer solvent approach to solute regions producing strong electric field. Pursuing the idea of combining explicit water model with simplified sampling of solvent configurations, we recently introduced a semiexplicit hydration model based on discrete solvent representation and mean field approach.32 Most solvent properties are simplified to the extreme, but those particularly important for molecular hydration: water hydrogen bonding, as well as water-solute electrostatic and dispersion interactions are explicitly taken into account. While maintaining computational efficiency typical for fast implicit solvent approaches, the model addresses some of their important deficiencies. Model predictions include not only hydration free energies, but also solvent distribution in the presence of a solute. The predicted solvent distribution is not dependent on any (pre)defined solvent accessible surface construction, but adjusts selfconsiestently to solute topography and physicochemical properties, thus capturing the aforementioned effects such as water expulsion from hydrophobic cavities or localization in hydrogen-bonded networks. Relying on atomistic force field description of water molecule, the model has build in asymmetry with respect to charge inversion. Furthermore, it is readily applicable to solutes described by standard force field parameters used for explicit solvent simulations. In this report we present extensions of the method allowing for predictions of hydration free energies for charged solutes. Along with model simplification, leading to the reduction in the number of adjustable parameters from 6 to 5, they include the introduction of electrostatic reaction field, boundary corrections for long-range dispersion interactions, and improved treatment of the first hydration shell density. The model was reparametrized and tested based on an extensive set of 642 neutral20 and 44 charged25,29 solutes, giving quantitatively sound predictions for hydration free energies spanning the range from +3.4 to −536 kcal/mol. As the preliminary version of the model was described in detail before,32 in the following we briefly review key model assumptions and focus on the improvements introduced to



THEORY AND METHODS Basic Assumptions. The model is based on semiexplicit solvent representation, which is a combination of atomistic treatment of solute−water interactions with mean field solvent description. The space available for water molecules is discretized into three-dimensional body centered cubic (BCC) lattice with 1 Å distance between the nearest neighbors. Given the tetrahedral arrangement of the nearest neighbors in BCC lattice, it allows the placement of three atoms of SPC/E water model33 (SPC/E - single point charge extended−one of popular atomistic water models used in explicit solvent molecular dynamics simulations, with OH distance of 1 Å, and HOH angle of 109.5°) in such a way that with an oxygen atom located at a given lattice point, two hydrogen atoms occupy two of its eight nearest neighbors (Figure 1). For each

Figure 1. Section of BCC lattice with build in SPC/E water molecule (red, oxygen atom; white, hydrogen atoms). Green spheres represent lattice points considered for hydrogen bonding with the central water particle at its current orientation.

oxygen position, there are 12 unique combinations of hydrogen placements, corresponding to 12 possible orientations of a water molecule. A lattice point is considered as occupied by water (with density ρ > 0), if it holds an oxygen atom, or empty (with density ρ = 0) otherwise. In the absence of a solute, all lattice points are assumed to represent uniform density of bulk water at standard state ρb = 0.0334 Å−3 (997 kg/m3). A solute, considered in all atom representation, is treated as a source of external potential acting on the solvent lattice. For a given position of solute atoms, electrostatic and Lennard-Jones potentials are calculated at each lattice point using parameters taken from atomistic force field (GAFF force field34 with AM1BCC charges35 is used for small organic molecules), adjusted for interaction with SPC/E water atoms by standard force field combination rules. Effective Hamiltonian. The energy of a single solvent lattice point located at r is described by an effective Hamiltonian, which includes solute−solvent term, HUV, and solvent−solvent term, HUU, both ensemble-averaged over 12 possible orientations, θ, of an SPC/E water probe with oxygen atom placed at r: 5971

DOI: 10.1021/acs.jpcb.5b01982 J. Phys. Chem. B 2015, 119, 5970−5978

Article

The Journal of Physical Chemistry B Heff (r, {n}) = −kBT ln(∑ e−βHUV (r, θ) − βHUU(r, θ ,{n})) θ

(1)

Here, kBT, and β are the Boltzmann constant times temperature (T = 300 K is assumed), and its inverse respectively, and {n} denotes the instantaneous distribution of occupied and empty solvent cells. The solute−solvent part is based on electrostatic, Eel, and Lennard-Jones, ELJ, energy terms of SPC/E water molecule in respective solute potentials: HUV (r, θ ) = Eel(ϵ(r), r, θ ) + ELJ (r)

(2)

The electrostatic term includes position dependent dielectric constant, ϵ(r). It is either equal to 1 for water probes within the first hydration shell (the first hydration shell is constituted by occupied lattice points having at least one neighbor from which water was displaced by the solute), or adopts a constant value of ϵg (ϵg is one of five adjustable parameters of the model) for the rest of occupied lattice points. The ELJ term does not depend on orientation, since the SPC/E water has only one Lennard-Jones interaction center at the oxygen atom. The solvent−solvent term is based on the assumption that solvent lattice points interact with each other only through hydrogen bonds, whose creation depends on lattice occupancy. The number of hydrogen bonds maintained by a water probe at its given position and orientation is determined by the occupancies of four lattice points (Figure 1): two of them are three grid spacings away from the position of water probe oxygen in the direction of hydrogen atoms (which corresponds to a distance of 3 Å from oxygen center, and is well within the length limit for water−water hydrogen bond36), and two others are at the same distance, at the direction of oxygen lone pairs. Under these assumptions 1 HUU (r, θ , {n}) = ϵHBN (r, θ , {n}) 2

Figure 2. Schematic representation of spherical semiexplicit solvent region of radius R, extending by Rc from the position of most offcenter solute atom, and an outer region, Vout, treated as a continuum with water dielectric constant ϵout.

centered at r = 0, and embedded in uniform medium of dielectric permittivity ϵout:37,38 ∞

Φ(R , r) =

∑ qi ∑ i

n=0

n 1 ⎛ rri ⎞ ⎜ ⎟ A P (cos γ ) n n i R ⎝ R2 ⎠

(4)

where An = (((2n + 1) ϵin)/(nϵin + (n + 1)ϵout)) − 1, γi is the angle between r and ri (or 0, if rri = 0), and Pn are associated Legendre polynomials. For such reaction field, electrostatic bulk correction for a given cavity radius is37,38 qi 1 Gbel(R ) = ∑ Φ(R , ri) 2 i 4π ϵ0ϵin (5) where ϵ0 is vacuum permittivity. For a single point charge located at r = 0, and assuming ϵin = 1, this expression reduces to the well-known Born formula for the free energy of transferring a charge from vacuum to the center of a spherical cavity of radius R inside uniform dielectric.39 Fast convergence of Φ(R, r) calculation with the use of eq 4 is assured by the fact that all solute charges are at least a few Å away from the boundary of the semiexplicit solvent region, resulting in R considerably larger than any of ri. In practice, including the first ten terms of the electrostatic potential expansion is enough to reach converged Gelb values. For hydration free energy calculations we assume ϵout = 78, as for bulk water at standard state. The value of internal dielectric permittivity, ϵin, is determined during the parametrization of the model, in order to ensure that the sum of electrostatic contributions due to semiexplicit solvent region and bulk correction does not depend on the actual radius of spherical boundary (Figure 3). Note that such construction implies that ϵin is not a freely adjustable parameter of the model, but is dependent on to the actual values of variable model parameters. The details of ϵin adjustment are given in the Supporting Information. Bulk correction to dispersion interactions, GLJ b , is based on the assumption that the aqueous environment outside of semiexplicit solvent sphere can be treated as a uniform fluid with density ρb, interacting with solute atoms only thorough long-range, attractive part of the Lennard-Jones potential. The resulting interaction energy can be thus described by a volume integral over the infinite outer region or, following the application of Gauss theorem,11 converted to surface integral over the spherical boundary:

(3)

were ϵHB is the assumed energy of a single hydrogen bond (a parameter of the model), and N(r, θ, {n}) ∈ [0..4], is the number of occupied (i.e., with solvent density ρ > 0) lattice points at locations suitable for making a hydrogen bond with water probe at (r, θ). Bulk Corrections. In the original model, whose application was limited to neutral solutes, calculations were performed on a grid region sufficiently large to confine the influence of cutoff effects within the desired level of accuracy. In the case of charged species, such an approach is prohibitive due to longrange nature of electrostatic effects, motivating the introduction of boundary conditions. Accordingly, the semiexplicit solvent region is restricted to a sphere of radius R, centered on solute’s charge center (the geometric center of absolute atomic charges; mass center is used if all charges are zero). This radius is adopted to the actual solute size, by adding an offset, Rc, beyond the position of the most off-center solute atom (Figure 2). The region outside of the sphere is treated as a continuum with bulk water density and macroscopic dielectric constant, ϵout. The spherical shape of semiexplicit−continuum boundary, allows the use of analytical solutions for bulk corrections. First, electrostatic correction, Gelb , is evaluated as an energy of solute charges in the reaction field Φ, generated by the outer dielectric region. Such reaction field is obtained by the solution of Poisson equation for i solute charges, qi, located at ri, inside a spherical cavity of radius R and dielectric permittivity ϵin, 5972

DOI: 10.1021/acs.jpcb.5b01982 J. Phys. Chem. B 2015, 119, 5970−5978

Article

The Journal of Physical Chemistry B

chemical potential. It is assumed that for pure water, in the absence of external solute fields, Heff  Hb, where Hb represents bulk excess chemical potential, and is an adjustable parameter of the model. In the preliminary model version, developed for neutral compounds that typically induce only moderate variations in Heff with respect to Hb in solvent occupied areas, good estimates of hydration free energies were obtained with the assumption of only binary solvent density, that is ρ  ρb in all occupied regions, and ρ = 0 in regions where solvent is displaced by solutes.32 Such approach proved unsatisfactory, however, when applied to charged solutes. A set of model parameters providing hydration free energies in good agreement with experiment for neutral compounds, resulted in underestimated predictions for ions (Figure 4). The observed discrepancies were particularly

Figure 3. Electrostatic (Eel) and dispersion (ELJ) contributions (green, total; blue, from semiexplicit region; orange, from bulk corrections) to hydration free energy for different sizes of semiexplicit grid region. Electrostatic contributions were evaluated for acetic acid (note that bulk correction values are shifted by −75 kcal/mol to fit into the plot) and dispersion contributions for benzene.

GbLJ = ρb

∫v

out

attr U LJ (r) dr = ρb

∮∂V

out

(A · n ) d S

(6)

−6 Here, Uattr LJ (r) is a sum of attractive (∼r ) Lennard-Jones contributions due to all solute atoms, A ∼ r/(3r6) is a vector field, chosen such that Uattr LJ = ∇A, and n is a surface normal vector. The calculation of A·n is particularly easy owing to the spherical symmetry of the boundary. The integral is performed numerically using three-dimensional spherical t-design with t = 240 points.40 We found that in order to obtain total dispersion contribution to hydration free energy (i.e., the sum of grid contribution and bulk correction) that is independent of the actual size of semiexplicit solvent region, GLJ b needs to be evaluated for spherical boundary whose radius is shifted with respect to R by δR = 1.3 Å toward the solute. Such δR value was determined by trial and error and appears to be transferable to all investigated solutes. The construction of bulk corrections described above, makes the results independent of the size of semiexplicit solvent region (Figure 3), as long as the Rc offset is large enough to contain at least a continuous shell of occupied lattice points around the solute. Obviously, smaller Rc values speed up the calculations, but at the same time closer proximity of boundary surface to solute atoms may increase inaccuracies of bulk corrections. We observed consistent results for Rc as small as 5 Å, but for model parametrization and all tests described below we used Rc = 6 Å. In the case of neutral compounds, such margin of semiexplicit solvent lattice typically results in GLJ b accounting for roughly 15% of total dispersion contribution to hydration free energy, and Gelb which is negligible. Opposite proportions are observed for charged compounds, whose net charge is responsible for strong reaction field resulting in noticeable Gelb , while van der Waals interactions are dominated by large solute−solvent repulsion captured by the first, semi explicit hydration shell. Solvent Distribution. In a simplified, mean field solvent treatment, its average density at a given point, ρ(r), is governed by the local excess chemical potential, μ(r), by the relation ρ(r) ∼ e−μ(r)/kBT, which implies that solvent accumulates in regions of favorable (i.e., low) excess chemical potential.41 In the case of semiexplicit solvent lattice, water density is assigned to a given lattice point depending on its effective Hamiltonian (eq 1), which, for the proposed model, is equivalent to local excess

Figure 4. Results for the training set before (left) and after (right) the introduction of density scaling for the first hydration shell, illustrating dramatic improvement for charged (q ≠ 0) solutes.

pronounced for small monovalent, or doubly charged ions, suggesting their link to the strength of electric field and its influence on the compactness of hydration shell. In order to account for the effect of strong solute−solvent attraction, we introduced the following coupling between Heff and solvent density: ρ(r) = f (Heff (r, {n})) ⎧ ρ (1 + a(H − H )), if 1 + a(H − H ) b eff b eff ⎪ b =⎨ ≥η ⎪ 0, otherwise ⎩ (7)

Here η is a ”drying threshold” (a fraction of bulk density, below which a given cell is considered as no longer occupied), and a > 0 is a density scaling factor, which are another two parameters of the model. Under those assumptions solvent distribution in the presence of a solute is determined in the following way: having calculated solute potentials for all lattice points, we start with uniform solvent density ρ = ρb for the whole lattice. Then, Heff is calculated for each lattice point, and solvent density is changed according to eq 7. The last two steps are iterated until stationary distribution of occupied points is reached. Finally, solute hydration free energy, ΔGs, is evaluated using the following formula: ΔGs =

∑ vb(ρ(r)Heff (r) − ρb Hb) + Gbel + GbLJ {r}

(8)

Here, the summation extends over all occupied lattice points, {r}, vb is a volume per BCC lattice point. Convergence. Owing to discrete nature of the solvent lattice, ΔGs values for a given, fixed solute conformation 5973

DOI: 10.1021/acs.jpcb.5b01982 J. Phys. Chem. B 2015, 119, 5970−5978

Article

The Journal of Physical Chemistry B

was truncated at 10 Å. Temperature was maintained at 300 K with a Nose−Hoover thermostat, and pressure was set to 1 bar with Parrinello−Rahman barostat, using standard GROMACS implementations of both methods. During production runs, solute structures were saved every 5 ps, and than clustered according to all atom RMSD using gromos algorithm with 1 Å cutoff, as implemented in GROMACS. A middle structure of the most populated cluster was used for calculations with the model. Parametrisation. The model has five adjustable parameters: ϵHB, the energy of hydrogen bond in water; Hb, bulk water free energy; η, a ”drying threshold”; a, a density scaling factor; and ϵg, the effective dielectric constant of semiexplicit solvent region. Their values were determined based on extensive search of parameter space, with the goal to minimize root mean standard error (RMSE) of ΔGs predictions with respect to experimental values. A training set, containing 119 rigid, neural organic molecules (molecules with no rotatable bonds were selected in order to limit the dependence of hydration free energies on solute conformations), and 33 randomly selected charged compounds was used. The estimation of hydration free energies for the entire training set takes ∼6 s on a single processor with 8-cores, allowing a complete scan of 105 combinations within a few days. Three rounds of nested searches with progressively finer spacing in parameter space were conducted, leading to a final set of parameters: ϵHB = −2.65 kcal/mol, Hb = −9.94 kcal/mol, η = 0.28, a = 0.112 (kcal/mol)−1, and ϵg = 7.0. Poisson−Boltzmann Calculations. Hydration free energies obtained with the model were compared with predictions based on implicit solvent calculations. They were carried out with the adaptive Poisson−Boltzmann solver (APBS),43 using zero bulk ionic strength, and a solvent dielectric of 78. The rest of physical parameters was either adjusted to give the best results or taken as default, as described in the Results. A grid spacing of 0.25 Å was used, and electrostatic contribution to hydration free energies was obtained as a difference between electrostatic energies in solvent medium and vacuum.

depend on its orientation on the lattice. In order to limit the lattice-related bias, N independent runs with random solute placements are performed, and ΔGs values are Boltzmann averaged to give the final estimate of hydration free energy. The convergence of such procedure depends on chemical complexity of the solute and the magnitude of its hydration free energy. For neutral solutes, standard deviation of hydration free energy estimates obtained upon Boltzmann averaging is typically confined to