Hydration in Discrete Water. A Mean Field, Cellular Automata Based

The model utilizes a mean field approach for the calculation of solute−solvent and solvent−solvent interaction energies and a cellular automata ba...
2 downloads 0 Views 287KB Size
J. Phys. Chem. B 2010, 114, 8667–8675

8667

Hydration in Discrete Water. A Mean Field, Cellular Automata Based Approach to Calculating Hydration Free Energies Piotr Setny*,† and Martin Zacharias Physics Department, Technical UniVersity Munich, 85748 Garching, Germany ReceiVed: March 18, 2010; ReVised Manuscript ReceiVed: May 26, 2010

A simple, semiheuristic solvation model based on a discrete, BCC grid of solvent cells has been presented. The model utilizes a mean field approach for the calculation of solute-solvent and solvent-solvent interaction energies and a cellular automata based algorithm for the prediction of solvent distribution in the presence of solute. The construction of the effective Hamiltonian for a solvent cell provides an explicit coupling between orientation-dependent water-solute electrostatic interactions and water-water hydrogen bonding. The water-solute dispersion interaction is also explicitly taken into account. The model does not depend on any arbitrary definition of the solute-solvent interface nor does it use a microscopic surface tension for the calculation of nonpolar contributions to the hydration free energies. It is demonstrated that the model provides satisfactory predictions of hydration free energies for drug-like molecules and is able to reproduce the distribution of buried water molecules within protein structures. The model is computationally efficient and is applicable to arbitrary molecules described by atomistic force field. Introduction Virtually all biophysically interesting phenomena occur in an aqueous environment. In most cases, water is not just a passive ingredient but determines the dynamics and the outcome of the considered processes. The associated thermodynamic contribution, change in hydration free energy of the evolving objects, is particularly important in studies of molecular association, conformational equilibria, reaction kinetics, or phase partitioning. Along with significant advances in computational biophysics in recent years, a number of theoretical approaches have been developed for solvation modeling, presenting varying degrees of complexity.1 Unfortunately, in many cases, the currently available methods still do not provide satisfactory results both when accuracy and efficiency are considered.2 In principle, molecular dynamics (MD) simulations with the use of an explicit solvent provide the most detailed description of hydration and usually the accurate values of thermodynamic quantities.3 Their practical use is hampered, however, by significant computational cost required to model a large number of water molecules in order to reproduce bulk solution. Thus, methods treating water as a continuum medium, whose description is reduced to macroscopic parameters such as surface tension and dielectric constant, are of great practical value. In the majority of such implicit solvent models, the free energy of hydration is divided into two components: nonpolar and polar. The first one describes both: the work required to create a cavity for the solute within the solvent and solute-solvent van der Waals interaction energy. This component is most often assumed to be proportional to some measure of solute-solvent interface area, with the proportionality constant playing the role of microscopic surface tension. In some approaches, the dispersion contribution is calculated explicitly for each solute atom as an integral of r-6 type function over solvent occupied * To whom correspondence should be addressed. E-mail: piotr.setny@ tum.de. † On leave from Interdisciplinary Centre for Mathematical and Computational Modelling, University of Warsaw, Warsaw 02-089, Poland.

area (or equivalently: the interface surface).4-6 The second, “polar” component, accounts for solute-solvent electrostatic interaction, and is usually evaluated with the use of generalized Born7 or Poisson-Boltzmann8 models. Such approaches, although physically well motivated,1 critically depend on a somewhat arbitrary choice of the solute-solvent boundary. Typically, a solvent accessible surface9 is used, which is defined geometrically as a surface visited by the center of a spherical probe rolled over the solute van der Waals surface. Its shape and area are tuned by adjustment of the probe and atomic radii, and a variety of methods are used to obtain proper surface tension parameters: from a single “experimental” value,10 through atomic11 or group12 specific coefficients, to sophisticated, geometry dependent atomic surface tensions.13 Nevertheless, the solvent accessible surface remains a geometric construct and as such is not sensitive to subtleties of the solute-solvent interaction. It is known, however, that particular combinations of solute topography with local solute-solvent dispersion and electrostatic interaction may cause water expulsion from sterically hydratable areas, effectively shifting the “solvent accessible” surface.14,15 Evidence of such dewetting has been found experimentally in hydrophobic protein cavities,16,17 observed in numerous explicit solvent simulations,18-20 and also envisioned by theoretical approaches.21 Reproduction of such effects, and hence applications involving large biomolecules with potentially dewettable hydrophobic structural motifs, seems to be currently beyond the scope of most implicit solvent models. Recently, Dzubiella et al. developed a variational implicit solvent model,22 in which the solvent boundary is a dynamic construct, whose optimal configuration is obtained by minimization of surface dependent solvent free energy functional. The ability of the method to predict a drying event and the related thermodynamic effect in hydrophobic receptor-ligand system was demonstrated,23,24 but its applicability to electrostatically interacting solutes requires further development. At the opposite end of the hydration phenomena spectrum are buried water molecules or hydrogen-bonded networks of localized water molecules that exist in confined areas within

10.1021/jp102462s  2010 American Chemical Society Published on Web 06/16/2010

8668

J. Phys. Chem. B, Vol. 114, No. 26, 2010

protein structures and intermolecular interfaces. They often provide specific interactions, critical for protein function or ligand recognition,25,26 and also, when released upon binding, affect association thermodynamics.27,28 An a priori prediction of buried hydration sites by straightforward explicit solvent simulations is rather impractical due to water permeation into macromolecular structures being too slow for currently available submicrosecond simulation time scales. An interesting alternative is grand canonical Monte Carlo simulations,29 but its practical use is still limited by the computational effort involved. On the other hand, the applicability of implicit solvent approaches, both for prediction of water presence and estimation of its free energy contribution, is questionable as the continuum solvent representation breaks down when isolated molecules are considered. To date, relatively little has been done to tackle this problem. Most of the available and practically usable methods rely on placing a water probe at different positions on the protein surface or inside the binding cavity and applying an empirical scoring function to determine whether a given location should be hydrated.30-33 A common drawback of such approaches is that interactions between neighboring hydration sites are not taken into account, and thus the stabilizing effect of the water-water hydrogen bond network is neglected. Also, those methods are usually parametrized for proteins, when both placement of hypothetical hydration sites and their scoring is considered, and hence are not easily transferable to other arbitrary biomolecular systems. A notable success in the modeling of macromolecular hydration was reported recently with the use of a three-dimensional reference interaction site model (3D-RISM).34,35 The method was shown to correctly predict positions of localized water molecules in several protein structures, demonstrating a very promising application of integral equation theories of solutions36 to large biomolecules. In this paper, a considerably simpler approach is presented, based on a semiheuristic, discrete solvent model. The idea of considering discrete water is not new. There exists a class of models based on a partially filled, body-centered cubic lattice (BCC).37-40 Such a lattice offers geometry that is particularly suitable for water modeling as each cell has its nearest neighbors arranged in tetrahedral configuration, which corresponds to the assumed ideal geometry of water hydrogen bonding. Those lattice-based approaches, benefiting from simplified solvent description with clearly defined discrete spatial and orientational configurations, arrive at series of analytical expressions for the thermodynamic quantities of interest. They were predominantly used for the investigation of anomalous water properties, but the hydration of model solutes was also studied.41,42 Yet, despite providing interesting results, those methods are not suitable for the treatment of the polyatomic solutes of arbitrary shape, size, and chemical nature. The approach adopted in this work, although also utilizing the BCC lattice, is rather different. It aims at the prediction of hydration free energies for drug-like solutes and the detection of localized water molecules in complex biomolecular structures, with the use of a single, generally applicable solvent model. Here, the main goal is to obtain the desired quantities of interest and the estimate of solvent distribution in a computationally efficient manner, perhaps at the expense of the depth of physical insight. Most solvent properties are simplified to the extreme, but those particularly important for molecular hydration: water-solute electrostatic and dispersion interactions, and water hydrogen bonding are

Setny and Zacharias explicitly taken into account. The solvent distribution around the solute is obtained using a cellular automata type of evolution, and is not dependent on any (pre)defined solvent accessible surface construction. Standard force field parameters are used for solute-solvent interactions. Owing to the discrete nature of the model, calculations are rather fast, making the method applicable both to small compounds and large biomolecules such as proteins. Model Description General Idea. Given an arbitrary solute, its free energy of hydration is estimated as the cost of obtaining the corresponding solvent distribution in the external field of the solute. The approach is loosely based on mean field theory. In case of a simple, continuous fluid with mean interparticle interaction field u(r), subjected to the external potential φ(r), the average density at a given point can be expressed as43

〈F(r)〉 ) Fb exp[-β(φ(r) +

∫ dr' 〈F(r')〉u(|r - r'|))] (1)

where β ) (kBT)-1 is the inverse of the Boltzmann constant times temperature T, and Fb corresponds to the bulk number density. In the current context, the external potential describes van der Waals and electrostatic interactions with the solute atoms. Instead of continuous density field F(r), a discrete solvent distribution is introduced, and the effective mean field solute-solvent and solvent-solvent interactions are calculated by placing a water probe at the center of each solvent cell and considering an ensemble of its orientation-dependent energies. This way, an estimate of the right-hand side of eq 1 can be obtained, and then the solvent distribution in the presence of solute can be determined in an iterative way. Here, instead of a formal procedure, a semiheuristic algorithm, based on cellular automata like approach is used for propagating the solvent evolution. A central role in the prediction of solvent distribution and subsequent evaluation of hydration free energy plays an effective Hamiltonian of a solvent cell. Effective Hamiltonian. The proposed discrete solvent model is implemented on a three-dimensional grid, topologically corresponding to the body-centered cubic (BCC) lattice. Due to arrangement of the nearest neighbors, such a lattice reproduces well the tetrahedral geometry of water hydrogen bonding. Also, the 109.5° angle between the two bonds corresponds to the HOH angle in the widely used singlepoint charge (SPC) family of three point molecular water models,44 which will be utilized in the construction of the effective Hamiltonian. Each grid point is considered as a center of solvent cell that can be either occupied or empty, having density of 1 or 0, respectively. Pure solvent is assumed to have a uniform density of 1. An average interaction of a solvent cell with the external (solute) field and the remaining solvent serves as an estimate of local excess chemical potential, and is regarded as an effective Hamiltonian (Heff) for the sake of solvent evolution. It is calculated by introducing a hypothetical, freely rotatable water molecule located at the center of the solvent cell. Given the external field, such effective Hamiltonian is the function of the cell position r and the occupancy of the surrounding cells, denoted as {n}: Heff ) Heff(r,{n}). The solute-solvent component of the effective Hamiltonian consists of van der Waals and electrostatic terms. The first one

Hydration in Discrete Water

J. Phys. Chem. B, Vol. 114, No. 26, 2010 8669

Figure 1. Fragment of BCC lattice showing cells considered for the evaluation of the interaction energy of hypothetical water molecule occupying the blue cell. Van der Waals energy is obtained, based on LJ potential value (eq 2) at the center of the blue cell. Electrostatic energy (eq 3) is calculated based on water atomic charges and electrostatic potential at the blue cell (for oxygen atom) and two red cells (for hydrogen atoms). Hydrogen bonding (eq 5) is determined, based on the occupancy of green cells.

is obtained in a straightforward way, using Lennard-Jones (LJ) potential energy of SPC/E44 water probe, whose oxygen is placed at point r

ELJ(r) )

∑ Us(LJ)(|r - rs|)

(2)

In order to estimate the solvent-solvent contribution to the effective Hamiltonian, it is assumed that water molecules interact with each other only through hydrogen bonds. An occupied solvent cell can participate in up to four such bonds, depending on orientation of its hypothetical water molecule and occupancy of the surrounding cells. For each of the 12 unique water orientations, there are four cells whose occupancy determines the actual number of hydrogen bonds: two of them are 3 grid spacings away (which corresponds to 3 Å, and is well within the length limit for water-water hydrogen bond45) from the central cell, in the direction of hydrogen atoms (Figure 1), and two other are at the same distance, roughly at the direction of oxygen lone pairs. Due to geometry of the BCC lattice, the resulting spatial arrangement of “hydrogen bonds” is exactly tetrahedral (i.e., with 109.5° angle between each bonds pair). The energetic contribution of each hydrogen bond (εHB) is assumed to be equal, and hence, the interaction of the hypothetical water molecule at a given orientation with the rest of the solvent reads

1 EHB(r, θ, {n}) ) εHBN(r, θ, {n}) 2

(5)

Here, N(r, θ,{n}) ∈ {0...4}, is the number of occupied cells suitable for making hydrogen bond with the considered cell, and 1/2 factor provides that a given hydrogen bond is not counted twice when contributions from the two bonded cells are added. The sum of contributions described by eqs 2, 3, and 5 provide the orientation dependent energy of the hypothetical water molecule occupying a solvent cell located at r

s

where summation extends over all s solute atoms. ULJ contributions are calculated using the standard 6-12 LJ potential with standard force field parameters and mixing rules. The calculation of the orientation dependent electrostatic energy relys on grid geometry. The oxygen-hydrogen distance in SPC/E water model is 1.0 Å. Using the BCC grid with the same distance between the nearest neighbors, and placing the center of oxygen atom at an arbitrary grid point, it is possible to find 12 unique orientations of water molecule for which both hydrogen atoms fall at the 2 of 8 neighboring grid points (see Figure 1). Then, the corresponding electrostatic energy of the hypothetical water molecule at a given orientation θ can be expressed as

Eel(r, θ) ) QOV(r) + QHV(rH1(r, θ)) + QHV(rH2(r, θ)) (3) where QO and QH are force field charges of water oxygen and hydrogen respectively, and V(r) is the electrostatic potential due to solute atoms given by

V(r) )

kQ

∑ ε(r)|r -s rs|

(4)

s

Here, s denotes solute atoms, k is a constant dependent on the choice of units (for “biochemical units”, i.e., kcal/mol and Å, used here, k ) 332), and ε(r) is the position dependent dielectric constant, whose actual form will be discussed in the next section.

H(r, θ, {n}) ) EHB(r, θ, {n}) + Eel(r, θ) + ELJ(r)

(6) The effective Hamiltonian of the solvent cell is then calculated by averaging over all 12 unique orientations:

(∑e

-βH(r,θ,{n})

Heff(r, {n}) ) -kBTln

θ

)

(7)

Solvent Evolution. In principle, the solvent distribution in the presence of an external field can be obtained by iteratively solving eq 1 (usually its more sophisticated version) as it is done in the framework of integral equation theory of liquids.36 Here, making use of the discrete nature of solvent and its internal interactions, a simple, semi heuristic iterative algorithm is developed (Figure 2). Adopting a cellular-automata approach, it is assumed that, given the arbitrary solvent distribution, the state of each cell in the next iteration step is determined by its effective Hamiltonian and, hence, is also a function of occupancy of some near neighbors (those with whom the cell can make hydrogen bonds). In order to determine if the cell should remain occupied, its effective Hamiltonian is compared with the mean effective Hamiltonian of bulk solvent: Hb, and the following rule is used:

F(r) ) f(Heff(r, {n})) )

{

1 if Heff - Hb e δ 0 otherwise

(8)

with δ playing the role of a “drying threshold”, which is one of the parameters of the model. The bulk free energy is calculated

8670

J. Phys. Chem. B, Vol. 114, No. 26, 2010

Setny and Zacharias

Figure 2. Left side: the algorithm for solvent evolution and hydration free energy estimation. Right side: schematic representation of solvent evolution around a hypothetical solute (balls and sticks). White, occupied cells; red, cells whose Heff - Hb > δ at the given step; stripes, empty cells. (a) uniform initial solvent distribution, (b) at first steps cells are deleted mostly due to unvavorable LJ energy, (c) initial cavity around the solute, (d) at further steps, some cells are deleted mostly due to their unvavorable interaction with the rest of solvent, and (e) final solvent distribution.

neglected, and the necessary screening effect is obtained by introducing a distance dependent dielectric constant. Even though it is possible to implement the iterative scheme in the model proposed here, it would require introduction of a solvent electrostatic potential and its recalculation in each step of solvent evolution. Since computational efficiency is the main goal of the developed approach, the distance dependent scheme for the electrostatic screening was considered. With respect to this, the model brings an interesting perspective of introducing the solvent dielectric constant which depends on the distance from the actual solute-solvent interface and not on the distance to the solute atoms. However, as it would require additional assumptions regarding the actual functional form of such dielectric constant, this approach was not pursued further in the current early model version. Instead, it occurred that satisfactory agreement with experimental results can be obtained when only the first shell of occupied solvent cells around the solute is taken into account for evaluation of electrostatic and hydrogen bonding solvent energies. In this case only a single value of dielectric constant is necessary, and this value was treated as an adjustable model parameter. Given that the LJ energy component (eq 2) of Heff is not orientation dependent, the effective Hamiltonian can be split into the LJ part and the term describing electrostatic and hydrogen bonding (EH) energy together: Heff ) ELJ + HEH. The solvation free energy can be then expressed as a sum of two terms (dependence on {n} is omitted for clarity)

∆G ) λLJ using eqs 5 and 7 (i.e., without the external field) and putting N(r, θ,{n}) ≡ Nb. The Nb corresponds to the average number of hydrogen bonds maintained by water molecule in the bulk, and is also an adjustable parameter of the model. Solvent evolution is started with the uniform density of 1 (Figure 2a). In each step, Heff is calculated for occupied cells, based on the actual solvent distribution. Then, eq 8 is evaluated for each such cell, and some cells become unoccupied (Figure 2b-d). The algorithm is iterated until no more cells change their state (Figure 2e). Finally, the free energy of hydration can be estimated using the following formula (see the Appendix):

∆G )

∑ VFbF(ri,j,k)(Heff(ri,j,k, {n}) - Hb)

(9)

i,j,k

where summation extends over all cells in the lattice. Here, V is volume per solvent cell (V ) d3/2, where d ) (23)/3 Å is the lattice constant, providing 1 Å distance between the nearest neighbors) and Fb ) 0.0334 Å -3 is the number density of bulk water. The treatment of electrostatic interactions is of primary importance for obtaining quantitatively satisfying results. The adopted approach is most similar to the Langevin dipole (LD) model, developed by Warshel and his co-workers.46-48 In that model, a discrete grid of solvent Langevin dipoles is placed around the solute, and the electrostatic contribution to solvation free energy is evaluated as energy of their average polarization in the solute electrostatic field. In the original LD model version, the dipoles are allowed to interact with each other, and their final magnitudes and orientations are determined in an iterative way. In a computationally more efficient but slightly less accurate model version, explicit dipole-dipole interactions are

∑ VFbF(ri,j,k)ELJ(ri,j,k) + i,j,k

λEH



VFb(HEH(ri,j,k) - Hb)

(10)

{i, j, k}sl

with the second sum extending only over the first shell of occupied cells. The two parameters λLJ and λEH govern the magnitudes of LJ and EH contributions respectively. Those parameters are fitted to best reproduce the experimental hydration free energy values. Summing up, the model has four adjustable parameters that govern the solvent distribution: energy of hydrogen bond εHB, number of hydrogen bonds in the bulk Nb, dielectric constant for the first hydration shell ε, drying threshold δ, and two parameters λEH and λLJ that, for a given solvent distribution, govern the magnitudes of LJ and EH contributions respectively to hydration free energy. Implementation. The method is implemented in a computer program written in C. Initially, the solute in its all atom representation is placed on the grid with random orientation. Then, LJ and electrostatic potentials are calculated for each grid point. Standard OPLS49 force field parameters with BCC-AM150,51 partial atomic charges for nonpeptide compounds are used to describe solute atoms, and SPC/E model44 is used for water. Evaluation of solute-solvent interaction potentials is the most time-consuming part of all calculations, but is performed only once. The subsequent solvent evolution usually converges after 3-5 iterations. Even though the current model implementation is not optimized for computational efficiency, the calculations are rather fast. For a standard 1003 grid, they take from 0.1 s for small drug-like molecules up to 1 min for 300-residue proteins on a single 2.8 GHz core. Upon completion of solvent evolution, the first “hydration” shell of grid points surrounding the solute can be visualized

Hydration in Discrete Water

J. Phys. Chem. B, Vol. 114, No. 26, 2010 8671 TABLE 1: Ranges of Control Parameters Considered during Model Parametrization, and the Best Combination Founda min max step best

Figure 3. Examples of results visualization: (A) the first shell of grid points around a small solute, (B) singular grid points (small balls) in the narrow deep pocket of lysosyme (pdb 193L); blue translucent spheres correspond to van der Waals contours of four localized crystallographic water molecules (id: 137, 138, 142, 155), gray wireframe is a solvent accessible surface. Points are colored according to Heff, with low values in blue and high values in red.

(Figure 3), allowing for analysis of solute-solvent interface or inspection of localized hydratable sites in case of more complex structures. As the method is based on a discrete array of cells, the results of calculations depend on solute position and orientation. Thus, in order to limit the grid-related uncertainty all considered numerical values of hydration free energies are Boltzmann averaged over 40 independent runs with random solute placements. This procedure provides consistent values with less than 1% relative error. Parametrization. The model was parametrized based on 156 neutral compounds with experimental hydration free energies ranging from -20.3 to +4.3 kcal/mol. A total of 56 of them were drug-like molecules used recently for the OpenEye’s Statistical Assessment of Modeling of Proteins and Ligands challenge (SAMPL1, 2008).2 This set is considered as particularly “difficult” for prediction of hydration free energies,52-55 as it contains relatively large, usually polar molecules with a wide range of diverse functional groups. The rest of the compounds were taken from sets commonly used for parametrization of other solvation models12,48 and were selected to provide possibly wide spectrum of hydration free energies and functional groups. As an independent test set, 16 drug-like molecules prepared for another challenge56 were used (the original set contained 17 molecules, but one of them: imidazole, was found already in the training set). All molecules were subjected to a common preparation procedure. The initial steps were performed with the aid of Schro¨dinger software suite: hydrogen atoms were assigned in configuration providing neutral charge with the LigPrep module, and 10 low energy conformers were generated with MacroModel. In each case, the conformer having the lowest energy (that included also GB estimate of hydration free energy) was selected for subsequent calculations. Partial charges were generated using the AM1-BCC method, and LJ potential parameters from the OPLS force field were assigned with the use of Antechamber program. As described above, the proposed model has six control parameters. Four of them (energy of hydrogen bond εHB, number of hydrogen bonds in the bulk Nb, dielectric constant ε, and drying threshold δ) govern the solvent evolution, and two (λEH and λLJ) are used once the final solvent distribution is obtained for scaling LJ and EH contributions to hydration free energy. In order to find the combination of parameters providing the best reproduction of experimental data, an exhaustive search procedure was applied. First, the calculations were performed for all combinations of four main control parameters within the

εHB

Nb

ε

δ

λEH

λLJ

-7.0 -3.0 0.5 -4.0

3.0 4.0 0.1 3.5

1.0 2.6 0.2 2.2

3.0 6.0 0.5 4.5

0.01 2.50 0.01 1.51

0.01 2.50 0.01 2.18

a εHB is the energy of hydrogen bond (kcal/mol), Nb is the number of hydrogen bonds in the bulk, ε is the dielectric constant, δ is the drying threshold (kcal/mol), and λEH and λLJ are the magnitudes of LJ and EH contributions to hydration free energy.

ranges presented in Table 1. Then, for each such combination, λEH and λLJ values minimizing the root-mean-square error (RMSE) with respect to the experimental free energies were obtained, using 10-fold crossvalidation fitting procedure. During this procedure, compounds from the training set were randomly assigned to one of 10 groups. Fitting was then performed for 9 such groups, and the remaining one group was used as a test set. This process was repeated 10 times, allowing each of 10 groups to serve as a test set once. In order to estimate the fit of the considered combination of four main parameters, a sum of mean RMSE obtained for the test sets with three times of its standard error (〈RMSE〉 + 3δ〈RMSE〉) were used. Final values of λEH and λLJ were then obtained based on the entire training set. The combination of control parameters providing the minimum value of 〈RMSE〉 + 3δ〈RMSE〉 is presented in Table 1, and the corresponding values of hydration free energies are discussed in the Results and Discussion section. This combination was used for all subsequent calculations, including the estimation of hydration free energies for test set molecules, and the detection of localized water molecules. Results and Disucssion Hydration of Hard Sphere Solutes: Water Surface Tension. Investigation of simple, hard sphere solutes (HS) provides insight into size dependence of predicted hydration free energies and also, by considering the limit of infinite curvature radii, allows us to compare predictions of the model with experimental surface tension. The size dependence of effective water surface tension has been a long-standing research topic.57-59 Recent advances, on both the theoretical21,60 and simulation61,62 side, highlight fundamental differences between small and large scale hydration, with the former one being dominated by volumeproportional cost of cavity creation and the latter by surface area-proportional cost of interface formation. The apparent crossover between those two hydration regimes occurs at subnanometer scale, and the corresponding curvature dependence of surface tension is approximately captured by formula derived on the ground of scaled particle theory (SPT):58 γ(R) ) γ∞(1 - 2τ/R). Here, γ∞ corresponds to planar limit of surface tension, and τ is the so-called Tollman length, expected to be of molecular size.63 The obtained surface tension for HS solutes as a function of interface radius (calculated based on distribution of grid points from the first hydration shell around the solute) is presented in the inset of Figure 4. For comparison, γ(R) resulting from the analytical SPT formula is plotted, with γ∞ ) 103 cal/mol/Å2 and τ ) 0.9 Å, obtained by fitting to results of explicit solvent MD utilizing SPC/E water61 that fairly well reproduce experimental water surface tension. As the model was not parametrized with the goal of the reproduction of water surface tension, the obtained results agree

8672

J. Phys. Chem. B, Vol. 114, No. 26, 2010

Figure 4. Calculated vs experimental hydration free energies for the training set (TR, open circles) and the test set (TE, black squares); RMSE values are in kcal/mol. Inset: surface tension coefficient based on calculations for hard sphere solutes (dots) compared to the explicit solvent MD results (dashed line) for SPC/E water model described by analytic SPT formula with γ∞ ) 103 cal/mol/Å2 and τ ) 0.9 Å.61 R denotes the average radius of solvent interface.

surprisingly well with the phenomenological SPT formula. The model predicts correctly the initial increase of surface tension with the solute size, and the crossover to interfacial regime around R ) 7 Å, indicating that the assumed geometry of water intermolecular interactions is reasonable. Furthermore, the value of surface tension in the infinite radii limit seems to be close to the experimental one of 103 cal/mol/Å2. This is particularly interesting as long-range water-water interactions are not included into the model. On the other hand, as the contribution of water-water interactions to the hydration free energy is parametrized based only on the single shell of grid points, perhaps the effect of interfacial water molecules loosing neighbors from the considerable part of the solid angle is able to account for the observed surface tension. At the small radii limit, it is clearly visible that the predicted values of surface tension become increasingly underestimated. It points to the inherent weakness of the adopted, surface-based approach to the calculation of hydration free energies, which focuses on solvent reorganization energy in the hydration shell, but neglects the important entropic effect of the solvent excluded volume. Indeed, according to the current view64 water molecules close to small, methane-like solutes do not loose much of their interactions, and also, the dominant part of unfavorable solvation entropy does not come from hydrogen bond rearrangement but rather from suppression of spatial solvent fluctuations due to presence of solute.65,66 The aforementioned problem may be particularly important for very small hydrophobic solutes like methane whose unfavorable hydration free energies are underestimated (see Table 3 in the Supporting Information). Free Energies of Hydration. The overall RMSE for hydration free energies obtained for the training set with the best combination of model parameters is 1.88 kcal/mol, and the correlation coefficient with the experimental data is 0.92. The RMSE value may seem to be relatively large, but one should keep in mind that more than one-third of molecules used in the training set are substantially more complex than compounds usually considered in the context of solvation free energy predictions. RMSE obtained for the SAMPL1 molecules alone is 2.30 kcal/mol (and 1.62 kcal/mol for the rest of the set). As a comparison, extensive explicit solvent molecular dynamics simulations, employing Bennet acceptance ratio (BAR) method for free energy estimation, yielded RMSE of 3.53 kcal/mol for

Setny and Zacharias those 56 compounds.52 Similarly, predictions obtained with several widely used PB-based models gave RMSE ranging from 2.44 to 3.64 kcal/mol.2 Certainly, comparing results of blind SAMPL1 challenge with results obtained upon fitting is not fair, even though SAMPL1 molecules constitute only part of the training set. Nevertheless, as most solvation models usually perform much better than in case of SAMPL1 set, it illustrates how demanding this set is. The actual performance of the model was evaluated with the use of the test set. Here again, relatively complex drug-like molecules were used, with hydration free energies ranging from -11 to +1 kcal/mol.56 RMSE of 1.48 kcal/mol and correlation coefficient of 0.88 were obtained for 16 compounds, including two outliers having errors greater than 2 kcal/mol. For comparison, PB-based implicit solvent model yielded RMSE of 1.87 and 2.57 kcal/mol depending on atomic radii parametrization, and explicit solvent simulation, using Antechamber AM1-BCC charges and BAR method gave RMSE of 1.53 kcal/mol. The overall performance of the model in hydration free energy predictions for small molecules seems to be within the range of PB based implicit solvent models. It should be noted, that standard force field parameters were used here for all calculations, and no attempt was done toward their optimization. As it was indicated by performance of Nicholls group in the SAMPL1 challenge,55 RMSE of 2.6 kcal/mol obtained in the blind test for a subset of 40 compounds by their PB-based implicit solvent model, dropped to 1.42 kcal/mol in retrospective study after modification of charge calculation scheme and adjustment of some atomic parameters. Although the model presented here is not expected to be as sensitive to radii modifications as PB based methods, alternative radii sets may be considered for further optimization along with different charge calculation algorithms. Localized Water Molecules. In order to test the ability of the model to predict the locations of buried water molecules, a set of protein structures was considered that met the following criteria: obtained with the use of X-ray diffraction at resolution equal to or better than 1.00 Å and contained no ligands or associated nonpeptide macromolecules. A total of 32 such structures were found in the PDB database. They were downloaded and processed with the pdb2pqr program.67 The preparation procedure included the addition of missing heavy atoms and hydrogen atoms (pH of 7.0 was assumed), structure refinement, hydrogen bonds optimization, and the assignment of Amber 99 force field parameters, including partial charges. Buried water molecules were detected with the use of VMD program.68 A water molecule was considered as buried when its solvent accessible surface area (SASA), evaluated while all other water molecules were removed from the structure, was equal to zero. A standard probe of 1.4 Å radius was used for those calculations. The results were then visually inspected, and some water molecules occupying closed cavities large enough to allow the fragment of their “solvent accessible” surface to appear (also due to the removal of neighboring buried water) were manually assigned as buried. In a similar manner, two other sets, containing water molecules located in deep pockets at protein surface, were defined as follows: W5 containing water molecules whose SASA ranged from 0 to 5% of the total water molecule surface area, and W10, corresponding to SASA between 5 and 10% of the total surface area. Out of 32 considered protein structures, buried water molecules (88 in total, see Table 2) were found in 16, and they were used in subsequent calculations. For those structures, W5 and W10 sets comprised 108 and 96 molecules respectively.

Hydration in Discrete Water

J. Phys. Chem. B, Vol. 114, No. 26, 2010 8673

TABLE 2: Buried Water Molecules Detected in the Considered Protein Structuresa PDB i.d.

water i.d.

1AHO 1CEX 1MN8(A) 1P1X(A) 1UCS 1V0L

44 6, 7, 8, 9, 10, 12 1, 28 7, 10, 20, 26, 27, 60, 305 4 49, 109, 147, 151, 258, 267, 269, 271, 277, 279, 314, 323, 348, 350, 351, 359, 383, 414, 544, 545, 583, 584, 589 1X6Z 1, 2, 6, 15 1ZLB 1, 2, 7 2A6Z 2, 6, 7, 11, 12, 14, 17, 18, 22, 23, 46, 100, 136, 140, 149, 266, 84 2GKG 22, 24 2H3L(A) 15, 20, 109 2P5K 2 2PND 7, 82, 235 2PPN 1, 2, 3 2PPP 1, 2, 3 3F1L(A) 11, 47, 48, 87, 131, 134, 210, 265, 368 a In the case of multiple identical chains found in the PDB file, only one of them was used (ID given in parentheses).

In order to determine the rate of false positive predictions (i.e., situations when buried water molecules are placed where they should not be according to X-ray data), the presence of unoccupied, closed cavities within the considered proteins was determined with the use of CASTp server.69 Before uploading to the server, all existing water molecules and hydrogen atoms were removed from the structures. To ensure good exploration of potential void regions, calculations were performed with a spherical probe of 1.3 Å radius, instead of a standard 1.4 Å (see comment below). On the basis of the CASTp results, geometric centers of cavities were marked with pseudoatoms. Some large void regions had more than one pseudoatom assigned, and in such cases minimum distance of 2.8 Å between pseudoatoms was assured by deleting the intermediate points. As there were only a few such cases, no specific algorithm was employed, and the pseutoatoms were removed manually. Finally, pseudoatoms belonging to hydrated cavities were discarded if they were closer than 2.8 Å to position of crystallographic water molecule. In total, 157 internal cavities were found, of which 81 (52%) contained buried water molecules. Interestingly, the locations of 7 buried water molecules were not detected as cavities, indicating their tight fit into host protein structures. Calculations with the use of the model were carried out for pure polypeptide chains (i.e., without water molecules and any crystallographic cofactors) in their all atom representation. Upon the completion of solvent evolution, hydration sites or cavities were considered as occupied, when at least one occupied grid point was found within 1.4 Å from the position of corresponding water molecule or cavity pseudoatom. It was also checked that each occupied grid point found in protein interior had its assignment, so that no false positive results were omitted. It is worth noting, that some unassigned grid points were found when 1.4 Å probe was used for cavity detection, hence motivating the use of 1.3 Å probe. Ten independent runs with random grid orientations were performed for each protein structure, and an average occupancy was calculated for all sites. As it is presented in Figure 5A, 25% of buried water molecules were detected in each run, and 77% were detected at least once. Out of W5 water molecules, 41% were always found and 94% were found at least once, whereas for the W10 set those numbers read 78 and 99%, respectively. At the same time, 61% of cavities that should remain empty did not contain any grid point in all 10 runs.

Figure 5. (A) Occupancy distributions for the considered sets, obtained after 10 independent runs. (B) Statistical descriptors for a given occupancy cutoff: PREC, precision; ACC, accuracy; TPR, true positive rate; FPR, false positive rate.

Limiting analysis to buried water molecules, and using the occupancy of 0.4 as a threshold above which a given cavity is considered as hydrated, the true positive rate (i.e., the probability of detecting a hydrated cavity) is 64% with an overall accuracy (i.e., the probability of proper distinction between hydrated and empty cavity) of 71%. At the same threshold, the precision of the method (i.e., the probability that the found cavity is indeed hydrated) is 77%, and the false positive rate (i.e., the probability of erroneously indicating an empty cavity as hydrated) is 21%. As illustrated in Figure 5B, greater precision and a lower false positive rate can be obtained when a larger threshold is used but at the expense of true positive rate. Conclusions and Outlook A simple, semiheuristic solvation model has been presented. The model is based on a discrete BCC grid of solvent cells, and solute-solvent and solvent-solvent interactions are calculated adopting a mean field approach. The construction of the effective Hamiltonian for a solvent cell provides an explicit coupling between orientation-dependent water-solute electrostatic interactions and water-water hydrogen bonding. In contrast to most implicit solvent approaches, the model does not rely on any arbitrary definition of solute-solvent interface. Instead, solvent distribution in the presence of solute is determined using a cellular automata based algorithm and is sensitive to the details of local solute-solvent interactions. Similarly, no microscopic surface tension parameter is needed for the calculation of the nonpolar hydration free energy component. The necessary contributions are obtained based on changes in local water-water interactions and depend on the geometry of the solute-solvent interface. It was demonstrated that the model provides satisfactory results for the calculations of hydration free energies of druglike molecules and the detection of localized water molecules within protein structures. Also, the estimated surface tension of hard sphere solutes agreed surprisingly well with the theoretical predictions over a wide range of the considered radii. All of the presented results were obtained with a single parameter set, optimized for predicting free energies of hydra-

8674

J. Phys. Chem. B, Vol. 114, No. 26, 2010

tion. Adjusting some parameters (in particular δ, the drying threshold) should further improve the model performance in the detection of localized water molecules. On the other hand, the universal parameter set is beneficial, as potentially the model may be applied in docking studies, providing estimates of changes in hydration free energies upon binding, with the effects of potentially localized water molecules included “on the fly”. It is worth mentioning that, as opposed to some other approaches aimed at the predictions of localized water molecules, the model does not rely on the knowledge based rules for water placement in the protein environment but is “compatible” with all forcefield described molecules, equally on the receptor and the ligand side. Furthermore, it takes into account direct water-water interactions, which should facilitate the detection of quite frequently encountered groups of buried water molecules connected by hydrogen bonds network. Certainly, for the proper treatment of the energetic effects related to the presence of such localized water molecules, it would be necessary to treat the relevant solvent grid points differently as those belonging to the hydration shell. Otherwise, the currently used, common scaling factor based on the bulk water number density will lead to the underestimation of their contributions. Perhaps, a clustering algorithm may be introduced in the postprocessing of the obtained solvent distribution, and different weights may be assigned to the contributions of buried grid points recognized as localized water molecules. The idea of applying the model in docking studies, although not yet tested, seems to be particularly appealing, as given the rigid receptor structure its electrostatic and LJ potentials can be calculated only once and reused for multiple ligands (or multiple ligand poses) owing to the additivity of the potentials. This should significantly speed up the calculations, as the initial determination of interaction potentials is the most timeconsuming step of the method. It should be stressed that, although the idea for calculating the effective Hamiltonian of a solvent cell may be treated as a general basis of the presented approach, the proposed methods for obtaining the solvent distribution and estimating the free energy of hydration may be further developed and tuned. In particular, obtaining hydration free energies based on a single shell of occupied grid points may not provide satisfactory results for charged solutes. In the most similar approach to the one presented here, the Langevin dipole model, grid points up to the distance of 20 Å from the van der Waals solute surface are considered; however, most of them are discarded in the actual calculations, on the basis of the magnitude of the electrostatic field.47 Similarly, a bulk correction, accounting for long-range electrostatic effects due to embedding a charge distribution in infinite dielectric solvent medium is currently not considered. Although small for neutral solutes (estimated to be in the range of -0.1 kcal/mol47), it would be certainly important for ionic species. At the current stage, no attempts were made to use the model for the prediction of hydration free energies for charged solutes. Nevertheless such a possibility will be explored in the future, with the hope that the model will be able to reproduce the well-known asymmetry in hydration free energies of oppositely charged ions,70 benefiting from the asymmetry of water charge distribution and the coupling between electrostatic interactions and hydrogen bonding, being explicitly built into the effective Hamiltonian. Acknowledgment. P.S. is supported by a grant (Za153/191) from DFG (Deutsche Forschungsgemeinschaft) to M.Z.

Setny and Zacharias Supporting Information Available: Table summarizing experimental and calculated hydration free energies for all molecules included in the training and the test set. This material is available free of charge via the Internet at http://pubs.acs.org. Appendix Simple Formula for Calculating Hydration Free Energy. Assume that, in the presence of solute, the solvent can adopt different spatial configurations {n}, corresponding to different distributions of i occupied cells. For a given configuration, the Hamiltonian of the solvent can be expressed as a sum of i effective Hamiltonians

H({n}) )

∑ Heff(ri, {n})

(11)

i

hence the partition function of the system is

Z)

∑ e-βH({n}) ) ∑ ∏ e-βH {n}

{n}

eff(ri,{n})

(12)

i

Assuming that the solvent configuration {n}s obtained upon solvent evolution has the lowest H (there is no strict proof for this assumption, and that is why the model remains heuristic) and limiting the summation in eq 12 only to this configuration, the free energy of the system is

Gs ) -kBT ln

∏ e-βH

eff(ri,{n}s)

i

)

∑ Heff(ri, {n}s) i

(13) Taking the free energy of uniformly distributed i solvent cells (unperturbed by the solute) as G0 ) ∑iHb, the free energy difference due to solute insertion is

∆G ) Gs - G0 )

∑ (Heff(ri, {n}s) - Hb)

(14)

i

which, after switching from summation over occupied cells to summation over grid points weighted by a binary density F(r) and introducing a scaling factor of VFb, gives eq 9 in section “Model Description”. Note that the number of occupied solvent cells is assumed to be the same before and after the solute insertion, hence the solvent effectively expands. It should remain clear, however, that all the entropic effects due to spatial solvent rearrangement are neglected in the current approach. References and Notes (1) Roux, B.; Simonson, T. Biophys. Chem. 1999, 78, 1–20. (2) Guthrie, J. P. J. Phys. Chem. B 2009, 113, 4501–4507. (3) Chipot, C.; Pohorille, A. Free Energy Calculations: Theory and Applications in Chemistry and Biology; Springer: New York, 2007. (4) Zacharias, M. J. Phys. Chem. A 2003, 107, 3000–3004. (5) Gallicchio, E.; Levy, R. M. J. Comput. Chem. 2004, 25, 479–499. (6) Wagoner, J. A.; Baker, N. A. Proc. Natl. Acad. Sci. U.S.A. 2006, 103, 8331–8336. (7) Still, W. C.; Tempczyk, A.; Hawley, R. C.; Hendrickson, T. J. Am. Chem. Soc. 1990, 112, 6127–6129. (8) Sharp, K. A.; Honig, B. J. Phys. Chem. 1990, 94, 7684–7692. (9) Lee, B. K.; Richards, F. M. J. Mol. Biol. 1971, 55, 379–400. (10) Sitkoff, D.; Sharp, K. A.; Honig, B. J. Phys. Chem. 1994, 98, 1978– 1988. (11) Eisenberg, D.; McLachlan, A. D. Nature 1986, 319, 199–203.

Hydration in Discrete Water (12) Wang, J.; Wang, W.; Huo, S.; Lee, M.; Kollman, P. J. Phys. Chem. B 2001, 105, 5055–5067. (13) Chambers, C.; Hawkins, G.; Cramer, C.; Truhlar, D. J. Phys. Chem. 1996, 100, 16385–16398. (14) Berne, B. J.; Weeks, J. D.; Zhou, R. Annu. ReV. Phys. Chem. 2009, 60, 85–103. (15) Rasaiah, J. C.; Garde, S.; Hummer, G. Annu. ReV. Phys. Chem. 2008, 59, 713–740. (16) Qvist, J.; Davidovic, M.; Hamelberg, D.; Halle, B. Proc. Natl. Acad. Sci. U.S.A. 2008, 105, 6296–6301. (17) Barratt, E.; Bingham, R. J.; Warner, D. J.; Laughton, C. A.; Phillips, S. E. V.; Homans, S. W. J. Am. Chem. Soc. 2005, 127, 11827–11834. (18) Liu, P.; Huang, X.; Zhou, R.; Berne, B. J. Nature 2005, 437, 159– 162. (19) Zhou, R.; Huang, X.; Margulis, C. J.; Berne, B. J. Science 2004, 305, 1605–1609. (20) Huang, X.; Margulis, C. J.; Berne, B. J. Proc. Natl. Acad. Sci. U.S.A. 2003, 100, 11953–11958. (21) Lum, K.; Chandler, D.; Weeks, J. D. J. Phys. Chem. B 1999, 103, 4570–4577. (22) Dzubiella, J.; Swanson, J. M. J.; McCammon, J. A. J. Chem. Phys. 2006, 124, 084905. (23) Cheng, L.-T.; Wang, Z.; Setny, P.; Dzubiella, J.; Li, B.; McCammon, J. A. J. Chem. Phys. 2009, 131, 144102. (24) Setny, P.; Wang, Z.; Cheng, L.-T.; Li, B.; McCammon, J. A.; Dzubiella, J. Phys. ReV. Lett. 2009, 103, 187801. (25) Computational and Structural Approaches to Drug DiscoVery; Stroud, R. M., Finer-Moore, J. , Eds.; The Royal Society of Chemistry: London, 2007; pp 95-110. (26) Li, Z.; Lazaridis, T. Phys. Chem. Chem. Phys. 2007, 9, 573–581. (27) Dunitz, J. D. Science 1994, 264, 670. (28) Hamelberg, D.; McCammon, J. A. J. Am. Chem. Soc. 2004, 126, 7683–7689. (29) Resat, H.; Mezei, M. Biophys. J. 1996, 71, 1179–1190. (30) Goodford, P. J. J. Med. Chem. 1985, 28, 849–857. (31) Rarey, M.; Kramer, B.; Lengauer, T. Proteins 1999, 34, 17–28. (32) Schymkowitz, J. W. H.; Rousseau, F.; Martins, I. C.; FerkinghoffBorg, J.; Stricher, F.; Serrano, L. Proc. Natl. Acad. Sci. U.S.A. 2005, 102, 10147–10152. (33) Jiang, L.; Kuhlman, B.; Kortemme, T.; Baker, D. Proteins 2005, 58, 893–904. (34) Imai, T.; Hiraoka, R.; Kovalenko, A.; Hirata, F. Proteins 2007, 66, 804–813. (35) Yoshida, N.; Imai, T.; Phongphanphanee, S.; Kovalenko, A.; Hirata, F. J. Phys. Chem. B 2009, 113, 873–886. (36) Hansen, J. P.; McDonald, I. R. Theory of simple liquids; Elsevier: Amsterdam, The Netherlands, 2006. (37) Bell, G. M. J. Phys. C: Solid State Phys. 1972, 5, 889–905. (38) Meijer, P. H. E.; Kikuchi, R.; Papon, P. Phys. A: Stat. Theor. Phys. 1981, 109, 365–381. (39) Besseling, N. A. M.; Lyklema, J. J. Phys. Chem. 1994, 98, 11610– 11622. (40) Roberts, C. J.; Debenedetti, P. G. J. Chem. Phys. 1996, 105, 658– 672.

J. Phys. Chem. B, Vol. 114, No. 26, 2010 8675 (41) Besseling, N.; Lyklema, J. J. Phys. Chem. B 1997, 101, 7604– 7611. (42) Eads, C. D. J. Phys. Chem. B 2002, 106, 12282–12290. (43) Chandler, D. Introduction to modern statistical mechanics; Oxford University Press: Oxford, U.K., 1987. (44) Berendsen, H. J. C.; Grigera, J. C.; Straatsma, T. P. J. Phys. Chem. 1987, 91, 6269–6271. (45) Kumar, R.; Schmidt, J. R.; Skinner, J. L. J. Chem. Phys. 2007, 126, 204107. (46) Warshel, A. J. Phys. Chem. 1979, 83, 1640–1652. (47) Florian, J.; Warshel, A. J. Phys. Chem. B 1997, 101, 5583–5595. (48) Florian, J.; Warshel, A. J. Phys. Chem. B 1999, 103, 10282–10288. (49) Jorgensen, W.; Maxwell, D.; Tirado-Rives, J. J. Am. Chem. Soc. 1996, 118, 11225–11236. (50) Jakalian, A.; Bush, B. L.; Jack, D. B.; Bayly, C. I. J. Comput. Chem. 2000, 21, 132–146. (51) Jakalian, A.; Jack, D. B.; Bayly, C. I. J. Comput. Chem. 2002, 23, 1623–1641. (52) Mobley, D. L.; Bayly, C. I.; Cooper, M. D.; Dill, K. A. J. Phys. Chem. B 2009, 113, 4533–4537. (53) Sulea, T.; Wanapun, D.; Dennis, S.; Purisima, E. O. J. Phys. Chem. B 2009, 113, 4511–4520. (54) Klamt, A.; Eckert, F.; Diedenhofen, M. J. Phys. Chem. B 2009, 113, 4508–4510. (55) Nicholls, A.; Wlodek, S.; Grant, J. A. J. Phys. Chem. B 2009, 113, 4521–4532. (56) Nicholls, A.; Mobley, D. L.; Guthrie, J. P.; Chodera, J. D.; Bayly, C. I.; Cooper, M. D.; Pande, V. S. J. Med. Chem. 2008, 51, 769–779. (57) Tolman, R. C. J. Chem. Phys. 1949, 17, 333–337. (58) Reiss, H.; Frisch, H. L.; Helfand, E.; Lebowitz, J. L. J. Chem. Phys. 1960, 32, 119–124. (59) Stewart, M. C.; Evans, R. Phys. ReV. E Stat. Nonlin. Soft Matter Phys. 2005, 71, 011602. (60) Hummer, G.; Garde, S.; Garcia, A.; Pratt, L. R. Chem. Phys. 2000, 258, 349–370. (61) Huang, D. M.; Geissler, P. L.; Chandler, D. J. Phys. Chem. B 2001, 105, 6704–6709. (62) Rajamani, S.; Truskett, T. M.; Garde, S. Proc. Natl. Acad. Sci. U.S.A. 2005, 102, 9475–9480. (63) Stillinger, F. H. J. Solution Chem. 1973, 2, 141–158. (64) Chandler, D. Nature 2005, 437, 640–647. (65) Chandler, D. Phys. ReV. E 1993, 48, 2898–2905. (66) Hummer, G.; Garde, S.; Garcia, A. E.; Pohorille, A.; Pratt, L. R. Proc. Natl. Acad. Sci. U.S.A. 1996, 93, 8951–8955. (67) Dolinsky, T. J.; Nielsen, J. E.; McCammon, J. A.; Baker, N. A. Nucleic Acids Res. 2004, 32, W665-W667. (68) Humphrey, W.; Dalke, A.; Schulten, K. J. Mol. Graph. 1996, 14, 33–38. (69) Dundas, J.; Ouyang, Z.; Tseng, J.; Binkowski, A.; Turpaz, Y.; Liang, J. Nucleic Acids Res. 2006, 34, W116-W118. (70) Marcus, Y. Chem. ReV. 2009, 109, 1346–1370. (71) Cabani, S.; Gianni, P.; Mollica, V.; Lepori, L. J. Solution Chem. 1981, 10, 563–595.

JP102462S