Comparison Study of Polar and Nonpolar ... - ACS Publications

Article Cite This: J. Chem. Inf. Model. 2017, 57, 2539-2553

pubs.acs.org/jcim

Comparison Study of Polar and Nonpolar Contributions to Solvation Free Energy Redona Izairi† and Hiqmet Kamberaj*,‡ †

Department of Physics, Faculty of Mathematics and Natural Sciences, State University of Tetovo, Street Ilinden, 1200, Tetovo, Republic of Macedonia ‡ Department of Computer Engineering, Faculty of Engineering, International Balkan University, Tashko Karadza, 11A, 1000, Skopje, Republic of Macedonia S Supporting Information *

ABSTRACT: In this study, we compared the contributions of polar and nonpolar interactions to the solvation free energy of a solute in solvent, which is decomposed into four different terms based on the nature of interactions: (i) electrostatic solvation free energy term counting for the work done to move solute charges from fixed points in some reference environment to their configuration positions in solvent; (ii) solute−solvent van der Waals dispersion interactions; (iii) change on solvent−solvent interactions and solvent entropy due to reorganization of solvent around solute cavity in solvent; and (iv) compensation of electrostatic forces acting on the dielectric surface boundary between solvent and solute. We compared these contributions to each other for a data set of 573 proteins, which were prepared using CHARMM22 and AMBER force fields. In addition, we compared the calculated with experimental hydration free energies for a data set of 642 small molecules, which were prepared using the general AMBER force field. Our results indicated the significance of each term to the total solvation free energy.

1. INTRODUCTION The nonpolar contribution of the solvation free energy is associated with attractive short-range dispersion interactions for creating a cavity in solvent where the solute resizes and solute− solvent dispersion interactions occur:1 solv solv,cav solv ΔGnonpolar = ΔGnonpolar + ΔGvdW

solvent and solute for creation of the cavity in solvent and the second term is the work done for moving a partial charge of a solute placed at some fixed point origin in a reference medium to another fixed point in of the solute in a solvent medium.5−8 The first term of eq 2 is discussed in more detail below. The cavity nonpolar contribution of the solvation free energy is often given as a sum of atomic contributions, assumed proportional to the molecular surface area, Si:2,9

(1)

ΔGsolv,cav nonpolar

where is the free energy cost for creation of a cavity in solvent with the size of a macromolecule, which includes the change of solvent−solvent interaction energy and change of solv solvent entropy. 2 ΔG vdW is the free energy cost of compensating for solute−solvent van der Waals dispersion interactions. Both terms have opposite signs and are anticorrelated with each other. In addition, both these interactions are short-range in nature and are, thus, associated with only the first solvent shell.1,3,4 Moreover, the nonpolar solv,cav term, ΔGnonpolar , often is used to include the entropic contribution due to the changes in solvent structures near the solute. Length-scale dependence of the cavity creation and solvent screening of solute−solvent dispersion interactions have to be properly described to have an accurate representation of the implicit modeling of nonpolar solvation free energy. In this study, the polar contribution to the solvation free energy is written as solv solv,cav solv ΔGpolar = ΔGpolar + ΔGelec

solv,cav ΔGnonpolar =

i

(3)

where the proportionality constant γi depends on the type of atom, calibrated using the experimental data by a fitting procedure. In most of generalized Born molecular surface area (GB/SA) and Poisson−Boltzmann molecular surface area (PB/ SA) models, further approximations are made by assuming a universal Γ value for all atom types, and eq 3 can be written as2,9−15 solv,cav ΔGnonpolar = ΓS

(4)

where S is the total molecular surface area of the macromolecule. Often, the coefficient Γ and parametrization of the surface effects in eq 4 are carefully designed to also include short-range effects, dominated by the first solvation shell, such as nonlinear response of solvent to the local electric field near

(2)

where the first term indicates the compensation of electrostatic forces acting on the dielectric surface boundary between the © 2017 American Chemical Society

∑ γiSi

Received: June 18, 2017 Published: September 7, 2017 2539

DOI: 10.1021/acs.jcim.7b00368 J. Chem. Inf. Model. 2017, 57, 2539−2553

Article

Journal of Chemical Information and Modeling solute and charge transfer to or from solvent4 (and the reference therein16). Equation 4 is also used to model other effects rather than non polar contributions to solvation free energy, for instance, accessible surface area has been used as a measure of thermodynamic parameters of peptide hydration.2 Thus, based on the empirical nature of the factor Γ, different values has been used for its parametrization, underlying different effects, including macromolecular force field, electrostatic solvation model, and choice of the macromolecule− solvent boundary.4 Indeed, the values of Γ range from 5 to 7 cal/(mol Å2),8,10,17 or from 40 to 70 cal/(mol Å2).18,19 This wide range of the values of Γ could be interpreted as different models used to represent the molecular surface.4 However, small values of Γ indicate that molecular surface contribution has only a small impact on the free energy changes due to large scale conformation changes. It should be noted that precise definition of the solute− solvent interface is an important physical property for both nonpolar and electrostatic solvation free energies. Although, molecular surface nonpolar term contains empirical corrections to compensate for various effects related to the first hydration shell and force field terms. Moreover, the optimal choice of the solute−solvent interface boundary for non polar term of the solvation free energy may not necessary coincide with that of the electrostatic contribution to solvation free energy.20 The evaluation of solvation free energies can be carried out by the computationally more intensive free energy perturbation (FEP) method.21 Although the continuum methods, such as the generalized Born and Poisson−Boltzmann discussed in this study, are less accurate, they have shown to be attractive methods, because they are computationally less expensive. Thermodynamic decomposition of hydration free energy into enthalpy and entropy terms has been introduced for a small number of organic molecules,22 which has indicated the importance of force field parametrization to achieve an accurate decomposition. Interesting decomposition of the free energy into terms arising from the contributions of different interaction components of the potential energy function using the FEP simulations has also been proposed.23 This study was conducted for complex systems of deoxyribonucleoside triphosphates in aqueous solution. Further study described in details the necessary conditions for obtaining an additive decomposition of the free energy in the framework of FEP method.24 We think that intuitive reasons for decomposition of the free energy into terms that may be additive originates from the pairwise nature of the most of force fields developed for computer simulations of (bio)molecular systems. The FEP computation method involves calculation of higher order potential energy mixed terms,23,24 since it practically requires computation of logarithm of ensemble average of exponential function,21 and hence the additivity may be violated unless longer FEP simulations are performed.24 Another issue on performing FEP simulation in the case of decomposition of the free energy into different terms is the accuracy on assessing the average value of each term due to the sampling inefficiency in each window, in particular, when the fluctuations of potential energy terms of decomposition are high, such that the width of probability energy distribution exceeds a few kBT (where T is the simulation temperature of system and kB is the Boltzmann’s factor). However, we can suggest thermodynamic integration (TI)25 as an alternative method for performing free energy simulations, which is based on a linear coupling of the initial and final Hamiltonian states, and hence can trivially be

decomposed into different energy and/or degrees of freedom terms. Moreover, to improve the sampling of TI simulations, the so-called λ-dynamics simulation can practically be performed26 (and the references therein). In this study, we consider decomposition of the solvation free energy into four different physical terms: (i) the first term takes into account only the change on solvent−solvent interaction energy and solvent entropy contributions; (ii) the second term compensates for short-range dispersion interactions; (iii) the third term compensates for electrostatic forces acting at the dielectric surface between solvent and solute; and (iv) electrostatic solvation free energy term counting for the work done to move solute charges from fixed points in some reference environment to their positions in solute configuration in water. Testing is performed on two benchmark systems: a data set of 573 proteins and a data set of 642 small molecules. The results obtained here are also compared with experimental hydration free energies for the data set of small molecules using a fitting procedure with two and three free parameters.

2. THEORY 2.1. Molecular Surface Nonpolar Term. Molecular surface nonpolar term can be computed using eq 4. This term represents the nonpolar contribution to the cavity creation. It includes the change on solvent-solvent interaction energy of the solvent molecules inside the solute size cavity merging into solvent medium. In addition, it also includes the change on entropy due to reorganization of solvent molecules around the solute.1−4,9 Using the definition of molecular surface (see Appendix A.2), eq 4 can be written as ; solv ΔGnp,cav = Γ ∑ δσt

(5)

t=1

where the sum runs over all triangle molecular surface patches ;. 2.2. van der Waals Term. The van der Waals contribution to nonpolar solvation free energy is derived using a continuum model for interpreting the solvent screening of solute−solvent dispersion interactions.1,3,4 In this model, the solvent (e.g., water) is represented by a single site (e.g., oxygen) with a constant density in the region outside solute volume, and the dispersion interaction energy between the solute and solvent is evaluated as a sum over all M solute atomic sites of the following volume integrals:1,3 M solv ΔGvdW = ρw ∑ i=1

3 ∫solvent ui(vdW) , w (|r − ri|) d r

(6)

where ρw is the solvent bulk number density at standard (vdW) conditions, ui,w is the dispersion interaction potential between the atom i of the solute and solvent, which is defined as the attractive part of the following decomposition of the Lennard-Jones potential:1,3 ⎧− ε , r ≤ 21/6σij ij ⎪ ⎪ (r ) = ⎨ ⎡⎛ σij ⎞12 ⎛ σij ⎞6 ⎤ ui(vdW) ,j ⎪ 4εij⎢⎜ ⎟ − ⎜ ⎟ ⎥ , r > 21/6σij ⎪ ⎣⎝ r ⎠ ⎝r ⎠⎦ ⎩

(7)

and 2540


Article

Journal of Chemical Information and Modeling ⎧ ⎡⎛ σ ⎞12 ⎛ σ ⎞6 ⎤ ij ij 1/6 ⎪ ⎪ 4εij⎢⎜⎝ ⎟⎠ − ⎜⎝ ⎟⎠ ⎥ + εij , r ≤ 2 σij (rep) r r ⎦ ui , j (r ) = ⎨ ⎣ ⎪ ⎪ 0, r > 21/6σij ⎩

dielectric due to the electrostatic forces of charge distribution in macromolecule is given by29 f = ρfE + (8)

Thus, eq 6 can be written as

solv ΔGvdW

∫

∫

(9)

4εiwσ12 iw

4εiwσ6iw

where Ai = and Bi = are the force constants, which depend on the atom i and solvent molecule site (e.g., oxygen for water) determined using the rules specified for the Lennard-Jones type of potential.25 Ωw denotes the entire solvent volume. The integral can be evaluated using different approaches, as discussed in ref 4, or, alternatively, other literature.3,27,28 Here, we are going to introduce another approach, which converts the integral over solvent volume (see eq 6) into an integration over the surface enclosing solute volume. Calculation of ΔGsolv vdW includes an integral of the form described in Appendix A.1. Therefore, ΔGsolv vdW from eq 9 takes the following form: 1. For |r − ri| ≥ 21/6σiw: M solv ΔGvdW =

∑

ρw Ai 9

i=1 M

−

∑

∮S

ρw Bi

|r − ri|12

⎛ ∂ε(g , T ) ⎞ dε ≡⎜ ⎟ dg ⎝ ∂g ⎠T It is a good approximation to assume that ε does not vary significantly with the mass density g of the dielectric media for typical range of g values, and hence dε/dg ≪ 1. Ignoring the second term and using the Maxwell equation ∇·D = ρf, then eq 14 can be written as f = (∇·D)E −

dS

ε0 ∇(ε E·E) + ε0(ε E·∇)E 2 1 = ∇·(D ⊕ E) − (D·∇)E − ∇(D·E) + (D·∇)E 2 1 = ∇·(D ⊕ E) − ∇(D·E) 2 ⎤ ⎡ 1 = ∇·⎢D ⊕ E − (D·E)I⎥ ⎦ ⎣ (15) 2

|r − ri|6

dS (10)

2. For |r − ri| < 21/6σiw: M solv ΔGvdW =

∑

ρw εiw 3

i=1

∮S (r − ri)·n dS

(11)

where D ⊕ E is a tensor of third rank with elements (D ⊕ E)ij = DiEj, for i,j = 1, 2, 3, and I is the unitary diagonal tensor:

Since the molecular surface of solute can be represented by a set of discrete surface elements (see Appendix A.2), we can rewrite the above surface integrals as the following: 1. For |rt − ri| ≥ 21/6σiw: M solv ΔGvdW

=

∑

ρw Ai 9

i=1 M

−

∑

;

∑ t=1

ρw Bi 3

i=1

(rt − ri) ·n t 12

|rt − ri|

;

∑ t=1

⎧ 0 if i ≠ j Iij = ⎨ ⎩ 1 if i = j ⎪ ⎪

δσt

(rt − ri) ·n t |rt − ri|6

δσt

solv ΔGvdW

=

∑ i=1

ρw εiw 3

(12)

∇·(D ⊕ E) − (D·∇)E = 0

(17)

∇·(D ⊕ E) = (D·∇)E

(18)

;

∑ (rt − ri)·n tδσt t=1

(16)

D and E are the electric displacement and field vectors, respectively, related as D = ε0εE. Note that eq 15 is also valid for the dielectric regions where ρf = 0. In such cases

2. For |rt − ri| < 21/6σiw: M

ε0 (E·E)∇ε 2

= ∇·(D ⊕ E) − (D·∇)E −

(r − ri) ·n

∮S

3

i=1

(r − ri) ·n

(14)

where ρf is the fixed charge density in dielectric, g is the mass density, and ε is the dielectric constant of solvent. In eq 14, the first term gives the force acting on the fixed charges in dielectric, the second term is related to inhomogeneity of the field, and the third term is related to the inhomogeneity of dielectric media. In general, for a dielectric media, ε = ε(g, T), where T is the temperature, can be considered as a state equation. Here, we are discussing the processes for which the temperature T is constant, thus

⎧ ⎡ A Bi ⎤ 3 i ⎪ ⎢ ⎥ d r, |r − ri| ≥ 21/6σiw − ⎪ Ω w ⎣ |r − ri|12 |r − ri|6 ⎦ = ρw ∑ ⎨ i=1 ⎪ εiw d3r, |r − ri| < 21/6σiw ⎪− ⎩ Ωw M

ε0 ⎛ ε dε ⎞ ∇⎜E·Eg ⎟ − 0 (E·E)∇ε 2 ⎝ dg ⎠ 2

or

(13)

2.3. Molecular Surface Polar Term. Calculation of the cav term ΔGsolv, involves the forces acting on the dielectric polar surface boundary between the solvent and solute using the relationship of forces on the dielectric medium and the Maxwell stress tensor σ. The forces acting on an elementary volume of

Therefore, for the regions where ρf = 0 (e.g., at the dielectric boundary between solvent−solute), the force density acting on dielectric media is given by 2541


Article

Journal of Chemical Information and Modeling ε0 (E·E)∇ε 2 ε − 0 ∇(ε E·E) + ε0(ε E·∇)E 2 1 − ∇(D·E) + (D·∇)E 2 1 ∇·(D ⊕ E) − ∇(D·E) 2 ⎤ ⎡ 1 ∇·⎢D ⊕ E − (D·E)I⎥ ⎦ ⎣ 2

3

f=− = = = =

P=

1 (D·E)I 2

=

∮S

σ · n dS

(19)

Et(1) = Et(2) ,

(20)

=

(22)

=

2 ε0εw ⎛ εm ⎞ (2) (2) ε0εw (2) (2) Et Et ⎜ ⎟ En En + 2 ⎝ εw ⎠ 2

⎡⎛ ε ⎤ ⎞ ε0 (εw − εm)⎢⎜ m − 1⎟En2 + E2 ⎥ ⎢⎣⎝ εw ⎥⎦ 2 ⎠

(29)

where En and E are the normal component and total electric field on the second medium, i.e., solute. Similarly, the tangential component St is determined as

(23)

3

St =

3

∑ (∑ (σij(2) − σij(1))nj)ti i=1 3

=

j=1 3

∑ ∑ ⎛⎝Di(2)E(2) j − ⎜

i=1 j=1

(24)

+

1 (2) (2) D E δij − Di(1)E(1) j 2

1 (1) (1) ⎞⎟ D E δij njti ⎠ 2

= Dt(2)En(2) − Dt(1)En(1) = ε0εmEt(2)En(2) − ε0εwEt(1)En(1) = ε0εmEt(2)En(2) − ε0εmEt(2)En(2) =0

(30)

where εm and εw are the solute and solvent dielectric constants, respectively. Here, En and Et are the normal and tangential components of electric field in solute medium, respectively. These results indicate that only the normal component (pressure) of the surface force exist, whereas the tangential component (stress) vanishes, which could be explained with the fact that fixed charge density is zero close to the dielectric boundary surface, and only the surface polarization induced charges exist in this boundary. Based on the molecular surface representation (see Appendix A.2), the electrostatic contribution to cavity creation in the solvent can be calculated as the following:

3

∑ FS ,iti (25)

Using eq 24, we can determine 3

∑ (σij(2) − σij(1))nj j=1

(28)

ε0εm (2) (2) ε0εm (2) (2) ε2 En En − Et Et − ε0 m En(2)En(2) 2 2 εw +

i=1

FS , i =

εwEn(1) = εmEn(2)

ε0εm (2) (2) (En En + Et(2)Et(2)) 2 ε 1 − ε0εmEn(2) m En(2) + ε0εw(En(1)En(1) + Et(1)Et(1)) 2 εw

∑ FS ,ini

i=1

Dn(1) = Dn(2) ,

P = ε0εmEn(2)En(2) −

3

St = FS ·t =

(27)

assuming that there is no free charge at the dielectric boundary surface, eq 27 reduces to

where σ(2) and σ(1) are the Maxwell stress tensors defined in the solute and solvent media, respectively. P is the normal pressure to the molecular solute surface and St is the tangential stress, which is the tangent component of FS. t is the tangential unit vector to the element surface dS. The normal pressure and the stress are defined as P = FS ·n =

1 (2) (2) D E δij − Di(1)E(1) j 2

Using the dielectric boundary conditions

where n is the unit vector outward the surface element dS. Since the normal component of the electrical field E is discontinuous at the boundary between solvent and solute media, the product σ·n in eq 23 exhibits a jump. Therefore, the difference between σ·n in the two media will give the surface force FS, which is a measure of the force per unit of area, and it can be written as FS = P n + St t = (σ (2) − σ (1)) ·n

⎜

1 (1) (1) ⎞⎟ D E δij njni ⎠ 2 1 1 = Dn(2)En(2) − D(2)E(2) − Dn(1)En(1) + D(1)E(1) 2 2

Using the Gauss’s theorem, the volume integral can be expresses as the surface integral surrounding the dielectric body

F=

∑ ∑ ⎛⎝Di(2)E(2) j − +

(21)

∫V f dV = ∫V ∇·σ dV

3

i=1 j=1

In terms of physics, the Maxwell stress tensor is the force acting on the unit area (or stress), that is, σij is the force per unit area in the ith direction acting on the surface element oriented in the jth direction. Thus, the diagonal elements (σxx, σyy, σzz) represent the pressures and off-diagonal elements are the shear stresses (referred to as simply stress in the following discussion). Time-average force acting on the dielectric body is F=

j=1

3

Equation 19 can then be written as

f = ∇·σ

∑ (∑ (σij(2) − σij(1))nj)ni i=1

Defining the Maxwell stress tensor σ as29 σ=D⊕E−

3

(26)

Combining expression for P from eq 25 with FS,i from eq 26, we obtain 2542


Article

Journal of Chemical Information and Modeling solv,cav ΔGpolar = P ΔV =

where I is the ionic strength, kB is Boltzmann’s constant, and T is the temperature. Ri is calculated according to the surface integral (see Appendix A.1):

∫ ([∮S (σ(m) − σ(w))·n dS]·n) dr

;

=

∑ (FS ,i ·ni)δVi

1 1 = Ri 4π

i=1 ;

=

∑ PiδVi (31)

1 1 = Ri 4π

where δVi is an element volume of the solute, calculated as δVi = [(rs , i − rc) ·ni]δσi

(32)

;

∑i = 1 rs , iδσi ;

∑i = 1 δσi

(33)

M

M

∑∑

−

1⎤ ⎥ εm ⎥⎦

i=1 j=1

⎡ exp( −ακf (rij)) GB ⎢ 4πε0fGB (rij) ⎢⎣ εw qiqj

(34)

where qi is the partial atomic charge of atom i and εm and εw are the dielectric constants of solute and solvent, respectively. Ri is the effective Born radius given by (see eq 83 in Appendix A.3) 1 1 = Ri 4π

∫exterior

d3r |r − ri|4

In eq 34, f GB is given by the following expression (see eq 86 in Appendix A.3)

The constants α and κ are explained in Appendix A.3 and are given by α=

;

∑ t=1

rt − ri |rt − ri|4

·n t δσt

(36)

4. RESULTS AND DISCUSSION 4.1. Data Set 1. Figure 1 shows a scatter plot of the electrostatic solvation free energy calculated using GB (x-axis) and PB (y-axis) approaches, respectively. In addition, a straight fitting line is shown, along with corresponding Pearson coefficient of determination, R2. It can be calculated that the Pearson correlation coefficient is R = 0.975, indicating a very good agreement between the two methods. On the other hand, the computation complexity (numerical algorithm and execution time) of the PB is higher compare to GB method,

1/2 ⎛ ⎛ rij2 ⎞⎞ 2 ⎜ ⎟ ⎟ fGB (rij) = ⎜rij + R iR jexp⎜⎜ − ⎟⎟ ⎝ 4R iR j ⎠⎠ ⎝

⎛ I ⎞1/2 κ=⎜ ⎟ , ⎝ εwε0kBT ⎠

(35)

3. ANALYZED DATA A data set of 573 proteins31 is used for calculations, which are prepared using the CHARMM22 force field.32 The Poisson− Boltzmann (PB) solvation free energies are calculated using the CHARMM program.33 For the sake of comparison, we also prepared the protein structures using the AMBER force field.34 Molecular surfaces are constructed using Connolly’s method35−37 with density of surface points per atom equal to 25 points/atom and a solvent probe radius of 1.4 Å. The dielectric constant of proteins is fixed to εm = 4 and that of solvent to εw = 80. In our calculations, the number density of water molecules is taken to be ρw = 0.033 atoms/Å3, potential energy depth of water molecule oxygen atom is εw = 0.1521 kcal/mol, and the minimum value of radius is Rmin,w = 1.7682 Å. The second data set is a set of small molecules for which the experimental values of the hydration energies are known38 (and the references therein). The molecules are prepared using the general AMBER force field.39 The atomic partial charges are assigned according to ref 38. Two benchmarks are created from this data set, one with 100 small molecules and the other including all 642 molecules. The dielectric constant of solute molecule, εm, is considered as fitting parameter, in this study. On the other hand, the dielectric constant of solvent is fixed to εw = 80. The other parameters are similar to the proteins data set. Software implementing the methods discussed in this study is free for download from the website http://hkamberajibu. wikidot.com/solvation. In addition, from the same website, the database of all molecular structure and topology files, for all data sets used in our calculations prepared using either CHARMM22 or general AMBER force field, can be downloaded for free. Numerical values (including the graphs) are also available.

Note that Pi is the pressure at surface point i given by expression in eq 29, where En and E are, respectively, the normal electric field component and total electrical field at the surface of solute−solvent interface, calculated in the solute side at the surface point i. To the best of our knowledge incorporation of this term in calculation of the solvation free energy is for the first time in this study. However, incorporation of the Maxwell stress tensor on correction for calculation of the Poisson−Boltzmann forces has already been discussed in literature.30 2.4. Generalized Born Model. The electrostatic contribution to solvation free energy, ΔGsolv elec, includes the work done to move solute charges from fixed points in some reference environment to their configuration positions in solvent, e.g., water. For calculation of ΔGsolv elec on the case of a solute of M atoms, we can use the generalized Born model, described in Appendix A.3, for which electrostatic contribution to solvation free energy can be written as: 1 2

|r − ri|4

where rt represents the position of the molecular surface point t and ri represents the position of the solute atomic site i. nt is the outward unit vector to the solute molecular surface point t and δσt is the surface area of element t. The sum runs over all surface points ; .

where rc is the solute center of gravity defined as

solv ΔGGB,elec =

(r − ri) ·dS

which can be written in a discrete form as

i=1

rc =

∮solute

1 + 0.0169 I 1 + 0.075 I 2543


Article

Journal of Chemical Information and Modeling

interactions between the solvent and solute. As expected, a much lower correlation was observed between SASA and electrostatic solvation energy obtained using the generalized Born method, Δsolv GB,elec (Pearson correlation coefficient was R = 0.600) as shown in Figure 2D. We also performed a fit between different solvation free energy terms and SASA, according to yfit = aS + b

(37)

where S is the value of SASA (nm2) and a and b are fitting constants to be determined. The results are summarized in Table 1. It can be seen that the values of a for ΔGsolv,cav and polar

Figure 1. Comparison between generalized Born (x-axis) and Poisson−Boltzmann models (y-axis) used for calculation of the 2 electrostatic solvation free energy, ΔGsolv elec (kcal/mol). R represents the Pearson coefficient of determination.

Table 1. Summary of the Values of Fitting Parameters a and b Obtained by a Regression Linear Line Fitting between Different Solvation Free Energy Terms (kcal/mol) and SASA (nm2)

making the later one more attractive from a computation point of view. In addition, we calculated the correlations between solventaccessible surface area (nm2), SASA, and each other component of the solvation free energy. Our results are shown in Figure 2. The Pearson coefficients of determination R2 are also shown. From our results, a very high correlation has been obtained between the SASA and ΔGsolv vdW with a value of Pearson correlation coefficient of R = 0.985, as depicted in Figure 2C. Similarly, a good correlation is observed between SASA and ΔGsolv,cav polar , with a Pearson correlation coefficient of R = 0.922 (see Figure 2A). These results, interestingly, indicate that each of these two terms can separately be parametrized as proportional to solvent-accessible surface area using eq 4, as it has often been suggested.40 However, as it can be seen from our results, these two terms have opposite sign and they are not correlated. Therefore, one can ask the question if sum of the two terms would show any correlation with SASA. Indeed, we observed that the correlation between SASA and the sum of solv,cav two terms, namely ΔGsolv vdW + ΔGpolar , is relatively lower; it was obtained that the Pearson correlation coefficient was R = 0.768 (see also Figure 2B). This indicates that these two terms have to be considered separately in calculation of the solvation free energy, since they represent, in fact, different nature of

parameter ΔGsolv,cav polar ΔGsolv vdW ΔGsolv,cav polar + ΔGsolv GB,elec

ΔGsolv elec

a (kcal/(mol nm2))

b (kcal/mol)

R2

5.69 −3.32 2.37 −4.47

100.50 0.23 −100.27 −62.75

0.85 0.97 0.59 0.36

2 ΔGsolv vdW vary in the interval [−4.47, 5.69] kcal/(mol·nm ); see Table 1. In agreement with the above discussion, the broad range of values for a indicates that these two terms can not practically be parametrized as a linear function of SASA accurately, which is often done. On the other hand, the value of a for the sum of these two terms, ΔGsolv,cav + ΔGsolv polar vdW, is 2.37 2 kcal/(mol·nm ), as shown in Table 1, which is in the range of values taken often in the literature8,10,17−19 to characterize Γ in calculations of nonpolar term, ΔGsolv cav . It is worth noting that calculations are also performed using the AMBER force field.34 For sake of comparison, we summarized the results of CHARMM22 and AMBER in Tables S2 and S3 of the Supporting Information, respectively. It can clearly be seen that results of both force fields are very similar, indicating that they are parametrized to reproduce the

solv,cav Figure 2. Correlations between the SASA (nm2) and different contribution terms to solvation free energy (kcal/mol): (A) ΔGsolv,cav polar ; (B) ΔGpolar + solv solv ; (C) ΔG ; (D) ΔG . In addition, the fitting straight lines to the data points are shown, along with Pearson coefficient of determination, ΔGsolv vdW vdW GB,elec R2.

2544


Article


deviations of both experimental and calculated free energies, which are both measurements, and hence they are both independent variables from statistical point of view. Calculated values of rmse and nrmse are presented in Figure 3A−D. It is worth noting that the values of rmse and nrmse indicate smaller residual variance between the calculated and experimental hydration free energies for a ≤ 0.02 kcal/(mol Å2). However, with increasing a slightly larger values of rmse (2.89 kcal/mol) and nrmse (17.2%) are obtained, indicating the importance of choosing the value of a, as will further be discussed below. Since the dielectric constant of solute molecule and a are free parameters, we fitted the calculated values to the following expression:

solvation properties of proteins when compared with experimental results. 4.2. Data Set 2. Figure 3 shows graphically correlation between the calculated free energy ΔGcalc (kcal/mol) and

fit solv solv solv,cav ΔGcalc = b·ΔgGB,elec + aS + ΔGvdW + ΔGpolar

(42)

where solv ΔgGB,elec

Figure 3. Correlation between the calculated solvation ΔGcalc (kcal/ mol) and the experimental hydration free energies ΔGexp (kcal/mol) for a data set of 100 small molecules: a = (A) 0.007; (B) 0.01; (C) 0.02; (D) 0.04 kcal/(mol Å2). The error bars represent the standard deviations of the experimental values. The rmse and nrmse are also presented for each case. In addition, the fitting straight lines to the data points are shown, along with Pearson coefficient of determination, R2.

b=

1 1 − εm 80

Ndata solv solv solv,cav 2 ∑ [ΔGexp − (b·ΔgGB,elec )] + aS + ΔGvdW + ΔGpolar i=1

(44)

where Ndata is the number of data points (i.e., Ndata = 100). From the fitting, we found that

(38)

b = 0.8618; (39)

i=1

a = 0.0278 kcal/(mol Å2)

Equivalently, the dielectric constant of molecules is

We calculated ΔGsolv,cav nonpolar for different values of a: 0.007, 0.01, 0.02, and 0.04 kcal/(mol Å2), Figure 3A−D, respectively. The dielectric constant of all solute molecules was fixed at εm = 2. The Pearson correlation coefficient, R, between ΔGcalc and ΔGexp takes different values: R = 0.902 (a = 0.007 kcal/(mol Å2)), R = 0.916 (a = 0.01 kcal/(mol Å2)), R = 0.941 (a = 0.02 kcal/(mol Å2)), and R = 0.907 (a = 0.04 kcal/(mol Å2)), as shown in Figure 3A−D. These results, interestingly, indicate high correlations. We also calculated the root-mean-square error (rmse) defined as:

∑ (ΔGi ,exp

(43)

:=

where

⎞1/2 − ΔGi ,calc) ⎟ ⎠

1 80

Hence, we considered εm and a as two fitting parameters, which minimize the following quantity:

solv solv,cav calc,cav solv ΔGcalc = ΔGGB,elec + ΔGnonpolar + ΔGpolar + ΔGvdW

Ndata

−

with being the value of generalized Born electrostatic contribution calculated with εm = 2. Thus, it can be seen that

experimental hydration free energy (and the references therein), ΔGexp (kcal/mol), for a data set of 100 small molecules characterized by rigid configuration structures, i.e., the solute entropy contribution may be small and, thus, may be ignored. Calculated free energy is the sum of these terms:

solv,cav ΔGnonpolar = aS

1 2

ΔGsolv GB,elec

38

⎛ 1 rmse = ⎜⎜ ⎝ Ndata

=

solv ΔGGB,elec

εm = 1.1438

In Figure 4, we present graphically the experimental values (ΔGexp) versus the calculated values from the fit (ΔGfit calc), along with the Pearson coefficient of determination. The error bars in the graph represent standard deviations of the experimental

2⎟

(40)

and the normalized root-mean-square error (nrmse), in percentage, obtained as 100 ·rmse nrmse (%) = max min max min (ΔGexp − ΔGexp )(ΔGcalc − ΔGcalc ) (41)

Figure 4. Correlation between the fitted solvation ΔGfit calc (kcal/mol), using two parameters, and the experimental hydration free energies ΔGexp (kcal/mol). The error bars represent the standard deviations of the experimental values. These results are for a data set of 100 small molecules.

min where ΔGmax exp/calc and ΔGexp/calc denote the maximum and minimum values of the experimental/calculated hydration free energies, respectively. It is interesting to note that nrmse assesses the relative error with respect to the maximum

2545


Article

Journal of Chemical Information and Modeling values. Our results indicate a very high correlation between the calculated and experimental values of the hydration free energies, with a Pearson correlation coefficient of R = 0.971. In additional, the root-mean-square is found to be, rmse = 1.19 kcal/mol, and the normalized root-mean-square error is: nrmse = 6.05%. Hence, our results, interestingly, indicate a small residual variance between the fitted calculated and the experimental values of hydration free energies. An explanation of these findings is that the parameters of our fitting model, a and b, should be considered as empirical parameters of the fitting model predicting the total free energy. Here, these parameters are used to compensate for solute force field, solute−solvent surface interface definition, solvation model for electrostatic contribution, solute dielectric constant. Hence, we expect these parameters to vary for different data sets of molecules, depending on structure flexibility, electronic polarization, and other properties. Therefore, in the following discussion, we are also going to consider other fitting models using different number of parameters. Next, we considered a larger data set of 642 small molecules with varying complexity of molecular structure, for which the experimental hydration free energies are known. In order to consider the entropy contribution of solutes and other factors such as solute force field, solute−solvent surface interface definition, and solvation model for electrostatic contribution, we parametrized the calculated hydration free energies as the following:

Figure 5. Correlation between the fitted solvation ΔGfit calc (kcal/mol), using three parameters, and the experimental hydration free energies ΔGexp (kcal/mol). The error bars represent the standard deviations of the experimental values. These results are for a data set of 642 small molecules.

In addition, we computed the relative error (in percentage) between the calculated and experimental hydration free energies according to ϵ = 100

fit ΔGcalc − ΔGexp fit ΔGcalc

(47)

Results are summarized in Table S1 for the data set of 311 molecules with the smallest relative error, typically, ϵ < 50% in the Supporting Information. Figure 6 shows our results

fit solv solv solv,cav ΔGcalc = bΔgGB,elec + aS + ΔGvdW + ΔGpolar +γ

(45)

where b determines the dielectric constant of solute, a is a factor of the change on solvent−solvent interaction energy and solvent entropy, and γ compensates for the solute configuration entropy and maybe other contributions not included in our approach. Again, to determine a, b, and γ, we minimized the following sum of squares of deviations between the calculated and experimental hydration free energies: Figure 6. Correlation between the fitted solvation ΔGfit calc (kcal/mol) and the experimental hydration free energies ΔGexp (kcal/mol), using a three parameter fitting model. The error bars represent the standard deviations of the experimental values. These results are for a data set of 311 small molecules with the smallest value of relative error between the calculated and experimental hydration free energies (ϵ < 50%).

Ndata

:=

solv solv ∑ [ΔGexp − (b·ΔgGB,elec + aS + ΔGvdW i=1 solv,cav + ΔGpolar + γ )]2

(46)

where Ndata = 642. From the fitting, we found that b = 0.5061,

graphically as a scatter plot, along with a straight fitting line. It can be seen that a very good correlation is obtained between the calculated and experimental hydration free energies with a Pearson correlation coefficient of R = 0.961. We also calculated the rmse and normalized rmse, nrmse, for the subset of 311 molecules. We found a rmse value of 1.31 kcal/mol and nrmse value of 8.94%, indicating a small residual variance between the experimental and calculated hydration free energies. These results indicate the importance of solute entropy contribution on the hydration free energy, as also discussed elsewhere.41 In addition, we analyzed the (percentage) contribution of each free energy decomposition term to overall calculated value. The results are summarized in Table 2 for the data set of 642 small molecules using three fitting models: solv solv,cav (1) ΔGsolv GB,elec + aS + γ + ΔGvdW + ΔGpolar , where the solute dielectric constant, εm, is fixed to either 1 or 2. solv solv,cav (2) bΔGsolv GB,elec + aS + ΔGvdW + ΔGpolar , for εm = 1.

a = 0.0328 kcal/(mol Å2),

γ = −2.7145 kcal/mol

In addition, the average dielectric constant of molecules is εm = 1.9283

In Figure 5, we present graphically the calculated Gfit calc and the experimental values ΔGexp as a scatter plot. The error bars indicate the standard deviations of the experimental values. In addition, the straight fitting line is shown to indicate the goodness of above fit. Our results show a Pearson correlation coefficient of R = 0.895, which indicates a high correlation taking into account the diversity on molecular structures of the data set. Furthermore, the calculated rmse value is 1.93 kcal/ mol and the normalized root-mean-square error is 21.7%. 2546


Article

Journal of Chemical Information and Modeling Table 2. Summary of the Contribution from Each Term to the Total Computed Free Energya model

parameters

ΔGsolv GB,elec (%)

aS + γ (%)

ΔGsolv vdW (%)

ΔGsolv,cav polar (%)

rmse (kcal/mol)

nrmse (%)

(1)

a = 0.0448 γ = −1.5311 a = 0.0324 γ = −2.7627 a = 0.0135 b = 0.5463 a = 0.0328 b = 0.5061 γ = −2.7145

40.9 ± 1.4

34.0 ± 1.0

22.7 ± 0.6

2.4 ± 0.1

1.51

5.51

37.4 ± 1.3

16.5 ± 0.8

41.8 ± 1.1

4.3 ± 0.2

1.28

8.96

36.1 ± 1.2

24.1 ± 0.7

36.0 ± 0.9

3.8 ± 0.2

1.27

8.14

37.9 ± 1.3

17.0 ± 0.8

40.9 ± 1.1

4.2 ± 0.2

1.31

8.94

(2) (3) (4)

a

= Calculated as P(%) i

100 Ndata

|cij|

N

∑ j =data 1

∑k4= 1 |ckj|

with cij (i = 1, 2, 3, 4) being the values of free energy terms of the jth molecule, which are averaged over

the dataset of 642 small molecules, along with standard errors. The results are presented for each fitting model as explained in the text. For each model the values of fitting parameters are also shown, where γ is measured in kilocalories per mole, b in arbitrary units, and a in kilocalories per mole squared angstrom. In addition, the rmse and nrmse for only the subset of molecules with relative error ϵ < 50% are presented.

since the dielectric constant depends on structure flexibility and electronic polarization of molecules. In order to count for all these approximations of our model and further improve the predictions, we performed cluster analysis. For this, we developed an algorithm, which clusters the molecules of data set into smaller subsets, where each cluster consists of molecules with high correlations between calculated and experimental hydration free energies, obtained after performing a fitting using linear regression method. The details of algorithm are described in the Supporting Information. We hope that this analysis will divide the data set into clusters of molecules with approximately the same structure properties, and hence improve the correlations between calculated and experimental hydration free energies. The functional forms given by eqs 48 and 49 were the scoring functions for a Monte Carlo algorithm, which attempts to cluster a data set of molecules based on the goodness of fit between the calculated and experimental values of hydration free energies. Here we used 10 clusters for optimization. The rmse (see eq 40) and mean error (me), defined as

solv solv,cav (3) bΔGsolv GB,elec + aS + γ + ΔGvdW + ΔGpolar , for εm = 1. We have also presented the values of fitting parameters obtained for each model, where γ is measured in kilocalories per mole, b in arbitrary units, and a in kilocalories per mole squared angstrom (see Table 2). In addition, the rmse and nrmse are reported, considering the subset of molecules with relative error ϵ < 50%. Interestingly, our results are consistent for all fitting models, where the largest contributions are solv,cav solv obtained from ΔGsolv GB,elec, ΔGnonpolar ≡= aS + γ and ΔGvdW terms. On the other hand, a smaller contribution (10−20 fold) is observed for ΔGsolv,cav term (see Table 2). It is also polar interesting to note that the changes on ΔGsolv,cav nonpolar follow these solv,cav on ΔGsolv vdW, for instance, an increase of ΔGnonpolar is followed by a decrease of ΔGsolv vdW, and vice-versa, by approximately the same amount, as presented in Table 2. These findings indicate that a, b, and γ are empirical parameters of the fitting models predicting the total free energy, compensating for other effects, such as solute force field, solute−solvent surface interface definition, solvation model for electrostatic contribution, solute dielectric constant, for which it is difficult to find an explicit form. In this study, we are showing that the surface term, which is often used to count for nonpolar contribution to total free solv solv,cav energy, is decomposable into ΔGsolv,cav nonpolar, ΔGvdW, and ΔGpolar . To further demonstrate the importance of these two additional terms, we considered other experiments, by trying to ignore each of these terms or both from our fitting functional form. In particular, if we ignore ΔGsolv,cav polar , the fitting function becomes

fit solv solv ΔGcalc = bΔGGB,elec + aS + γ + ΔGvdW

me =

Ndata

∑ (ΔGi ,exp − ΔGifit,calc) i=1

(50)

are used as a measure of goodness of the fit. Our results, presented in Figure 7A, indicate that if both terms ΔGsolv vdW and ΔGsolv,cav polar are included in scoring function, the rmse value is 0.79 kcal/mol, which is much smaller than 2.9 kcal/mol in the case when ΔGsolv,cav is excluded from scoring function, and even polar smaller than 3.0 kcal/mol in the case when both terms ΔGsolv vdW and ΔGsolv,cav are excluded. An interesting behavior is of the polar mean error (eq 50), which is a measure of symmetry for the distribution of calculated values around the fitting straight line. Running averages of mean error, see Figure 7B, indicate that convergence is faster if all terms are included in the free energy decomposition; on the other hand, ignoring either ΔGsolv,cav polar or both ΔGsolv,cav and ΔGsolv polar vdW makes the convergence slower. Overall mean error over all 642 molecules is me = 0.0 kcal/mol; if all terms are included in free energy decomposition, me = −0.1 and 0.04 kcal/mol, respectively, if only ΔGsolv,cav polar or both solv ΔGsolv,cav polar and ΔGvdW are ignored. In addition, we evaluated the correlations between ΔGfit calc and ΔGexp for each case and presented results graphically in Figure 7C−E for the complete data set of 642 small molecules, along with equations of straight lines and Pearson coefficient of

(48)

and if we ignore both terms, then fit solv ΔGcalc = bΔGGB,elec + aS + γ

1 Ndata

(49)

It is important noting that in our approximation, all the calculations are performed using single molecular structure, and hence, we have not considered in our analysis different possible configurations of the solute. In practice, an ensemble average should have been performed, which could have improved the predictions. Moreover, different possible conformations of the molecules yield a configuration entropy contribution term (see also ref 41 for a more detailed discussion), which is not considered in our approach. It is also worth noting that our calculations are performed using one value of solute dielectric constant for all of molecules, which is a crude approximation, 2547


Article


parametrized as ΔGsolv,cav = aS, with a taken from the fitting polar models of small molecules data set. In particular, for a = 0.0324 and 0.0328 kcal/mol, the rank according to the relative solv,cav solv contribution is as the following: ΔGsolv GB,elec, ΔGpolar , ΔGvdW, solv,cav and ΔGnonpolar, consistently for both force fields. For a = 0.0135 kcal/mol, the same ranking is observed; however, as expected, the contribution of ΔGsolv,cav nonpolar term is around 2−3 fold smaller in magnitude compare to larger values of a. The most interesting observation is that the reduced value of ΔGsolv,cav nonpolar, due to smaller a, is almost equally distributed among other solv solv,cav terms, namely ΔGsolv GB,elec, ΔGvdW, and ΔGpolar (see also Table solv,cav 3), indicating that ΔGnonpolar term is empirical, and practically it is used to compensate for either missing terms or physics underlaying each of the other terms included in decomposition. It is worth noting that for large a, typically a = 0.0448 kcal/mol, solv,cav and ΔGsolv ΔGsolv,cav nonpolar becomes slightly larger than ΔGpolar vdW, solv,cav solv but still Gpolar > ΔGvdW (Table 3). Interestingly, the amount of increase on Gsolv,cav nonpolar equals the total amount of decrease in other terms of free energy decomposition (equally distributed for each term), as can be observed from Table 3. An explanation of all these findings is that ΔGsolv,cav polar depends on the surface electrostatic potential of the solvent−solute interface. This potential will contribute to the distribution of charges at the interface, and hence it will contribute to the solvation energy. To demonstrate this, we evaluated the and other terms of the free correlations between ΔGsolv,cav polar solv solv,cav energy decomposition, Gsolv GB,elec, ΔGvdW, and ΔGnonpolar. Our results show (see Figure S1 in the Supporting Information) a solv high correlation between ΔGsolv,cav polar and ΔGGB,elec with Pearson correlation coefficient of R = 0.855, and on the other hand, solv,cav and lower correlations were obtained between ΔGpolar solv,cav solv ΔGnonpolar and ΔGvdW with Pearson correlation coefficients of 0.694 and 0.712, respectively.

Figure 7. Summary of the results for the 642 small molecules data set as obtained from the clustering Monte Carlo algorithm using 10 clusters. (A) Running averages of root-mean-square error and (B) running averages of mean error using the three parameter fitting model solv,cav for the following free energy decompositions: (i) (ΔGsolv GB,elec, ΔGnonpolar solv,cav solv solv,cav = aS + γ, ΔGsolv , ΔG ); (ii) (ΔG , ΔG = aS + γ, vdW polar GB,elec nonpolar solv solv,cav ΔGsolv vdW); (iii) (ΔGGB,elec, ΔGnonpolar = aS + γ). Correlations between the calculated Gfit calc (kcal/mol) and the experimental hydration free energies ΔGexp (kcal/mol) are also presented for the following free energy decompositions: as in (C) i; (D) ii; (D) iii.

determination. From our graphs, it can clearly be seen that the data points are very well symmetrically distributed around the straight line if all terms of free energy decomposition are included, with a very high Pearson correlation coefficient R = 0.989. Lower correlations and less symmetric distributions of data points around the straight line are obtained if either ΔGsolv,cav or both terms, ΔGsolv,cav and ΔGsolv polar polar vdW, are ignored as presented in Figure 7D and E. We also did the same analysis for the proteins data set, using AMBER and CHARMM22 force fields and summarized the results in Table 3. Surprisingly, observed ranking is not the same as for small molecules data set. The nonpolar term is

5. CONCLUSIONS In this study, we presented a way of how to decompose the solvation free energy in different contributions. We also discussed a new term to the solvation free energy which is introduced to compensate for the forces acting on the dielectric surface boundary between the solvent and solute based on macroscopic theory of the Maxwell stress tensor on the dielectric medium. For interpreting the solvent screening of solvent−solute dispersion interactions, we discussed the socalled van der Waals contribution to the solvation free energy. In order to evaluate the volume integrals, we converted them into surface integrals using the Gauss theorem, which are performed over a closed surface representing the molecular surface of solute in solvent. For calculation of the electrostatic contribution to the solvation free energy we used the generalized Born approximation, where the effective Born radii are also computed using the surface integrals. Our results show a very good correlation between the GB and PB methods, where PB method is considered to be a more accurate approach. Moreover, our observations indicated that both polar and van der Waals contributions are important to the solute cavity in solvent, taking into account both shortrange dispersion interactions and the forces acting on the dielectric surface boundary between the solvent and solute. Both terms show very high correlations with SASA of solute; however, they are not correlated to each other, and thus, they should not parametrized together for proper considerations of different physical contributions to the cavity creation in solvent.

Table 3. Summary of the Contribution from Each Term to the Total Computed Free Energya a (kcal/mol Å2)

a

ΔGsolv GB,elec (%)

0.0448 0.0324 0.0135 0.0328

30.3 32.6 36.7 32.5

0.0448 0.0324 0.0135 0.0328

32.0 34.6 39.3 34.5

ΔGsolv,cav nonpolar(%)

ΔGsolv vdW (%)

Amber Force Field ± 0.8 25.6 ± 0.3 20.3 ± 0.8 19.9 ± 0.3 21.9 ± 0.9 9.4 ± 0.1 24.9 ± 0.8 20.1 ± 0.3 21.9 CHARMM22 force field ± 0.8 26.9 ± 0.3 19.9 ± 0.8 21.0 ± 0.3 21.5 ± 0.9 10.0 ± 0.1 24.6 ± 0.8 21.2 ± 0.3 21.5

= Calculated as P(%) i

100 Ndata

N

∑ j =data 1

|cij| ∑k4= 1 |ckj|

ΔGsolv,cav polar (%)

± ± ± ±

0.2 0.3 0.4 0.3

23.8 25.6 29.0 25.5

± ± ± ±

0.5 0.6 0.6 0.6

± ± ± ±

0.3 0.3 0.4 0.3

21.2 22.9 26.1 22.8

± ± ± ±

0.5 0.5 0.6 0.5

with cij (i = 1, 2, 3, 4) being

the values of free energy terms of the jth protein, which are averaged over the dataset of Ndata = 573 proteins, along with standard errors. The results are presented for both AMBER and CHARMM22 force fields with ΔGsolv,cav nonpolar = aS, where a is obtained from the fitting models of small molecules. 2548


Article

Journal of Chemical Information and Modeling Application of the method to a data set of small molecules showed a significant correlation between calculated and experimental hydration free energies. The Pearson correlation between the calculated and experimental hydration free energies for a subset of 311 molecules with smallest relative error ϵ (ϵ < 50%) was R = 0.961. Our results, interestingly, indicate the role of solute entropy on the hydration free energy.41 The analysis of relative contribution for each term of decomposition indicated that ranking based on their magnitudes was surprisingly different for small molecules and proteins data sets. This study showed that the surface term, which is often used to count for nonpolar contribution to total solv free energy, is decomposable into ΔGsolv,cav nonpolar, ΔGvdW, and solv,cav ΔGpolar . We hope and believe that the parametrization of hydration free energy in this study can be used as a scoring function for screening a large data set of molecules based on their values of solvation free energy.

■

I=

∮S −

1 3 n d r |r − ri|

I=

(51)

∫Ω |r −1 r |n d3r − ∫Ω i

1 3 nd r + | r − r | 0 i 1 3 − n d r Ωs |r − ri|

∫V

s

1 d3r |r − ri|n

∫Ω−V

0

(52)

(53)

We can now use Gauss’s theorem:

∫Ω ∇·A d3r = ∮S A·n dS

(54)

where Ω is the integration volume and S is surface enclosing that volume. n is the outward unit vector to the surface element dS. In our case, when 1 |r − ri|n

(55)

it can be found that A=

r − ri 1 · 3 − n |r − ri|n

(57)

(59)

1 n−3

∮S

(r − ri) ·n dS |r − ri|n

(60)

Different methods have been introduced to calculate the molecular surfaces.35−37,42−45 In general, molecular surface can be seen as a set of overlapping spheres, each having the van der Waals radius of its constituent atom, which is the so-called van der Waals surface. The so-called solvent-accessible surface (SASA) of a molecule is defined as the van der Waals envelope of a molecule expanded by the radius of the solvent sphere about each atom center.42,43 Then, the so-called molecular surface which is defined as the contour drawn by a sphere of radius rp, representing the solvent molecule, rolling over a set of the van der Waals beads centered at the atomic positions.35−37 More recently,45 a so-called smooth invariant molecular surface is also defined using a smoothing sphere with radius rs rolling over the generated molecular surface. The molecular surface comprises the convex spherical patches, saddle-shaped toroidal patches and the concave patches, which then is partitioned in a set of triangles.35−37 From the definition, the solvent-accessible surface has no reentrant sections, that is the molecule comprises only convex spherical patches. This may yield a loss of information since the ratio of contact-to-reentrant surface could be a measure of molecular surface roughness.44 On the other hand, the choice of the problem radius will affect the value of the area of molecular surface in each case. A small probe radius reveals a large number of features from the molecular surface and for rp = 0, the solvent-accessible surface is equal to the molecular surface. Since the probe sphere mimics the solvent molecule, i.e., water molecule, the smallest physically accepted radius is 1.4 Å, equal to the radius of a water molecule. As the probe radius increases, the molecular surface and solvent-accessible surface become smoother and for rp = ∞, the molecular surface is the convex hull of the set of atomic spheres. From the geometrical point of view, molecular surface tends towards a finite limiting value as the probe radius increases, whereas the solvent-accessible surface tends towards infinite. Therefore, the

1 d3r |r − ri|n

∫

∇·A =

∮S

∞

(r − ri) ·n∞ dS |r − ri|n

A.2. Molecular Surface

where Ω is the all space volume and Ωs is the solute volume. The first term of integral can further be split into two other terms. The first is integral over volume of the ith solute atom, V0, and the second term is over all space excluding the volume of that atom, Ω − V0. Thus, we obtain I=

1 3−n

(r − ri) ·n dS |r − ri|n

∮S

Therefore, we can finally write that

over solvent volume, Ωw. This integral can be split into two terms, where the first term is the integral over entire space, Ω, and the second one over solute volume, Ωs, as: I=

(r − ri) ·ñ 0 1 dS + n |r − ri| 3−n

1 →0 |r − ri|n

Let us consider the integrals of the following form:

w

0

(r − ri) ·n 0 1 dS + 3−n |r − ri|n

and hence, the first two integrals cancel out. Here, n∞ is the outward unit vector normal to dS element of S∞ surface and n is the outward unit vector normal to dS element of S surface. The third integral is zero, since at infinity we get

A.1. Volume Integrals

∫Ω

0

∮S

where S is the molecular surface of solute, S0 is spherical surface of ith solute atom, S∞ is the surface at infinity distance of solvent, n0 is the outward unit vector normal to dS element of S0 surface, and ñ0 is the inward unit vector normal to dS element of S0 surface, which is given as ñ 0 = −n 0 (58)

APPENDIX

I=

1 3−n

(56)

Therefore, we obtain that 2549


Article


q 4πε0εwa q ϕr = 4πε0a

molecular surface may be a better representation of the surface of a molecule. The accuracy of the calculation of surface of a molecule is a geometrical problem. In order to accurately represent the surface of a molecule, several conditions are required to be satisfied:45 maximum homogeneity of dot distribution, a smoothing surface near the singular points, stability with variation of the dot density, independence on the rotation of molecule, and stability with changing the molecular conformation. Different physical properties on the molecular surface and integration of functions over the molecular surface require a numerical representation of the molecular surface as a set of {(xi , yi , zi), ni , δσi}

ϕw =

where εr = 1. Then, the electrostatic part of solvation free energy is given by the well-known Born formula47 solv ΔGelec =

A.3. Generalized Born Model

solv ΔGelec =

The generalized Born (GB) model is another approach to describe the electrostatic interactions in a multiple-dielectric environment in less computation efforts.46 To obtain the electrostatic potential ϕ in such model, one has to solve the Poisson equation 1 ρ(r) ε0

1 ρ(r), ε0

ε(interior) = εm ,

1 ρ(r), ε0

ε(interior) = εm ,

ε(exterior) = εw ∇[ε(r)∇ϕr (r)] = − ε(exterior) = εr

W=

1 2

(64)

∫

d3rϕreac(r)ρ(r)

1 = 2

i=1

1 2

⎛

⎞

∫ E·D d3r = 12 ∫ ⎜⎝ εDε ⎟⎠·D d3r 0

(70)

(71)

∫ ∇·D d3r = ∫ ρ d3r = qi

(72)

∮ D·dS = qi

(73)

or

(65)

since the electric displacement vector D has a radial symmetry, then we can write

M

∑ qiϕreac(ri)

(69)

Integrating according to the volume and using the Gaussian theorem, we get

Approximating the macromolecule charge distribution by a set of partial atomic point charges qi for i = 1, 2, ..., M, then solv ΔGelec

M

∑

∇·D = ρ

and the electrostatic contribution in solvation free energy is given by solv ΔGelec =

⎞ qi2 ⎛ 1 ⎜ − 1⎟ ⎠ i = 1 4πε0ai ⎝ εw M qjqj ⎛ 1 ⎞ 1 + ∑ ⎜ − 1⎟ 2 i ≠ j 4πε0rij ⎝ εw ⎠ 1 2

Let us consider the charge qi which is assumed to be placed at the origin of some coordinate system, representing the partial charge of atom i. Using the Maxwell equation, we can write

(63)

with solutions, respectively, ϕw and ϕr. The difference between these two potentials gives the reaction field

ϕreac = ϕw − ϕr

(68)

where the first term is the sum of individual Born terms and the second term is the sum of pairwise Coulomb interactions. In the GB theory, we attempt to find the same relatively simple analytical formula as in eq 69 by solving directly the Poisson equation. First we discuss the linear Poisson equation (i.e., no salt effects are included in description), which provide a ΔGsolv elec which is quadratic in the charges, but as we will see the effect of dielectric constants of solvent and macromolecule is different. From classical electrostatics, assuming a linearly polarized medium (i.e., isotropic medium), the work needed to assemble a charge distribution can also be formulated in terms of the scalar product of the electric field E and electric displacement D:

(62)

where ρ is the charge distribution and ε is the dielectric constant, which is equal to εm in the macromolecular interior, and it is equal to εw elsewhere. Note that if the macromolecule is immersed in some reference environment (e.g., gas phase), then the dielectric constant of the exterior region is εr (e.g., for the gas phase it is 1), and thus ε = εr. Then, we can solve eq 62 under these two conditions: ∇[ε(r)∇ϕw(r)] = −

⎞ q2 ⎛ 1 ⎜ − 1⎟ 8πε0a ⎝ εw ⎠

If then the macromolecule would have been considered as a set of charges q1, ..., qM embedded in spheres of radii a1, ..., aM, and assuming that the separation between any of these two spheres i and j is rij is sufficiently large in comparison to their radii, the solvation free energy can be written as

(61)

where (xi, yi, zi), ni, δσi are, respectively, the coordinates, outward unit vector, and area of a small element of the molecular surface.

∇[ε(r)∇ϕ(r)] = −

(67)

(66)

∮ D(r) dS = qi

In the case of a single ion of radius a and charge q, the potentials can be found analytically from eq 63 as

(74)

where dS = d(4πr2) = 8πr dr, then we obtain the solution as 2550


Article


D(r ) =

qi 4πr

2

1 1 = Ri 4π

(75)

As the vector, it can be written as qr D= i 3 4πr

1 1 = 4π Ri (77)

=

The work done to place a charge qi at the origin within some macromolecule whose interior part has a dielectric constant εm, surrounded by a reference medium with dielectric constant εr with no other charges placed yet, is 1 Wi = 2 =

1 8πε0

=

q2

∫interior d3r ε |r −i r |4 m

+

i

∫exterior d3r ε |r − r |4 r

∫interior

i

d3r + εm|r − ri|4

∫exterior

d3r ⎤ ⎥ εr |r − ri|4 ⎦

solv ΔGelec =

If the reference medium has the same dielectric constant is the macromolecule, εr = εm, then

+

∫interior

∫exterior

Wi (wat) =

⎡ ⎢ 8πε0 ⎣ +

∫interior

∫exterior

3

(80)

Then, the electrostatic contribution to solvation free energy, which is the work done for moving a partial charge of a macromolecule placed at some fixed point origin in a reference medium to another fixed point in of the macromolecule in a solvent medium, is solv ΔGelec

solv ΔGelec =

=

8πε0

⎡1 1 ⎤ d3r ⎢ − ⎥ exterior ⎣ εw εm ⎦ |r − ri|4

∫

solv ΔGelec

i

i

dr ⎤ ⎥ |r − ri|4 ⎦

i

∫interior(|r−r |>a ) i

i

d3r |r − ri|4

(84)

1 2

M

M

∑∑ i=1 j=1

⎡1 1⎤ ⎢ − ⎥ εm ⎦ 4πε0fGB (rij) ⎣ εw qiqj

(85)

(86)

1 2

M

M

∑∑ i=1 j=1

⎡ exp( −ακf (rij)) 1⎤ GB ⎢ ⎥ − εw εm ⎥⎦ 4πε0fGB (rij) ⎢⎣ qiqj

where κ is the Debye−Hückel screening parameter, given by ⎛ I ⎞1/2 κ=⎜ ⎟ ⎝ εwε0kBT ⎠

(81)

which can also be written in the form 2 1 qi ⎡ 1 1⎤ = ⎢ − ⎥ εm ⎦ 2 R iε0 ⎣ εw

∫|r−r |≤a

i

(87)

= Wi (wat) − Wi (ref) qi2

1 4π

i

3

It can be seen that both dielectric constants, of the solvent and macromolecule, appear in the expression for ΔGsolv elec, where εm of macromolecule appears in the second term of eq 85 because it has been taken as the dielectric constant of the reference environment system. The GB model can be extended to consider low salt concentrations at the Debye−Hückel approximation level as discussed in ref 48. In this approach eq 85 is written as49

dr εm|r − ri|4

⎤ d3r ⎥ 4 εw|r − ri| ⎦

i

3

∫interior(|r−r |>a ) |r d− rr |4

1/2 ⎛ ⎛ rij2 ⎞⎞ 2 ⎟⎟ fGB (rij) = ⎜⎜rij + R iR jexp⎜⎜ − ⎟⎟ ⎝ 4R iR j ⎠⎠ ⎝

(79)

If the reference medium is the solvent, εr = εw, then qi2

⎡ 1 −⎢ ⎣ 4π

∫allspace |r d− rr |4

where f GB(rij) is a function, which if i = j becomes the effective Born radius Ri and if i ≠ j (i.e., in pairwise terms) becomes the effective interaction distance. The form chosen for this functions is17

d3r εm|r − ri|4

⎤ d3r ⎥ εm|r − ri|4 ⎦

i

3

1 1 − 4π ai

(78)

qi2 ⎡ ⎢ Wi (ref) = 8πε0 ⎣

3

∫exterior |r d− rr |4

where the integration over all space gives zero because limr→∞ 1/|r − ri|4 = 0. The second term in eq 84 is the so-called excluded volume integral. It can be seen that if the molecular boundary is simply the sphere of radius ai, then Ri = ai. For a macromolecular system of M partial atomic charges, the electrostatic term of the solvation free energy for bringing the system of charges for a fixed point in surrounding medium with dielectric constant εr = εm to a fixed point in surrounding medium being the solvent with dielectric constant εw is

1 8πε0

qi2

qi2 ⎡ ⎢ = 8πε0 ⎣

1 4π +

D d3r ·D ε0ε

∫

(83)

i

In general, the effective Born radius is written in terms of the integral over interior region of the macromolecule, excluding a radius ai around the point of charge qi46

(76)

The electric field vector is qi r E= 4πε0εr 3

3

∫exterior |r d− rr |4

(88)

where I is the ionic strength and α is a constant that depends on the ionic strength:49 (82)

α=

where Ri is the effective Born radius, 2551

1 + 0.0169 I 1 + 0.075 I

(89) DOI: 10.1021/acs.jcim.7b00368 J. Chem. Inf. Model. 2017, 57, 2539−2553

Article


(8) Sitkoff, D.; Sharp, K. A.; Honig, B. Accurate Calculation of Hydration Free-Energies Using Macroscopic Solvent Models. J. Phys. Chem. 1994, 98, 1978−1988. (9) Eisenberg, D.; McLachlan, A. D. Solvation energy in protein folding and binding. Nature 1986, 319, 199−203. (10) Simonson, T.; Brunger, A. T. Solvation Free Energy Estimated from Macroscopic Continuum Theory: An Accuracy Assessment. J. Phys. Chem. 1994, 98, 4683−4694. (11) Chothia, C. Hydrophobic bonding and accessible surface area in proteins. Nature 1974, 248, 338−339. (12) Nemethy, G.; Scheraga, H. A. The structure of water and hydrophobic bonding in proteins. III. The thermodynamics properties of hydrophobic bonds in proteins. J. Phys. Chem. 1962, 66, 1773− 1789. (13) Pierotti, R. A. A scaled particle theory of aqueous and nonaqueous solutions. Chem. Rev. 1976, 76, 717−726. (14) Ashbaugh, H. S.; Kaler, E. W.; Paulaitis, M. E. A “Universal” surface area correlation for molecular hydrophobic phenomena. J. Am. Chem. Soc. 1999, 121, 9243−9244. (15) Raschke, T. M.; Tsai, J.; Levitt, M. Quantification of the hydrophobic interaction by simulations of the aggregation of small hydrophobic solutes in water. Proc. Natl. Acad. Sci. U. S. A. 2001, 98, 5965−5969. (16) Cramer, C. J.; Truhlar, D. G. Implicit solvation models: equilibria, structure, spectra, and dynamics. Chem. Rev. 1999, 99, 2161−2200. (17) Still, W. C.; Tempczyk, A.; Hawley, R. C.; Hendrickson, T. Semianalytical treatment of solvation for molecular mechanics and dynamics. J. Am. Chem. Soc. 1990, 112, 6127−6129. (18) Tanford, C. Interfacial free energy and the hydrophobic effect. Proc. Natl. Acad. Sci. U. S. A. 1979, 76, 4175−4176. (19) Sharp, K. A.; Nicholls, A.; Fine, R. F.; Honig, B. Reconciling the magnitude of the microscopic and macroscopic hydrophobic effects. Science 1991, 252, 106−109. (20) Wang, C.; Xiao, L.; Luo, R. Numerical interpretation of molecular surface field in dielectric modeling of solvation. J. Comput. Chem. 2017, 38, 1057−1070. (21) Zwanzig, R. W. High-temperature equation of state by a perturbation method. I. nonpolar gases. J. Chem. Phys. 1954, 22, 1420−1426. (22) Kubo, M. M.; Gallicchio, E.; Levy, R. M. Thermodynamic Decomposition of Hydration Free Energies by Computer Simulation: Application to Amines, Oxides, and Sulfides. J. Phys. Chem. B 1997, 101, 10527−10534. (23) Bren, U.; Martínek, V.; Florián, J. Decomposition of the Solvation Free Energies of Deoxyribonucleoside Triphosphates Using the Free Energy Perturbation Method. J. Phys. Chem. B 2006, 110, 12782−12788. (24) Bren, M.; Florián, J.; Mavri, J.; Bren, U. Do all pieces make a whole? Thiele cumulants and the free energy decomposition. Theor. Chem. Acc. 2007, 117, 535−540. (25) Leach, A. R. Molecular Modelling. Principles and Applications, 2nd ed.; Pearson Education Limited: Prentice Hall, 2001. (26) Burusco, K. K.; Bruce, N. J.; Alibay, I.; Bryce, R. A. Free Energy Calculations using a Swarm-Enhanced Sampling Molecular Dynamics Approach. ChemPhysChem 2015, 16, 3233−3241. (27) Lee, M. S.; Salsbury, F. R.; Brooks, C. L. Novel generalized Born methods. J. Chem. Phys. 2002, 116, 10606−10614. (28) Zacharias, M. Protein-protein docking with a reduced protein model accounting for side-chain flexibility. Protein Sci. 2003, 12, 1271− 1282. (29) Sykja, H. Bazat e elektrodinamikës; Publishing House of University Book: Tirana, 2006. (30) Lu, B.; McCammon, J. A. Improved boundary element methods for Poisson-Boltzmann electrostatic and force calculations. J. Chem. Theory Comput. 2007, 3, 1134−1142. (31) Luo, R. UC Irvine: Department of Molecular Biology and Biochemistry. rayl0.bio.uci.edu/data/ (accessed July 8, 2016).

The GB methods discussed in the literature do not have any other dependence on the macromolecule and solvent dielectric constants beyond the scaling factor shown in eq 85. In fact, the linear Poisson−Boltzmann solvation energy method has more complicated dependencies on εm and εw.49 The effective Born radii in GB models can be derived from continuum dielectric models and Coulomb field approximation. For decreasing the computational time other approximations are introduced as well, which attempt to adjust these parameters by fitting either to experimental data or numerical continuum dielectric data. For relatively small molecules, the extension of standard GB models were introduced by carrying out integrals over the macromolecule dielectric regions numerically.16,50,51 Instead, the pairwise methods have shown to be fast in applications to macromolecules. In this approach the integral in eq 84 is replaced by the sum of contributions for each atom. Thus, based on this model, the molecule consists of a set of spheres with non-vanishing radius ai.49,52,53

■

ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jcim.7b00368. Results of free energy decomposition calculations and analyses are provided for the proteins and small molecules data sets (PDF)

■

AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. ORCID

Hiqmet Kamberaj: 0000-0001-7357-7490 Notes

The authors declare no competing financial interest.

■ ■

ACKNOWLEDGMENTS H.K. thanks the International Balkan University for support. REFERENCES

(1) Levy, R. M.; Zhang, L. Y.; Gallicchio, E.; Felts, A. K. On the Nonpolar Hydration Free Energy of Proteins: Surface Area and Continuum Solvent Models for the Solute-Solvent Interaction Energy. J. Am. Chem. Soc. 2003, 125, 9523−9530. (2) Ooi, T.; Oobatake, M.; Némethy, G.; Scheraga, H. A. Accessible surface areas as a measure of the thermodynamic parameters of hydration of peptides. Proc. Natl. Acad. Sci. U. S. A. 1987, 84, 3086− 3090. (3) Gallicchio, E.; Levy, R. M. AGBNP: an analytic implicit solvent model suitable for molecular dynamics simulations and high-resolution modeling. J. Comput. Chem. 2004, 25, 479−499. (4) Chen, J.; Brooks, C. L., III Implicit modelling of non polar solvation for simulating protein folding and conformational transitions. Phys. Chem. Chem. Phys. 2008, 10, 471−481. (5) Sharp, K. A.; Honig, B. Calculating total electrostatic energies with non-linear Poisson-Boltzmann equation. J. Phys. Chem. 1990, 94, 7684−7692. (6) Sharp, K. A.; Honig, B. Electrostatic interactions in macromolecules: Theory and Applications. Annu. Rev. Biophys. Biophys. Chem. 1990, 19, 301−332. (7) Nicholls, A.; Honig, B. Electrostatic interactions in macromolecules: Theory and Applications. J. Comput. Chem. 1991, 12, 435. 2552


Article

Journal of Chemical Information and Modeling (32) MacKerell, A. D. J.; Feig, M.; Brooks, C. L. I. Extending treatment of backbone energetics in protein force fields: limitations of gas-phase quantum mechanics in reproducing protein conformational distributions in molecular dynamics simulations. J. Comput. Chem. 2004, 25, 1400−1415. (33) Brooks, B. R.; et al. CHARMM: The bimolecular simulation program. J. Comput. Chem. 2009, 30, 1545−1614. (34) Ponder, J. W.; Case, D. A. Force fields for protein simulations. Adv. Protein Chem. 2003, 66, 27−85. (35) Connolly, M. L. Solvent-accessible surfaces of proteins and nucleic acids. Science 1983, 221, 709−713. (36) Connolly, M. L. Analytical molecular surface calculation. J. Appl. Crystallogr. 1983, 16, 548−558. (37) Connolly, M. L. Molecular Surface Triangulation. J. Appl. Crystallogr. 1985, 18, 499−505. (38) Mobley, D. L. Experimental and Calculated Small Molecule Hydration Free Energies. UC Irvine: Department of Pharmaceutical Sciences, UCI. http://www.escholarship.org/uc/item/6sd403pz (accessed July 19, 2015). (39) Wang, J.; Wolf, R. M.; Caldwell, J. W.; Kollman, P. A.; Case, D. A. Development and testing of a general amber force field. J. Comput. Chem. 2004, 25, 1157−1174. (40) Liwo, A.; Oldiej, S.; Pincus, M. R.; Wawak, R. J.; Rackovsky, S.; Scheraga, H. A. A united-residue force field for off-lattice proteinstructure simulations: I. Functional forms and parameters of longrange side-chain interaction potentials from protein crystal data. J. Comput. Chem. 1997, 18, 849−873. (41) Mobley, D. L.; Dill, K. A.; Chodera, J. D. Treating entropy and conformation changes in implicit solvent simulations of small molecules. J. Phys. Chem. B 2008, 112, 938−946. (42) Lee, B.; Richards, F. M. The interpretation of protein structures: estimation of static accessibility. J. Mol. Biol. 1971, 55, 379−400. (43) Richards, F. M. Areas, volumes, packing and protein structure. Annu. Rev. Biophys. Bioeng. 1977, 6, 151−176. (44) Richards, F. M. Optical matching of physical models and electron density maps: Early developments. Methods Enzymol. 1985, 115, 145−154. (45) Vorobjev, Y. N.; Hermans, J. SIMS: computation of a smooth invariant molecular surface. Biophys. J. 1997, 73, 722−732. (46) Bashford, D.; Case, D. Generalized Borb model of macromolecular salvation effects. Annu. Rev. Phys. Chem. 2000, 51, 129−152. (47) Born, M. Volumes and hydration warmth of ions. Eur. Phys. J. A 1920, 1, 45−48. (48) Srinivasan, J.; Trevathan, M. W.; Beroza, P.; Case, D. A. Theor. Chem. Acc. 1999, 101, 426−434. (49) Tjong, H.; Zhou, H. X. GBr6NL: A generalised Born method for accurately reproducing solvation energy of the nonlinear PoissonBoltzmann equation. J. Chem. Phys. 2007, 126, 195102. (50) Luo, R.; Moult, J.; Gilson, M. K. Dielectric screening treatment of electrostatic solvation. J. Phys. Chem. B 1997, 101, 11226−11236. (51) Majeux, N.; Scarsi, M.; Apostolakis, J.; Ehrhardt, C.; Caflisch, A. Proteins: Struct., Funct., Genet. 1999, 37, 88−105. (52) Tjong, H.; Zhou, H. X. GBr6: A parametrization-free, accurate, analytical generalised Born method. J. Phys. Chem. B 2007, 111, 3055− 3061. (53) Grycuk, T. Deficiency of the Coulomb-field approximation in the generalised Born model: An improved formula for form radii evaluation. J. Chem. Phys. 2003, 119, 4817−4826.

2553


Comparison Study of Polar and Nonpolar ... - ACS Publications

Recommend Documents