Implicit Solvation Using the Superposition ... - ACS Publications

Nov 9, 2017 - Department of Chemistry, Colorado State University, Fort Collins, ... protein folding, peptide self-assembly, and oil−water emulsion f...
4 downloads 0 Views 2MB Size
Subscriber access provided by READING UNIV

Article

IS-SPA: An Implicit Treatment of the Non-polar Component to Solvation for Simulating Molecular Aggregation Peter T. Lake, and Martin McCullagh J. Chem. Theory Comput., Just Accepted Manuscript • DOI: 10.1021/acs.jctc.7b00698 • Publication Date (Web): 09 Nov 2017 Downloaded from http://pubs.acs.org on November 11, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Chemical Theory and Computation is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 45

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Table of Contents Graphic 88x34mm (300 x 300 DPI)

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

IS-SPA: An Implicit Treatment of the Non-polar Component to Solvation for Simulating Molecular Aggregation Peter T. Lake and Martin McCullagh∗ Department of Chemistry, Colorado State University, Fort Collins, CO, USA E-mail: [email protected]

Abstract Non-polar solute–solvent interactions are the driving force for aggregation in important chemical and biological phenomena including protein-folding, peptide selfassembly and oil-water emulsion formation. Currently, the most accurate and computationally efficient description of these processes requires an explicit treatment of all solvent and solute atoms. Previous computationally feasible implicit solvent models, such as solute surface area approaches, are unsuccessful at capturing aggregation features including both structural and energetic trends while more theoretically rigorous approaches, such as Reference Interaction Site Model (RISM), are accurate but extremely computationally demanding. Our approach, denoted Implicit Solvation using the Superposition Approximation (IS-SPA), builds on previous theory utilizing the Kirkwood superposition approximation to approximate the mean force of the solvent from solute parameters. We introduce and verify a parabolic first solvation shell truncation of atomic solvation, fitting water distributions around a molecule, and a Monte ∗

To whom correspondence should be addressed

1

ACS Paragon Plus Environment

Page 2 of 45

Page 3 of 45

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Carlo integration of the mean solvent force. These extensions allow this method to be implemented as an efficient non-polar implicit solvent model for molecular simulation. The approximations in IS-SPA are first explored and justified for the homodimerization of an array of different sized Lennard-Jones spheres. The accuracy and transferability of the approach are demonstrated by its ability to capture the position and relative energies of the de-solvation barrier and free energy minimum of various alkane homodimers. The model is then shown to reproduce the phase separation and solubility of cyclohexane and water. These promising results, coupled with two orders of magnitude speed-up for dilute systems as compared to explicit solvent simulations, demonstrate that IS-SPA is an appealing approach to boost the time- and length-scale of molecular aggregation simulations.

1

Introduction

Non-polar solvation effects are pivotal components of aqueous aggregation in processes such as protein folding, 1 protein-ligand binding and molecular self-assembly. 2,3 Often referred to as the hydrophobic effect, the physics leading to this phenomenon has been and continues to be the focus of a significant body of work (see recent reviews, Refs. [ 3,4]). Despite some informative experiments by Ben-Naim et al., it is a challenge to decompose the interactions governing aqueous aggregation using experimental techniques. 5 All-atom explicit solvent simulations have been utilized to provide insight into the underlying physics of non-polar aggregation 6–8 but suffer from sampling issues due to the long length- and time-scales of these processes. Implicit solvent models attempt to resolve this limitation by removing the solvent degrees of freedom from the simulation. Despite the extent of previous work in the theory of implicit solvent models, there is no model that accurately and efficiently capture non-polar solvent interactions in simulation. It is of importance, therefore, to develop physically accurate implicit solvent models to better sample molecular aggregation via molecular simulations. The division of solvation interactions into those that are hydrophilic and hydropho-

2

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

bic can more accurately be divided into electrostatic and non-polar interactions. The focus of much previous work on implicit solvent models has been on decomposing the free energy of solvation, ∆Gsolv , into these two components: ∆Gsolv = ∆Gelec + ∆Gnonpolar . 9,10 This division relies on considering solvation in two steps; first the energy required to solvate an uncharged solute followed by the energy required to charge the system. The implicit solvent models most readily available for simulations use two independent theories for the two terms such as the Poisson-Boltzmann, 11 generalized Born (GB) 12 or others 13–16 to estimate ∆Gelec and solvent accessible surface areas (SASA) to estimate ∆Gnonpolar . 17–19 More recent studies show the importance of having a theory that consistently describes the two processes. 20 More accurate theories either avoid dividing the free energy into two terms, such as density functional theory (DFT) studies, 21 reference interaction site models (RISM), 22,23 and others, 24 or have an explicit division of the polar and non-polar interactions that are calculated from the same input. 25,26 Whereas these theories are more accurate than typical GB and SASA implicit solvent models, they require substantially more computation. Our goal is to capture the physics of these higher level theories while being computationally efficient enough to be used in molecular simulations. Developing an implicit solvent model for non-polar interactions is a first and necessary step in creating a more general theory that includes polar and charged solutes. The work of Weeks, Chandler and Andersen (WCA) demonstrates that the non-polar solvation free energy can be decomposed into a part due to the hard sphere repulsive forces and longer-ranged attractive forces of the solvent: ∆Gnonpolar = ∆Ghs + ∆Gdisp . 27 Additionally, using hard spheres as a reference system, 28 the WCA theory shows that ∆Gnonpolar is dominated by contributions from the excluded volume effects, ∆Ghs . Implicit solvent models designed to capture the non-polar solvation behavior have mainly focused on a simple picture of a constant density of solvent around the solutes, leading to ∆Ghs being proportional to some combination of solute surface area 17–19 and/or volume. 29–32 More recently, improvements to these approaches have been achieved by including the attractive component of the solvation free energy in the form of dispersion

3

ACS Paragon Plus Environment

Page 4 of 45

Page 5 of 45

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

interactions. 31,33–38 Unfortunately, due to computational efficiency and availability, solute surface area models are the most prevalent non-polar implicit solvent model despite an increasing body of literature describing their failings. 37–39 Implicit solvent models such as generalized Born for electrostatics and surface area arguments for non-polar interactions rely on the approximation that there is no variation in solvent density around the solute beyond excluded volumes. The variation in solvent density plays an important role in molecular aggregation where enhancements of solvent density between two constituents give rise to barriers. The work of Hummer et al. 40 demonstrates how the Kirkwood superposition approximation 41 can be used to predict the density of solvent around a solute. Further work by Ashbaugh et al. 42 and Pellegrini et al. 43,44 shows how these concepts can be incorporated into an implicit solvent model but only for systems with limited degrees of freedom. These methods focus on reproducing the interactions of two non-polar solutes as opposed to approaches which attempt to capture the full solvation process from vacuum to solution. Using the language of Ben-Amotz, this can be termed treating non-polar interaction rather than non-polar solvation. 3,45 This approach is more intuitive for designing an implicit solvent simulation for aggregation whereby we are more concerned with how the system changes given that the molecules are already solvated as opposed to considering the solvation process itself. Our work extends the previous work on using the Kirkwood superposition for implicit solvation resulting in approximations that can be used to run a molecular simulation. We denote this simulation method Implicit Solvation using the Superposition Approximation (IS-SPA). Whereas the model we consider in this work is specifically developed for non-polar solvation effects, it provides a consistent framework for the addition of electrostatic effects in future studies. In Section 2 we develop our formalism and introduce the approximations necessary to employ it in a computationally efficient manner. Section 3 describes the methods used to study these theories. Section 4 shows the accuracy and probes the limitations of our approximations using both solvated Lennard-Jones spheres as a simple system and more physically relevant alkane

4

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 45

systems.

2

Theory

The goal of any implicit solvent model in a simulation is to accurately predict the mean force of the solvent impinging on the solutes without explicitly simulating all the solvent degrees of freedom. The relevancy of the mean force hfi i is due to the fact that the average distribution function of N solutes, G(RN ), is directly related to the potential of mean force (PMF) of a single solute i

  hfi i = ∇i T ln G(RN ) ,

(1)

where T is the temperature in units of energy. The mean force on solutes can similarly be written in terms of the mean distribution of the solvent at position r given the solute positions RN , g(r; RN ). Namely, in the case of pairwise additive interactions,

hfi i = ρ

Z

fi,solv (r − Ri )g(r; RN )dr +

X j6=i

fij (Rj − Ri )

= hfi isolv + fi,solute

(2)

where ρ is the number density of the solvent, fi,solv is the direct force between a solvent molecule and solute i, and fij is the direct force between solute atom i and j. This relation demonstrates how the solvent effects can be treated separately from the solute-solute interactions, which are calculated explicitly in simulation. In the classical force-fields considered in this study, the solvation force is a sum of Coulomb and Lennard-Jones contributions allowing the average solvent force to be expressed as: hfi isolv = hfi iCoulomb +hfi iLennard−Jones . These two averages are coupled solv solv by the distribution function, g(r; RN ), over which they are averaged. For the sake of ≈ 0. this work we will consider solutes with little to no charge implying that hfi iCoulomb solv A study of the limit of charge on solute species allowed by the approximations presented

5

ACS Paragon Plus Environment

Page 7 of 45

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

in the next two subsections is provided in Supporting Information. While the approximations introduced below are motivated by the non-polar solutes studied here, the above formulation of predicting the distribution of water around a solute configuration to measure the mean solvent force is still valid for the case of charged solutes. The presence of polar or charged solutes give rise to two separate challenges to be addressed in future work. The first is to predict the enhancement of solvent density around a polar or charged solute relative to the uncharged system. The second challenge is to efficiently treat the long-ranged nature of the Coulomb force. These challenges can both be approached by considering not only the density of the solvent but also the alignment of the water in the presense of charged solutes. The current work considers the first step of developing an implicit solvent model where the orientational degrees of freedom can be disregarded.

2.1

Superposition approximation

Our approach is to find appropriate approximations for the solvent distribution function for a given solute configuration that allows for calculation of the mean solvent force to be used in simulation. The first approximation we use is the Kirkwood superposition approximation (SPA), 41 where the many-body distribution function is reduced to a product of two-body distribution functions gj ,

N

g(r; R ) ≃

N Y

gj (r; Rj ).

(3)

j=1

The implication of this equation is that knowledge of the mean solvent distribution around a single separated atom can be used to predict the solvent distribution around any collection of them. The SPA is the result of an exact first-order Taylor expansion of the distribution function, where the nth order terms are related to (n + 1)-body interactions. Whereas higher order terms will be demonstrated as being important in the following section, we keep our approximations in the form of equation Equation 3 and only consider pairwise decomposition.

6

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

This theory is general for any solvent, but in the present work we consider aqueous solutions specifically. The TIP3P model of water is used for all explicit solvent simulations in this work. 46 This choice is motivated by the many thermodynamic properties that this water model reproduces. That said, the work here is not limited to the use of TIP3P and alternative parameters can be developed for any desired water model. Many simplifications can be made when considering a solvent such as TIP3P water. Namely, there are no internal degrees of freedom in a solvent molecule. Furthermore, only the water oxygen–solute distribution function gO (r) is needed to calculate the solvent forces since there are no Lennard-Jones parameters associated with the hydrogen atoms.

2.2

Parabolic approximation

Previous studies have used the SPA to calculate the distribution of water around an arbitrary solute configuration but were applied only to systems with limited degrees of freedom. 40,42–44 These restrictions are due to the fact that the distribution functions of each atom would need to be tabulated and a cut-off for interaction would need to be implemented to limit the number of necessary calculations. Furthermore, as will be demonstrated for the simple case of two Lennard-Jones spheres in the next section, shortcomings of the SPA prevent it from being used to accurately construct solvent distribution functions around a molecule from those of each atom. We develop further approximations of the SPA in order to create an implicit solvent model that can be used in molecular simulation. The SPA suggests that the only information needed to parameterize each atom type is the solvent distribution function around each atom in isolation gi (r). An additional approximation is motivated by the short range nature of van der Waals forces where the force between a solute and a solvent molecule is significant only within the first solvation shell. Even though there is structure in the solvent distribution beyond the first solvation shell, as can be seen at distances beyond 5 ˚ A in Figure 1(a), the mean force of the solvent becomes diminishingly small past that point, as seen in Figure 1(b).

7

ACS Paragon Plus Environment

Page 8 of 45

Page 9 of 45

We introduce a parabolic approximation as a simple functional form for the atomic radial distribution function

gi (r) ≃

    0    

|r| < r1,i

−αi (|r| − r0,i )2 + g0,i       1

(4)

r1,i < |r| < r2,i |r| > r2,i

where αi , r0,i and g0,i are free parameters and r1,i and r2,i are chosen to have gi be continuous. This approximation results in needing only three fitted parameters for a given atom type opposed to a fully tabulated gi (r). Figure 1(a) is an example of the water distribution function around a solvated Lennard-Jones sphere the size of a small molecule (rmin = 5 ˚ A) and the corresponding parabolic approximation. (b)

2.5

simulation approximation

2 g0

( f solv · rˆ )gO (r) (kcal/mol/Å)

(a) gO (r)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

1.5 1 0.5 0

12

simulation approximation

10 8 6 4 2 0

0

1

2

3

r1

4

r0 r2

5

6

7

-2

8

0

1

2

|r| (Å)

3

r1

4

5 r3

6

7

8

|r| (Å)

Figure 1: (a) The radial distribution function of water oxygens around a Lennard-Jones sphere. (b) The magnitude of the mean force upon a Lennard-Jones sphere from water oxygen as a function of distance. Various parameters in the approximation have physical interpretations that are indicated in the graphs. This approximation of gi , combined with the SPA, can be used to calculate the mean force of the solvent on a solute particle using only two-body distribution functions. The mean force due to the solvent on particle 1, for example, reduces to:

hf1 isolv = ρ

Z

f1,solv (r − R1 )g1 (r − R1 )

8

N Y

j=2

gj (r − Rj )dr.

ACS Paragon Plus Environment

(5)

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 45

Here, f1,solv is the force exerted on solute 1 by a solvent molecule in the force field used in the explicit solvent simulation. A cut-off for the range of the solvent interactions to a given solute is introduced by giving a different functional form for fi,solv gi . The force of a solvent molecule times the distribution function is approximated as

fi,solv (r)gi (r) ≃

    0    

|r| < r1,i

  fi,solv (r) −αi (|r| − r0,i )2 + g0,i       0

r1,i < |r| < r3,i

(6)

|r| > r3,i

where fi,solv is the explicit solute-solvent force and r1,i and r3,i are the roots of the parabola making the equation continuous. Note that the range that the solvent force acts is larger than the range of the correlation found in g(r), seen in the fact that r3,i > r2,i . An example of the solvent force projected onto the solute-solvent vector (fi,solv · rˆ)gi measured in simulation and the corresponding parabolic approximation are shown in Figure 1(b). The domination of the positive force at close separations is directly connected to the WCA theory and the role of repulsive forces in a liquid. 27 The dispersion force is also included in this theory as seen by the negative force at larger distances.

2.3

Extension to molecular dynamics simulations

There are a number of considerations that must be addressed to extend the theory outlined above to the aggregation of molecular systems. These considerations include fitting solvent distribution functions around molecules, sampling internal degrees of freedom of molecules, and approximating the integral necessary to solve fi,solv (r − Ri ). A discussion of how to fit molecular distribution functions is given in the next section and the benefits to this approach are provided in the Results section. The solution to sampling internal degrees of freedom and approximating fi,solv (r − Ri ) are addressed by developing a simulation code to perform a Monte Carlo integration for a given solute configuration. The details of the optimization of this protocol are broken down

9

ACS Paragon Plus Environment

Page 11 of 45

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

into considerations of the sampling distribution with which we perform the Monte Carlo integration and the number of sampling points necessary to perform an efficient simulation.

2.3.1

Fitting solvent distribution functions around molecules

As will be discussed in the Results section, using the solvated atom distribution functions does not accurately predict the distribution function around a molecule composed of those atoms. Instead of using the atomic distribution functions, as is done in previous studies, we fit the molecular distribution function directly. This is done by simulating a single molecule with all the internal degrees of freedom fixed and finding the average solvent distribution for a specific configuration. We keep the parabolic functional form for each atomistic component to predict the total distribution. Then these parabolic functions can be used via SPA to predict the enhancements of water around any set of molecules in any conformation. The solvent distribution function around a molecular solute is used to fit the parabolic parameters used for our model. The distribution functions are made using a three dimensional 0.1 ˚ A grid and then a least-squares algorithm is used to fit the parameters for the parabolic approximation. The chi-squared function to be minimized as a function of the fit parameters α is a sum over all grid points:

χ2 =

X

i∈grid



v i g i −

N Y

j=1

2

gj (|rj − ri |; α) .

(7)

In this equation, vi is the volume of the cell at position ri and gi is the measured value of the distribution function as measured from simulation.

2.3.2

Implicit solvent simulations

The parabolic functional forms of gi (r) and fi,solv (r)gi (r) can be used to solve the integral in Equation 2. Unfortunately, it becomes impractical to obtain analytic results for this integral when there are multiple overlaps of atomic solvation shells, analogous

10

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 45

to what is found in calculating molecular solvent accessible surface areas. One option in solving the integral is to develop a closed form approximation just as the linear combination of pairwise overlap approximation used in calculating solute surface areas. 19 Another option, and the one used in this work, is to solve the integral approximately by Monte Carlo importance sampling in the volume of r1,i –r3,i of a given solute atom. A molecular dynamics simulation code was developed to run IS-SPA. Other than the Monte Carlo integration which will be discussed in detail in the subsequent sections, this code is meant to replicate the algorithms of the Amber 14 code 47 with a few alterations beyond the addition of the implicit solvent force. As opposed to using the SHAKE algorithm, the hydrogen atoms are allowed to move freely but with a mass equivalent to carbon. This changing of the mass allows for a 2 fs time-step with reasonable energy conservation. 48 No cut-off to the non-bonding interactions is applied since the number of interactions is greatly reduced in these simulations. An Andersen thermostat is used to maintain a constant temperature. It was not necessary to remove center-of-mass force and torque for any of the examples described in this paper but these contributions are predicted to become larger for more complex systems.

2.3.3

Monte Carlo sampling distributions

The premise of Monte Carlo integration is to use pseudorandom numbers to approximate an unknown integral. In the case of IS-SPA, the unknown integral is the mean force of the solvent on a given atom as predicted using the SPA and parabolic approximation, hfi isolv = ρ

Z

fi,solv (r)gi (r)

Y j6=i

gj (r − Rj )dr.

(8)

By introducing a probability distribution p(r), hfi isolv is written as an ensemble average of sampling from this distribution,

hfi isolv =

*

Y ρ fi,solv (r)gi (r) gj (r − Rj ) p(r) j6=i

11

ACS Paragon Plus Environment

+

. p

(9)

Page 13 of 45

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Any distribution p can be used to approximate the integral as long as its domain contains the entire domain of the integrand function. The difference between using various distribution functions is in the efficiency of converging the average. Three distributions are considered over which to sample the integral. Ultimately the distribution that minimized the variance for a fixed number of sample points is chosen. For a more detailed discussion of the distributions investigated see the Supporting Information. The distribution chosen is essentially |fi,solv | g0,i but a full domain description is given as:

p(r)r2 dr =

    0    

1  A

    0

r < r1,i   |fi,solv | −αi (r − r0,i )2 + g0,i r2 dr

r1,i < r < r3,i .

(10)

r > r3,i

The normalization factor, A, is complicated since the integral contains six different powers of r and is provided in Supporting Information. Given this distribution, a set of NM C uncorrelated points is chosen and each point rn is given a weight wn equal to the argument of the integral in Equation 9,

wn =

Y ρ fi,solv (rn )gi (rn ) gj (rn ), p(rn )

(11)

j6=i

leading to hfi isolv = hwi. Since the points are uncorrelated, the efficiency of approximating the integral is directly related to the variance of w,

2 σhf i =

2 σw . NM C

(12)

Using 106 sample points, the variance of the mean force averaged over all atoms in a cyclohexane molecule is sigma2w = 6.59 (kcal/mol/˚ A)2 . A way to further reduce the variance of wn is to let the points generated for measuring the mean force on particle i also be considered in the measurement of mean force of all other atoms. This approach is in effect a different distribution function being

12

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 45

sampled. For example, using the distribution function above, p, a second distribution can be defined as N 1 X pB (r) = pi (r), N

(13)

i=1

where N is the number of atoms. In this way, N × NM C points are being sampled for each atom, albeit many points are sampled outside of the domain of non-zero force for a given atom. This method outperforms the previous distribution considerably with 2 = 3.17 (kcal/mol/˚ σw A)2 . We choose this distribution to run IS-SPA simulation since,

as is discussed next, lowering the error in the mean force measurement is crucial not only for the speed of the code but also the performance of the simulation.

2.3.4

Choice of Number of Monte Carlo Sample Points

The choice of NM C in IS-SPA is dictated by a balance between computational cost and integral convergence. The nature of Monte Carlo integration, however, makes it unavoidable to have error in the measurement of the mean force, embodied in Equa√ tion 12, where the error of the measurement is proportional to 1/ NM C . This error is uncorrelated in time and manifests itself as heating in a simulation, requiring the use of a thermostat to avoid system catastrophe. The use of the Andersen thermostat introduces an additional free variable, ν, the probability of velocity renormalization in each time step. Equations for the computational cost and tolerance of system heating are developed to specify the choice of NM C and ν. Velocity renormalization with the probability ν mitigates the heating of the system arising from a random force of variance hδFi2 i. For a simulation with a random force from Monte Carlo sampling and utilizing a velocity Verlet algorithm with an Andersen thermostat, the system obtains a temperature higher than the thermostat of (see Supporting Information for a more detailed derivation)   N X ∆t2 2 1 1 ∆T = , σ −1 3N mi w,i ν NM C

(14)

i=1

where ∆t is the Verlet integration time step, N is the number of particles, T is the

13

ACS Paragon Plus Environment

Page 15 of 45

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

2 temperature, mi is the mass of particle i and σw,i is determined from the choice of

distribution discussed in the previous section. As is shown in Supporting Information, the expected heating in Equation 14 as a function of NM C and ν agrees well with heating measured in IS-SPA simulations. In practice, a tolerance for ∆T is chosen and then the value of νNM C is determined by Equation 14. Since an Andersen thermostat will maintain the correct temperature regardless of the value of ν, it still remains to show how to further choose NM C . 49 We relate this decision to the overall computational cost of the simulation. The computational cost of an IS-SPA simulation can be divided into two parts, the time it takes to run a single time step tstep times the number of steps for uncorrelating the system Ncor , cost ∝ tstep Ncor .

(15)

Since any values of ν and NM C are viable, given that their product is small enough to prevent excess heating, minimizing the cost of the simulation determines the optimal values. The time of a single time step is roughly proportional to NM C since each Monte Carlo point requires some time of calculation for each time step. On the other hand, the thermostat parameter ν has little effect on the simulation timing; setting ν = 1 does induce more calculations, but at a minimal cost compared to changing NM C . The correlation time is more vague and problem dependent. We consider the autocorrelation function of the collective variable from an umbrella simulation of cyclohexane dimer for the purpose of quantifying Ncor in this work. This choice of metric is motivated by the fact that the value of the collective variable is the relevant observable to calculate the potential of mean force. The choice of a different metric produces slightly different results, but not so much as to change the general conclusions. Based on arguments developed further in Supporting Information, Ncor is strictly a function of ν. The correlation time tcor can be numerically calculated to be when |hδx(0)δx(t)i|/hδx2 i < 0.001 (or some other limiting value) for any t > tcor . Combining the correlation time and the cost of calculating a single time step, the cost of an IS-SPA

14

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

6

.

relative cost ∝ Ncor /ν

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 45

5 4 3 2 1 0

0

0.05

0.1

0.15

0.2 ν

Figure 2: The cost of an IS-SPA simulation relative to the most efficient parameters as a function of the probability of renormalizing a particle’s velocity. simulation is cost ∝ tstep Ncor (ν) ∝

1 Ncor (ν), ν

(16)

The cost of the simulation relative to the most efficient code as a function of ν is found in Figure 2. The most efficient IS-SPA simulation occurs at a value of ν that also corresponds to the most efficient Andersen thermostat. Surprisingly, using ∆T = 0.3 K, a one in a thousand accuracy at T = 300 K, and the most efficient value of ν for the motion of the collective variable for a cyclohexane dimer leads to NM C = 10. This small number of sampling points for each atom does not produce a complete sampling of the mean force each time step, but is enough to maintain the desired ensemble to the quoted precision. As discussed in the Supporting Information, increasing NM C provides no benefit since the exploration of phase space, as characterized by the positional autocorrelation function, is solely dictated by the thermostat and not the noise in the Monte Carlo sampling of the mean force (Figure SI3). A plot of the potentials of mean force generated using NM C = 10 and NM C = 1000 for the dimerization of aqueous cyclohexane is shown in Supporting Information to further demonstrate that the number of Monte Carlo points does not change the average distribution of states (Figure SI5). In general, the efficiency of IS-SPA over an explicit solvent simulation is concentra-

15

ACS Paragon Plus Environment

Page 17 of 45

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

tion dependent. As a rough approximation, the difference in the two calculations are found in the number of non-bonded interactions that are calculated using a neighbor list. If there are M water atoms per N solute atoms in the case of explicit solvent and NM C Monte Carlo points per solute atom in IS-SPA, then IS-SPA will outperform explicit solvent when NM C < M . The exact performance of the two codes is more complicated than this simple picture and is dependent on the optimizations for each of the codes. Using ν = 0.035 and NM C = 10 produces timing of 1.09 µs/day for a dimer of cyclohexane on a single CPU core. The same calculation with explicit water and using PME produce timing of 2.66 ns/day on a single CPU core. IS-SPA thus gives a 400 times increase in timing for a simulation with 36 solute atoms in a volume of 2500 water molecules, corresponding to a 40 mM concentration.

3

Computational Methods

The explicit solvent simulations used to parameterize and benchmark IS-SPA are performed using Amber 14 GPU code. 50 Each system is solvated with about 2500 TIP3P water molecules producing equilibrated box sizes of linear length over 40 ˚ A. Nonbonded interactions are truncated at 12 ˚ A and the particle mesh Ewald method is used to handle long-range electrostatic interactions. Restraining the bond length of bonds including a hydrogen atom using the SHAKE algorithm allows for an integration timestep of 2 fs. The simulations with a single solute molecule used to parameterize the IS-SPA simulations are performed with the solute fixed in space. The simulations are performed in an NVT ensemble using a Langevin thermostat at 298 K for 30 ns, writing the coordinates to file every 200 ps. All simulations with solute pairs are studied using umbrella sampling restraining the center of mass distances of the two solutes using a force constant of 20 kcal/mol/˚ A2 and a window separation of 0.5 ˚ A. Initial configurations for each window are generated by running unbiased simulations after an equilibration time of 3 ns. Each window is run

16

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

for 45 ns for the Lennard-Jones particles and for 15 ns for the alkanes. The simulations are performed in the NPT ensemble using a Monte Carlo barostat at 1 bar. WHAM is used to reweight the histograms from the umbrella sampled distributions to generate the potential of mean force, using bootstrapping to evaluate the uncertainties. 51 The all-atom simulations performed in this study correspond to umbrella sampling of two solvated Lennard-Jones spheres, umbrella sampling of two solvated alkane molecules and simulations of alkane monomers used to parameterize the theory. Specific details for each of these different types of simulations are found in Supporting Information.

4

Results and Discussion

4.1

Two solvated Lennard-Jones spheres

Solvated Lennard-Jones spheres of varying sizes are first considered to demonstrate the effects of two approximations used in IS-SPA: SPA and the parabolic approximation. The solvent distribution functions that are used both in the SPA and to fit the parameters in the IS-SPA model are measured from explicit solvent simulations of isolated Lennard-Jones spheres. The mean solvent force on a pair of Lennard-Jones spheres as a function of separation is then computed using Equation 2 for both models and compared to explicit solvent simulation in Figure 3. Despite the range of solute sizes spanning unphysically small systems of rmin = 1 ˚ A to those of molecules with rmin = 7 ˚ A, the behavior of the solvent force is qualitatively similar for all systems depicted in Figure 3. Starting at small distances for the solid lines in Figure 3, there is a strong attractive solvent force with a minimum around 1˚ A. The mean force becomes less attractive roughly linearly between distances of 1 ˚ A and past the rmin of each system. Moving to larger distances, the solvent force becomes repulsive which gives rise to the de-solvation barrier characteristic of solvated systems. The solvent force has a maximum that is relatively constant in magnitude of

17

ACS Paragon Plus Environment

Page 18 of 45

Page 19 of 45

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

0.25 kcal/mol/˚ A and occurs about 2 ˚ A past the rmin of each system. Beyond the first maximum, the correlations oscillate and go to zero at large distances. The SPA results are the dashed lines in Figure 3 and show consistent trends with the explicit solvent results. The largest discrepancy between the SPA and explicit solvent results are at typical bonded distances of R < 2 ˚ A for all particle sizes where the SPA produces a more attractive force than is seen in simulation. The SPA accurately describes the slope of the mean solvent force as a function a separation distance for typical distances from equilibrium distances to the first maximum, albeit being less attractive in the SPA as compared to explicit solvent. This less attractive force in the SPA becomes worse for larger solutes, being negligible for rmin = 1 ˚ A and ∼0.5 kcal/mol/˚ A for rmin = 7 ˚ A. Beyond the first maximum in the mean force, the SPA is quantitatively accurate with the explicit solvent results. The parabolic approximation results are the dotted lines in Figure 3, and are seen to be very similar to the SPA results. The largest difference between SPA and the parabolic approximation occurs at about 3 ˚ A where the SPA produces a shoulder in the solvent mean force as a function of distance for all values of rmin . In this respect, the parabolic approximation better captures the physics of the explicit solvent simulation where the slope of the mean force remains relatively constant. Between R =4 ˚ A to the first maximum of the mean force for each system, the differences in the parabolic approximation and explicit solvent is completely described in the deficiencies of SPA. The oscillations in the mean force beyond the first maximum are not described by the parabolic approximation since it is a first solvation shell approximation. In moving forward in developing a model for molecular simulations, Figure 3 shows that using the SPA and the parabolic approximation will not be effective to describe bonded systems but is more effective to describe the solvent force between molecules at non-bonded distances. The increasing discrepancy of the approximations compared to explicit solvent simulations as a function of particle size raises the question of whether different physics is occurring for the different systems. In particular, work on the hydrophobic effect sug-

18

ACS Paragon Plus Environment

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

h f i solv (kcal/mol/Å)

Journal of Chemical Theory and Computation

Page 20 of 45

0.5 0 -0.5 -1 rmin rmin rmin rmin

-1.5 -2

0

2

4

6

= 1.0 Å = 3.0 Å = 5.0 Å = 7.0 Å 8

10 R (Å)

12

Figure 3: The mean solvent force on a pair of Lennard-Jones particles of various sizes at a given distance. The solid lines are from explicit solvent simulations, dashed lines are the SPA and dotted lines are the parabolic approximation used in IS-SPA. gests that the nature of solvation changes for large and small systems with a crossover transition around 10 ˚ A. 32 Since a dimer of the largest system considered here is larger than this transitional size, it remains to be seen if it is this phenomenon that gives rise to the SPA to be a worse approximation for larger systems. To explore this idea, Figure 4 shows in more detail the differences of the simulation results compared to the SPA for the four particle sizes considered here. For each subfigure, the top of plot shows the solvent distribution function from the simulation on the right to be compared with the SPA on the left. The bottom plot shows the difference in the solvent force acting on the solute to the right between the two distribution functions plotted above, (f · rˆ1 )∆gO (r). At these equilibrium distances, it is observed that the biggest discrepancy between SPA and explicit solvent for all the particle sizes occurs in the overlapping regions of the first solvation shells of the two particles. This region occurs near the cusp where the two particles intersect (the blue to black in the upper plots of Figure 4 just to the right and left of the dotted lines) with the SPA overpredicting the enhancement in this position.

19

ACS Paragon Plus Environment

Page 21 of 45

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

This overprediction of density leads to a larger effective repulsive force between the two solvents (corresponding purple areas of the lower plots of Figure 4). The volume of this region grows as a function of particle size leading to the larger discrepancy between SPA and explicit solvent seen in Figure 3 for solutes with larger rmin . The presence of additional discrepancies between SPA and explicit solvent for rmin = 1 ˚ A and to some extent rmin = 3 ˚ A are due to the fact that these solutes are smaller than the solvent molecules; other problems of the SPA appear at the small equilibrium distances of these particles. In this sense, the SPA actually performs better for larger particle sizes. Thus, while there are some cancellations of errors, the dominant component leading to the worse performance of SPA as particle size grows is due to the increasing volume of the overlap region rather than a crossover transition in the solvation behavior of hydrophobic solutes. For the purpose of this work, we use the parabolic approximation to describe systems separated at non-bonded distances. Future work will focus on using alternative approximations of the solvent distribution function around solutes, but these initial studies show that one approximation can be developed to describe all system sizes. It is important to separate the effects of the two distinct approximations of the SPA and the parabolic approximation. It is the first approximation of the SPA that gives rise to most of the observed discrepancies with the explicit solvent simulation, particularly at bonded distances. This is in contrast to the parabolic approximation, which has a high fidelity to the SPA. The implication is that confining the approximation to be a first solvation shell effect is accurate, but the mixing rule that SPA supposes is not completely accurate. The next section demonstrates how the SPA can be altered to be applied to the bonded atoms in molecules.

20

ACS Paragon Plus Environment

4

(a)

rmin = 1 Å

(b)

SPA

Explicit

rmin = 3 Å

SPA

Explicit

3 2.5

2

2

0

1.5 1

-2 -4

( f · rˆ1 )∆gO (r)

-6 6 4

(c)

0.5

( f · rˆ1 )∆gO (r) (d)

rmin = 5 Å

SPA

Explicit

0

rmin = 7 Å

SPA

Explicit

2 0 -2 -4 -6

( f · rˆ1 )∆gO (r)

( f · rˆ1 )∆gO (r) -8

-6

-4

-2

0

2

4

6

8

-8

-6

-4

-2

0

2

4

6

8

2 1.5 1 0.5 0 -0.5 -1 -1.5 -2

( f · rˆ1 )∆gO (r) (kcal/mol/Å)

y (Å)

6

y (Å)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 45

gO (r)

Journal of Chemical Theory and Computation

x (Å)

x (Å)

Figure 4: The average distribution of water oxygens around a pair of Lennard-Jones spheres with different rmin values at their respective equilibrium distance compared to the SPA. The size of rmin are (a) 1 ˚ A, (b) 3 ˚ A, (c) 5 ˚ A,and (d) 7 ˚ A. In each figure, the top right is the result of the explicit solvent simulations, the top left is the result of the SPA and the bottom shows the difference in solvent force between the two distributions shown in the top plot acting on the particle to the right.

21

ACS Paragon Plus Environment

Page 23 of 45

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

4.2

Dimerization of Alkanes

4.2.1

IS-SPA

The next step is to extend the theory to molecular systems. The goal here is to design an implicit solvent model while keeping the atomic detail of the solute. We use atomically resolved methane as an initial test system. The high symmetry and limited number of internal degrees of freedom make methane closely related to the Lennard-Jones systems described above. In fact, an entire molecule of methane can be approximated as a Lennard-Jones sphere with ǫ = 0.294 kcal/mol and rmin = 4.187 ˚ A. 52 We want to be able to model every atom in methane and still get consistent results of this united atom approximation. Three alterations that need to be made from the Lennard-Jones system are the introduction of atomic charges, corrections to the SPA for bonded systems, and a method to apply the model in a simulation. The first difference between molecules and the Lennard-Jones system is the presence of atomic charges. Even though the non-polar systems studied herein have non-zero atomic charges, the charges are small and can be treated with a constant dielectric. A preliminary study of how IS-SPA performs as a function of charge is presented in Supporting Information. The effect of charge is observed to impact the results of ISSPA for charges of magnitude greater than ∼ 0.4 e, well below the magnitude of charges in these systems. Further work is needed to extend IS-SPA to more highly charged systems. The second effect that needs to be addressed in solvated molecules is the failure of SPA to correctly predict the solvent distribution around atoms at bonded distances. The SPA suggests the following prescription for modeling the solvation of methane. Each atom in methane is a Lennard-Jones particle and can be artificially studied as a single solvated atom to measure gH (r) and gC (r) . The resulting atomic solvent distribution functions are then used to predict the solvent distribution around a molecule of methane: gCH4 (r) ≃ gC (|RC − r|)

22

4 Y i=1

gH (|RHi − r|).

ACS Paragon Plus Environment

(17)

Journal of Chemical Theory and Computation

An immediate problem with this equation is that it will greatly overpredict values of gCH4 . This problem of the SPA is anticipated by the results of the Lennard-Jones study, where the SPA fails to describe the solvent distribution for a pair of atoms at bonded distances of about 1 ˚ A. For example, the maximum values of gH and gC are both approximately gmax ≃ 1.7, as seen in the points in Figure 5. There is a point outside of methane that is the distance of these maxima from three hydrogen atoms and the carbon atom. The anticipated water density enhancement at this point would be gmax ≃ 1.74 ≃ 8.4. In light of the previous discussion of Lennard-Jones spheres, this result is an order of magnitude above the anticipated result of a single Lennard-Jones sphere used to describe methane. (a) Carbon

(b) Hydrogen

2 gO (r)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 45

1.5

1

0.5

0

0

1

2

3

4

5

6 7 r (Å)

8

0

1

2

3

4

5

6 7 r (Å)

8

Figure 5: The fitted parabolas for (a) carbon atoms and (b) hydrogen atoms found upon fitting the solvent distribution around methane. The points are the distribution function measured for a bare Lennard-Jones sphere using the respective atomic parameters. Instead of delving into the exact correction terms associated with the SPA, we take an empirical approach of directly measuring gCH4 from simulation and using that as input to find appropriate parameters to the approximation. For this purpose, a simulation is performed of a single solvated solute fixed in space. In this fashion, the average solvent distribution around the solute is measured. The resulting gO (r) is used to fit the parameters in the parabolic approximation directly. The need to fit the parabolic approximation instead of using a tabulated gO (r) comes from the fact that the solvent distribution around the solute changes for different molecular configurations;

23

ACS Paragon Plus Environment

Page 25 of 45

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

these changes are subtle for the case of a rigid molecule like methane, but become more acute for larger systems such as butane where multiple dihedral states are possible (see Supporting Information for a comparison of molecular states observed in butane). Parameters for the parabolic approximation in IS-SPA are found for the two atom types of methane, since the hydrogen atoms are degenerate, by fitting to the molecular gCH4 (r) computed from explicit solvent simulation. Fitting the resulting six parameters produce the parabolas that describe the atomic distributions shown in the purple curve of Figure 5. These parabolas can be directly compared to the distribution measured for a solvated atom depicted as open circles in Figure 5. Furthermore, a two dimensional slice of the water density around a methane molecule as measured in simulation and the resulting parabolic approximation are shown in Figure 6. The parabolic approximation is seen to capture the correct excluded volume of the molecule (in red), along with the enhancements of the first solvation shell (in blue). The difference between the two models at large distances is due to poor sampling from the explicit solvent simulation; on average the density is equal to bulk and thus should be white, but both blue and red cells are seen in equal probability. Figures 5 and 6 lend a physical intuition about the fit parameters for the atom types. For example, in Figure 5, the radius of the excluded volume is 2.8 ˚ A for a A for a hydrogen atom, which is recaptured in the parabolic apcarbon atom and 2.3 ˚ proximation parameters. Also, the position of the maximum of the first solvation shell is relatively conserved. What is starkly different is the magnitude of the distributions; to compensate for the overprediction of the SPA, the fit parameters yield smaller magnitudes for the atomic contributions. As seen in Figure 6, this decrease of magnitudes is what gives rise to a maximum in the solvent density of gmax ≃ 2 opposed to the value of gmax ≃ 8.4 predicted by the SPA due to atomic solvation contributions. The third extension of the SPA beyond that of the Lennard-Jones system is to have a method to calculate the mean forces in simulation. The only relative degree of freedom between two Lennard-Jones spheres is the radial distance between the two, making it possible to study all configurations of the system. The multitude of relative

24

ACS Paragon Plus Environment

8 y (Å)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

3 Explicit

6

IS-SPA

2.5

Page 26 of 45

gO (r)

Journal of Chemical Theory and Computation

4 2

2 0

1.5

-2

1

-4 0.5

-6 -8

-8

-6

-4

-2

0

2

4 6 x (Å)

8

0

Figure 6: A slice of the water oxygens distribution function around a methane molecule. The plane crosses the centers of the carbon atom and two hydrogen atoms located in the positive y-direction. The left side of the plot is the results of explicit solvent simulation and the right is the prediction from the SPA using the fitted parabolas.

25

ACS Paragon Plus Environment

Page 27 of 45

degrees of freedom available for a pair of solvated molecules makes direct calculation of all configurations infeasible. Instead, molecular dynamics simulations are used to sample the phase space available for the system. For this purpose, IS-SPA uses Monte Carlo sampling of the parabolic approximation to measure the mean force of the solvent on each atom at each time step of a simulation as discussed previously in the Theory section. An IS-SPA simulation of two methane molecules can now be performed using these fit parameters and the potential of mean force (PMF) of dimerization measured. The resulting PMF is labeled ‘IS-SPA’ in Figure 7(a) and shown along with the PMFs measured in a vacuum and in explicit solvent. (a)

(b)

1

1

vacuum explicit IS-SPA

0.5

h f i solv (kcal/mol/Å)

uPMF (kcal/mol)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

0

-0.5

-1

explicit IS-SPA

0.5

0

-0.5

0

2

4

6

8

10

12 R (Å)

-1

14

0

2

4

6

8

10

12 R (Å)

14

Figure 7: The (a) PMF and (b) mean solvent force measured for a dimer of methane molecules from explicit solvent and IS-SPA simulations. The vertical line at ∼ 3.9 ˚ A in (b) corresponds to the distance of the minimum of the explicit solvent PMF. To better compare with the results presented on the Lennard-Jones systems, the mean solvent force in the three solvent models is shown in Figure 7(b). This force is approximated by taking the difference of the PMF with that of the vacuum system and taking the negative derivative. Agreement is seen between explicit solvent and IS-SPA for both the force and the free energy in Figure 7 at the barrier of dimerization (6 ˚ A) and at equilibrium distance (4 ˚ A). Furthermore, the difference between IS-SPA and the explicit solvent simulation are consistent with what is observed in the case of

26

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

solvated Lennard-Jones spheres; the magnitude of the predicted force is smaller both in the barrier and at equilibrium distances. This observation further suggests that the SPA is the source of largest error in our theory, and not the parabolic approximation. A similar procedure can be carried out to parameterize and perform IS-SPA simulations for the dimerization of any non-polar solute in water. This has done for linear alkanes from methane to hexane and cylcoalkanes from cyclopropane to cyclohexane. The results are given in Supporting Information. It is demonstrated that IS-SPA is able to recapture the thermodynamics of the dimerization events for all of these systems. Specifically, the position and height of the barriers as well as the position and depth of the minima in IS-SPA are found to be in good agreement with explicit solvent and significantly better than SASA. Furthermore, the IS-SPA parameters are demonstrated to be transferable among all of the alkane systems suggesting the inherently physical nature of these parameters.

4.2.2

Comparison to Other Methods

IS-SPA can now be compared to the results generated by other implicit solvent models. For this end, we study the homodimerization PMF of methane from a variety of models. These systems are SASA, 19 RISM, 53 and alternative descriptions of g(r) using the ISSPA method. 43,44 Ultimately, comparing the accuracy and efficiency of the models shows the superiority of IS-SPA over typical implicit solvent models for these systems. We use all atom explicit solvent simulations as a reference state in order to compare the accuracy of the various implicit solvent models. Ideally, our results would be compared to experiment, but alkane homodimerization PMF’s are not readily measured in experiment. Furthermore, many implicit solvent models, including IS-SPA and RISM, are parameterized based on results from a specific explicit solvent. We use relative entropy Srel to quantify the accuracy of these models. 54,55 The relative entropy is a measure of how different two probability distributions are; smaller values of relative entropy correspond to more similar distributions with Srel = 0 only occurring when the distributions are the same. The relative entropy using two PMF’s is calculated to

27

ACS Paragon Plus Environment

Page 28 of 45

Page 29 of 45

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

be

Srel

4π = V

Z

0

∞



    1 1 uT (R) exp − (uM (R) − uT (R)) + (uM (R) − uT (R)) − 1 exp − R2 dR T T T (18)

where V is the volume, T is the temperature in units of energy, uM (R) and uT (R) are the model and target PMF’s, respectively. In this case, the target system is explicit solvent and the model is the attempt to get the same physics. This function goes to zero in the limit of infinite volume, but lim(V →∞) V Srel is finite and reported in Table 1. The speed of the simulation running on a single CPU is used to compare the computational cost of the different models. The values are specifically for the methane dimer studied in this section and can be expected to produce different relative values for different systems. Also, the fact that Amber is used for all the simulations except for the IS-SPA method further complicates comparison to computational efficiency. These timings are meant to represent a general sense of the computational cost of the different methods. Also, simulations using the same code are quoted to have the same timing, such as the two SASA simulations and the two systems using the IS-SPA code, as discussed below. SASA descriptions of solvation free energies are attractive for their computational efficiency, as seen in the relative speed values in Table 1. This efficiency comes at the cost of accuracy, as seen in the large values of relative entropy in Table 1. Two results for SASA are shown corresponding to different values of the surface tension constant γ. The line labeled “SASA” is calculated using a typical surface tension in implicit solvent simulations, γ = 0.005 kcal/mol/˚ A2 . This PMF is barely any different from the system in vacuum and completely devoid of any structure such as the enhanced free energy minimum or a de-solvation barrier. The surface tension constant can be tuned to at least add enough of an effect to correctly describe the free energy minimum of the PMF. We find using γ = 0.03 kcal/mol/˚ A2 accomplishes this goal and the results for this optimized SASA are labeled “SASA opt” in Table 1 and Figure 8. Despite

28

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

correctly describing the minimum in the PMF, the other features of the solvated PMF such as the de-solvation barrier at ∼5.5 ˚ A in the explicit solvent PMF are missing. Also note that the relative entropy for this optimized SASA is only 10.% lower than that of SASA using a typical surface tension. This is in contrast to the 72.% drop in Srel of IS-SPA compared to SASA. The relatively small change in Srel for the two SASA simulations emphasizes the fault in the logic that describing the free energy minimum suffices in getting the thermodynamics correct. RISM represents the opposite in terms of theoretical accuracy and computational efficiency as compared to SASA. In fact, using RISM to run a simulation is costly to the extent that it takes about six times the computational time as compared to explicit solvent. The resulting PMF shows the expected features of an enhanced well and a de-solvation barrier, but considerably shifted in the distance coordinate. This includes the barrier being shifted by 0.5 ˚ A and the minimum by 0.2 ˚ A towards closer distances. This mismatch in the RISM PMF relative to that of explicit solvent results in a large relative entropy. IS-SPA is able to outperform the accuracy of RISM because the input of the molecular solvent distribution function; RISM attempts to construct this function with no a priori information whereas it is input necessary to run IS-SPA. The IS-SPA method does not preclude using other approximations to the solvent distribution function. For example, the original approximation proposed in the work of Pellegrini et al. can be used in a simulation using our proposed method. 43,44 The work of Pellegrini et al. considers the SPA using the atomic radial distribution functions. To avoid the large densities this produces, as discussed above, the distribution function is capped at a value of 3. The results in Table 1 and Figure 8 are labeled “capped SPA”. The results show enhanced features relative to the explicit solvent PMF; the barrier is ∼3.05 times larger and the minimum is ∼1.57 times larger. The lower performance in accuracy as compared to IS-SPA further demonstrates the significance of fitting the molecular solvent distribution function as opposed to each atom separately. Of all the implicit solvent models considered here, IS-SPA retains the highest fidelity to all atom simulations. Furthermore, the computational cost is comparable to SASA.

29

ACS Paragon Plus Environment

Page 30 of 45

Page 31 of 45

This enhancement of both accuracy and efficiency of computation highlight the power of this new technique. uPMF (kcal/mol)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

1.5 1.25 1 0.75 0.5 0.25 0 -0.25 -0.5 -0.75 -1 -1.25 -1.5

explicit SASA SASA opt RISM capped SPA IS-SPA

0

2

4

6

8

10 R (Å)

12

Figure 8: PMF’s for the homodimerization of methane for a variety of solvent models. See text for details.

Table 1: Relative entropies and timing of simulation for methane dimer PMF’s using different solvent models. The relative timing is for running the simulations on a single CPU as compared to explicit solvent simulation in units of ns/day. Model explicit solvent SASA SASA opt RISM capped SPA IS-SPA

4.3

V Srel (˚ A3 ) 0 22.7(4) 20.4(4) 19.7(5) 31.2(5) 6.4(3)

relative speed 1. 3300. 3300. 0.17 3900. 3900.

Aggregate of Cyclohexane

Up to this point we have demonstrated the accuracy and efficiency of IS-SPA for dimer systems. It remains to be seen whether many-body interactions can be appropriately described using our version of the SPA. Aspects of these many-body interactions are mitigated by using the molecular distribution function as opposed to those of each independent atom, but the density variations due to interacting molecules is still a concern.

30

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

With the intention of using this implicit solvent model to study aqueous aggregation we turn our attention now to a collection of solute molecules. In particular, we consider 500 cyclohexane molecules phase separated from an aqueous phase simulated for 100 ns. We demonstrate that IS-SPA captures the trends of both the water density surrounding the organic phase and also the density profile of the aggregate itself. In order to capture the appropriate bulk behavior of the cyclohexane phase, a number of modifications to the IS-SPA simulation code are made. The first is the inclusion of non-solvated bubbles. Due to the bulk density of cyclohexane, there are vacuum bubbles that form inside the cyclohexane phase in which there should be no water. Without modification, IS-SPA would place significant solvent density in these areas causing non-physical behavior of the bulk. An analysis of the number of solute atoms close to a Monte Carlo point demonstrates that these buried bubbles have significantly more neighbors than actual solvated regions. The holes in the bulk cyclohexane phase are found to be removed by setting the solvent density to zero when there are 12 or more solute atoms within the distance r2 to the Monte Carlo point. The second modification that is necessary is an increase in the cutoff length used for the solvent force calculation. The relatively short cutoff length (around 5.5 ˚ A for carbon atoms) in the dimer simulations is motivated by the rapidly decaying Lennard-Jones force between a water molecule and the solute atom. As is depicted in Figure 1(b), this short cutoff captures most of the force weighted by the solvent density but does neglect small attractive component at larger distances. This small omission becomes extremely important at the interface between the alkane phase and the aqueous phase. We thus do not use Equation 6 to approximate the extent of the solvent force but rather increase the cutoff length of the solvent sampling Equation 4 to 8.0 ˚ A for all atom types directly in the IS-SPA aggregate simulations. The addition of a neighbor list in calculating both the non-bonded and solvent forces becomes important for studying a system 250 times the size of the dimer systems. The geometry of the system is a rectangular box with the z-dimension being twice as long as the other two. This container, with periodic boundary conditions, creates

31

ACS Paragon Plus Environment

Page 32 of 45

Page 33 of 45

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

a slab of cyclohexane molecules that span the two smaller dimensions to minimize the phase boundaries, as seen in Figure 9(a). Before performing an IS-SPA simulation, the solvent density prediction of IS-SPA can be compared to that from explicit solvent. A snapshot from an explicit solvent simulation is chosen and the average solvent density is computed for both IS-SPA and explicit solvent simulations. The results are depicted in Figure 9(b) and (c). Much like in the previous two sections, the solvent density for a particular two dimensional slice is depicted with red indicating zero density, white indicating bulk density and blue-black indicated increased solvent density relative to bulk. This figure demonstrates that IS-SPA does overpredict the solvent density in certain areas at the interface but overall captures it with qualitative agreement. The ability of SPA to describe the density of an extended surface of hydrophobic solutes is consistent of previous studies of the water density across a collection of alkanes. 58 (a)

(b)

12

vacuum explicit IS-SPA

10

ρ (mol/L)

ρ (mol/L)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 34 of 45

8

102

vacuum explicit IS-SPA

101 100 10−1

6 10−2 4

10−3

2 0

10−4 0

5

10

15

20

25

30 z (Å)

10−5

35

0

5

10

15

20

25

30 35 z (Å)

Figure 10: Density profiles of the aggregate from the different simulations. (a) Normal scale to show the liquid structure and (b) the density on a logarithmic scale to show the density of the sparse phase. Unbiased simulation is used to study the phase coexistence between the organic and aqueous phases. Namely, we consider the density profile of the organic phase from the center of mass of the cyclohexanes. These profiles are also compared to that of liquidvapor coexistence of cyclohexane at T = 298 K to emphasize the effects of the aqueous phase. The resulting density profiles are shown in Figure 10 using both a normal, (a),

33

ACS Paragon Plus Environment

Page 35 of 45

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

and logarithmic, (b), scale to show details of the organic (liquid) and aqueous (vapor) phases, respectively. The results are also summarized in Table 2. The all-atom simulations quantitatively predict the experimental densities of the dense phases, may it be the liquid phase for the liquid-vapor coexistence or organic phase for the two component system. The presence of the aqueous phase gives rise to a sharper interface and also causes ordering of the dense phase relative to this interface. IS-SPA, in comparison, compacts the organic phase as seen both in the 10% increase of density and the 1.5 ˚ A decrease of the interface position. This increase of density is consistent with the heightened water density at the aggregate surface, which gives rise to larger forces at the interface in the IS-SPA simulation than found from explicit solvent. The density of the sparse phase, may it be the vapor phase for the liquid-vapor coexistence or the solvated phase for the two component system, changes by orders of magnitude in the presence of the aqueous phase. A short coming of the cyclohexane model used in this work is that it predicts a fifty fold decrease in solubility as compared to experiment. Regardless, our goal is to correctly replicate the explicit solvent result since it is the basis of the IS-SPA parameters. Unfortunately, the sparse phase was not resolved from the IS-SPA simulation, suggesting that IS-SPA predicts at least another order of magnitude smaller solubility than predicted by explicit solvent simulation. Table 2 also contains data for the speed of the simulations being run on a single CPU in units of ns/day. The speed of our IS-SPA code is 1.3 times slower than the Amber code for explicit water simulation. Although, it should be noted that IS-SPA is not being developed for systems with this high a concentration of solutes; about one in every three atoms belong to a cyclohexane molecule in the all-atom simulation. Furthermore, our code is not optimized to the same degree as the Amber software. For example, the system of cyclohexane liquid-vapor coexistence has a timing for both our code and the Amber code, with the Amber code running faster despite the addition of Ewald sums. Assuming that the IS-SPA algorithm can be implemented in Amber at the same relative cost, we anticipate IS-SPA to be 1.1 times faster than explicit solvent

34

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

at the high concentrations considered here. The qualitative agreement between IS-SPA and explicit solvent indicates that the SPA, in part, contains the relevant physics in the solvent mean force. Aspects such as the enhanced water density at the aggregate surface, the structuring of the aggregate relative to the interface, and the decrease of density in the sparse phase relative to liquid-vapor coexistence are all captured by IS-SPA. The lack of quantitative agreement suggest that higher order terms to the SPA are necessary to accurately describe the water density around the aggregate.

5

Conclusions

We have demonstrated that the non-polar component of the solvation force can be captured implicitly using the IS-SPA approach. The IS-SPA formalism utilizes the Kirkwood SPA and a parabolic first solvation shell approximation to allow for the derivation of solvation parameters from monomer explicit solvent simulations. The IS-SPA parameters for each atom type are comprised of three physically interpretable quantities: the radius of excluded volume, the peak of the first solvation shell and the position at which the water distribution function goes to bulk. IS-SPA also represents a unique method of running an implicit solvent simulation given a solvent distribution function. It becomes untenable to exactly calculate the mean solvent force for an arbitrary solvent distribution. We introduce a method to use Monte Carlo importance sampling requiring only 10 sampling points per atom to generate the mean force. We show that despite the mean force not being resolved to high precision in each time step, the associated statistics remain unaltered. This poor sampling of the mean force leads to simulation of up to 3900 times faster than explicit solvent for the dimer PMFs considered in this work. The method is first tested against the homodimerization of solvated Lennard-Jones spheres of varying sizes. It is found that the explicit solvent force is qualitatively captured by IS-SPA over the full range of interparticle separations. Quantitative dis-

35

ACS Paragon Plus Environment

Page 36 of 45

Page 37 of 45

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

agreement is observed at the de-solvation barrier and distances equivalent to that of a covalent bond (< 3 ˚ A). These deviations are found to be due to the SPA rather than the parabolic first solvation shell approximation. Additionally, these deviations are found to be robust to particle size suggesting that a correction for molecular systems is possible. Building on the insights from the Lennard-Jones systems, IS-SPA is tested on the homodimerization of ten alkanes in water. Parabolic parameters for the radial distribution function of water around carbon and hydrogen are found from monomer simulations of methane. IS-SPA simulations using these parameters show excellent agreement with respect to explicit solvent simulations of the homodimerization of these two molecules in water. Quantitative agreement for solvent forces is observed for all ranges of interparticle separations with the most significant deviations seen at the de-solvation barrier in agreement with the trends seen in the Lennard-Jones spheres. Furthermore, IS-SPA performs better in both accuracy and computational efficiency than other typical implicit solvent models such as SASA and RISM. Not only does IS-SPA describe the dimerization of non-polar solutes, it can also be used to study aggregation of a collection of such molecules. We considered an aggregate of cyclohexane phase separated from an aqueous phase and show how ISSPA qualitatively captures the effects of the aqueous phase on the aggregate. This study also shows the importance of accounting for many-body effects that are not correctly captured by the SPA. The accuracy and benefits of IS-SPA are demonstrated for millimolar concentrations of non-polar molecules solvated in water. The magnitude of the atomic charges in these systems is maximal for cyclopropane with absolute value of the carbon charges as 0.26 e, suggesting that somewhat polar molecules can be treated with this approach. A more thorough investigation of the limits of this approach to charged systems is necessary. The computational efficiency of IS-SPA depends only on the number of solute atoms implying that it is ideally suited to simulate the aggregation of solutes with the micromolar concentrations that are found in typical experimental conditions.

36

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Acknowledgement We gratefully acknowledge the Army Research Office for financial support (ARO Fund number ARO 70799LS). M.M. would like to thank Colorado State University for startup funding. Additionally, we gratefully acknowledge the support of NVIDIA Corporation with the donation of one of the Titan X Maxwell GPUs used for this research.

Supporting Information Available Additional information is provided including: values obtained for the IS-SPA fit parameters, the resulting PMFs for all ten alkane systems, tables of values for the measured PMFs, and preliminary studies of parameter transferability and charged systems. All code used to fit the distribution functions and to run an IS-SPA simulation can be found at: http://github.org/mccullaghlab/IS-SPA/.

This material is available

free of charge via the Internet at http://pubs.acs.org/.

References (1) Dill, K. A. Dominant forces in protein folding. Biochemistry 1990, 29, 7133–7155. (2) Berne, B. J.; Weeks, J. D.; Zhou, R. Dewetting and hydrophobic interaction in physical and biological systems. Annu. Rev. Phys. Chem. 2009, 60, 85–103. (3) Ben-Amotz, D. Water-Mediated Hydrophobic Interactions. Annu. Rev. Phys. Chem. 2016, 67, 617–638. (4) Pratt, L. R.; Chaudhari, M. I.; Rempe, S. B. Statistical Analyses of Hydrophobic Interactions : A Mini-Review. J. Phys. Chem. B 2016, 120, 6455–6460. (5) Ben-Naim, A.; Wilf, J. A direct measurement of intramolecular hydrophobic interactions. J. Chem. Phys. 1979, 70, 771–777. (6) Chaudhari, M. I.; Pratt, L. R.; Paulaitis, M. E. Communication : Direct observation of a hydrophobic bond in loop closure of a capped ( OCH CH ) oligomer in

37

ACS Paragon Plus Environment

Page 38 of 45

Page 39 of 45

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

water Communication : Direct observation of a hydrophobic bond in loop closure of a capped ( OCH 2 CH 2 ) n oligomer in water. J. Chem. Phys. 2010, 133, 231102. (7) Zhu, S.; Elcock, A. H. A Complete Thermodynamic Characterization of Electrostatic and Hydrophobic Associations in the Temperature Range 0 to 100 C from Explicit-Solvent Molecular Dynamics Simulations. J. Chem. Theory Comput. 2010, 6, 1293–1306. (8) Bandyopadhyay, D.; Choudhury, N. Characterizing hydrophobicity at the nanoscale : A molecular dynamics simulation study Characterizing hydrophobicity at the nanoscale : A molecular dynamics. J. Chem. Phys. 2012, 136, 224505. (9) Roux, B.; Simonson, T. Implicit solvent models. Biophys. Chem. 1999, 78, 1–20. (10) Gallicchio, E.; Kubo, M. M.; Levy, R. M. Enthalpy - Entropy and Cavity Decomposition of Alkane Hydration Free Energies : Numerical Results and Implications for Theories of Hydrophobic Solvation. J. Phys. Chem. B 2000, 104, 6271–6285. (11) Baker, N. A. Improving implicit solvent simulations: a Poisson-centric view. Curr. Opin. Struc. Bio. 2005, 15, 137–143. (12) Onufriev, A.; Bashford, D.; Case, D. A. Modification of the Generalized Born Model Suitable for Macromolecules. J. Phys. Chem. B 2000, 104, 3712–3720. (13) Gong, H.; Hocky, G.; Freed, K. F. Influence of nonlinear electrostatics on transfer energies between liquid phases: Charge burial is far less expensive than Born model. Proc. Natl. Acad. Sci. U.S.A. 2008, 105, 11146–11151. (14) Gong, H.; Freed, K. F. Electrostatic Solvation Energy for Two Oppositely Charged Ions in a Solvated Protein System: Salt Bridges Can Stabilize Proteins. Biophys. J. 2010, 98, 470–477.

38

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(15) Liu, Y.; Haddadian, E.; Sosnick, T. R.; Freed, K. F.; Gong, H. A Novel Implicit Solvent Model for Simulating the Molecular Dynamics of RNA. Biophys. J. 2013, 105, 1248–1257. (16) Che, J.; Dzubiella, J.; Li, B.; McCammon, J. A. Electrostatic free energy and its variations in implicit solvent models. J. Phys. Chem. B 2008, 112, 3058–3069. (17) Reynolds, J. A.; Gilbert, D. B.; Tanford, C. Empirical Correlation Between Hydrophobic Free Energy and Aqueous Cavity Surface Area. Proc. Natl. Acad. Sci. U.S.A. 1974, 71, 2925–2927. (18) Richmond, T. J. Solvent Accessible Surface Area and Excluded Volume in Proteins. J. Mol. Biol. 1984, 178, 63–89. (19) Weiser, J.; Shenkin, P. S.; Still, W. C. Approximate atomic surfaces from linear combinations of pairwise overlaps (LCPO). J. Comput. Chem. 1999, 20, 217–230. (20) Dzubiella, J.; Swanson, J. M. J.; McCammon, J. A. Coupling Hydrophobicity , Dispersion , and Electrostatics in Continuum Solvent Models. Phys. Rev. Lett. 2006, 96, 087802. (21) Ramirez, R.; Borgis, D. Density Functional Theory of Solvation and Its Relation to Implicit Solvent Models. J. Phys. Chem. B 2005, 109, 6754–6763. (22) Andersen, H. C.; Chandler, D. Optimized Cluster Expansions for Classical Fluids . II . Theory of Molecular Liquids. J. Chem. Phys. 1972, 57, 1930–1937. (23) Beglov, D.; Roux, B. Solvation of complex molecules in a polar liquid : An integral equation theory Solvation of complex molecules in a polar liquid : An integral equation theory. J. Chem. Phys. 1996, 104, 8678–8689. (24) Setny, P. Hydration in Discrete Water ( II ): From Neutral to Charged Solutes. J. Phys. Chem. B 2015, 119, 5970–5978.

39

ACS Paragon Plus Environment

Page 40 of 45

Page 41 of 45

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(25) Chen, Z.; Baker, N. A.; Wei, G. W. Differential geometry based solvation model I : Eulerian formulation. J. Comput. Phys. 2010, 229, 8231–8258. (26) Wang, Z.; Che, J.; Cheng, L.-t.; Dzubiella, J.; Li, B.; McCammon, J. A. Level-Set Variational Implicit-Solvent Modeling of Biomolecules with the Coulomb-Field Approximation. J. Chem. Theory Comput. 2012, 8, 386–397. (27) Weeks, J. D.; Chandler, D.; Andersen, H. C. Role of Repulsive Forces in Determining the Equilibrium Structure of Simple Liquids. J. Chem. Phys. 1971, 54, 5237–5247. (28) Andersen, H. C.; Weeks, J. D.; Chandler, D. Relationship between the HardSphere Fluid and Fluids with Realistic Repulsive Forces. Phys. Rev. A 1971, 4, 1597–1607. (29) Wagner, F.; Simonson, T. Implicit solvent models: combining an analytical formulation of continuum electrostatics with simple models of the hydrophobic effect. J. Comput. Chem. 1999, 20, 322–335. (30) Lum, K.; Chandler, D.; Weeks, D., John Hydrophobicity at Small and Large Length Scales. J. Phys. Chem. B 1999, 103, 4570–4577. (31) Wagoner, J. A.; Baker, N. A. Assessing implicit models for nonpolar mean solvation forces: the importance of dispersion and volume terms. Proc. Natl. Acad. Sci. U.S.A. 2006, 103, 8331–8336. (32) Chandler, D. Interfaces and the driving force of hydrophobic assembly. Nature 2005, 437, 640–647. (33) Huang, D. M.; Chandler, D. The Hydrophobic Effect and the Influence of Solute– Solvent Attractions. J. Phys. Chem. B 2002, 106, 2047–2053. (34) Levy, R. M.; Zhang, L. Y.; Gallicchio, E.; Felts, A. K. On the Nonpolar Hydration Free Energy of Proteins : Surface Area and Continuum Solvent Models for the Solute - Solvent Interaction Energy. J. Am. Chem. Soc. 2003, 125, 9523–9530.

40

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(35) Zacharias, M. Continuum Solvent Modeling of Nonpolar Solvation: Improvement by Separating Surface Area Dependent Cavity and Dispersion Contributions. J. Phys. Chem. A 2003, 107, 3000–3004. (36) Labute, P. The generalized Born/volume integral implicit solvent model: estimation of the free energy of hydration using London dispersion instead of atomic surface area. J. Comput. Chem. 2008, 29, 1693–1698. (37) Harris, R. C.; Pettitt, B. M. Effects of geometry and chemistry on hydrophobic solvation. Proc. Natl. Acad. Sci. U.S.A. 2014, 111, 14681–14686. (38) Harris, R. C.; Pettitt, B. M. Reconciling the understanding of ’hydrophobicity’ with physics-based models of proteins. J. Phys. Condens. Matter 2016, 28, 083003. (39) Chen, J.; Brooks, C. L. Implicit modeling of nonpolar solvation for simulating protein folding and conformational transitions. Phys. Chem. Chem. Phys. 2008, 10, 471–481. (40) Hummer, G.; Soumpasis, D. M. Computation of the water density distribution at the ice-water interface using the potentials-of-mean-force expansion. Phys. Rev. E 1994, 49, 591–596. (41) Kirkwood, J. G. Statistical mechanics of fluid mixtures. J. Chem. Phys. 1935, 3, 300–313. (42) Ashbaugh, H. S.; Garde, S.; Hummer, G.; Kaler, E. W.; Paulaitis, M. E. Conformational Equilibria of Alkanes in Aqueous Solution : Relationship to Water Structure Near Hydrophobic Solutes. Biophys. J. 1999, 77, 645–654. (43) Pellegrini, M.; Doniach, S. Modeling solvation contributions to conformational free energy changes of biomolecules using a potential of mean force expansion. J. Chem. Phys. 1995, 103, 2696–2702.

41

ACS Paragon Plus Environment

Page 42 of 45

Page 43 of 45

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(44) Pellegrini, M.; Gronbech-Jensen, N.; Doniach, S. Potentials of mean force for biomolecular simulations : Theory and test on alanine dipeptide Potentials of mean force for biomolecular simulations : Theory and test on alanine dipeptide. J. Chem. Phys. 1996, 104, 8639–8648. (45) Ben-Amotz, D. Hydrophobic Ambivalence: Teetering on the Edge of Randomness. J. Phys. Chem. Lett. 2015, 6, 1696–1701. (46) Jorgensen, W. L.; Chandrasekhar, J.; Madura, J. D.; Impey, R. W.; Klein, M. L. Comparison of simple potential functions for simulating liquid water. J. Chem. Phys. 1983, 79, 926. (47) Case, D. A.; Babin, V.; Berryman, J.; Betz, R.; Cai, Q.; Cerutti, D.; Cheatham, III, T. E.; Darden, T.; Duke, R. E.; Gohlke, H.; Goetz, A. W.; Gusarov, S.; Homeyer, N.; Janowski, P.; Kaus, J.; Kolossvary, I.; Kovalenko, A.; Lee, T. S.; LeGrand, S.; Luchko, T.; Luo, R.; Madej, K. M.; Merz, K. M.; Paesani, F.; Roe, D. R.; Roitberg, A.; Sagui, C.; Salomon-Ferrer, R.; Seabra, G.; Simmerling, C.; Smith, W.; Swails, J.; Walker, R. C.; Wang, J.; Wolf, R. M.; Wu, X.; Kollman, P. A. AMBER 14 ; University of California, San Fransisco, 2014. (48) Hopkins, C. W.; LeGrand, S.; Walker, R. C.; Roitberg, A. E. Long-time-step molecular dynamics through hydrogen mass repartitioning. J. Chem. Theory Comput. 2015, 11, 1864–1874. (49) Andersen, H. C. Molecular dynamics simulations at constant pressure and/or temperature. J. Chem. Phys. 1980, 72, 2384–2393. (50) Salomon-ferrer, R.; Go, A. W.; Poole, D.; Grand, S. L.; Walker, R. C. Routine Microsecond Molecular Dynamics Simulations with AMBER on GPUs . 2 . Explicit Solvent Particle Mesh Ewald. J. Chem. Theory Comput. 2013, 9, 3878–3888. (51) Kumar, S.; Rosenberg, J. M.; Bouzida, D.; Swendsen, R. H.; Kollman, P. A.

42

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Weighted Histogram Analysis Method for Free-Energy Calculations on Biomolecules. I. The Method. J. Comput. Chem. 1992, 13, 1011–1021. (52) Jorgensen, W. L.; Madura, J. D.; Swenson, C. J. Optimized Intermolecular Potential Functions for Liquid Hydrocarbons. J. Am. Chem. Soc. 1984, 106, 6638–6646. (53) Omelyan, I.; Kovalenko, A. Multiple time step molecular dynamics in the optimized isokinetic ensemble steered with the molecular theory of solvation: Accelerating with advanced extrapolation of effective solvation forces. J. Chem. Phys. 2013, 139, 244106. (54) Johnson, R. W.; Shore, J. E. Axiomatic Derivation of the Principle of Maximum Entropy and the Principle of Minimum Cross-Entropy. IEEE Trans. Inf. Theory 1980, 26, 26–37. (55) Shell, M. S. The relative entropy is fundamental to multiscale and inverse thermodynamic problems. J. Chem. Phys. 2008, 129, 144108. (56) McAuliffe, C. Solubility in Water of Paraffin, Cycloparaffin, Olefin, Acetylene, Cycloolefin, and Aromatic Hydrocarbons. J. Phys. Chem. 1966, 70, 1267–1275. (57) Penoncello, S. G.; Jacobsen, R. T.; Goodwin, A. R. H. A thermodynamic property formulation for cyclohexane. Int. J. Thermophys. 1995, 16, 519–531. (58) Ashbaugh, H. S.; Pratt, L. R.; Paulaitis, M. E.; Clohecy, J.; Beck, T. L. Deblurred Observation of the Molecular Structure of an Oil - Water Interface. J. Am. Chem. Soc. 2005, 127, 2808–2809.

43

ACS Paragon Plus Environment

Page 44 of 45

Page 45 of 45

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment