Efficient Strategy for the Calculation of Solvation ... - ACS Publications

Sep 21, 2017 - ... the Calculation of Solvation Free Energies in. Water and Chloroform at the Quantum Mechanical/Molecular. Mechanical Level. Meiting ...
0 downloads 0 Views 625KB Size
Subscriber access provided by UNIVERSITY OF ADELAIDE LIBRARIES

Article

An Efficient Strategy for the Calculation of Solvation Free Energies in Water and Chloroform at the Quantum Mechanical/Molecular Mechanical Level Meiting Wang, Pengfei Li, xiangyu jia, Wei Liu, Yihan Shao, Wenxin Hu, Jun Zheng, Bernard R. Brooks, and Ye Mei J. Chem. Inf. Model., Just Accepted Manuscript • DOI: 10.1021/acs.jcim.7b00001 • Publication Date (Web): 21 Sep 2017 Downloaded from http://pubs.acs.org on September 24, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Chemical Information and Modeling is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 52

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

An Efficient Strategy for the Calculation of Solvation Free Energies in Water and Chloroform at the Quantum Mechanical/Molecular Mechanical Level Meiting Wang,†,# Pengfei Li,†,# Xiangyu Jia,∗,†,@ Wei Liu,† Yihan Shao,‡,¶ Wenxin Hu,§ Jun Zheng,§ Bernard R. Brooks,k and Ye Mei∗,†,⊥ †State Key Laboratory of Precision Spectroscopy, School of Physics and Materials Science, East China Normal University, Shanghai 200062, China ‡Q-Chem Inc., 6601 Owens Drive, Suite 105, Pleasanton, CA 94588, USA ¶Department of Chemistry and Biochemistry, University of Oklahoma, Norman OK 73019, USA §The Computer Center, School of Computer Science and Software Engineering, East China Normal University, Shanghai 200062, China kLaboratory of Computational Biology, National Institutes of Health, National Heart, Lung and Blood Institute, 5635 Fishers Lane, T-900 Suite, Rockville, MD 20852, USA ⊥NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China #These authors have contributed equally to this work @Current address: NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China E-mail: [email protected]; [email protected]

Abstract

1 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The partitioning of solute molecules between immiscible solvents with significantly different polarities is of great importance. The polarization between the solute and solvent molecules plays an essential role in determining the solubility of the solute, which makes computational studies utilizing molecular mechanics (MM) rather difficult. In contrast, quantum mechanics (QM) can provide more reliable predictions. In this work, the partition coefficients of the side chain analogs of some amino acids between water and chloroform were computed. The QM solvation free energies were calculated indirectly via a series of MM states using Multistate Bennett Acceptance Ratio (MBAR) and the MM-to-QM corrections were applied at the two end points using Thermodynamic Perturbation (TP). Previously, it has been shown (Jia, X.; Wang, M.; Shao, Y.; K¨ onig, G.; Brooks, B. R.; Zhang, J. Z. H.; Mei, Y. Calculations of Solvation Free Energy through Energy Reweighting from Molecular Mechanics to Quantum Mechanics. Journal of Chemical Theory and Computation 2016, 12, 499-511) that this method provides the minimal variance in the results without running QM simulations. However, if there is insufficient overlap in phase space between the MM and QM Hamiltonians, this method fails. In this work, we propose, for the first time, a quantity termed the reweighting entropy that serves as a metric for the reliability of the TP calculations. If the reweighting entropy is below a certain threshold (0.65 for the solvation free energy calculations in this work), this MM-to-QM correction should be avoided and two alternative methods can be employed by either introducing a semiempirical state or conducting nonequilibrium simulations. However, the results show that the QM methods are not guaranteed to yield better results than the MM methods. Further improvement of the QM methods are imperative, especially the treatment of the van der Waals and the electrostatic interactions between the QM region and the MM region in the first shell. We also propose a scheme for the calculation of the van der Waals parameters for the solute molecules in nonaqueous solvent, which improves the quality of the computed thermodynamic properties. Furthermore, the force field parameters for the sulfur-containing molecules are also optimized.

2 ACS Paragon Plus Environment

Page 2 of 52

Page 3 of 52

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Introduction The partition coefficient of a molecule between two solvents with different physicochemical characteristics is a fundamental quantity, and knowledge of the partition coefficient is highly desirable in drug design, separation technology, and environmental science. 1–4 For instance, the partitioning of the amino acid side chains between water and a membrane is of biological significance. 5 However, the simulation of biological membranes is complicated and experimental measurements are also limited. 5–8 Therefore, the partition coefficient between water and a membrane is usually approximated by that between water and an organic solvent, such as 1-octanol, n-hexane, or chloroform. 6,9–13 Wolfenden 10,12 and Wimley 9 have measured the partition coefficients for some amino acids between water and the organic solvents octanol, cyclohexanol, and chloroform at room temperature, which has provided much valuable reference data. From a computational perspective, the partition coefficient can be obtained from the difference in the solvation free energies of a molecule in two solvents. 14–16 While the solvation free energies of side chain analogs (SCA) of some amino acids in water have been extensively studied, 17–21 studies in organic solvents 18,19 are relatively scarce. The sovation free energy of SCAs in chloroform 18 and cyclohexane 18,19 have been calculated. These calculations were based on thermodynamic integration (TI) free energy calculations 22 using molecular dynamics simulations. Both studies yielded some deviations from the experimental values, especially for the polar analogs. The reliability of the results from molecular mechanics (MM) simulations is always limited by the accuracy of the force field employed. Force fields for organic molecules such as the General AMBER Force Field (GAFF) 23 and the CHARMM General Force Field (CGenFF) 24 have been used in many studies. 25–29 Nevertheless, because of the complexity of the chemical space under investigation, reliability is not always guaranteed. In the parameterization of these force fields, more attention has been paid to the thermodynamic properties in water than in other solvents. Thus, the applicability of these parameters to studies in organic solvents and biomembranes requires further validation because of the different dielectric 3 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

constants of these media. However, the optimization of MM parameters is tedious and extremely computationally expensive. In the past few decades, polarizable force fields, such as the Atomic Multipole Optimized Energetics for Biomolecular Application (AMOEBA) 30 force field in AMBER and the Drude 31 polarizable force field in CHARMM, have been developed to improve the accuracy of the interatomic potentials for biomolecular interactions. Polarizable force fields can capture the electronic response of the molecules under external perturbation, thereby yielding accurate predictions of the interaction energetics across a variety of systems. However, in a recent study of solvation free energies calculations in four organic solvents with dielectric constants spanning a relatively wide range, 32 the performance of AMOEBA was slightly worse than that of GAFF, which is an additive force field. This is direct evidence that the use of these polarizable force fields is premature. Quantum mechanical (QM) calculations provide a more parameter independent way to study organic molecules. Quantum mechanical calculations of the solvation free energy in implicit solvent, such as the polarizable continuum model (PCM) 33 and its variants, have been widely used for small organic molecules 34–37 and large biomolecules like proteins. 38,39 However, implicit solvent models lack the molecular details of the solvent and a time-dependent response to perturbations. For the calculation of the solvation free energy in explicit solvent, large scale phase space sampling is necessary, and intermediate states bridging the gas phase and the solution phase are always required. This may significantly increase the computational expense. In the following, we do not strictly distinguish QM and QM/MM, and they will be used interchangeably. Fortunately, the free energy is a state function and the free energy difference between two states is independent of the transformation path. Therefore, the reference potential scheme can be employed, 40–47 in which inexpensive and less accurate reference Hamiltonians can be utilized for sampling and a one-step correction from the reference Hamiltonian to the target Hamiltonian (the QM Hamiltonian in this work) using thermodynamic perturbation (TP) 48 can be applied to obtain the ensemble averages under the target Hamiltonian. Recently, this

4 ACS Paragon Plus Environment

Page 4 of 52

Page 5 of 52

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

scheme has been employed for the calculation of solvation free energies and host–guest binding energies for the molecules in the 4th Statistical Assessment of the Modeling of Protein and Ligands (SAMPL4). 49–53 Dybeck et al. showed that the multistate Bennett acceptance ratio (MBAR) 54 analysis of all states provides the overall free energies with minimal uncertainties. 51 However, this requires QM calculations for the samples harvested in all the states, which is still too expensive for current computational facilities. In a previous study by some of the authors, the free energies of all the MM intermediate states were estimated using the Bennett acceptance ratio (BAR) 55 or MBAR, and the MM-to-QM corrections were carried out only at the two ends of the MM states. 50 Although this method was quite successful in calculating the solvation free energy, it still relies on error cancellation. We noticed that the BLYP functional performed better in that work than some more sophisticated density functionals and second-order Møller–Plesset perturbation theory (MP2). More importantly, we also noticed that the uncertainty mainly originates from the TP 48 step, in which the exponential of the energy difference between the target and reference Hamiltonians shows slow convergence. This TP correction step may perform unsatisfactorily or even fail in some simple systems because of insufficient overlap of the important regions in phase space, such as for Asn and Gln in this work (for convenience, the names of the SCAs are replaced by the three letter abbreviations of the amino acids, as shown in Table 1). Therefore, the assessment of the TP calculations is crucial. When this single-step MM-to-QM/MM TP correction fails to provide a numerically stable result, two alternative schemes can be utilized. In the first scheme, a cheap (yet more reliable than MM) Hamiltonian is introduced to bridge the reference and the target Hamiltonians. In this work, the PM6 semi-empirical method 56 was used. BAR and MBAR can be used again to calculate the free energy difference between the reference and the bridging Hamiltonians, and a single-step TP correction is used to calculate the free energy cost for the transfer from the bridging Hamiltonian to the target Hamiltonian. If the bridging Hamiltonian is carefully selected and has sufficient overlap with the target Hamiltonian in

5 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 52

phase space, this scheme can accelerate the convergence of the computation. In the second scheme, nonequilibrium (NE) 57–62 molecular dynamic simulations are performed to convert the system from the reference Hamiltonian to the target Hamiltonian. Although energy and force evaluations under the target Hamiltonian are required, this scheme is still remarkably cheaper than direct equilibrium simulations under the target Hamiltonian. 63 Many previous studies have been carried out to improve the nonequilibrium calculations. 64–75 Nonequilibrium molecular dynamic simulations can not only be used to calculate the potential of mean force (PMF) along the collective coordinate correlated with λ 76–79 but also be combined with alchemical transformations to calculate the binding free energies in ligand-receptor systems. 80,81 Furthermore, nonequilibrium free energy calculations can be highly parallel and are readily carried out on distributed computing systems. In this work, the partition coefficients of the SCAs of some neutral amino acids in water and chloroform were calculated. A new force field for chloroform was developed based on the general AMBER force field but with optimized van der Waals (vdW) parameters. To optimize the response of the solute molecules to the perturbation from chloroform, the Lennard-Jones -parameters of the solute molecules were scaled according to the effective atomic dipole polarizabilities of the solute molecules in two solvents.

Method Solvation free energy estimators The partition coefficient 14 can be calculated as

log P =

∆A(c) − ∆A(w) , ln(10)RT

(1)

where ∆A(w) and ∆A(c) are the solvation free energies in water and chloroform, respectively, R is the gas-constant, and T is the absolute temperature.

6 ACS Paragon Plus Environment

Page 7 of 52

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Free energy is a state function, and the free energy difference between two states is independent of the transformation path. The thermodynamic cycle utilized in the present work is depicted in Fig. 1. Within these cycles, a series of MM intermediate states defined by H(λ), where H(λ = 0) = H0 and H(λ = 1) = H1 , were introduced and the solute molecule is gradually decoupled from the solvent with λ growing from 0 to 1. The free energy changes associated with this process was estimated by the MBAR method. In MBAR, with Ni uncorrelated equilibrium samples from each of the K thermodynamic states, the free energy of the ith state can be written as a function of the free energies of all the states as Nj K exp [−βUi (xjn )] 1 X X , Ai = − ln PK β j=1 n=1 k=1 Nk exp [βAk − βUk (xjn )]

(2)

where β is the reciprocal of the thermodynamic temperature. Therefore, these equations must be solved self-consistently until convergence is reached. The corresponding statistical variance of ∆Aij , σij2 is calculated using Eq. 8 and 12 in Ref. 54. The QM corrections are applied only at the two MM end states using the reference potential scheme. Only the solute molecules were described by the QM Hamiltonian, while the solvent molecules were treated as MM atoms. With this scheme, the sampling is only performed on the inexpensive low-level potential energy surface and the accurate QM energies can be obtained via single-point energy calculations of the harvested configurations. The free energy difference between the MM state and the QM state can be obtained in several ways. In this work, three different schemes were utilized (shown in Fig. 1). Using TP, the free energy difference can be formulated as

∆A =AQM,0.0 − AM M,0.0 1 ln hexp (−β (UQM,0.0 − UM M,0.0 ))iM M,0.0 β 1 = − ln hexp (β∆U )iM M,0.0 , β =−

7 ACS Paragon Plus Environment

(3)

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 52

where UM M,0.0 and UQM,0.0 are the potential energies of the system at the MM and QM/MM levels, respectively, with λ = 0.0, and ∆U is their difference. The corresponding variance of ∆A, σ 2 , between two states can be estimated using the standard point estimation theory as 82 1 σ = 2 β N 2



σx hxi

2 , x ≡ exp [β∆U ] .

(4)

This scheme is denoted as QM/MM-D hereafter, where D means direct. A precondition of using this QM/MM-D scheme is that the important region in phase space under the MM Hamiltonian has significant overlap with that of the QM/MM Hamiltonian. However, this condition is not always satisfied. To characterize the reliability of the TP calculation, the “reweighting entropy” is introduced here, which is defined as N 1 X S=− Wj ln Wj , ln N j=1

(5)

exp(−β∆Uj ) . Wj = PN exp(−β∆U ) i i=1

(6)

where

Here, N is the number of frames that are harvested under the MM Hamiltonian. As the reweighting entropy increases, the TP calculation becomes more reliable. S has a maximum value of 1.0, where all the samples have equal weights. The maximum of Wj of N frames is called the “maximal weight,” Wmax . Generally, as the maximal weight becomes smaller, the TP calculation becomes more reliable. TP requires significant overlap in the phase space of the sampled (MM) Hamiltonian and the target (QM/MM) Hamiltonian. 83 Otherwise, the TP calculation is unreliable. Although using the total energy difference as ∆U in Eq. 3 and Eq. 4 is rigorous, the convergence can sometimes be rather slow. As has been shown in some previous studies by us and some other groups, 44,50,84,85 the use of the interaction energy difference yields well-

8 ACS Paragon Plus Environment

Page 9 of 52

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

converged results easily. Therefore, the interaction energy difference was used in this work, which is calculated as   QM MM WFD ∆U = ∆Uele − ∆Uele + ∆UM .

(7)

QM MM and ∆Uele are the electrostatic interaction energies between the solute molecule ∆Uele

and the solvent at the MM and QM/MM levels, respectively. The electrostatic interaction energies were calculated by summing up the products of the MM atomic charges and the QM , the electrostatic potentials on the MM atoms generated by the solute molecules. For ∆Uele

electrostatic potentials were calculated using the electronic structures of the solute molecules WFD in the condensed phase. ∆UM is the wave function distortion energy of the solute molecule

at QM/MM level. It is defined as hΨ(1)|H0 |Ψ(1)i − hΨ(0)|H0 |Ψ(0)i, where Ψ(1) and Ψ(0) are the wave functions of the solution molecules in the condensed phase and in the gas phase respectively, and H0 is the Hamiltonian of the solute molecules. More details can be found in Ref. 50. The distributions of ∆U are shown in Fig. S2. The QM calculations were carried out at the BLYP/6-31G* level, because this functional has performed better than other functionals and MP2 in previous studies. 50,86 In the second scheme, which is termed QM/MM-B (where B here means bridged), a semiempirical (SQM) method, which was PM6 in this study, was used to connect the MM and QM/MM Hamiltonians. Similarly to the QM/MM calculations, the SQM Hamiltonian only applied to the solute molecules, and the solvent molecules were described by the MM force field. As shown in Fig. 1b, the green circles represent the SQM states. Because sampling at a semiempirical level costs far less than that at the QM level, the free energy difference between the MM and PM6 Hamiltonian can be calculated with BAR using trajectories from both Hamiltonians and that between PM6 and QM Hamiltonian can be calculated with TP. In the third scheme, the NE simulation was applied to calculate the free energy difference between the MM and QM Hamiltonians, as shown in Fig. 1c. This scheme is denoted as QM/MM-NE in the current work, where NE means nonequilibrium. In 1997, Jarzynski proved for the first time that the equilibrium free energy difference can be expressed as the 9 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 52

exponential average of the work performed on the system during irreversible transformations between two states, 57,58 viz.

exp(−β∆A) = hexp(−βW )i .

(8)

Here, the angled bracket h· · · i denotes the average over an ensemble of non-equilibrium transformation processes initiated from equilibrium states. At each jump of the Hamiltonian from H(λ(t)) to H(λ(t + δt)), the instantaneous work done to the system is

δW (t → t + δt) = U (r(t), λ(t + δt)) − U (r(t), λ(t)),

(9)

where U is the potential energy of the system, which is a function of both the coordinates and λ. The non-equilibrium work W is thus accumulated as 59

W =

τX −δt

U (r(t), λ(t + δt)) − U (r(t), λ(t)).

(10)

t=0

The variance of ∆A can be calculated in a similar way as Eq. 4.

Optimization of the chloroform solvent model The initial structure of the chloroform solvent molecule (CHCl3 ) was optimized at the B3LYP/6-311++G** level of theory using Gaussian 09. 87 The atomic charges of CHCl3 were obtained by a dual-step restrained electrostatic potential (RESP) 88,89 charge fitting method, using the electrostatic potential computed at the B3LYP/6-311G(2df, p) level. This basis set is larger than the recommended one for the parameterization of GAFF, but it allows sufficient flexibility for Cl atoms. 90 Other parameters were taken from GAFF, 23 except for the vdW parameters for the chlorine atom, which were further optimized to reproduce some properties of liquid chloroform, such as the density, enthalpy of vaporization, and permittivity. The force field parameters are listed in Table 2. With the updated parameters, a 10 ACS Paragon Plus Environment

Page 11 of 52

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

3 box of CHCl3 with a dimension of 55.5 × 58.4 × 57.0 ˚ A was generated using the LEaP

program in AmberTools 1.5. The box contained 6,985 CHCl3 molecules. After initial energy relaxation, the system was heated to 298 K and was equilibrated for 1 ns, followed by a 10-ns production simulation in the NPT ensemble for the calculation of the physical properties of liquid chloroform. All simulations were performed with AMBER14 91 with a 2-fs time step. The temperature was coupled to a heat bath of 298 K using Langevin 92 dynamics and the pressure was set to 1 atm using Berendsen’s barostat. 93 The vdW interaction and the electrostatic interaction were truncated at 12 ˚ A in real space and the long-range electrostatic interactions were calculated with the particle mesh Ewald (PME) 94 method. Snapshots were saved every 2 ps. The density, enthalpy of vaporization, and permittivity were calculated using established protocols. 95–97 The density (ρ) in a constant pressure simulation can be computed trivially from the total mass (M ) of the system divided by its volume (V )

ρ=

M . V

(11)

The enthalpy of vaporization can be computed from

∆Hvap = H (g) − H (l) = E (g) − E (l) + P (V (g) − V (l)) ,

(12)

where g and l refer to gas and liquid phases, respectively, and P is the pressure. For an ideal gas, P V (g) equals the product of the gas-constant (R) and the temperature (T ). The molar volume in the liquid phase is negligible compared to that in gas phase. Therefore,

∆Hvap = (Epot (g) + RT ) − Epot (l).

(13)

Finally, the static dielectric constants (ε) were computed from the fluctuation in the total

11 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 52

dipole moment (M) of the simulation box,

2 4πβ M2 − hMi ε=1+ . 3 V

(14)

The vdW parameters for solute molecules in chloroform The vdW parameters in GAFF are typically taken from the Optimized Potentials for Liquid Simulation (OPLS) model, 98 which was parameterized mainly for polar environments. GAFF has been quite successfully employed in many studies for the description of organic molecules solvated in aqueous solution or complexed with proteins. 29,99–101 Water is a representative polar medium, and the mutual polarization between water molecules and the solute molecule distorts the electronic structure of the solute molecule. However, if the same molecule is placed in an apolar environment, such as chloroform used in this work, the electronic response might be significantly different from that in water. 102,103 Therefore, using standard GAFF parameters for the solute molecule in chloroform is not optimal. The determination of the long-range correlation effects, such as dispersion interactions, depends on the electronic structure of the system, and significant efforts 104–108 have been devoted to the refinement of the vdW parameters that depend explicitly on the electronic structure. 106,109–112 For example, Tkatchenko et al. proposed a method to determine the C6 coefficient from the ground-state electronic density of the molecule. 106 In their scheme, the C6 term between two atoms (A and B) is determined using the static polarizabilities of the two atoms:   ηA ηB 3 α0 α0 . C6 = 2 (ηA + ηB ) A B

(15)

Here, η and α0 are the effective frequency and the static polarizability, respectively, of atoms A and B. This idea has been adopted in the present work. The Lennard-Jones -parameters of the solute molecules in chloroform can be obtained by scaling those in water by the square

12 ACS Paragon Plus Environment

Page 13 of 52

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

of the ratio of the atomic polarizabilities in chloroform and in water as α0 (c)  (c) =  (w) 0 α (w) 

2 ,

(16)

where α0 (c) and α0 (w) are the atomic polarizabilities in chloroform and water, respectively. The atomic polarizabilities are calculated from the response of atomic dipole moments in an external electric field in a four-step procedure. First, the electrostatic potential (ESP) around the solute molecule is calculated, where the electronic structure calculated in each continuous medium is characterized by its dielectric constant without external perturbation. Second, external electric fields with a strength of 0.001 a.u. are applied to the solute molecule from six directions (±x, ±y, ±z), which perturbs the electronic structure of the solute molecule and changes the ESP. Third, the induced atomic dipole moments (µ~i ) are calculated by fitting to the ESP change (∆V ) by least-squares minimization: Ngrid

χ2 =

X

∆Vj −

MX atom i=1

j=1

~ ij µ ~i · R (Rij )3

!2 .

(17)

Finally, the effective atomic dipole polarizability tensor can be calculated from the induced atomic dipole moment and the external electric field using the finite difference method: 113 

 ∂µi αij = ∂Ej E=0 µi (Ej ) − µi (−Ej ) = lim . Ej →0 2Ej

(18)

The isotropic atomic polarizability is, thus, calculated using α = (αxx + αyy + αzz ) /3. The electrostatic potentials around the solute molecules in the external electric fields were carried out at B3LYP/aug-cc-pVTZ level using Q-Chem, 114 and the solvent effect was described utilizing the PCM with corresponding dielectric constant. A similar scheme based on the atoms-in-molecules electron density partitioning has been utilized by Cole et al. in their recent work. 115 13 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

It is worth noting that this rescaling scheme for the vdW parameters only works for molecules in a homogeneous environment. For inhomogeneous environments, which are equally important for biological systems such as protein-ligand complexes and membranewater interfaces, more rigorous schemes with explicit consideration of the electronic response are required. For large solute molecules for which the internal vdW interactions play an essential role in determining the conformation of the solute molecules, this scaling method may also fail. Efforts to address this failure are in progress.

Molecular dynamics simulation details The molecules studied in this work are shown in Fig. 2. These are the neutral SCAs of the naturally occurring amino acids, formed by truncating the side chain at Cβ and replacing the Cα atom with a hydrogen atom. For all the analogs, the structures were optimized at the B3LYP/6-311++G** level and were then solvated in periodic boxes of solvent molecules (either TIP3P water molecules or chloroform) with the distance between the solute molecule and boundary of the unit cell no less than 15 ˚ A. The atomic charges were fitted using the restrained ESP charges calculated at the HF/6-31G* level with the ESP grids defined using the Merz-Kollman scheme. 116 Other MM parameters were taken from GAFF, except for those of Cys and Met. Although GAFF is frequently used for small molecules, the vdW parameters of all the atom types for sulfur atom are the same, ignoring the differences in its bonding conditions, and it has been shown the default parameters for sulfur atoms in the AMBER force field are not satisfactory. 117–119 The force field parameters of sulfur atoms used in this work were calibrated according to the interaction energy between water and Cys and Met (see Table S1 and Fig. S1 in the Supporting Information for details). To ensure the phase space overlap, eleven alchemical windows were used to decouple the SCA from the solvent environment gradually, and the soft-core potentials 120 were used to turn off the non-bonded interactions smoothly. The van der Waals interaction with the soft-core

14 ACS Paragon Plus Environment

Page 14 of 52

Page 15 of 52

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

potential became 

 VSC−vdW

   = 4ij (1 − λ)  "  

1  αλ +

rij σij

 6 #2 −

1  αλ +

   6  , rij   σij

(19)

where ij and σij are the Lennard-Jones parameters in the force field, rij is the distance between atom i and j, and α is the parameter determining the softness of van der Waals potential. In this work, α was set to 0.5. The electrostatic interaction with the soft-core potential became VSC−EEL = (1 − λ)

qq qi j , 2 4π0 λβ + rij

(20)

where qi and qj are the atomic charges and β is the parameter that adjust the softness of the 2

potential. In this work, β was set to 12 ˚ A . In each window, the system was heated to 298 K and was then equilibrated for 300 ps before a 10-ns production NPT simulation was carried out. The temperature was regulated using Langevin 92 dynamics and the pressure was set to 1 atm using Berendsen’s barostat. 93 The non-bonded interactions were truncated in real space at 12.0 ˚ A, but the long-range Coulomb interactions were calculated using the particle mesh Ewald (PME) 94 method. SHAKE was used and the time step was set to 2 fs. Altogether, 1000 snapshots were harvested for QM/MM single point energy calculations, which were taken into the TP calculations in order to obtain the free energy difference between the MM and the QM/MM Hamiltonians using the QM/MM-D scheme. All the MD simulations were carried out with AMBER14 91 and the QM single point energy calculations were carried out with Gaussian 09. 87 For the QM/MM-B scheme, a water sphere with a radius of 15 ˚ A was added to the system, centered on the heavy atom closest to the center-of-mass of the solute molecule, and was ˚2 . 121 In restrained by a soft half-harmonic potential with a force constant of 10 kcal/mol/A the SQM(PM6)/MM calculations, the system was optimized for 5000 steps using the steepest 15 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

descent algorithm and 5000 steps using the conjugate gradient minimization method. The relaxed structure was heated to 300 K in 100 ps and was further relaxed in 150 ps, followed by a production run in 10 ns. The temperature was regulated using Langevin dynamics. The vdW interaction and the electrostatic interaction were fully counted without a truncation. SHAKE was used and the time step was set to 2 fs. The free energy difference between the MM and SQM(PM6)/MM potential was calculated with BAR using the trajectories from both Hamiltonians. Periodicity was disabled. The samples from the non-periodic simulations under the SQM(PM6)/MM Hamiltonian were used to calculate the free energy between the PM6 and QM potentials using TP. For the QM/MM-NE scheme, one hundred transformation measurements were used together to calculate the free energy difference between the MM and QM Hamiltonians using NE. In each non-equilibrium trajectory, λ was changed uniformly from 0 to 1 in 100 stages with an interval of 0.2 ps between two adjacent jumps of λ. Therefore, the total transformation time for each trajectory was 20 ps. These non-equilibrium simulations were performed in a water sphere without periodicity. The systems were minimized using the steepest decent and conjugate gradient methods. Then, the systems were heated to 298 K and equilibrated in a 1-ns simulation followed by a 2-ns production simulation. In all these steps, the systems were solvated in a water droplet without any periodic conditions. The time step was set to 2 fs, and the snapshots (both the coordinates and the velocities) were saved every 20 ps. The temperature was regulated by Langevin dynamics with a collision frequency of 2.0 ps. The non-equilibrium simulations were started from the coordinates and velocities saved during the equilibrium simulation in a canonical ensemble at the initial state. In AMBER, there is a keyword “dynlmb”, which can be used to perform simulations with dynamically changing the λ values from 0 to 1. This can be combined with “icfe” keyword, which controls the free energy calculations and “ifmbar” keyword, which controls whether to print the potential energy of different Hamiltonians.

16 ACS Paragon Plus Environment

Page 16 of 52

Page 17 of 52

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Results and Discussion Properties of the chloroform model The simulated density (ρ), permittivity (ε), molecular dipole moment (µ), and the enthalpy of vaporization (∆Hvap ) for liquid chloroform are listed in Table 3. As shown in this table, the properties are nearly invariant with respect to the cutoff radius for the nonbonded interactions, indicating that the calculated properties have converged with a cutoff radius of 10 ˚ A. The liquid properties of chloroform are also well reproduced. The density is very close to the experimental value, 122,123 and the relative error is less than 1.1%. The dipole moment of the solvent molecule also agrees with the calculated result at the QM level. The permittivity of the low-dielectric solvent is critical to the calculation of the solvation free energy. Using the final parameter sets, the simulated permittivity is well reproduced, and the deviation from the experimental value is only 0.3. The quality of computed dipole moment and permittivity also serves as a validation of the atomic charges. The calculated enthalpy of vaporization for liquid chloroform is 7.31 kcal/mol, which is only 0.17 kcal/mol smaller than the experimental value (7.48 kcal/mol).

Solvation free energies Solvation in water The computed hydration free energies of the thirteen SCAs are listed in Table 4, as are the experimental values. The polarizable continuous model (PCM) is nowadays widely used in the calculation of solvation free energies for small molecules because of its simplicity in computation and high accuracy, which arises from the excellent calibration of the parameters. In this work, we only used one conformation for each solute molecule, which was obtained by energy-minimization in the gas phase. The solvation free energies were calculated at the M06-2X/cc-pVTZ level using the integral equation formalism variant of PCM with radii and non-electrostatic terms from Truhlar and coworkers (SMD). 35 These adiabatic solvation free 17 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

energy calculations yielded quite satisfactory results with a root mean squared deviation (RMSD) of 0.76 kcal/mol and a mean signed error (MSE) of 0.49 kcal/mol, which is slightly worse than the results of the MM and the QM/MM methods in which structural ensembles were used. The largest deviations from the experimental measurement were seen for Trp (1.28 kcal/mol) and Tyr (1.33 kcal/mol), which contain bulky aromatic groups. The results computed using the MM potential, as listed in the second column (denoted as MM), agree well with the experimental value with an RMSD of 0.54 kcal/mol and a systematic shift of 0.38 kcal/mol. For Asn, Cys, Ile, Leu, Phe, and Trp, the differences between the MM results and the experimental values are all less than 0.30 kcal/mol. However, for Gln, Thr, and Tyr, the results show relatively larger deviations from the experimental ones. A study by Shirts et al. using TI with a long range vdW correction observed a similar trend. 17 Notably, the results of Cys and Met were obtained using the optimized parameters, whereas the use of the original GAFF parameters led to even worse results, as listed in Table S2 in the Supporting Information. The hydration free energies at the QM/MM level using the first scheme (QM/MM-D) are listed in Table 4. The average QM correction energy is -0.72 kcal/mol. The correction energies of the polar molecules, Asn, Cys, Gln, Met, Ser, Thr, and Tyr, are over -0.6 kcal/mol. For Ala, Thr, Tyr, and Val, the QM correction improved the calculated results significantly. Especially for Tyr and Val, the QM corrections reduced the deviation from the experimental values to less than 0.15 kcal/mol. However, for Asn and Gln, this direct QM correction caused the hydration free energies of Asn and Gln to be overestimated, and the deviations from the experimental hydration free energies became larger. As shown in Table 4, the S values of Asn and Gln are lower than the others, and the corresponding maximum weights (Table S3 in the Supporting Information) are larger. This indicates an insufficient overlap in the phase space under the MM and QM potentials. Therefore, the direct TP calculations for these two molecules were unreliable, and the second and third QM correction schemes are required. For the apolar SCAs, all the correction energies (in absolute values) are no

18 ACS Paragon Plus Environment

Page 18 of 52

Page 19 of 52

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

larger than 0.5 kcal/mol, which are less significant than those for most of the polar SCAs. Because of the polarization from the water molecules, the electronic distribution of the solute molecule may respond to the thermal motion of the water molecules surrounding it. QM corrections inherently account for this polarization effect, although the mutual polarization effect between the solute and the solvent has not been fully accounted for in the QM/MM framework. For polar solute molecules, the polarization effect is more remarkable, and the QM correction is larger than that of the apolar solute molecules. Therefore, the QM corrections are necessary for the calculation of solvation free energies for molecules in water. For Phe, the MM-to-QM/MM correction reduces (less negative) the calculated hydration free energy. In contrast, for the other apolar SCAs (Ala, Ile, Leu, and Val), this correction energy increases (less positive) the calculated results. As shown in Table 4, this MM-to-QM/MM correction leads to much worse results for Ile, Leu, and Phe, which can be explained by the incompatibility between the solute-solvent vdW interaction and the electrostatic embedding potential in the QM/MM Hamiltonian. A more rigorous vdW potential that depends on the electronic structure of the solute molecule is necessary to alleviate this incompatibility. The RMSD for the QM/MM-D scheme (0.55 kcal/mol) is almost the same as that of MM method (0.54 kcal/mol). However, as mentioned before, the largest deviation in the QM/MM-D method comes from the unreliable results of Asn and Gln. If these two molecules are excluded, the RMSD reduces to 0.42 kcal/mol with an MSE of -0.23 kcal/mol, which are better than those of the MM method (0.53 and 0.36 kcal/mol for RMSD and MSE, respectively, after excluding Asn and Gln). As suggested by one of the referees, we compared the reweighting entropy S with the Kish’s effective sample size K. The numbers are listed in Table S3 and are plotted in Fig. S3. The correlation between the nature logarithm of the Kish’s effective sample size 124 (ln K) and the reweighting entropy (S) is 0.99, indicating that the reweighting entropy can also effectively measure the reliability of the ensemble average. To deal with the insufficient overlap for Asn and Gln, the other two schemes were used.

19 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Using the QM/MM-B scheme, the results for Asn and Gln improved noticeably. Using this correction method for these two molecules and QM/MM-D for others, the RMSD is reduced by 0.15 kcal/mol from 0.55 kcal/mol to 0.40 kcal/mol and the MSE changes from -0.35 kcal/mol to -0.22 kcal/mol. The increased values of S for Asn and Gln indicate that the TP calculations from PM6 to BLYP are more reliable than the direct correction from the MM state to BLYP. The phase space overlap between the PM6 Hamiltonian and BLYP Hamiltonian is more significant compared to that between the MM Hamiltonian and BLYP Hamiltonian, which is supported by the overlap matrices 125 in Fig. S4. However, improvement is not guaranteed by adding a SQM state. For some of the molecules, QM/MM-B performs worse than QM/MM-D, for instance Tyr (see Table S3). The reweighting entropy of QM/MM-D for Tyr is 0.75, while that of QM/MM-B is only 0.53. This is not surprising, because the SQM method does not always stand between the DFT Hamiltonian and the MM Hamiltonian. Sometimes, SQM is farther away from the DFT Hamiltonian than the MM Hamiltonian. The QM/MM-NE method also improved the results. For Asn and Gln, the deviations from the experimental data were reduced by 0.64 kcal/mol and 0.26 kcal/mol, respectively. The S values are very close to 1.0, indicating a narrow distribution of the work (Wi ). With the NE correction applied only to Asn and Gln, the RMSD and MSE of the QM/MM solvation free energy for all molecules reduced to 0.44 kcal/mol and -0.28 kcal/mol, respectively. By comparing the results of QM/MM-D, QM/MM-B and QM/MM-NE, it clearly shows that for the calculations with a large S value, these three schemes yielded, statistically, the same results. The optimal wall times for the simulations are also compared. Taking Asn as an example, for which the wall times are listed in Table 5. The wall time for QM/MM-D is about 0.13 hours using five nodes. Because of the additional cost in the simulations at the PM6 level, the wall time increased by about 6 hours for QM/MM-B. The energy and gradient calculations at the QM/MM level in the nonequilibrium simulations were much more expensive than

20 ACS Paragon Plus Environment

Page 20 of 52

Page 21 of 52

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

those at the MM or SQM levels. The total wall time for QM/MM-NE is about 4.7 days. However, this is still much cheaper than the equilibrium QM/MM/MD simulations, which takes over 90 days. Besides, the QM/MM-NE calculations decreases almost linearly with the number of computer nodes. The pulling trajectories can be independently performed on multiple computers, and no communication between the computers is required.

Solvation in chloroform Although thirteen SCAs were studied in water, only five of them (Asn, Gln, Ser, Phe, and Trp) have experimental solvation free energies measured in chloroform. The permittivity of liquid chloroform is 4.81, which is much smaller than that of water (80.2). 122 Therefore, the electronic response of the solute molecule in chloroform would be different from that in water. To improve the calculation of the solvation free energies in chloroform, the LennardJones -parameters of each atom of the solute molecules were scaled with a factor computed according to Eq. 16, and the scale factors are listed in Table 6. The coordinates and the molecular polarizability scalars of all the molecules are listed in Table S5. The polarizability of each atom type is very transferable, except for atom type hn (hydrogen atoms bonded to a nitrogen atom). However, the transferability of each element is relatively low. Notably, only the Lennard-Jones -parameters of the solute molecules were scaled and the other MM parameters for solute were taken directly from GAFF. The calculated solvation free energies in chloroform with the original GAFF parameters are listed in the second column of Table 7. With the original force field parameters, most of the calculated solvation free energies are overestimated and the RMSD is 1.04 kcal/mol with a systematic shift of -0.85 kcal/mol, which are not within chemical accuracy. The deviations from the experimental values are larger than 0.50 kcal/mol for all the SCAs, except for Ser. Especially for Phe and Trp, which possess bulky aromatic groups, the deviations are more than 1.0 kcal/mol. QM corrections were carried out with the first scheme. Unlike the results of the calculations in water, the correction energies are much smaller. Even for the

21 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

polar molecule Ser, the correction value, which is the largest, is only 0.35 kcal/mol. The overall deviations between the calculated results and the experimental values do not reduce remarkably. The RMSD and the MSE of the QM/MM-D method are 0.89 kcal/mol and -0.63 kcal/mol, respectively, which are still unsatisfactory. Using the scaled vdW parameters, the calculated solvation free energies are lower. For the molecules containing an aromatic group (Phe and Trp), the calculated solvation free energies are larger (less negative) by more than 1.0 kcal/mol. Even for the small molecules (Gln and Asn), the solvation free energies in chloroform are also improved by 0.83 kcal/mol and 0.39 kcal/mol. On the other hand, the calculated result for Ser is slightly worse (only 0.15 kcal/mol). The predicted results are significantly improved on scaling the vdW parameters of the solute molecules: the RMSD decreases to 0.37 kcal/mol and the MSE is only -0.01 kcal/mol. Therefore, optimizing the Lennard-Jones -parameters of the solute molecules can improve the calculated free energies in an apolar environment, especially for large solute molecules that have strong vdW interactions with the solvent. As mentioned before, the QM corrections from the MM Hamiltonian to the QM Hamiltonian are very weak in chloroform. For all the SCAs, the correction energies are no larger than 0.40 kcal/mol, which is smaller than that in water (-0.71 kcal/mol) for the five molecules. The QM correction improves the calculated results of the SCAs other than Ser. For Ser, the deviation is slightly increased. In general, the QM correction for the scaled MM calculations shifts the solvation free energy up by 0.21 kcal/mol on average (from -0.01 kcal/mol to 0.20 kcal/mol for the MSE). However, the RMSD increases slightly (from 0.37 kcal/mol to 0.40 kcal/mol). The solvation free energies in chloroform were only calculated with the first QM correction scheme. As is shown in the Table 7, the S values for all the five molecules are over 0.93 and the corresponding maximum weights are less than 0.02, indicating the overlaps in the phase space under MM and QM potentials are sufficient and the TP calculations are reliable. Therefore, the calculation of the second and third QM/MM correction schemes were not necessary.

22 ACS Paragon Plus Environment

Page 22 of 52

Page 23 of 52

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Partition coefficient between water and chloroform The calculated partition coefficients with different methods are presented in Table 8, as are the experimental measurements. From a computational perspective, error cancellation plays an important role in the calculations. As shown in Table 8, the worst predicted results come from the MM calculations using the original GAFF parameters. For these values, the RMSD is 0.90 and the MSE is -0.78. The unreliable results of the original MM potential are mainly due to the inaccurate solvation free energies in chloroform (MSE = -0.85 kcal/mol). For instance, the hydration free energies of Asn, Phe, and Trp are well reproduced. However, their solvation free energies in chloroform deviate substantially. For Ser, the solvation free energies in both media are well reproduced, and the partition coefficient is, thus, accurate. When the scaled vdW parameters for the solute molecules in chloroform are employed, the partition coefficients are well reproduced, and the RMSD decreases to 0.27 and the MSE changes from -0.78 to -0.16. Employing the MM-to-QM/MM correction does not improve the calculated partition coefficients significantly, the results even becoming worse when using the scaled vdW parameters. The failure of QM/MM methods might arise from the vdW parameters being incompatible with the electrostatic embedding potential.

Conclusion In this paper, we have reported the use of a reference potential scheme named QM/MMD, an SQM-bridged scheme named QM/MM-B, and an NE scheme named QM/MM-NE to estimate solvation free energies in water and chloroform and the partition coefficients between these two solvents for some amino acid side chain analogs. Due to the path-independent nature of the free energy, the QM/MM-D scheme avoids expensive sampling on the QM/MM potential energy surface and yields results of high accuracy at much lower cost. However, the QM/MM-D method may fail for some systems, where the phase space overlap between MM and QM/MM Hamiltonian potential is not sufficiently large. In this situation, the

23 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

QM/MM-B and QM/MM-NE schemes can be employed instead. In the case of the solvation free energy in chloroform, a more efficient solvent mode was applied, and the LennardJones -parameters of the solute molecules were scaled using the atomic polarizabilities. The observations in this work show that • The QM/MM methods is capable of providing hydration free energies of comparable accuracy as the MM method, and the reference-potential scheme is an efficient scheme for performing QM corrections. However, the MM method yielded better results in chloroform and better partition coefficients than the QM/MM method. Further improvement of the QM/MM method is imperative now, especially the treatment of the vdW interaction and the electrostatic interaction between the QM region and the MM region in the first shell. • The reweighting entropy S defined in this work can be used to evaluate the reliability of TP calculations. If the reweighting entropy is below a certain threshold (0.65 for the solvation free energy calculations in this work), this MM-to-QM correction is not reliable. • When the direct TP calculation scheme fails because of the weak phase space overlap between the two states, the QM/MM-B and QM/MM-NE schemes can be used to improve the results. • Scaling the vdW parameters for the solute molecules in chloroform improves the quality of the computed thermodynamic properties.

Acknowledgement This work was supported by the National Natural Science Foundation of China (Grant No. 21173082) and the Large Instruments Open Foundation of East China Normal University.

24 ACS Paragon Plus Environment

Page 24 of 52

Page 25 of 52

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

We thank Dr. Gerhard K¨onig for helpful discussion. CPU time was provided by the Supercomputer Center of East China Normal University and the LoBoS cluster of the National Institutes of Health.

Supporting Information Available The optimization of the sulfur parameters, optimized parameters for sulfur-containing systems, comparison of the calculated solvation free energies of Cys and Met using the optimized and the original GAFF parameters, assessment of the reliability of the TP calculations, and the coordinates of the molecules studied.

25 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

References (1) Leo, A.; Hansch, C.; Elkins, D. Partition Coefficients and Their Uses. Chem. Rev. 1971, 71, 525–616. (2) Levy, Y.; Onuchic, J. N. Water Mediation in Protein Folding and Molecular Recognition. Annu. Rev. Biophys. Biomol. Struct. 2006, 35, 389–415. (3) Essex, J. W.; Reynolds, C. A.; Richards, W. G. Theoretical Determination of Partition Coefficients. J. Am. Chem. Soc. 1992, 114, 3634–3639. (4) Bannan, C. C.; Calabr´o, G.; Kyu, D. Y.; Mobley, D. L. Calculating Partition Coefficients of Small Molecules in Octanol/Water and Cyclohexane/Water. J. Chem. Theory Comput. 2016, 12, 4015. (5) Klamt, A.; Huniar, U.; Spycher, S.; Keldenich, J. COSMOmic: A Mechanistic Approach to the Calculation of Membrane-Water Partition Coefficients and Internal Distributions within Membranes and Micelles. J. Phys. Chem. B 2008, 112, 12148–12157. (6) Gobas, F. A. P. C.; Mackay, D.; Shiu, W. Y.; Lahittete, J. M.; Garofalo, G. A Novel Method for Measuring Membrane-Water Partition Coefficients of Hydrophobic Organic Chemicals: Comparison with 1-Octanol-Water Partitioning. J. Pharm. Sci. 1988, 77, 265–272. (7) Neale, C.; Bennett, W. F. D.; Tieleman, D. P.; Pom`es, R. Statistical Convergence of Equilibrium Properties in Simulations of Molecular Solutes Embedded in Lipid Bilayers. J. Chem. Theory Comput. 2011, 7, 4175–4188. (8) Bemporad, D.; Luttmann, C.; Essex, J. W. Computer Simulation of Small Molecule Permeation across a Lipid Bilayer: Dependence on Bilayer Properties and Solute Volume, Size, and Cross-Sectional Area. Biophys. J. 2004, 87, 1–13.

26 ACS Paragon Plus Environment

Page 26 of 52

Page 27 of 52

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(9) Wimley, W. C.; Creamer, T. P.; White, S. H. Solvation Energies of Amino Acid Side Chains and Backbone in a Family of Host-Guest Pentapeptides. Biochemistry 1996, 35, 5109–5124. (10) Wolfenden, R.; Andersson, L.; Cullis, P. M.; Southgate, C. C. B. Affinities of Amino Acid Side Chains for Solvent Water. Biochemistry 1981, 20, 849–855. (11) Radzicka, A.; Wolfenden, R. Comparing the Polarities of the Amino Acids: SideChain Distribution Coefficients Between the Vapor Phase, Cyclohexane, 1-Octanol and Neutral Aqueous Solution. Biochemistry 1988, 27, 1664–1670. (12) Wolfenden, R.; Lewis, C. A.; Yuan, Y.; Carter, C. W. Temperature Dependence of Amino Acid Hydrophobicities. Proc. Natl. Acad. Sci. 2015, 112, 7484–7488. (13) James, G.; Benoˆıt, R. Determination of Membrane-Insertion Free Energies by Molecular Dynamics Simulations. Biophys. J. 2012, 102, 795–801. (14) Michel, J.; Orsi, M.; Essex, J. W. Prediction of Partition Coefficients by Multiscale Hybrid Atomic-Level/Coarse-Grain Simulations. J. Phys. Chem. B 2008, 112, 657– 660. (15) Genheden, S. Predicting Partition Coefficients with a Simple All-Atom/CoarseGrained Hybrid Model. J. Chem. Theory Comput. 2016, 12, 297–304. (16) Eksterowicz, J. E.; Miller, J. L.; Kollman, P. A. Calculation of Chloroform/Water Partition Coefficients for the N-Methylated Nucleic Acid Bases. J. Phys. Chem. B 1997, 101, 10971–10975. (17) Shirts, M. R.; Pitera, J. W.; Swope, W. C.; Pande, V. S. Extremely Precise Free Energy Calculations of Amino Acid Side Chain Analogs: Comparison of Common Molecular Mechanics Force Fields for Proteins. J. Chem. Phys. 2003, 119, 5740–5761.

27 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(18) Villa, A.; Mark, A. E. Calculation of the Free Energy of Solvation for Neutral Analogs of Amino Acid Side Chains. J. Comput. Chem. 2002, 23, 548–553. (19) MacCallum, J. L.; Tieleman, D. P. Calculation of the Water-Cyclohexane Transfer Free Energies of Neutral Amino Acid Side-Chain Analogs Using the All-Atom Force Field. J. Comput. Chem. 2003, 24, 1930–1935. (20) K¨onig, G.; Brucknerand, S.; Boresch, S. Absolute Hydration Free Energies of Blocked Amino Acids: Implications for Protein Solvation and Stability. Biophys. J. 2013, 104, 453–462. (21) Hess, B.; van der Vegt, N. F. A. Hydration Thermodynamic Properties of Amino Acid Analogues: A Systematic Comparison of Biomolecular Force Fields and Water Models. J. Phys. Chem. B 2006, 110, 17616–17626. (22) Kirkwood, J. G. Statistical Mechanics of Fluid Mixtures. J. Chem. Phys. 1935, 3, 300–313. (23) Wang, J.; Wolf, R. M.; Caldwell, J. W.; Kollman, P. A.; Case, D. A. Development and Testing of a General Amber Force Field. J. Comput. Chem. 2004, 25, 1157–1174. (24) Vanommeslaeghe, K.; Hatcher, E.; Acharya, C.; Kundu, S.; Zhong, S.; Shim, J.; Darian, E.; Guvench, O.; Lopes, P.; Vorobyov, I.; Mackerell, A. D. CHARMM General Force Field: A Force Field for Drug-like Molecules Compatible with the CHARMM All-atom Additive Biological Force Fields. J. Comput. Chem. 2010, 31, 671–690. (25) Hansen, N.; van Gunsteren, W. F. Practical Aspects of Free-energy Calculations: A Review. J. Chem. Theory Comput. 2014, 10, 2632–2647. (26) Mobley, D. L.; Bayly, C. I.; Cooper, M. D.; Shirts, M. R.; Dill, K. A. Small Molecule Hydration Free Energies in Explicit Solvent: An Extensive Test of Fixed-charge Atomistic Simulations. J. Chem. Theory Comput. 2009, 5, 350–358. 28 ACS Paragon Plus Environment

Page 28 of 52

Page 29 of 52

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(27) Mobley, D. L.; Liu, S.; Cerutti, D. S.; Swope, W. C.; Rice, J. E. Alchemical Prediction of Hydration Free Energies for SAMPL. J. Comput.-Aided Mol. Des. 2012, 26, 551– 562. (28) Knight, J. L.; Yesselman, J. D.; Brooks, C. L. Assessing the Quality of Absolute Hydration Free Energies among CHARMM-compatible Ligand Parameterization Schemes. J. Comput. Chem. 2013, 34, 893–903. (29) Nicholls, A.; Mobley, D. L.; Guthrie, J. P.; Chodera, J. D.; Bayly, C. I.; Cooper, M. D.; Pande, V. S. Predicting Small-molecule Solvation Free Energies: An Informal Blind Test for Computational Chemistry. J. Med. Chem. 2008, 51, 769–779. (30) Ren, P.; Ponder, J. Consistent Treatment of Inter- and Intramolecular Polarization in Molecular Mechanics Calculations. J. Comput. Chem. 2002, 23, 1497–1506. (31) Lamoureux, G.; MacKerell, A. D., Jr.; Roux, B. A Simple Polarizable Model of Water Based on Classical Drude Oscillators. J. Chem. Phys. 2003, 119, 5185–5197. (32) Mohamed, N. A.; Bradshaw, R. T.; Essex, J. W. Evaluation of Solvation Free Energies for Small Molecules with the AMOEBA Polarizable Force Field. J. Comput. Chem. 2016, 37, 2749–2758. (33) Tomasi, J.; Mennucci, B.; Cammi, R. Quantum Mechanical Continuum Solvation Models. Chem. Rev. 2005, 105, 2999–3094. (34) Klamt, A.; Eckert, F.; Reinisch, J.; Wichmann, K. Prediction of Cyclohexane-water Distribution Coefficients with COSMO-RS on the SAMPL5 Data Set. J. Comput.Aided Mol. Des. 2016, 30, 959–967. (35) Marenich, A. V.; Cramer, C. J.; Truhlar, D. G. Universal Solvation Model Based on Solute Electron Density and on a Continuum Model of the Solvent Defined by the

29 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Bulk Dielectric Constant and Atomic Surface Tensions. J. Phys. Chem. B. 2009, 113, 6378–6396. (36) Ribeiro, R. F.; Marenich, A. V.; Cramer, C. J.; Truhlar, D. G. Prediction of SAMPL2 Aqueous Solvation Free Energies and Tautomeric Ratios Using the SM8, SM8AD, and SMD Solvation Models. J. Comput.-Aided Mol. Des. 2010, 24, 317–333. (37) Marenich, C. J., Aleksandr V. Cramer; Truhlar, D. G. Performance of SM6, SM8, and SMD on the SAMPL1 Test Set for the Prediction of Small-Molecule Solvation Free Energies. J. Phys. Chem. B 2009, 113, 4538–4543. (38) Fedorov, D. G.; Kitaura, K.; Li, H.; Jensen, J. H.; Gordon, M. S. The Polarizable Continuum Model (PCM) Interfaced with the Fragment Molecular Orbital Method (FMO). J. Comput. Chem. 2006, 27, 976–985. (39) Mei, Y.; Ji, C.; Zhang, J. Z. H. A New Quantum Method for Electrostatic Solvation Energy of Protein. J. Chem. Phys. 2006, 125, 094906. (40) Luzhkov, V.; Warshel, A. Microscopic Models for Quantum Mechanical Calculations of Chemical Processes in Solutions: LD/AMPAC and SCAAS/AMPAC Calculations of Solvation Energies. J. Comput. Chem. 1992, 13, 199–213. (41) Gao, J. Absolute Free Energy of Solvation from Monte Carlo Simulations Using Combined Quantum and Molecular Mechanical Potentials. J. Phys. Chem. 1992, 96, 537– 540. (42) Wesolowski, T.; Warshel, A. Ab initio Free Energy Perturbation Calculations of Solvation Free Energy Using the Frozen Density Functional Approach. J. Phys. Chem. 1994, 98, 5183–5187. (43) Muller, R. P.; Warshel, A. Ab initio Calculations of Free Energy Barriers for Chemical Reactions in Solution. J. Phys. Chem. 1995, 99, 17516–17524. 30 ACS Paragon Plus Environment

Page 30 of 52

Page 31 of 52

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(44) Beierlein, F. R.; Michel, J.; Essex, J. W. A Simple QM/MM Approach for Capturing Polarization Effects in Protein-ligand Binding Free Energy Calculations. J. Phys. Chem. B 2011, 115, 4911–4926. (45) Cave-Ayland, C.; Skylaris, C.-K.; Essex, J. W. Direct Validation of the Single Step Classical to Quantum Free Energy Perturbation. J. Phys. Chem. B 2015, 119, 1017– 1025. (46) Olsson, M. A.; S¨oderhjelm, P.; Ryde, U. Converging Ligand-binding Free Energies Obtained with Free-energy Perturbations at the Quantum Mechanical Level. J. Comput. Chem. 2016, 37, 1589–1600. (47) Olsson, M. A.; Ryde, U. Comparison of QM/MM Methods To Obtain Ligand-Binding Free Energies. J.Chem. Theory Comput. 2017, 13, 2245–2253. (48) Zwanzig, R. W. High-temperature Equation of State by a Perturbation Method. I. Nonpolar Gases. J. Chem. Phys. 1954, 22, 1420–1426. (49) K¨onig, G.; Pickard, F. C.; Mei, Y.; Brooks, B. R. Predicting Hydration Free Energies with a Hybrid QM/MM Approach: An Evaluation of Implicit and Explicit Solvation Models in SAMPL4. J. Comput.-Aided Mol. Des. 2014, 28, 245–257. (50) Jia, X.; Wang, M.; Shao, Y.; K¨onig, G.; Brooks, B. R.; Zhang, J. Z. H.; Mei, Y. Calculations of Solvation Free Energy through Energy Reweighting from Molecular Mechanics to Quantum Mechanics. J. Chem. Theory Comput. 2016, 12, 499–511. (51) Dybeck, E. C.; K¨onig, G.; Brooks, B. R.; Shirts, M. R. Comparison of Methods To Reweight from Classical Molecular Simulations to QM/MM Potentials. J. Chem. Theory Comput. 2016, 12, 1466–1480. (52) Mikulskis, P.; Cioloboc, D.; Andreji´c, M.; Khare, S.; Brorsson, J.; Genheden, S.; Mata, R. A.; S¨oderhjelm, P.; Ryde, U. Free-energy Perturbation and Quantum Me31 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

chanical Study of SAMPL4 Octa-acid Host–guest Binding Energies. J. Comput.-Aided Mol. Des. 2014, 28, 375–400. (53) Genheden, S.; Cabedo Martinez, A. I.; Criddle, M. P.; Essex, J. W. Extensive Allatom Monte Carlo Sampling and QM/MM Corrections in the SAMPL4 Hydration Free Energy Challenge. J. Comput.-Aided Mol. Des. 2014, 28, 187–200. (54) Shirts, M. R.; Chodera, J. D. Statistically Optimal Analysis of Samples from Multiple Equilibrium States. J. Chem. Phys. 2008, 129, 124105. (55) Bennett, C. H. Efficient Estimation of Free Energy Differences from Monte Carlo Data. J. Comput. Phys. 1976, 22, 245–268. (56) Stewart, J. J. P. Optimization of Parameters for Semiempirical Methods V: Modification of NDDO Approximations and Application to 70 Elements. J. Mol. Model. 2007, 13, 1173–1213. (57) Jarzynski, C. Nonequilibrium Equality for Free Energy Differences. Phys. Rev. Lett. 1997, 78, 2690. (58) Jarzynski, C. Equilibrium Free-Energy Differences from Nonequilibrium Measurements: A Master-Equation Approach. Phys. Rev. E 1997, 56, 5018. (59) Crooks, G. Nonequilibrium Measurements of Free Energy Differences for Microscopically Reversible Markovian Systems. J. Statis. Phys. 1998, 90, 1481. (60) Crooks, G. Entropy Production Fluctuation Theorem and the Nonequilibrium Work Relation for Free Energy Differences. Phys. Rev. E 1999, 60, 2721. (61) Crooks, G. Path-Ensemble Averages in Systems Driven Far from Equilibrium. Phys. Rev. E 1999, 61, 2361.

32 ACS Paragon Plus Environment

Page 32 of 52

Page 33 of 52

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(62) Chelli, R.; Marsili, S.; Barducci, A.; Procacci, P. Generalization of the Jarzynski and Crooks Nonequilibrium Work Theorems in Molecular Dynamics Simulations. Phys. Rev. E 2007, 75, 050101. (63) Hudson, P. S.; Woodcock, H. L.; Boresch, S. Use of Nonequilibrium Work Methods to Compute Free Energy Differences Between Molecular Mechanical and Quantum Mechanical Representations of Molecular Systems. J. Phys. Chem. Lett. 2015, 6, 4850. (64) Cossins, B. P.; Foucher, S.; Edge, C. M.; Essex, J. W. Assessment of Nonequilibrium Free Energy Methods. J. Phys. Chem. B 2009, 113, 5508. (65) Goette, M.; Grubm¨ uller, H. Accuracy and Convergence of Free Energy Differences Calculated from Nonequilibrium Switching Processes. J. Comput. Chem. 2009, 30, 447. (66) Hendrix, D. A.; Jarzynski, C. A Fast Growth Method of Computing Free Energy Differences. J. Chem. Phys. 2001, 114, 5974. (67) Hummer, G. Fast-Growth Thermodynamic Integration: Error and Efficiency Analysis. J. Chem. Phys. 2001, 114, 7330. (68) Ytreberg, F. M.; Zuckerman, D. M. Single-Ensemble Nonequilibrium Path-Sampling Estimates of Free Energy Differences. J. Chem. Phys. 2004, 120, 10876. (69) Lechner, W.; Oberhofer, H.; Dellago, C.; Geissler, P. L. Equilibrium Free Energies from Fast-Switching Trajectories with Large Time Steps. J. Chem. Phys. 2006, 124, 044113. (70) Nicolini, P.; Frezzato, D.; Chelli, R. Exploiting Configurational Freezing in Nonequilibrium Monte Carlo Simulations. J. Chem. Theory Comput. 2011, 7, 582.

33 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(71) Oberhofer, H.; Dellago, C.; Geissler, P. L. Biased Sampling of Nonequilibrium Trajectories: Can Fast Switching Simulations Outperform Conventional Free Energy Calculation Methods? J. Phys. Chem. B 2005, 109, 6902. (72) Oostenbrink, C.; van Gunsteren, W. F. Calculating Zeros: Non-Equilibrium Free Energy Calculations. Chem. Phys. 2006, 323, 102. (73) Spichty, M.; Cecchini, M.; Karplus, M. Conformational Free-Energy Difference of a Miniprotein from Nonequilibrium Simulations. J. Phys. Chem. Lett. 2010, 1, 1922. (74) Echeverria, I.; Amzel, L. M. Estimation of Free-Energy Differences from Computed Work Distributions: An Application of Jarzynski’s Equality. J. Phys. Chem. B 2012, 116, 10986. (75) Zuckerman, D. M.; Woolf, T. B. Overcoming Finite-Sampling Errors in Fast-Switching Free-Energy Estimates: Extrapolative Analysis of A Molecular System. Chem. Phys. Lett. 2002, 351, 445. (76) Ngo, V. A.; Kim, I.; Allen, T. W.; Noskov, S. Y. Estimation of Potentials of Mean Force from Nonequilibrium Pulling Simulations Using Both Minh-Adib Estimator and Weighted Histogram Analysis Method. J. Chem. Theory Comput. 2016, 12, 1000. (77) Chelli, R.; Marsili, S.; Procacci, P. Calculation of the Potential of Mean Force from Nonequilibrium Measurements via Maximum Likelihood Estimators. Phys. Rev. E 2008, 77, 031104. (78) Chelli, R.; Procacci, P. A Potential of Mean Force Estimator Based on Nonequilibrium Work Exponential Averages. Phys. Chem. Chem. Phys. 2009, 11, 1152. (79) Nicolini, P.; Procacci, P.; Chelli, R. Hummer and Szabo-like Potential of Mean Force Estimator for Bidirectional Nonequilibrium Pulling Experiments/Simulations. J. Phys. Chem. B 2010, 114, 9546. 34 ACS Paragon Plus Environment

Page 34 of 52

Page 35 of 52

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(80) Procacci, P.; Cardelli, C. Fast Switching Alchemical Transformations in Molecular Dynamics Simulations. J. Chem. Theory Comput. 2014, 10, 2813. (81) Sandberg, R. B.; Banchelli, M.; Guardiani, C.; Menichetti, S.; Caminati, G.; Procacci, P. Efficient Nonequilibrium Method for Binding Free Energy Calculations in Molecular Dynamics Simulations. J. Chem. Theory Comput. 2015, 11, 423. (82) Paliwal, H.; Shirts, M. R. A Benchmark Test Set for Alchemical Free Energy Transformations and Its Use to Quantify Error in Common Free Energy Methods. J. Chem. Theory Comput. 2011, 7, 4115–4134. (83) Lu, N.; Woolf, T. B. In Free Energy Calculations; Chipot, C., Pohorille, A., Eds.; Springer: Heidelberg, 2007; pp 199–247. (84) Heimdal, J.; Ryde, U. Convergence of QM/MM Free-energy Perturbations Based on Molecular-mechanics or Semiempirical Simulations. Phys. Chem. Chem. Phys. 2012, 14, 12592–12604. (85) Bentzien, J.; Muller, R. P.; Flori´an, J.; Warshel, A. Hybrid ab initio Quantum Mechanics/Molecular Mechanics Calculations of Free Energy Surfaces for Enzymatic Reactions: The Nucleophilic Attack in Subtilisin. J. Phys. Chem. B. 1998, 102, 2293–2301. (86) K¨onig, G.; Mei, Y.; Pickard, F. C., IV; Simmonett, A.; Miller, B. T.; Herbert, J.; Woodcock, H. L., III; Brooks, B. R.; Shao, Y. Computation of Hydration Free Energies Using the Multiple Environment Single System Quantum Mechanical/Molecular Mechanical Method. J. Chem. Theory Comput. 2016, 12, 332–344. (87) Frisch, M. J.; Trucks, G. W.; Schlegel, H. B.; Scuseria, G. E.; Robb, M. A.; Cheeseman, J. R.; Scalmani, G.; Barone, V.; Mennucci, B.; Petersson, G. A.; Nakatsuji, H.; Caricato, M.; Li, X.; Hratchian, H. P.; Izmaylov, A. F.; Bloino, J.; Zheng, G.; Sonnenberg, J. L.; Hada, M.; Ehara, M.; Toyota, K.; Fukuda, R.; Hasegawa, J.; Ishida, M.;

35 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Nakajima, T.; Honda, Y.; Kitao, O.; Nakai, H.; Vreven, T.; Montgomery, J. A., Jr.; Peralta, J. E.; Ogliaro, F.; Bearpark, M.; Heyd, J. J.; Brothers, E.; Kudin, K. N.; Staroverov, V. N.; Keith, T.; Kobayashi, R.; Normand, J.; Raghavachari, K.; Rendell, A.; Burant, J. C.; Iyengar, S. S.; Tomasi, J.; Cossi, M.; Rega, N.; Millam, J. M.; Klene, M.; Knox, J. E.; Cross, J. B.; Bakken, V.; Adamo, C.; Jaramillo, J.; Gomperts, R.; Stratmann, R. E.; Yazyev, O.; Austin, A. J.; Cammi, R.; Pomelli, C.; Ochterski, J. W.; Martin, R. L.; Morokuma, K.; Zakrzewski, V. G.; Voth, G. A.; Salvador, P.; Dannenberg, J. J.; Dapprich, S.; Daniels, A. D.; Farkas, O.; Foresman, J. B.; Ortiz, J. V.; Cioslowski, J.; Fox, D. J. Gaussian 09, Revision B.01, Gaussian, Inc.: Wallingford, CT. 2010. (88) Cieplak, P.; Cornell, W. D.; Bayly, C.; Kollman, P. A. Application of the Multimolecule and Multiconformational RESP Methodology to Biopolymers: Charge Derivation for DNA, RNA, and Proteins. J. Comput. Chem. 1995, 16, 1357–1377. (89) Bayly, C. I.; Cieplak, P.; Cornell, W.; Kollman, P. A. A Well-behaved Electrostatic Potential Based Method Using Charge Restraints for Deriving Atomic Charges: The RESP Model. J. Phys. Chem. 1993, 97, 10269–10280. (90) Fox, T.; Kollman, P. A. Application of the RESP Methodology in the Parametrization of Organic Solvents. J. Phys. Chem. B 1998, 102, 8070–8079. (91) Case, D. A.; Berryman, J. T.; Betz, R. M.; Cerutti, D. S.; Cheatham, T. E., III; Darden, T. A.; Duke, R. E.; Giese, T. J.; Gohlke, H.; Goetz, A. W.; Homeyer, N.; Izadi, N.; Janowski, P.; Kaus, J.; Kovalenko, A.; Lee, T. S.; LeGrand, S.; Li, P.; Luchko, T.; Luo, R.; Madej, B.; Merz, K. M.; Monard, G.; Needham, P.; Nguyen, H.; Nguyen, H. T.; Omelyan, I.; Onufriev, A.; Roe, D. R.; Roitberg, A.; SalomonFerrer, R.; Simmerling, C. L.; Smith, W.; Swails, J.; Walker, R. C.; Wang, J.; Wolf, R. M.; Wu, X.; York, D. M.; Kollman, P. A. AMBER 2015, University of California, San Francisco. 2015. 36 ACS Paragon Plus Environment

Page 36 of 52

Page 37 of 52

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(92) Pastor, R. W.; Brooks, B. R.; Szabo, A. An Analysis of the Accuracy of Langevin and Molecular Dynamics Algorithms. Mol. Phys. 1988, 65, 1409–1419. (93) Berendsen, H. J. C.; Postma, J. P. M.; van Gunsteren, W. F.; DiNola, A.; Haak, J. R. Molecular Dynamics with Coupling to an External Bath. J. Chem. Phys. 1984, 81, 3684–3690. (94) Darden, T.; York, D.; Pedersen, L. Particle Mesh Ewald: An N·log(N) Method for Ewald Sums in Large Systems. J. Chem. Phys. 1993, 98, 10089–10092. (95) Oostenbrink, C.; Villa, A.; Mark, A. E.; van Gunsteren, W. F. A Biomolecular Force Field Based on the Free Enthalpy of Hydration and Solvation: The GROMOS ForceField Parameter Sets 53A5 and 53A6. J. Comput. Chem. 2004, 25, 1656–1676. (96) Caleman, C.; van Maaren, P. J.; Hong, M.; Hub, J. S.; Costa, L. T.; van der Spoel, D. Force Field Benchmark of Organic Liquids: Density, Enthalpy of Vaporization, Heat Capacities, Surface Tension, Isothermal Compressibility, Volumetric Expansion Coefficient, and Dielectric Constant. J. Chem. Theory Comput. 2012, 8, 61–74. (97) Jia, X.; Zhang, J. Z. H.; Mei, Y. Assessing the Accuracy of the General AMBER Force Field for 2, 2, 2-Trifluoroethanol as Solvent. J. Mol. Model. 2013, 19, 2355–2361. (98) Jorgensen, W. L.; Tirado-Rives, J. The OPLS Potential Functions for Proteins, Energy Minimizations for Crystals of Cyclic Peptides and Crambin. J. Am. Chem. Soc. 1988, 110, 1657–1666. (99) Martins, S. A.; Sousa, S. F.; Ramos, M. J.; Fernandes, P. A. Prediction of Solvation Free Energies with Thermodynamic Integration Using the General Amber Force Field. J. Chem. Theory Comput. 2014, 10, 3570–3577. (100) Zhang, J.; Tuguldur, B.; van der Spoel, D. Force Field Benchmark of Organic Liquids. 2. Gibbs Energy of Solvation. J. Chem. Inf. Model. 2015, 55, 1192–1201. 37 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(101) Mobley, D. L.; Bayly, C. I.; Cooper, M. D.; Dill, K. A. Predictions of Hydration Free Energies from All-Atom Molecular Dynamics Simulations. J. Phys. Chem. B 2009, 113, 4533–4537. (102) Burger, S. K.; Schofield, J.; Ayers, P. W. Quantum Mechanics/Molecular Mechanics Restrained Electrostatic Potential Fitting. J. Phys. Chem. B 2013, 117, 14960. (103) Beerepoot, M. T. P.; Steindal, A. H.; List, N. H.; Kongsted, J.; Olsen, J. M. H. Averaged Solvent Embedding Potential Parameters for Multiscale Modeling of Molecular Properties. J. Chem. Theory Comput. 2016, 12, 1684. (104) Johnson, E. R.; Becke, A. D. A Post-Hartree-Fock Model of Intermolecular Interactions. J. Chem. Phys. 2005, 123, 024101. (105) Grimme, S.; Antony, J.; Ehrlich, S.; Krieg, H. A Consistent and Accurate ab initio Parametrization of Density Functional Dispersion Correction (DFT-D) for the 94 Elements H-Pu. J. Chem. Phys. 2010, 132, 154104. (106) Tkatchenko, A.; Scheffler, M. Accurate Molecular van der Waals Interactions from Ground-State Electron Density and Free-Atom Reference Data. Phys. Rev. Lett. 2009, 102, 073005. (107) Silvestrelli, P. L. Van der Waals Interactions in DFT Made Easy by Wannier Functions. Phys. Rev. Lett. 2008, 100, 053002. (108) Berland, K.; Cooper, V. R.; Lee, K.; Schr¨oder, E.; Thonhauser, T.; Hyldgaard, P.; Lundqvist, B. I. Van der Waals Forces in Density Functional Theory: A Review of the vdW-DF Method. Rep. Prog. Phys 2015, 78, 066501. (109) Giese, T. J.; York, D. M. Charge-dependent Model for Many-body Polarization, Exchange, and Dispersion Interactions in Hybrid Quantum Mechanical/Molecular Mechanical Calculations. J. Chem. Phys. 2007, 127, 194101. 38 ACS Paragon Plus Environment

Page 38 of 52

Page 39 of 52

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(110) Tkatchenko, A.; Rossi, M.; Blum, V.; Ireta, J.; Scheffler, M. Unraveling the Stability of Polypeptide Helices: Critical Role of van der Waals Interactions. Phys. Rev. Lett. 2011, 106, 118102. (111) Tkatchenko, A.; DiStasio, R. A.; Car, R.; Scheffler, M. Accurate and Efficient Method for Many-Body van der Waals Interactions. Phys. Rev. Lett. 2012, 108, 236402. (112) St¨ohr, M.; Michelitsch, G. S.; Tully, J. C.; Reuter, K.; Maurer, R. J. Communication: Charge-population Based Dispersion Interactions for Molecules and Materials. J. Chem. Phys. 2016, 144, 151101. (113) Mei, Y.; Simmonett, A. C.; Pickard, F. C., IV; DiStasio, J., Robert A.; Brooks, B. R.; Shao, Y. Numerical Study on the Partitioning of the Molecular Polarizability into Fluctuating Charge and Induced Atomic Dipole Contributions. J. Phys. Chem. A 2015, 119, 5865. (114) Shao, Y.; Gan, Z.; Epifanovsky, E.; Gilbert, A. T.; Wormit, M.; Kussmann, J.; Lange, A. W.; Behn, A.; Deng, J.; Feng, X.; Ghosh, D.; Goldey, M.; Horn, P. R.; Jacobson, L. D.; Kaliman, I.; Khaliullin, R. Z.; Ku´s, T.; Landau, A.; Liu, J.; Proynov, E. I.; Rhee, Y. M.; Richard, R. M.; Rohrdanz, M. A.; Steele, R. P.; Sundstrom, E. J.; Woodcock, H. L.; Zimmerman, P. M.; Zuev, D.; Albrecht, B.; Alguire, E.; Austin, B.; Beran, G. J. O.; Bernard, Y. A.; Berquist, E.; Brandhorst, K.; Bravaya, K. B.; Brown, S. T.; Casanova, D.; Chang, C.-M.; Chen, Y.; Chien, S. H.; Closser, K. D.; Crittenden, D. L.; Diedenhofen, M.; DiStasio, R. A.; Do, H.; Dutoi, A. D.; Edgar, R. G.; Fatehi, S.; Fusti-Molnar, L.; Ghysels, A.; GolubevaZadorozhnaya, A.; Gomes, J.; Hanson-Heine, M. W.; Harbach, P. H.; Hauser, A. W.; Hohenstein, E. G.; Holden, Z. C.; Jagau, T.-C.; Ji, H.; Kaduk, B.; Khistyaev, K.; Kim, J.; Kim, J.; King, R. A.; Klunzinger, P.; Kosenkov, D.; Kowalczyk, T.; Krauter, C. M.; Lao, K. U.; Laurent, A. D.; Lawler, K. V.; Levchenko, S. V.; Lin, C. Y.; Liu, F.; Livshits, E.; Lochan, R. C.; Luenser, A.; Manohar, P.; Manzer, S. F.; Mao, S.39 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

P.; Mardirossian, N.; Marenich, A. V.; Maurer, S. A.; Mayhall, N. J.; Neuscamman, E.; Oana, C. M.; Olivares-Amaya, R.; O’Neill, D. P.; Parkhill, J. A.; Perrine, T. M.; Peverati, R.; Prociuk, A.; Rehn, D. R.; Rosta, E.; Russ, N. J.; Sharada, S. M.; Sharma, S.; Small, D. W.; Sodt, A.; Stein, T.; St¨ uck, D.; Su, Y.-C.; Thom, A. J.; Tsuchimochi, T.; Vanovschi, V.; Vogt, L.; Vydrov, O.; Wang, T.; Watson, M. A.; Wenzel, J.; White, A.; Williams, C. F.; Yang, J.; Yeganeh, S.; Yost, S. R.; You, Z.-Q.; Zhang, I. Y.; Zhang, X.; Zhao, Y.; Brooks, B. R.; Chan, G. K.; Chipman, D. M.; Cramer, C. J.; Goddard, W. A.; Gordon, M. S.; Hehre, W. J.; Klamt, A.; Schaefer, H. F.; Schmidt, M. W.; Sherrill, C. D.; Truhlar, D. G.; Warshel, A.; Xu, X.; Aspuru-Guzik, A.; Baer, R.; Bell, A. T.; Besley, N. A.; Chai, J.-D.; Dreuw, A.; Dunietz, B. D.; Furlani, T. R.; Gwaltney, S. R.; Hsu, C.-P.; Jung, Y.; Kong, J.; Lambrecht, D. S.; Liang, W.; Ochsenfeld, C.; Rassolov, V. A.; Slipchenko, L. V.; Subotnik, J. E.; Van Voorhis, T.; Herbert, J. M.; Krylov, A. I.; Gill, P. M.; HeadGordon, M. Advances in Molecular Quantum Chemistry Contained in the Q-Chem 4 Program Package. Mol. Phys. 2015, 113, 184–215. (115) Cole, D. J.; Vilseck, J. Z.; Tirado-Rives, J.; Payne, M. C.; Jorgensen, W. L. Biomolecular Force Field Parameterization via Atoms-in-Molecule Electron Density Partitioning. J. Chem. Theory Comput. 2016, 12, 2312–2323. (116) Singh, U. C.; Kollman, P. A. An Approach to Computing Electrostatic Charges for Molecules. J. Comput. Chem. 1984, 5, 129–145. (117) Wennmohs, F.; Schindler, M. Development of a Multipoint Model for Sulfur in Proteins: A New Parametrization Scheme to Reproduce High-level ab initio Interaction Energies. J. Comput. Chem. 2005, 26, 283–293. (118) Olivet, A.; Vega, L. F. Optimized Molecular Force Field for Sulfur Hexafluoride Simulations. J. Chem. Phys. 2007, 126, 144502.

40 ACS Paragon Plus Environment

Page 40 of 52

Page 41 of 52

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(119) Zhang, X.; Gong, Z.; Li, J.; Lu, T. Intermolecular Sulfur· · · Oxygen Interactions: Theoretical and Statistical Investigations. J. Chem. Inf. Model. 2015, 55, 2138–2153. (120) Steinbrecher, T.; Mobley, D. L.; Case, D. A. Nonlinear Scaling Schemes for LennardJones Interactions in Free Energy Calculations. J. Chem. Phys. 2007, 127, 214108. (121) Vasilevskaya, T.; Thiel, W. Periodic Boundary Conditions in QM/MM Calculations: Implementation and Tests. J. Chem. Theory Comput. 2016, 12, 3561. (122) Dean, J. A. Lange’s Handbook of Chemistry; McGraw-Hill Professional Publishing: New York, 2005. (123) Weat, R. C. CRC Handbook of Chemistry and Physics; CRC Press: Boca Raton, Fla., 1988. (124) Kish, L. Survey Sampling, Revised Edition; Wiley-Interscience: New York, 1995. (125) Klimovich, P. V.; Shirts, M. R.; Mobley, D. L. Guidelines for the Analysis of Free Energy Calculations. J. Comput. Aided Mol. Des. 2015, 29, 397–411.

41 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Table 1: Simplified names for side chain analogs studied in this work. molecules parent amino acids acetamide Asn methanethiol Cys propionamide Gln methanol Ser ethanol Thr n-butane Ile iso-butane Leu methylthioethane Met toluene Phe skatole Trp 4-methylphenol Tyr propane Val methane Ala

42 ACS Paragon Plus Environment

Page 42 of 52

Page 43 of 52

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Table 2: Optimized force field parameters for the chloroform molecule. All the values are taking from GAFF, except for the vdW parameters of the chlorine atom, and the original GAFF values are listed in parentheses. Atom Type cw Cl hy Bond cw-hy cw-Cl Angle hy-cw-Cl Cl-cw-Cl Nonbonded cw Cl hy

e -0.470 0.037 0.359 RKa (kcal/mol/˚ A2 ) 333.4 279.0 TKc (kcal/mol/rad2 ) 40.330 54.230 σ 1.9080 1.9280 (1.948) 1.1870

REQb 1.0984 1.7860 TEQd

a

Force constants for the bonds of each type. Equilibrium bond lengths of the bonds for each type. c Force constants for the angles of each type. d Equilibrium angles for each type. b

43 ACS Paragon Plus Environment

107.650 111.03  0.1094 0.280 (0.265) 0.0157

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 44 of 52

Table 3: Calculated physical properties of liquid chloroform. cutoff (˚ A) 10 12 15 Exp/QM a b

ρ (g/cm3 ) 1.499±0.000 1.499±0.000 1.499±0.000 1.483a

µ (D) 1.61±0.00 1.61±0.00 1.61±0.00 1.60

ε 5.12±0.01 5.10±0.00 5.15±0.01 4.80a

Ref. 122. Ref. 123.

44 ACS Paragon Plus Environment

∆Hvap (kcal/mol) 7.31±0.03 7.31±0.03 7.32±0.03 7.48b

Page 45 of 52

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Table 4: Hydration free energies in kcal/mol. The results at MM level, denoted as “MM”, were analyzed with MBAR, while those after the QM/MM corrections at BLYP/6-31G(d) level were carried out using three schemes, respectively, namely QM/MM-D, QM/MM-B and QM/MM-NE. The data in the “QM/PCM” column are results from the SMD calculations at M06-2X/cc-pVTZ level. Molecule Ala Asn Cys Gln Ile Leu Met Phe Ser Thr Trp Tyr Val RMSD MSE a b c

QM/MM-D ∆G S 2.40±0.04 2.20±0.04 0.99 -9.59±0.07 -10.80±0.15 0.63 -1.37±0.02 -2.24±0.10 0.69 -8.54±0.08 -10.19±0.18 0.58 2.33±0.06 1.83±0.07 0.96 2.30±0.06 1.83±0.06 0.97 -1.05±0.07 -1.93±0.12 0.68 -0.70±0.07 -0.38±0.08 0.84 -4.71±0.06 -5.32±0.07 0.91 -4.21±0.06 -5.25±0.09 0.78 -5.63±0.09 -6.03±0.10 0.82 -4.74±0.08 -6.25±0.12 0.75 2.32±0.06 1.92±0.06 0.97 0.54±0.02 0.55±0.03 MM

0.38±0.02

-0.35±0.03

QM/MM-B ∆G S 2.24±0.04 0.99 -9.65±0.13 0.73 -2.26±0.04 0.63 -9.74±0.17 0.66 1.87±0.07 0.91 1.93±0.07 0.92 -2.11±0.17 0.59 -0.33±0.07 0.89 -5.39±0.07 0.54 -5.86±0.09 0.55 -6.00±0.17 0.66 -6.04±0.15 0.53 1.94±0.07 0.93 0.49 ± 0.03 0.40 ± 0.02b −0.25 ± 0.03 −0.22 ± 0.02b

QM/MM-NE QM/PCM Expt.a ∆G S 2.23±0.04 1.00 2.38 1.94 -10.16±0.07 0.99 -9.37 -9.68 -2.45±0.07 0.96 -0.39 -1.24 -9.93±0.08 0.98 -8.49 -9.38 1.87±0.06 1.00 1.80 2.15 1.89±0.06 1.00 1.91 2.28 -1.76±0.10 0.97 -0.62 -1.48 -0.32±0.08 0.99 -0.73 -0.76 -5.31±0.07 0.99 -4.19 -5.06 -5.33±0.07 0.99 -4.26 -4.88 -5.95±0.09 1.00 -4.60 -5.88 -6.38±0.10 0.98 -4.78 -6.11 1.96±0.06 1.00 1.66 1.99 0.47 ± 0.02 0.76 0.44 ± 0.02c −0.27 ± 0.02 0.49 −0.28 ± 0.02c

The experimental hydration free energies from Ref. 10. Using QM/MM-B results for Asn and Gln and QM/MM-D results for other molecules. Using QM/MM-NE results for Asn and Gln and QM/MM-D results for other molecules.

45 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Table 5: Optimal wall time of the three methods for the computations of the QM/MM solvation free energy at the BLYP/6-31G* level for Asn. Assume 5 nodes were used and each node has 16 cores of Intel Xeon CPU E5-2660 2.20GHz. Methods QM/MM-D QM/MM-B QM/MM-NE QM/MM/MDa a

Optimal wall time 0.13 hours 6.4 hours 4.7 days >90 days

A 1-ns simulation for each of the 11 windows.

46 ACS Paragon Plus Environment

Page 46 of 52

Page 47 of 52

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Table 6: The scale factors for the Lennard-Jones -parameters of the solute atoms. solute molecule Asn Phe Ser Gln Trp Averagea Averageb a b

c ca c3 cc cd 0.88 / 0.87 / / / 0.82 0.86 / / / / 0.90 / / 0.87 / 0.87 / / / 0.81 0.89 0.84 0.82 0.88 0.82 0.89 0.84 0.82 0.86

scale factor h1 ha hc h4 / / 0.91 / / 0.97 0.89 / 0.92 / / / / / 0.89 / / 1.04 0.88 1.38 0.92 1.01 0.89 1.38 1.08

The average scale factor for each atom type. The average scale factor for each element.

47 ACS Paragon Plus Environment

hn 1.72 / / 1.26 1.07 1.35

oh o / 0.85 / / 0.88 / / 0.84 / / 0.88 0.85 0.86

n na 0.88 / / / / / 0.88 / / 0.85 0.88 0.85 0.87

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 48 of 52

Table 7: Solvation free energies in chloroform (kcal/mol). MM Asn -7.65±0.03 Gln -7.90±0.03 Phe -5.62±0.03 Ser -3.15±0.02 Trp -9.91±0.04 RMSD 1.04 MSE -0.85 a b

QM/MM MM (scaled vdW) ∆Aaparam -7.32±0.04 -7.26±0.03 0.39 -7.95±0.04 -7.07±0.03 0.83 -5.37±0.03 -4.32±0.03 1.30 -2.80±0.02 -3.00±0.02 0.15 -9.71±0.04 -8.40±0.04 1.51 0.89 0.37 -0.63 -0.01

QM/MM (scaled vdW) -6.94±0.03 -7.15±0.04 -4.09±0.04 -2.60±0.02 -8.21±0.04 0.40 0.20

S Wmax 0.95 0.01 0.93 0.02 0.97 0.01 0.97 0.01 0.95 0.01

The solvation free energy differences between GAFF and the scaled GAFF. The experimental solvation free energies in chloroform from Ref. 11.

48 ACS Paragon Plus Environment

Expt.b -6.87 -7.37 -3.80 -3.16 -8.80

Page 49 of 52

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Table 8: Partition coefficients between chloroform and water. MM QM/MM MM (scaled vdW) Asn 1.45±0.04 1.74±0.07 1.74±0.04 Gln 0.48±0.04 1.34±0.06 1.10±0.04 Phe -3.67±0.04 -3.73±0.04 -2.70±0.04 Ser 1.16±0.03 1.88±0.06 1.28±0.03 Trp -3.20±0.04 -2.75±0.07 -2.07±0.04 RMSD 0.90 0.75 0.27 MSE -0.78 -0.33 -0.16 a

QM/MM (scaled vdW) 2.02±0.06 1.93±0.06 -2.77±0.05 2.03±0.06 -1.63±0.07 0.55 0.29

The experimental data from Ref. 11.

49 ACS Paragon Plus Environment

Expt.a 2.00 1.40 -2.28 1.26 -2.24

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(a)

(b)

(c)

Figure 1: Schematic representations of the three schemes for the MM-to-QM/MM corrections. (a) The QM/MM-D scheme. (b) The QM/MM-B scheme. (c) The QM/MM-NE scheme.

50 ACS Paragon Plus Environment

Page 50 of 52

Page 51 of 52

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Asn

Ile

Cys

Leu

Ser

Gln

Met

Phe

CH4

Tyr

Val

Ala

Figure 2: Side chain analogs studied in this work.

51 ACS Paragon Plus Environment

Thr

Trp

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Graphical TOC Entry

52 ACS Paragon Plus Environment

Page 52 of 52