Force Field Benchmark of Amino Acids. 2. Partition Coefficients

(42) Despite the popularity and importance of 1-octanol/water partition coefficients (log Poct) in the .... From refs (60and61). ... was shown to have...
0 downloads 0 Views 3MB Size
Subscriber access provided by UOW Library

Computational Biochemistry

Force Field Benchmark of Amino Acids: II. Partition Coefficients between Water and Organic Solvents Haiyang Zhang, Yang Jiang, Ziheng Cui, and Chunhua Yin J. Chem. Inf. Model., Just Accepted Manuscript • DOI: 10.1021/acs.jcim.8b00493 • Publication Date (Web): 26 Jul 2018 Downloaded from http://pubs.acs.org on July 31, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Force Field Benchmark of Amino Acids: II. Partition Coefficients between Water and Organic Solvents Haiyang Zhang, † Yang Jiang,‡ Ziheng Cui,‡ and Chunhua Yin*† †

Department of Biological Science and Engineering, School of Chemistry and Biological

Engineering, University of Science and Technology Beijing, 100083 Beijing, China ‡

Beijing Key Lab of Bioprocess, College of Life Science and Technology, Beijing University of

Chemical Technology, Box 53, 100029 Beijing, China Corresponding Author * [email protected]

ACS Paragon Plus Environment

1

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 45

ABSTRACT

The partitioning of amino acids between water and apolar environments is of vital importance in protein function and drug delivery. Here we present an extensive benchmark for octanol/water (log Poct), chloroform/water (log Pclf), and cyclohexane/water (log Pchx) partition coefficients of neutral amino acid side chain analogues (SCAs) with Amber families of ff99SB-ILDN, ff03, ff14SB, fb15, and ff15ipq, CHARMM 27, GROMOS 53A6, and OPLS-AA/L force fields. A root-mean-square error (RMSE) of 0.4~1.3 log units from experiment is observed for the tested FFs, of which Amber ff94 lineages of ff99SB-ILDN, ff14SB, and fb15 perform best with an RMSE and mean signed error (MSE) of about 0.5 and 0.2 log units, respectively, a performance comparable with quantum mechanical SMD calculations. This finding retains the possibility of modeling proteins in varied environments with one set of classical molecular mechanical force fields. All the FFs tend to overestimate log P, except for GROMOS 53A6 underestimating log Pclf and log Pchx. These discrepancies are mainly due to the larger overestimated solvation free energies in water (∆Gwat) relative to that in organic solvents (∆Goct, ∆Gclf, and ∆Gchx); for GROMOS 53A6, it is due to the underestimated ∆Gwat and ∆Goct. The latest water models of “FB” and “OPC” families paired with the recent Amber fb15 do not show an obvious improvement for ∆Gwat and log P calculations. The van der Waals interaction between amino acids and cyclohexane is found to be too strong (overestimated) systematically. Scaling protein-water interactions leads to more favorable ∆Gwat, thereby lowering log P and resulting in a better performance for Amber ff03ws, while such scaling seems a bit too much for Amber ff99SBws. This, along with our previous work (Zhang et al. J. Chem. Inf. Model. 2018, 58, 1037-1052), may aid in the development and systematic improvements of classical force fields to model proteins in aqueous and nonaqueous phases accurately.

ACS Paragon Plus Environment

2

Page 3 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

1. INTRODUCTION Partitioning of a solute between aqueous and apolar environments is of fundamental importance in biotechnology applications like drug discovery.1-3 The binding of a drug to a protein receptor for instance involves a transfer of the drug ligand from water to a less polar surrounding. Due to the challenge of transporting drug molecules across cell membranes, drug targets are mostly membrane proteins whose transmembrane segments are generally hydrophobic.4 Also, enzyme catalysis for industrial uses in organic solvents has synthetic and processing advantages compared to that in water.5 As a crucial complement to experiments, computational methods facilitate the high-throughput virtual screening of biologically active components and help unveil the mechanism underlying these applications in details.6-9 For an accurate modeling, theoretical force field models for biomolecules like proteins are therefore required to reproduce not only structural details of the molecules of interest but also the partitioning (solvation) properties in, and between, varied environments.10 The commonly used protein force field (FF) sets include Amber,11 CHARMM,12 GROMOS,13 and OPLS-AA,14 of which experimental solvation free energies were not used for calibration explicitly during the parameterization of amino acids, except the GROMOS FF sets that were parameterized against the solvation free energies (SFEs) in water and cyclohexane.13,

15

The

general Amber (GAFF),16 CHARMM general (CGenFF),17-19 and GROMOS-compatible 2016H6620 force fields were then developed subsequently, allowing for modeling drug-like molecules and organic liquids; again, only the GROMOS 2016H66 set targeted SFEs of small organic molecules in water and cyclohexane explicitly.20 The SFEs in aqueous and nonaqueous media have been used to benchmark atomistic force fields and inform changes for improvement.15, 21-30 Another useful quantity for the benchmark is the partition coefficient (log P)

ACS Paragon Plus Environment

3

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 45

that has a straightforward relationship with the SFEs in water and organic solvents and can be measured routinely and relatively easily from experiments compared to SFE determinations.31 However, most benchmarks of protein force fields were extensively done in water,15, 27-29, 32-36 although a number of experimental log P values of amino acid side chain analogues (SCAs) between water and organic solvents such as 1-octanol, chloroform, and cyclohexane were available.37 SFEs of amino acid SCAS in chloroform have been evaluated with the Amber ff99 force field (relative SFE to Gly),38 GROMOS 43A2 (an old version of GROMOS FFs),10 and GAFF force fields.26 Except for the GROMOS force fields,10,

13

as far as we know,

cyclohexane/water partition coefficients (log Pchx) of amino acid SCAs were examined only with the OPLS-AA force field39-41 and a coarse-grained protein model.42 Despite the popularity and importance of 1-octanol/water partition coefficients (log Poct) in the assessment of drug-likeness in drug development,3 there are very few reports on the performance of protein FFs for predicting log Poct of amino acid SCAs. These observations indicate that our understanding for protein force field performances in organic media is largely limited. Moreover, current protein force fields were argued to yield stronger protein-protein interactions than protein-water interactions in recent years, likely leading to more compact disordered peptides and artificial protein aggregation,29, 43-47 as evidenced by the calculated less favorable hydration free energies of solute molecules.28, 29, 32, 36, 48 A strategy of scaling proteinwater interactions (such as Amber ff99SBws and ff03ws) was proposed attempting to strengthen the interactions between the solute and water molecules then,28, 29, 46 and such scaling is expected to influence the partition behavior of amino acids. Recently, semi-automated schemes have been exploited for rapid developments of new force fields; for instance, the force balance (FB)49 and

ACS Paragon Plus Environment

4

Page 5 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

implicitly polarized charge (IPolQ)50 methods generates the latest Amber fb1551 and ff15ipq52 protein force fields, respectively. How these strategies perform is yet to be assessed however. As a follow-up on our previous work on the benchmark of hydration and diffusion properties of amino acids,36 here we present an extensive assessment of partition coefficients of amino acid SCAs between water and organic solvents with ten protein force fields, namely, Amber lineages of ff99SB-ILDN,53 ff99SBws,29 ff03,54 ff03ws,29 ff14SB,55 fb15,51 and ff15ipq,52 CHARMM 27,12, 56, 57 GROMOS 53A6,13 and OPLS-AA/L.14 The performance of these FFs in nonaqueous media of 1-octanol, chloroform, and cyclohexane was presented in terms of solvation free energies and partition coefficients. This work may be useful for further efforts in protein force field developments regarding the partition behavior of biomolecules.

2. COMPUTATIONAL METHODS 2.1 Collection of Experimental Observations. Octanol/water (log Poct), chloroform/water (log Pclf), and cyclohexane/water (log Pchx) partition coefficients of neutral amino acid SCAs were collected from four compilations of experimental studies1, 37, 58, 59 and listed in Table 1. For the analogues of titratable amino acids such as Asp, Glu, and Lys, their solvation free energies (∆Gsolvation) in 1-octanol, chloroform, and cyclohexane were taken, if available, from the Minnesota solvation database (version 2012),59 and are related to log P by log ܲ =

ି∆ீ౪౨౗౤౩౜౛౨ ோ்୪୬(ଵ଴)

=

∆ீ౭౗౪ ି∆ீ౩౥ౢ౬౗౪౟౥౤ ோ்୪୬(ଵ଴)

(eq 1)

where ∆Gtransfer is the transfer free energy of solute moving from water to organic media, R is the gas constant, T is the absolute temperature, and ∆Gwat is the solvation free energy in water (i.e., hydration free energy).31 Note that according to the above equation the experimental compilation in Ref.37 is of opposite sign to that in Table 1.

ACS Paragon Plus Environment

5

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 45

Table 1. Experimental Observations for Partition Coefficients and Solvation Free Energies (kcal/mol) of Neutral Amino Acid Side Chain Analogues log Pchxa ∆Gwat b ∆Goctc ∆Gclf c ∆Gchx c AA side chain analogues log Pocta log Pclfa 1.94 Ala methane 1.09 1.33 0.45 0.13 -10.92 Arg n-propylguanidine -9.68 Asn acetamide -1.26d -2.00 -4.88 -7.96 -6.95 -3.04 -6.70 Asp acetic acid -0.26e -1.44e -3.64e -6.35e -4.74e -1.73e -1.24 Cys methanethiol 0.78d 0.93 -2.52 -2.30 -9.38 Gln propionamide -1.40 -4.07 -7.47 -3.84 -6.47 Glu propanoic acid 0.29e -0.81e -1.97e -6.86e -5.37e -3.78e -10.27 His 4-methylimidazole 0.23d -10.58 2.15 Ile n-butane 2.89 3.62 -1.79 -2.77 2.28 Leu iso-butane 2.76 3.62 -2.64 -1.49 -4.38 Lys n-butylamine 0.70e 0.99f -0.29f -5.33e -5.73 -3.98 d -1.48 Met methyl ethyl sulfide 1.54 1.73 -3.58 -3.83 -0.76 Phe toluene 2.58 2.28 2.19 -3.74 -4.28 -3.87 e e -5.06 Ser methanol -0.87 -1.26 -2.49 -3.87 -3.34 -1.66 -4.88 Thr ethanol -0.38e -0.69e -1.89 -4.36e -3.94e -2.31 -5.88 Trp 3-methylindole 2.60 2.24 1.71 -9.43 -8.94 -8.21 -6.11 Tyr p-cresol 2.00e 1.08e -0.10 -7.58e -5.97 -8.84e 1.99 Val propane 2.36 2.97 -2.05 -1.23 a Octanol/water (log Poct), chloroform/water (log Pclf), and cyclohexane/water (log Pchx) partition coefficients were taken from Ref.37 unless noted otherwise; bFrom Refs.;60, 61 cThe given log P was converted to solvation free energies by eq 1 unless noted otherwise; dFrom Ref.;58 eSolvation free energies from Ref.59 were converted to log P by eq 1; fForm Ref.1

2.2 Structural and Force Field Model. Ten protein force fields (FFs), namely, Amber families of ff99SB-ILDN,53 ff99SBws,29 ff03,54 ff03ws,29 ff14SB,55 fb15,51 and ff15ipq,52 CHARMM 27,12, 56, 57 GROMOS 53A6,13 and OPLS-AA/L,14 were used to model amino acid side chain analogues (SCAs). Construction of structural models for the analogues followed previous reports by adjusting parameters of β-carbon (Cβ).32,

35, 62

The Amber variants of

ff99SBws and ff03ws29 have identical parameters for amino acid SCAs (thereby giving identical solvation free energies in 1-octanol, chloroform, and cyclohexane) to ff99SB-ILDN and ff03, respectively, but they use a scaled protein-water interaction resulting in different hydration free energies. Partial atomic charges in the Amber force fields were mainly derived with three distinct concepts: (1) ff9x parameter sets such as ff94 and ff99, the parent of ff99SB-ILDN,

ACS Paragon Plus Environment

6

Page 7 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

ff14SB, and fb15, were based on fitting gas-phase electrostatic potential calculated at the HF/6– 31G* level;11 (2) ff03 included a low-dielectric continuum model (corresponding to organic media or protein interior) in the quantum mechanical (QM) calculations of electrostatic potentials;54 (3) polarized atomic charges in ff15ipq was assigned implicitly by the IPolQ scheme.52 Force field parameters of the side chain analogues in the Amber ff99SB-ILDN are identical to that in ff94, ff99, ff99SB, etc., and here we used Amber ff9x to represent these FF variants, as referred to in Ref.29 Amber ff14SB and fb15 share identical nonbonded parameters to ff9x, but differ in side chain dihedrals and bonded parameters, respectively. The GROMOS 53A6 force field13 was chosen instead of the latest version 54A815 because the compatible organic solvents molecules used (see below) were largely based on the 53A6 parameter set,20 and these two sets are identical for neutral amino acids. Due to the influence of charge reassignment on the hydration free energies of neutral side chains with Amber ff03, atomic charges obtained from the full electrostatic potential (ESP) QM calculations in our previous work36 (the resulting parameters was referred to as Amber ff03full in this work) were examined as well. For aqueous simulations, TIP3P water model63 was used with Amber ff99SB-ILDN,53 ff03,54 and ff14SB, modified TIP3P (TIPS3P) with Lennard-Jones interaction sites on the hydrogens64 was with CHARMM 27,12, 56, 57 TIP4P/200565 was with Amber ff99SBws and ff03ws,29 TIP3PFB49 was with Amber fb15,51 SPC/Eb66 was with Amber ff15ipq,52 and SPC67 and TIP4P63 models were with GROMOS 53A613 and OPLS-AA/L,14 respectively. To examine the water model dependence, seven more models of OPC3,68 TIP3P,63 OPC,69 TIP4P-D,43 TIP4P-Ew,70 TIP4P-FB,49 and TIP5P-Ew71 was used with the latest Amber fb15 force field as well. Unless noted otherwise, the Amber fb15 is paired with the preferred TIP3P-FB model.51 Due to computational efficiency in the GROMACS suite,72-75 TIP3P is recommended for use with

ACS Paragon Plus Environment

7

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 45

CHARMM instead of the commonly used TIPS3P model,57 and we also examined the TIP3P performance with CHARMM 27. The FF/water combinations examined are listed in Table S1 in the Supporting Information (SI). Solvent models in the GROMOS-compatible 2016H66 set were used for 1-octanol, chloroform, and cyclohexane.20 For the other three kinds of protein FFs, solvent models were mainly based on the GROMACS Molecule and Liquid Database76-78 where organic molecules were modeled with the general Amber (GAFF),16 CHARMM general (CGenFF), 17-19 and OPLSAA79 force fields. Cyclohexane models for CHARMM and OPLS were built based on the available cyclohexanone model via a mutation of the ketone oxygen to two hydrogen atoms with atomic charges identical to the other carbon-attached hydrogen. The CHARMM-compatible chloroform model used here was taken from the work by Noskovet al.80 FF parameters for other solvent models were taken from the GROMACS database directly (http://virtualchemistry.org/). Note that the RESP charge deviation81 in GAFF, which is compatible with Amber ff9x and ff03, differs from the IPolQ52 scheme used in Amber ff14ipq82 and ff15ipq,52 although a modified version of IPolQ (IPolQ-mod)83 was shown to have similar performance for solvation free energy calculations to the RESP method with GAFF.84 Here we used the same GAFF-based organic solvent models for all the tested Amber FFs for direct comparison. 2.3 Pure Liquid Simulations. The simulation protocol was most identical to our previous benchmark on the hydration and diffusion of amino acids.36 Pure liquids of 1-octanol, chloroform, and cyclohexane were first placed in a cubic box with an image distance of 4 nm separately. Followed by energy minimization and 500 ps NVT equilibration, each box was then subjected to 10 ns NPT simulations; the final coordinate was used as a solvent box for the following solvation free energy calculations. In order to evaluate the used solvent models, the

ACS Paragon Plus Environment

8

Page 9 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

equilibrated solvent box were used as an initial snapshot as well for five separated production simulations at NPT for 5 ns with different initial velocities generated from Maxwell distribution at 298.18 K randomly. Each single solvent molecule was also simulated in vacuum without cutoff for 100 ns separately. These simulations were used for calculating density (ρ), dipole moment (µ), static dielectric constants (ε0), and enthalpy of vaporization (∆Hvap); see Ref.76 for the calculation details. All the simulations were performed at T = 298.15 K with the GROMACS package (version 5.1).72-75 2.4 Thermodynamic Integration. Solvation free energies (SFEs) of amino acid SCAs were calculated with thermodynamic integration (TI).85 A two-stage approach was conducted with λ = 0, 0.25, 0.5, 0.75, and 1 for decoupling Coulombic interactions, and then λ = 0, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95, and 1 for decoupling van der Waals interactions. For aqueous simulations, the protocol is the same as in the Refs.36, 86 Hydration free energies for GROMOS 53A6 and OPLS-AA/L were computed in this work and the results for the CHARMM 27 and Amber FF variants of ff99SB-ILDN, ff99SBws, ff03, ff03ws, ff03full, ff14SB, fb15, and ff15ipq were taken from previous work.35,

36

For a further evaluation of solvation models,

solvation free energies of the examined solvent molecule (as solute in this case) in water, 1octanol, cyclohexane, and chloroform were calculated as well. Before productions in organic solvents, the system was simulated at NVT for 100 ps, followed by NPT for 400 ps with the Berendsen barostat87 for a robust equilibration. Production simulations were then extended to 5 ns for each λ at NPT using the Parrinello−Rahman algorithm88, 89 for pressure coupling at 1 bar with the coupling constant of 5 ps and the compressibility of 5 x 10-5 bar-1.21, 76 Due to the possible slow equilibration in 1-octanol simulations observed in previous work21, 31 and in this work (see the Results section below), simulated annealing (SA) technique was applied

ACS Paragon Plus Environment

9

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 45

attempting to accelerate the system equilibration. The temperature (T) was increased linearly from 298.15 to 600 K in the first 50 ps simulation, kept unchanged at 600 K for the next 150 ps, and then dropped back to 298.15 K slowly from 200 to 400 ps. These processes were repeated five times cyclically, i.e., 2 ns for each SA simulation. Before TI calculations, the system with fully interacting solvents (λ = 0) was simulated at NPT for 2 ns with SA, followed by a normal NPT for 3 ns without SA. Such strategy was used for the cases where slow equilibration was detected in the 1-octanol TI calculations with Amber ff99SB-ILDN, Amber ff03, CHARMM 27, GROMOS 53A6, and OPLS-AA/L and for all the systems with the other FFs. 2.5 Quantum Mechanical Prediction. For comparison with classical molecular mechanical FFs, the quantum mechanical SMD continuum universal solvation model90 was also examined to compute the solvation free energies of amino acid SCAs in different media using the Gaussian 09 software.91 Three levels of theory, B3LYP,92-94 Hartree-Fock (HF), and M06-2X,95 and two basis sets of 6-31+G(d,p)96 and aug-cc-pVTZ97, 98 were tested (six combinations in total). The amino acid SCAs were optimized in gas phase and then solvated in SMD models of water, 1octanol, chloroform, and cyclohexane separately.90 Solvation free energy was defined as the difference in the free energy of the SCAs in the liquid phase and in the gas phase.86, 99, 100 2.6 Analysis. The GROMACS analysis tool of “gmx bar”101 was used to calculate from the TI statistics the solvation free energies in water, 1-octanol, cyclohexane, and chloroform, from which partition coefficients between water and the three organic solvents (log Poct, log Pclf, and log Pchx) were computed by eq 1. Similar to the work by Mobley et al.,31 a variety of quantities were obtained to assess the performance of different theoretical models via a comparison with experimental solvation free energies (∆G) and log P observations, including root-mean-square error (RMSE), mean signed error (MSE), Pearson’s correlation coefficient (R), Spearman's rank

ACS Paragon Plus Environment

10

Page 11 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

correlation coefficient (Rs), and the percentage of the calculated results with correct signs. With the help of R program,102 these quantities of error metrics were computed by a bootstrapping with 1000 iterations. The cumulative average of density as a function of simulation time was used to monitor the convergence of simulated systems in organic media. 3. RESULTS 3.1 Solvent Model Evaluation. The simulated liquid density (ρ), molecular dipole moment (µ), static dielectric constant (ε0), enthalpy of vaporization (∆Hvap), and solvation free energies in water (∆Gwat), 1-octanol (∆Goct), chloroform (∆Gclf), and cyclohexane (∆Gchx) for the three organic solvent molecules used are listed in Table 2. Because of the –CH2 united atom type bearing no Table 2. Calculated Properties of Organic Solvent Models Used in This Worka FFb

ρ (g/L)

µ (D)

ε0

∆Hvap

∆Gwat ∆Goct ∆Gclf ∆Gchx 1-octanol (oct) GAFF 829.5 ± 0.4 2.24 2.7 ± 0.4 19.35 ± 0.08 -4.68 ± 0.07 -9.75 ± 0.21 -9.43 ± 0.06 -7.03 ± 0.04 CGenFF 837.4 ± 0.3 2.39 5.6 ± 0.3 17.69 ± 0.12 -3.65 ± 0.07 -8.96 ± 0.16 -8.37 ± 0.03 -6.72 ± 0.06 GROMOS 822.0 ± 0.3 2.33 4.7 ± 0.9 17.86 ± 0.05 -4.24 ± 0.07 -9.48 ± 0.20 -8.03 ± 0.03 -6.86 ± 0.02 OPLS-AA 827.8 ± 0.6 2.38 5.7 ± 0.9 17.13 ± 0.13 -3.25 ± 0.07 -8.43 ± 0.22 -8.37 ± 0.02 -6.21 ± 0.06 c d exp. 826.2 1.9 10.3 16.96 -4.09 -8.13 chloroform (clf) GAFF 1446.9 ± 0.1 1.57 4.4 ± 0.1 8.20 ± 0.01 0.15 ± 0.03 -3.77 ± 0.10 -3.85 ± 0.02 -3.70 ± 0.02 CGenFF 1404.6 ± 0.4 1.46 3.6 ± 0.1 7.27 ± 0.03 0.44 ± 0.06 -4.00 ± 0.04 -4.02 ± 0.02 -4.27 ± 0.02 GROMOS 1519.5 ± 0.1 1.10 2.3 ± 0.0 7.85 ± 0.01 0.33 ± 0.05 -3.70 ± 0.05 -4.36 ± 0.03 -4.08 ± 0.01 OPLS-AA 1496.4 ± 0.2 1.47 3.9 ± 0.1 7.47 ± 0.04 0.52 ± 0.03 -3.95 ± 0.06 -4.06 ± 0.02 -4.11 ± 0.01 c d exp. 1483.2 1.1 4.8 7.48 -1.07 -3.81 -4.13 cyclohexane (chx) GAFF 763.5 ± 0.1 0.02 1.0 ± 0.0 6.55 ± 0.00 1.42 ± 0.05 -3.74 ± 0.13 -4.89 ± 0.02 -4.63 ± 0.03 CGenFF 765.2 ± 0.3 0.32 1.1 ± 0.0 6.82 ± 0.01 1.86 ± 0.04 -2.95 ± 0.06 -4.67 ± 0.02 -4.06 ± 0.05 GROMOS 786.7 ± 0.1 0.00 1.0 ± 0.0 7.20 ± 0.00 0.87 ± 0.05 -4.30 ± 0.02 -5.16 ± 0.05 -5.06 ± 0.03 OPLS-AA 765.0 ± 0.2 0.12 1.0 ± 0.0 6.89 ± 0.01 1.89 ± 0.05 -3.32 ± 0.03 -4.67 ± 0.02 -4.19 ± 0.04 exp.c 778.5 0.3d 2.0 7.89 1.23 -3.46 -4.45 -4.43 a Enthalpy of vaporization (∆Hvap) and solvation free energies in water (∆Gwat), 1-octanol (∆Goct), chloroform (∆Gchx), and cyclohexane (∆Gchx) are in units of kcal/mol; bThe general Amber (GAFF), CHARMM general (CGenFF), GROMOS 2016H66, and OPLS-AA force fields were used to model solvent molecules; cExperiments for solvation free energies were taken from the Minnesota Solvation Database59 and for other properties from Ref.;103 dDipole moment from Ref.104

charges in the GROMOS force field, the dipole moment of cyclohexane amounts to zero. Compared to experimental measurements, most of liquid prosperities are reproduced reasonably,

ACS Paragon Plus Environment

11

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 45

allowing a reliable solvation benchmark of amino acid SCAs in these organic solvents. The solvent permittivity (ε0), however, was underestimated systematically (for instance by even up to 74% for 1-octanol with the GAFF force field), and this deficiency appears to be systematic for most general force field models of organic liquids, as observed in GAFF, CGenFF, and OPLSAA.76, 78 All the tested general FFs produce chloroform models with a weak interaction with water molecules, as indicated by the positive hydration free energies that are of opposite sign to the experiment, although solvation free energies of these models in 1-octanol and chloroform can be reproduced with good accuracy. 3.2 Convergence of Solvation Free Energy Calculations. Simulation of organic solvents probably comes with a slow equilibration, and density (ρ) may help monitor the convergence of simulated systems.31 The relatively small amino acid SCAs are assumed not to impact the density convergence of the entire system and the final equilibrated density dramatically. The cumulative average of density as a function of simulation time detects a slow equilibration in the solvated amino acid SCAs in 1-octanol, as shown in Figure 1a. Density profiles in 1-octanol for all the SCAs with Amber ff9x, Amber ff03, CHARMM 27, GROMOS 53A6, and OPLS-AA/L are given in Figures S1-S5 in the SI, respectively, indicating that 5-ns normal simulations with the setup in this work are not enough for sufficient equilibration in some cases, as observed for representative density curves in blue (Figures 1b-d). The densities of Lys (Figure 1b) and Ala (Figure 1c) side chains have a continuous tendency of going up and down during the last 4-ns simulations, respectively, and appear to need considerable time for equilibration. The system of the Cys side chain in 1-octanol reaches a constant ρ of about 850 g/L (Figure 1d), much larger than the pure 1-octanol (ρ = 827.8 g/L, Table 1), indicating a dramatic influence of the solute on the final density of the entire system. This is likely another case resulting from the slow

ACS Paragon Plus Environment

12

Page 13 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

equilibration, and note that these convergence issues are not specific to certain amino acids and force fields.

Figure 1. Representative cumulative average of density (ρ) as a function of simulation time showcasing the normal (a) and problematic (b-d) profiles (in blue) when using 1-octanol as a solvent. The systems with a pre-equilibration with simulated annealing (SA) before production simulations are presented in red and that without SA are in blue. The overall average density is indicated by a solid green line. The convergence issues (blue in b-d) are not specific to certain amino acids and force fields, even though the simulated systems are labeled.

ACS Paragon Plus Environment

13

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 45

A pre-equilibration with simulated annealing (SA) before production simulations appears to accelerate the system equilibration in 1-octanol and eliminate the solute influence on the system density largely, as indicated by red curves in Figures 1 and S1-S5; all the density profiles converge within 2 ns to a constant value that is close to the simulated density of pure 1-octanol (Table 1). Without the SA pre-equilibration, the densities of 1-octanol simulations with CHARMM 27 (Figure S3) and GROMOS 53A6 (Figure S4) are well converged within the first 2-ns simulations. It should be noted that each simulation system was prepared independently and the convergence issue in 1-octanol is likely to be observed randomly, that is, not specific to certain amino acids and force fields. Probably, such issues result from the initial simulation conditions and reflect somewhat a slow rearrangement of 1-octanol phase, as noted by Mobley and co-workers.31 This convergence issue is not detected when using chloroform (Figure S6) or cyclohexane (Figure S7) as a solvent. For all the organic solvents, the last 3 ns simulations with sufficient equilibration were used for data analysis. There are 13 simulated systems in 1-octanol that may have convergence problems, as detected by the cumulative average of density with Amber ff9x (Figure S1), Amber ff03 (Figure S2), and OPLS-AA/L (Figure S5). While such density problems could be fixed by the above simulated annealing (SA) technique, using SA seems not to improve the calculation of solvation free energies in 1-octanol (∆Goct) for most cases significantly, as presented in Figure 2. Compared to the systems without SA, the SA method yields less negative ∆Goct for Asp with Amber ff9x and Hid with Amber ff03 (Figure 2), both in good agreement with the experimental observations (Table 1). For the other 11 systems, SA gives almost identical solvation predictions to that without SA (Figure 2).

ACS Paragon Plus Environment

14

Page 15 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 2. Computed solvation free energy calculations in 1-octanol (∆Goct) with simulated annealing (SA) pre-equilibrations before TI calculations versus that without SA for the simulated systems where problematic density profiles were detected. 3.3 Solvation Free Energies. TI calculated solvation free energies of neutral amino acid SCAs in water (∆Gwat), 1-octanol (∆Goct), chloroform (∆Gclf), and cyclohexane (∆Gchx) are presented in Figures 3a-d and Tables S2-S5, respectively, along with the comparisons with experiment. The force field (FF) performance for predicting hydration free energies (∆Gwat) of amino acids SCAs has been well documented in Refs,35, 36 here the ∆Gwat results are briefly stated for a better understanding of the amino acid partitioning between water and organic solvents. Amber ff9x, CHARMM 27, GROMOS 53A6, and OPLS-AA/L yield with a root-meansquare error (RMSE) of ~1 kcal/mol from experimental hydration free energies, whereas Amber ff03 produces a large RMSE of ~2 kcal/mol for ∆Gwat (Figure 4a). GROMOS 53A6 tends to give a favorable (i.e., an underestimation) hydration of amino acid SCAs with a mean signed error (MSE) of -0.7 kcal/mol, while the computed ∆Gwat by the other FFs are unfavorable relative to experiment (i.e., overestimated) to some extent with positive MSEs ranging from 0.5 (Amber ff9x) to 1.2 (Amber ff03) kcal/mol (Figures 3a and 4b and Table S2). Inheriting from the same parent, expectedly, Amber 14SB and fb15 shows a similar performance for ∆Gwat to ff9x, whereas the latest Amber ff15ipq yields a large RMSE of 1.7 kcal/mol (Table S2).

ACS Paragon Plus Environment

15

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 45

Figure 3. Comparison of calculated solvation free energies of amino acid side chain analogues in water (a), 1-octanol (b), chloroform (c), and cyclohexane (d) with experiments. For ∆Goct, GROMOS 53A6 produces the largest RMSE of 1.9 kcal/mol and a systematic underestimation with an MSE of -1.4 kcal/mol (Figures 3b and 4b and Table S3). Except for the overestimation of Asn, Hie, and Tyr side chains by 2~3 kcal/mol, Amber ff03 reproduces the solvation free energies of other SCAs in 1-octanol with good accuracy (Figure 3b and Table S3).

ACS Paragon Plus Environment

16

Page 17 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 4. Root-mean-square error (a), mean signed error (b), Pearson’s correlation coefficient (c), and Spearman's rank correlation coefficient (d) between calculated and experimental solvation free energies of amino acid side chain analogues in water (wat), 1-octanol (oct), chloroform (clf), and cyclohexane (chx). Amber ff9x, ff14SB, fb15, ff15ipq, CHARMM 27, and OPLS-AA/L show a good performance for predicting ∆Goct (RMSE ~1 kcal/mol and MSE < 0.5 kcal/mol), of which Amber ff15ipq and OPLS-AA/L gives a better MSE amounting to zero approximately (Figure 4b and Table S3). The calculated ∆Gwat and ∆Goct with all the tested FFs correlate strongly with the experiments

ACS Paragon Plus Environment

17

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 45

(R >0.92, Figure 4c), and well reproduce the relative solvation of all the neutral amino acid side chains, as indicated by high Spearman’s rank-order correlation coefficients (Rs > 0.94, Figure 4d). All the force fields tested reproduce the solvation free energies of amino acid SCAs in chloroform (∆Gclf) and cyclohexane (∆Gchx) well, except CHARMM 27 with a slightly large overestimation for ∆Gclf (RMSE = 1.2 kcal/mol and MSE = 0.8 kcal/mol), as shown in Figures 3c-d and 4a-b and Tables S4 and S5. The Amber FFs tends to underestimate the solvation in chloroform, whereas the other FFs give an overestimation; all the FFs tends to overestimate ∆Gchx slightly with an MSE ranging from 0.1 to 0.4 kcal/mol (Tables S4-S6). The correlations with experimental ∆Gclf for Amber and GROMOS 53A6 (R and Rs ≧ 0.9, Figure 4c-d) are stronger (i.e., a better performance) than CHARMM 27 and OPLS-AA/L. Good correlations with experimental ∆Gchx are observed for the tested FFs, except GROMOS 53A6 showing a relatively small Rs of 0.8. This means that GROMOS 53A6 fails to reproduce the relative solvation of amino acid side chains in cyclohexane, despite the good performance (by design, due to its parameterization in part against solvation free energies in cyclohexane)13 for modeling absolute ∆Gchx (RMSE = 0.8 kcal/mol and MSE = 0.2 kcal/mol, Figure 4 and Table S5). Compared to the method of Cβ modification32, 35, 62 with ff03, a full ESP charge reassignment (i.e., Amber ff03full) leads to an improvement for ∆Gwat and ∆Goct, as indicated by a reduced RMSE and MSE, but does not influence the calculations of ∆Gclf and ∆Gchx significantly (Tables S2-S6). Computed ∆Gwat, ∆Goct, ∆Gclf, and ∆Gchx of amino acid SCAs by SMD solvation models90 are listed in Tables S7-S10, respectively. The tested theory of B3LYP, Hartree-Fock (HF), and M062X with the basis of 6-31+G(d,p) and aug-cc-pVTZ reproduces the solvation free energies in water (Table S7) and 1-octanol (Table S8) reasonably with an RMSE of 0.7~1.4 kcal/mol and an MSE of -0.8~0.7 kcal/mol, a similar performance to that by the tested protein FFs. All the six

ACS Paragon Plus Environment

18

Page 19 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

levels of theory tend to give an underestimation for ∆Gclf (Table S9). The HF theory underestimates the solvation in all of the four solvents, as indicated by negative MSEs between 0.03 (∆Gchx with HF/aug-cc-pVTZ) and -1.24 (∆Gclf with HF/6-31+G(d,p)) kcal/mol, and appears not to be appropriate for the solvation calculations in chloroform, as revealed by a large RMSE of ~1.6 kcal/mol and a weak correlation with experiment (R and Rs ≦ 0.8, Table S9). The SMD calculations in chloroform and cyclohexane highly depends on the used level of theory, and B3LYP appears to outperform HF and M062X, despite the observed poor performance for ∆Gchx by B3LYP/6-31+G(d,p) (Tables S9 and S10). Overall, the B3LYP/aug-cc-pVTZ level of theory performs best for solvation free energy calculations of amino acid SCAs in all of the four organic media.

3.4 Partition Coefficients. Comparison of experimental partition coefficients (log P) for amino acid SCAs between water and organic solvents with the predictions by Amber (ff9x, ff99SBws, ff03, and ff03ws) and GROMOS 53A6 force fields is presented in Figure 5. As noted in the Methods section 2.2, due to identical FF parameters for neutral amino acid SCAs, Amber ff99SBws and ff03ws have the same solvation free energies of the SCAs as Amber ff9x and ff03, respectively. Compared to Amber ff9x and ff03, the scaled protein-water interactions in Amber ff99SBws and ff03ws produce more negative hydration free energies (∆Gwat) for the SCAs, leading to a reduced MSEs of -0.3 and 0.4 kcal/mol,29, 36 respectively. Such scaling is therefore expected to lower the calculated log P values via increasing the transfer free energy of the SCAs from water to organic media (eq 1). Out of the ten FFs tested, Amber ff9x performs best and accurately reproduces the partitioning of amino acid SCAs between water and organic solvents of 1-octanol (log Poct), chloroform (log Pclf,), and cyclohexane (log Pchx), as indicated by a small RMSE (0.4~0.6 log units) and MSE (0~0.4 log units) and strong correlations with experiment (R and Rs ≧ 0.98) as well as by a high percentage (≧93 %) of the calculations with correct signs (Figure 5 and Table 3). This good

ACS Paragon Plus Environment

19

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 45

Figure 5. Comparison of calculated 1-octanol/water (log Poct, a), chloroform/water (log Pclf, b) and cyclohexane/water (log Pchx, c) partition coefficients of amino acid side chain analogues with experiments. performance perfectly matches the accuracy of (or even slightly better than, for log Pclf) quantum mechanical SMD calculations (Tables 3 and S11). Amber (ff9x, ff03, ff14SB, fb15, and ff15ipq), CHARMM 27, and OPLS-AA/L tend to overestimate the log P (Figure 5 and Table 3), very likely due to the larger overestimation of ∆Gwat relative to the solvation in organic solvents (Figure 4b). A significantly overestimated log P is observed for Amber ff03 (Figure 5 and Table 3). Due to the improvement of ∆Gwat and ∆Goct, Amber ff03full performs better to some extent

ACS Paragon Plus Environment

20

Page 21 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Table 3. Performance of Different Models for Predicting Partition Coefficients of Amino Acid Side Chain Analogues Compared to Experimental Observationsa model RMSE MSE R Rs correct sign log Poct Amber ff9x 0.4 ± 0.1 0.1 ± 0.1 0.98 ± 0.01 0.98 ± 0.02 100 ± 0% Amber ff99SBws 0.6 ± 0.1 -0.5 ± 0.1 0.95 ± 0.02 0.97 ± 0.03 76 ± 10% Amber ff03 0.8 ± 0.1 0.4 ± 0.2 0.88 ± 0.07 0.90 ± 0.07 82 ± 9% Amber ff03ws 0.7 ± 0.3 -0.2 ± 0.2 0.83 ± 0.10 0.91 ± 0.06 88 ± 7% Amber ff03fullb 0.6 ± 0.1 0.4 ± 0.1 0.91 ± 0.04 0.95 ± 0.04 82 ± 9% Amber ff14SB 0.5 ± 0.1 0.2 ± 0.1 0.92 ± 0.04 0.93 ± 0.07 76 ± 10% Amber fb15 0.5 ± 0.1 0.3 ± 0.1 0.96 ± 0.03 0.96 ± 0.06 88 ± 7% Amber ff15ipq 0.7 ± 0.1 0.6 ± 0.1 0.97 ± 0.01 0.95 ± 0.05 88 ± 7% CHARMM 27 0.6 ± 0.1 0.1 ± 0.1 0.89 ± 0.05 0.89 ± 0.08 88 ± 7% GROMOS 53A6 0.9 ± 0.2 0.5 ± 0.2 0.81 ± 0.10 0.86 ± 0.09 76 ± 10% OPLS-AA/L 0.9 ± 0.2 0.6 ± 0.1 0.87 ± 0.06 0.93 ± 0.07 88 ± 7% SMDc 0.4 ± 0.1 0.2 ± 0.1 0.97 ± 0.02 0.96 ± 0.04 94 ± 5% log Pclf Amber ff9x 0.6 ± 0.2 0.4 ± 0.2 0.98 ± 0.01 0.99 ± 0.04 100 ± 0% Amber ff99SBws 0.6 ± 0.2 -0.3 ± 0.2 0.96 ± 0.04 0.99 ± 0.04 90 ± 9% Amber ff03 1.3 ± 0.2 1.0 ± 0.2 0.97 ± 0.07 0.89 ± 0.15 90 ± 9% Amber ff03ws 0.8 ± 0.1 0.3 ± 0.2 0.96 ± 0.05 0.89 ± 0.17 100 ± 0% Amber ff03fullb 0.9 ± 0.2 0.7 ± 0.1 0.99 ± 0.01 1.00 ± 0.00 100 ± 0% Amber ff14SB 0.6 ± 0.2 0.4 ± 0.2 0.99 ± 0.01 0.99 ± 0.05 100 ± 0% Amber fb15 0.8 ± 0.2 0.4 ± 0.2 0.97 ± 0.03 0.96 ± 0.08 100 ± 0% Amber ff15ipq 1.2 ± 0.1 0.8 ± 0.3 0.94 ± 0.04 0.95 ± 0.08 80 ± 12% CHARMM 27 0.6 ± 0.2 0.1 ± 0.2 0.98 ± 0.03 0.94 ± 0.10 100 ± 0% GROMOS 53A6 1.0 ± 0.1 -0.7 ± 0.2 0.97 ± 0.02 0.95 ± 0.09 90 ± 9% OPLS-AA/L 0.8 ± 0.1 0.4 ± 0.2 0.96 ± 0.04 0.89 ± 0.14 90 ± 9% SMDc 0.9 ± 0.2 0.7 ± 0.2 0.98 ± 0.01 0.98 ± 0.05 100 ± 0% log Pchx Amber ff9x 0.4 ± 0.1 0.0 ± 0.1 0.99 ± 0.01 1.00 ± 0.01 93 ± 5% Amber ff99SBws 0.8 ± 0.1 -0.5 ± 0.1 0.98 ± 0.01 0.99 ± 0.02 100 ± 0% Amber ff03 1.1 ± 0.2 0.5 ± 0.2 0.94 ± 0.03 0.95 ± 0.05 87 ± 8% Amber ff03ws 0.9 ± 0.2 -0.1 ± 0.2 0.94 ± 0.02 0.97 ± 0.04 93 ± 6% Amber ff03fullb 0.5 ± 0.1 0.2 ± 0.1 0.98 ± 0.01 0.99 ± 0.02 93 ± 6% Amber ff14SB 0.4 ± 0.1 0.0 ± 0.1 0.99 ± 0.00 1.00 ± 0.01 93 ± 5% Amber fb15 0.6 ± 0.1 0.1 ± 0.1 0.98 ± 0.01 1.00 ± 0.01 93 ± 6% Amber ff15ipq 1.1 ± 0.2 0.3 ± 0.2 0.93 ± 0.02 0.97 ± 0.05 93 ± 6% CHARMM 27 0.8 ± 0.1 0.5 ± 0.2 0.97 ± 0.01 0.96 ± 0.04 87 ± 8% GROMOS 53A6 0.8 ± 0.1 -0.5 ± 0.2 0.98 ± 0.01 0.98 ± 0.03 100 ± 0% OPLS-AA/L 0.7 ± 0.1 0.5 ± 0.1 0.98 ± 0.01 0.98 ± 0.03 87 ± 8% SMDc 0.6 ± 0.1 0.3 ± 0.1 0.98 ± 0.01 0.98 ± 0.03 93 ± 5% a Root-mean-square error (RMSE), mean signed error (MSE), Pearson’s correlation coefficient (R), Spearman's rank correlation coefficient (Rs), and the percentage of the calculated results with correct signs; bA full ESP charge reassignment by QM calculations was used for constructing the side chain analogues using ff03; cComputed with SMD solvation models at the B3LYP/aug-cc-pVTZ level of theory.

ACS Paragon Plus Environment

21

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 45

for log P predictions than ff03 (Table 3). As expected, scaling (strengthening, more precisely) protein-water interactions in Amber ff99SBws and ff03ws leads to smaller log P values (Figure 5) compared to that without scaling, as revealed by the reduced MSEs in Table 3. Such scaling results in a better performance for Amber ff03ws than ff03 via yielding a smaller MSE and a higher percentage of correct signs, whereas it produces a large underestimation of log Poct and log Pchx (which is worse, Table 3) for Amber ff99SBws. The log Poct of amino acid SCAs are overestimated by GROMOS 53A6 with an MSE of 0.5 log units (Figure 5a and Table 3), due to the larger underestimation of ∆Goct relative to ∆Gwat on average (Figures 3b and 4b). However, GROMOS 53A6 overestimates the chloroform/water (log Pclf, Figure 5b) and cyclohexane/water (log Pchx, Figure 5c) partition coefficients of the SCAs, which is largely attributed to the underestimation of ∆Gwat (Figures 3a and 4b). These indicate that inappropriate interactions of neutral amino acids with water and 1-octanol (Figures 3 and 4) in GROMOS 53A6 are likely a source of error for the deviation in log P from experiment, in particular for the interaction with 1-octanol that was not yet included in the parameterization of the protein forced field models. 3.5 Water Model Dependence. There is no significant difference in the calculated hydration free energies (∆Gwat) of the amino acid SCAs for the three-site (OPC3, TIP3P, and TIP3P-FB) and four-site (OPC, TIP4P-D, TIP4P-Ew, and TIP4P-FB) water models paired with the most recent Amber fb15 force field (Figure 6 and Table S12).

These water models tend to

overestimate ∆Gwat, thereby overestimating the partitioning (log P) of amino acids, except TIP4P-D showing a slight underestimation (Table S13). TIP5P-Ew yields underestimated ∆Gwat and as a result, systematic underestimations for log P values on average are observed (Table S13). Out of the eight water models tested, TIP3P, the native (commonly used) model for Amber

ACS Paragon Plus Environment

22

Page 23 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 6. Root-mean-square error (RMSE) and mean signed error (MSE) from experiment for the calculated hydration free energies (∆Gwat) of amino acid side chain analogues using Amber fb15 with different water models. FFs, performs best for the amino acid partitioning with Amber fb15, although it was reported to be bad for the diffusivity of amino acids.36 Compared to TIP3P, TIPS3P leads to a better ∆Gwat with CHARMM 27 (Table S2) and thus to a slightly better performance for log P (Table S13). 4. DISCUSSION The partitioning of amino acids between water and apolar environments is of vital interest in biology such as the stability and function of membrane proteins as well as protein folding.10, 105107

Here we presented an extensive benchmark for predicting partition coefficients of amino acid

side chain analogues (SCAs) between water and organic solvents with popular classical protein force fields of Amber (ff99SB-ILDN,53 ff99SBws,29 ff03,54 ff03ws,29 ff14SB,55 fb15,51 and ff15ipq52), CHARMM 27,12,

56, 57

GROMOS 53A6,13 and OPLS-AA/L.14 It has been well

documented that polarization plays a role in the transfer of molecules between different surrounding and one set of force field (FF) parameters may not be appropriate to be used in all kinds of solvent environments.10, 21, 22, 26, 108, 109 Explicit polarization is not included in the protein FFs tested here, although Amber ff15ipq used the IPolQ scheme to assign polarized atomic

ACS Paragon Plus Environment

23

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 45

charges implicitly. That is, our calculations neglected the electric response (ε0 = 10.4 for 1octanol, 4.8 for chloroform, and 2 for cyclohexane)103 and introduced the risk of incorrect interactions (Coulomb and van der Waals) resulting from variable dielectric media. Out of the ten FFs, the Amber ff94 lineages of ff99SB-ILDN, ff15SB, and fb15 performed best (but only by a moderate amount) and reproduced simultaneously and accurately the 1-octanol/water (log Poct), chloroform/water (log Pclf), and cyclohexane/water (log Pchx) partition coefficients of amino acid side chain analogues (SCAs) with an RMSE of ~0.5 log units and an MSE of ~0.2 log units, a performance comparable with the quantum mechanical SMD calculations. Atomic charges in Amber ff9x were derived from RESP fitting of the calculated ESP at HF/6-31G* in gas phase,11 which was argued to overestimate the gas-phase dipole, thereby being somewhat condensed phase-like.54 The benchmark results show an advantage of this charge scheme for varied phases of water, 1-octanol, chloroform, and cyclohexane (ε0 = 2 ~ 78), which retains the possibility of modeling proteins in different environments with one set of classical molecular mechanical force fields. Due to the complexity in apolar environments of interest such as the interiors of folded proteins and transmembrane segments of membrane proteins, much attention has been paid to simple representative phases of for instance lipid bilayer vesicles and water-immiscible organic media, which were used to quantify the water-leaving tendencies (partitioning/distribution) of molecules.105-107, 110, 111 The choice of 1-octanol as a reference phase, despite the popularity in the drug development,3 has been argued to overestimate the hydrophobicity in some cases, probably due to the existence of substantial water in 1-octanol at equilibrium and to the fact that polar solutes may drag extra water molecules across the solvent interface when entering solvents with moderate polarity.1, 31, 105 In addition, the experimental transfer free energy of amino acid SCAs

ACS Paragon Plus Environment

24

Page 25 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

from vapor to water has a close relationship (R = 0.92) to that from vapor to 1-octanol, so does for the transfer of cyclohexane to water with of cyclohexane to 1-octanol (R = 0.93), as given in Table S14. The wet octanol therefore resemble water closely,37, 105 as evidenced by the similar performance for predicting solvation free energies of the SCAs in both solvents with different protein force fields (Figures 3 and 4), none of which includes the parameterization against the interaction with 1-octanol. These findings indicate that 1-octanol seems not an optimal reference phase for practical use and force field benchmark. Cyclohexane appears preferable to furnish such one reference phase with traces of water at saturation and to be relatively free of complications resulted from the solvents with moderate polarity such as 1-octanol stated above.105 A computational study by Tieleman and coworkers found that the transfer free energies of amino acid side chains from water to the center of a DOPC (dioleoyl phosphor choline) membrane bilayer correlated very strongly (R = 0.99) with the experimental water to cyclohexane transfer free energies.106, 107 In such a sense, cyclohexane could be used as a reliable phase to benchmark the protein force fields. Surprisingly, all the tested force fields reproduce the absolute solvation free energies (∆Gchx) of neutral amino acid SCAs in cyclohexane with good accuracy (RMSE = 0.6~0.8 kcal/mol and MSE = 0.1~0.4 kcal/mol), despite a slightly worse prediction of relative ∆Gchx by GROMOS 53A6 than the others (although the GROMOS 53A6 has been parameterized against the ∆Gchx of some neutral amino acid SCAs).13 The larger overestimation of ∆Gwat relative to the solvation free energies in organic solvents is mainly responsible for the overestimated log P with the Amber (ff9x, ff03, ff14SB, fb15, and ff15ipq), CHARMM 27, and OPLS-AA/L force fields. The overestimated (unfavorable) ∆Gwat has been suggested to lead to more compact disordered peptides and artificial protein

ACS Paragon Plus Environment

25

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 45

aggregation.29, 43-47 Scaling short range van der Waals interactions between protein and water appears to be a partial solution to this issue by strengthening protein-water interactions,28, 29, 46 although such scaling in Amber ff9SBws yields a slightly worse performance for 1-octanol/water and cyclohexane/water partition coefficients of the amino acid SCAs than that without scaling (Table 3). The GROMOS 53A6 set, which underestimates ∆Gwat, may need a similar scaling to weaken protein-water interactions for further improvements.

Figure 7. Comparison of dipole moment (µ) for neutral amino acid side chain analogues by quantum mechanical (QM) at HF/6-31G* and molecular mechanical (MM) calculations with Amber force fields of (a) ff99SB-ILDN and ff03full and (b) ff14ipq and ff15ipq in gas phase. Hydration free energies (∆Gwat) of the SCAs are going to be overwhelmingly determined by the overall dipole moment. Compared to the QM calculations at HF/6-31G* in gas phase, Amber ff99SB-ILDN and ff03full overestimate the dipole moment of the neutral SCAs by about 10%, in line with the work by Duan et al.,54 whereas ff14ipq and ff15ipq give an overestimation by 15%

ACS Paragon Plus Environment

26

Page 27 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

and 18%, respectively (Figure 7 and Table S16). Considering the good performance with Cornell’s charge set (i.e., Amber ff9x) based on the HF/6-31G* in gas phase,75 the bad performance of Amber ff15ipq may be ascribed to the problematic prediction of the dipole moments of the SCAs. Despite the overestimated dipole, Amber ff14ipq gave improved HFEs of the SCAs because it targeted this quantity,50 but solvation free energies were no more targeted in the latest version of ff15ipq. For a better partitioning, one could optimize the organic solvent models and/or the interaction of solute with them as well. For instance, an extensive benchmark for properties of organic liquids suggested a reduction in the Lennard-Jones (LJ) dispersion (C6) term, in particular for GAFF and CGenFF.78 The van der Waals parameters from GAFF and OPLS were parameterized mainly for polar media, and these need to be reoptimized when used in an apolar environment, as proposed for the calculation of the effective C6 coefficient112 or the ε term in the σ-ε form of LJ potential.26 Most of the organic solvent models were taken from an automatic construction76-78 or optimized against pure liquid properties,20 which are assumed to be compatible with corresponding protein force fields. No further modification was attempted in this work to optimize the models and the interaction with them. For all the tested FFs, however, the performance for the solvation free energy calculations in organic media appears to be comparable with that in water. This yields confidence in current classical protein fields for the condensed phase simulations in aqueous and nonaqueous environments, although further improvements are indeed required, as mentioned above. When using cyclohexane as solvent, the solute-solvent electrostatic interaction was turned off almost completely due to the tiny charge for each atom of the solvent, and the van der Waals interactions dominate the solvation then (Tables S5 and S15). A systematic overestimation for ∆Gchx is observed (Table S5), indicating

ACS Paragon Plus Environment

27

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 28 of 45

that van der Waals interactions between protein and organic solvents are too strong and need to be weakened somehow. One might speculate that the current classical molecular mechanical models have reached a performance plateau and must leap to a new degree of complexity, such as a shift to high-quality water models or inclusion of explicit polarization. Despite the good performance for a comprehensive set of liquid properties, the latest water models of “FB” and “OPC” families appear not to fix the issues of overestimated hydration (for instance with the most recent Amber fb15 FF, Table S12) and diffusion properties of amino acids.36 Building an accurate water model that is simultaneously good for proteins therefore largely remains a challenge. An important step forward in terms of solvent-induced polarization has been made in the Amber ff15ipq,52 which applied the IPolQ scheme50 to assign polarized atomic charges implicitly in a nonpolarizable model. However, the charge model in Amber ff15ipq seems worse for the hydration free energies of the SCAs than the RESP scheme81 in Amber ff9x and ff03 lineages,36 suggesting a room for further improvement (see discussion above). The workflows for the Amber fb15 and ff15ipq developments are (semi)automated and comprehensive, and some of their aspects are indeed worth preserving however. Polarizable models in the form of for instance induced dipoles113 or Drude particles114 hold promise for a more physically realistic modeling of electrostatic polarization for biomolecules. ASSOCIATED CONTENT Supporting Information FF/water combinations (Table S1), calculated ∆G (Tables S2-S10), SMD performance for log P (Table S11), calculated ∆Gwat with Amber fb15 in different water models (Table S12), water

ACS Paragon Plus Environment

28

Page 29 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

model dependence on log P predictions (Tables S13), correlations between experimental transfer free energies (Table S14), van der Waals part of ∆Gchx (Table S15), calculated dipole moment from QM and MD (Table S16), and the cumulative average of density (Figures S1-S7). The Supporting Information is available free of charge on the ACS Publications website. AUTHOR INFORMATION Corresponding Author [email protected] (C.Y.) Notes The authors declare no competing financial interest. ACKNOWLEDGMENT This work was supported by the Beijing Natural Science Foundation (5174036), National Natural Science Foundation of China (21606016), and the Fundamental Research Funds for the Central Universities (FRF-TP-17-009A2). REFERENCES (1) Leo, A.; Hansch, C.; Elkins, D. Partition Coefficients and Their Uses. Chem. Rev. 1971, 71, 525-616. (2) Lipinski, C. A.; Lombardo, F.; Dominy, B. W.; Feeney, P. J. Experimental and Computational Approaches to Estimate Solubility and Permeability in Drug Discovery and Development Settings. Adv. Drug Delivery Rev. 2012, 64, 4-17. (3) Avdeef, A., Absorption and Drug Development: Solubility, Permeability, and Charge State. John Wiley & Sons: 2012. (4) Yıldırım, M. A.; Goh, K.-I.; Cusick, M. E.; Barabási, A.-L.; Vidal, M. Drug—Target Network. Nat. Biotechnol. 2007, 25, 1119. (5) Serdakowski, A. L.; Dordick, J. S. Enzyme Activation for Organic Solvents Made Easy. Trends Biotechnol. 2008, 26, 48-54. (6) Kitchen, D. B.; Decornez, H.; Furr, J. R.; Bajorath, J. Docking and Scoring in Virtual Screening for Drug Discovery: Methods and Applications. Nat. Rev. Drug Discov. 2004, 3, 935.

ACS Paragon Plus Environment

29

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 30 of 45

(7) Li, C.; Tan, T. W.; Zhang, H. Y.; Feng, W. Analysis of the Conformational Stability and Activity of Candida Antarctica Lipase B in Organic Solvents Insight from Molecular Dynamics and Quantum Mechanics/Simulations. J. Biol. Chem. 2010, 285, 28434-28441. (8) Dutta Banik, S.; Nordblad, M.; Woodley, J. M.; Peters, G. n. H. A Correlation between the Activity of Candida Antarctica Lipase B and Differences in Binding Free Energies of Organic Solvent and Substrate. Acs Catalysis 2016, 6, 6350-6361. (9) Jorgensen, W. L. The Many Roles of Computation in Drug Discovery. Science 2004, 303, 1813-1818. (10) Villa, A.; Mark, A. E. Calculation of the Free Energy of Solvation for Neutral Analogs of Amino Acid Side Chains. J. Comput. Chem. 2002, 23, 548-553. (11) Cornell, W. D.; Cieplak, P.; Bayly, C. I.; Gould, I. R.; Merz, K. M.; Ferguson, D. M.; Spellmeyer, D. C.; Fox, T.; Caldwell, J. W.; Kollman, P. A. A Second Generation Force Field for the Simulation of Proteins, Nucleic Acids, and Organic Molecules. J. Am. Chem. Soc. 1995, 117, 5179-5197. (12) MacKerell, A. D.; Feig, M.; Brooks, C. L. Extending the Treatment of Backbone Energetics in Protein Force Fields: Limitations of Gas-Phase Quantum Mechanics in Reproducing Protein Conformational Distributions in Molecular Dynamics Simulations. J. Comput. Chem. 2004, 25, 1400-1415. (13) Oostenbrink, C.; Villa, A.; Mark, A. E.; Van Gunsteren, W. F. A Biomolecular Force Field Based on the Free Enthalpy of Hydration and Solvation: The GROMOS Force-Field Parameter Sets 53A5 and 53A6. J. Comput. Chem. 2004, 25, 1656-1676. (14) Kaminski, G. A.; Friesner, R. A.; Tirado-Rives, J.; Jorgensen, W. L. Evaluation and Reparametrization of the OPLS-AA Force Field for Proteins Via Comparison with Accurate Quantum Chemical Calculations on Peptides. J. Phys. Chem. B 2001, 105, 6474-6487. (15) Reif, M. M.; Hünenberger, P. H.; Oostenbrink, C. New Interaction Parameters for Charged Amino Acid Side Chains in the Gromos Force Field. J. Chem. Theory Comput. 2012, 8, 3705-3723. (16) Wang, J. M.; Wolf, R. M.; Caldwell, J. W.; Kollman, P. A.; Case, D. A. Development and Testing of a General Amber Force Field. J. Comput. Chem. 2004, 25, 1157-1174. (17) Vanommeslaeghe, K.; Hatcher, E.; Acharya, C.; Kundu, S.; Zhong, S.; Shim, J.; Darian, E.; Guvench, O.; Lopes, P.; Vorobyov, I. Charmm General Force Field: A Force Field for DrugLike Molecules Compatible with the Charmm All-Atom Additive Biological Force Fields. J. Comput. Chem. 2010, 31, 671-690. (18) Vanommeslaeghe, K.; Raman, E. P.; MacKerell Jr, A. D. Automation of the Charmm General Force Field (CGenFF) Ii: Assignment of Bonded Parameters and Partial Atomic Charges. J. Chem. Inf. Model. 2012, 52, 3155-3168. (19) Vanommeslaeghe, K.; MacKerell Jr, A. D. Automation of the Charmm General Force Field (CGenFF) I: Bond Perception and Atom Typing. J. Chem. Inf. Model. 2012, 52, 31443154. (20) Horta, B. A.; Merz, P. T.; Fuchs, P. F.; Dolenc, J.; Riniker, S.; Hünenberger, P. H. A GROMOS-Compatible Force Field for Small Organic Molecules in the Condensed Phase: The 2016H66 Parameter Set. J. Chem. Theory Comput. 2016, 12, 3825-3850. (21) Zhang, J.; Tuguldur, B.; van der Spoel, D. Force Field Benchmark of Organic Liquids. 2. Gibbs Energy of Solvation. J. Chem. Inf. Model. 2015, 55, 1192-1201. (22) Zhang, J.; Tuguldur, B.; van der Spoel, D. Correction to Force Field Benchmark of Organic Liquids. 2. Gibbs Energy of Solvation. J. Chem. Inf. Model. 2016, 56, 819-820.

ACS Paragon Plus Environment

30

Page 31 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(23) Fennell, C. J.; Wymer, K. L.; Mobley, D. L. A Fixed-Charge Model for Alcohol Polarization in the Condensed Phase, and Its Role in Small Molecule Hydration. J. Phys. Chem. B 2014, 118, 6438-6446. (24) Garrido, N. M.; Jorge, M.; Queimada, A. J.; Gomes, J. R.; Economou, I. G.; Macedo, E. A. Predicting Hydration Gibbs Energies of Alkyl-Aromatics Using Molecular Simulation: A Comparison of Current Force Fields and the Development of a New Parameter Set for Accurate Solvation Data. Phys. Chem. Chem. Phys. 2011, 13, 17384-17394. (25) Knight, J. L.; Yesselman, J. D.; Brooks, C. L. Assessing the Quality of Absolute Hydration Free Energies among Charmm-Compatible Ligand Parameterization Schemes. J. Comput. Chem. 2013, 34, 893-903. (26) Wang, M.; Li, P.; Jia, X.; Liu, W.; Shao, Y.; Hu, W.; Zheng, J.; Brooks, B. R.; Mei, Y. Efficient Strategy for the Calculation of Solvation Free Energies in Water and Chloroform at the Quantum Mechanical/Molecular Mechanical Level. J. Chem. Inf. Model. 2017, 57, 2476-2489. (27) Jensen, K. P. Improved Interaction Potentials for Charged Residues in Proteins. J. Phys. Chem. B 2008, 112, 1820-1827. (28) Nerenberg, P. S.; Jo, B.; So, C.; Tripathy, A.; Head-Gordon, T. Optimizing Solute–Water Van Der Waals Interactions to Reproduce Solvation Free Energies. J. Phys. Chem. B 2012, 116, 4524-4534. (29) Best, R. B.; Zheng, W.; Mittal, J. Balanced Protein–Water Interactions Improve Properties of Disordered Proteins and Non-Specific Protein Association. J. Chem. Theory Comput. 2014, 10, 5113-5124. (30) Wang, J.; Hou, T. Application of Molecular Dynamics Simulations in Molecular Property Prediction. 1. Density and Heat of Vaporization. J. Chem. Theory Comput. 2011, 7, 2151-2165. (31) Bannan, C. C.; Calabró, G.; Kyu, D. Y.; Mobley, D. L. Calculating Partition Coefficients of Small Molecules in Octanol/Water and Cyclohexane/Water. J. Chem. Theory Comput. 2016, 12, 4015-4024. (32) Shirts, M. R.; Pitera, J. W.; Swope, W. C.; Pande, V. S. Extremely Precise Free Energy Calculations of Amino Acid Side Chain Analogs: Comparison of Common Molecular Mechanics Force Fields for Proteins. J. Chem. Phys. 2003, 119, 5740-5761. (33) Deng, Y.; Roux, B. Hydration of Amino Acid Side Chains: Nonpolar and Electrostatic Contributions Calculated from Staged Molecular Dynamics Free Energy Simulations with Explicit Water Molecules. J. Phys. Chem. B 2004, 108, 16567-16576. (34) Hess, B.; van der Vegt, N. F. Hydration Thermodynamic Properties of Amino Acid Analogues: A Systematic Comparison of Biomolecular Force Fields and Water Models. J. Phys. Chem. B 2006, 110, 17616-17626. (35) Zhang, H.; Jiang, Y.; Yan, H.; Yin, C.; Tan, T.; van der Spoel, D. Free-Energy Calculations of Ionic Hydration Consistent with the Experimental Hydration Free Energy of the Proton. J. Phys. Chem. Lett. 2017, 8, 2705-2712. (36) Zhang, H.; Yin, C.; Jiang, Y.; van der Spoel, D. Force Field Benchmark of Amino Acids: I. Hydration and Diffusion in Different Water Models. J. Chem. Inf. Model. 2018, 58, 10371052. (37) Radzicka, A.; Wolfenden, R. Comparing the Polarities of the Amino Acids: Side-Chain Distribution Coefficients between the Vapor Phase, Cyclohexane, 1-Octanol, and Neutral Aqueous Solution. Biochemistry 1988, 27, 1664-1670.

ACS Paragon Plus Environment

31

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 32 of 45

(38) Gu, W.; Rahi, S. J.; Helms, V. Solvation Free Energies and Transfer Free Energies for Amino Acids from Hydrophobic Solution to Water Solution from a Very Simple Residue Model. J. Phys. Chem. B 2004, 108, 5806-5814. (39) Tieleman, D. P.; MacCallum, J. L.; Ash, W. L.; Kandt, C.; Xu, Z.; Monticelli, L. Membrane Protein Simulations with a United-Atom Lipid and All-Atom Protein Model: Lipid– Protein Interactions, Side Chain Transfer Free Energies and Model Proteins. J. Phys.: Condens. Matter 2006, 18, S1221. (40) MacCallum, J. L.; Tieleman, D. P. Calculation of the Water–Cyclohexane Transfer Free Energies of Neutral Amino Acid Side-Chain Analogs Using the Opls All-Atom Force Field. J. Comput. Chem. 2003, 24, 1930-1935. (41) Chang, J.; Lenhoff, A. M.; Sandler, S. I. Solvation Free Energy of Amino Acids and SideChain Analogues. J. Phys. Chem. B 2007, 111, 2098-2106. (42) Han, W.; Wan, C.-K.; Wu, Y.-D. Toward a Coarse-Grained Protein Model Coupled with a Coarse-Grained Solvent Model: Solvation Free Energies of Amino Acid Side Chains. J. Chem. Theory Comput. 2008, 4, 1891-1901. (43) Piana, S.; Donchev, A. G.; Robustelli, P.; Shaw, D. E. Water Dispersion Interactions Strongly Influence Simulated Structural Properties of Disordered Protein States. J. Phys. Chem. B 2015, 119, 5113-5123. (44) Rauscher, S.; Gapsys, V.; Gajda, M. J.; Zweckstetter, M.; de Groot, B. L.; Grubmüller, H. Structural Ensembles of Intrinsically Disordered Proteins Depend Strongly on Force Field: A Comparison to Experiment. J. Chem. Theory Comput. 2015, 11, 5513-5524. (45) Abriata, L. A.; Dal Peraro, M. Assessing the Potential of Atomistic Molecular Dynamics Simulations to Probe Reversible Protein-Protein Recognition and Binding. Sci. Rep. 2015, 5. (46) Nawrocki, G.; Wang, P.-h.; Yu, I.; Sugita, Y.; Feig, M. Slow-Down in Diffusion in Crowded Protein Solutions Correlates with Transient Cluster Formation. J. Phys. Chem. B 2017, 121, 11072-11084. (47) Petrov, D.; Zagrovic, B. Are Current Atomistic Force Fields Accurate Enough to Study Proteins in Crowded Environments? PLoS comput. biol. 2014, 10, e1003638. (48) Shirts, M. R.; Pande, V. S. Solvation Free Energies of Amino Acid Side Chain Analogs for Common Molecular Mechanics Water Models. J. Chem. Phys. 2005, 122, 134508. (49) Wang, L.-P.; Martinez, T. J.; Pande, V. S. Building Force Fields: An Automatic, Systematic, and Reproducible Approach. J. Phys. Chem. Lett. 2014, 5, 1885-1891. (50) Cerutti, D. S.; Rice, J. E.; Swope, W. C.; Case, D. A. Derivation of Fixed Partial Charges for Amino Acids Accommodating a Specific Water Model and Implicit Polarization. J. Phys. Chem. B 2013, 117, 2328-2338. (51) Wang, L.-P.; McKiernan, K. A.; Gomes, J.; Beauchamp, K. A.; Head-Gordon, T.; Rice, J. E.; Swope, W. C.; Martínez, T. J.; Pande, V. S. Building a More Predictive Protein Force Field: A Systematic and Reproducible Route to AMBER-FB15. J. Phys. Chem. B 2017, 121, 40234039. (52) Debiec, K. T.; Cerutti, D. S.; Baker, L. R.; Gronenborn, A. M.; Case, D. A.; Chong, L. T. Further Along the Road Less Traveled: Amber ff15ipq, an Original Protein Force Field Built on a Self-Consistent Physical Model. J. Chem. Theory Comput. 2016, 12, 3926-3947. (53) Lindorff-Larsen, K.; Piana, S.; Palmo, K.; Maragakis, P.; Klepeis, J. L.; Dror, R. O.; Shaw, D. E. Improved Side-Chain Torsion Potentials for the Amber ff99SB Protein Force Field. Proteins: Struct., Funct., Genet. 2010, 78, 1950-1958.

ACS Paragon Plus Environment

32

Page 33 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(54) Duan, Y.; Wu, C.; Chowdhury, S.; Lee, M. C.; Xiong, G. M.; Zhang, W.; Yang, R.; Cieplak, P.; Luo, R.; Lee, T.; Caldwell, J.; Wang, J.; Kollman, P. A Point-Charge Force Field for Molecular Mechanics Simulations of Proteins Based on Condensed-Phase Quantum Mechanical Calculations. J. Comput. Chem. 2003, 24, 1999-2012. (55) Maier, J. A.; Martinez, C.; Kasavajhala, K.; Wickstrom, L.; Hauser, K. E.; Simmerling, C. Ff14sb: Improving the Accuracy of Protein Side Chain and Backbone Parameters from ff99SB. J. Chem. Theory Comput. 2015, 11, 3696-3713. (56) MacKerell Jr, A. D.; Bashford, D.; Bellott, M.; Dunbrack Jr, R. L.; Evanseck, J. D.; Field, M. J.; Fischer, S.; Gao, J.; Guo, H.; Ha, S. All-Atom Empirical Potential for Molecular Modeling and Dynamics Studies of Proteins. J. Phys. Chem. B 1998, 102, 3586-3616. (57) Bjelkmar, P.; Larsson, P.; Cuendet, M. A.; Hess, B.; Lindahl, E. Implementation of the Charmm Force Field in Gromacs: Analysis of Protein Stability Effects from Correction Maps, Virtual Interaction Sites, and Water Models. J. Chem. Theory Comput. 2010, 6, 459-466. (58) Kim, S.; Thiessen, P. A.; Bolton, E. E.; Chen, J.; Fu, G.; Gindulyte, A.; Han, L.; He, J.; He, S.; Shoemaker, B. A.; Wang, J.; Yu, B.; Zhang, J.; Bryant, S. H. Pubchem Substance and Compound Databases. Nucleic Acids Res. 2016, 44, D1202-D1213. (59) Marenich, A.; Kelly, C.; Thompson, J.; Hawkins, G.; Chambers, C.; Giesen, D.; Winget, P.; Cramer, C.; Truhlar, D. Minnesota Solvation Database, Version 2012 University of Minnesota: Minneapolis 2012. (60) Wolfenden, R.; Andersson, L.; Cullis, P. M.; Southgate, C. C. B. Affinities of Amino Acid Side Chains for Solvent Water. Biochemistry 1981, 20, 849-855. (61) Sitkoff, D.; Sharp, K. A.; Honig, B. Accurate Calculation of Hydration Free Energies Using Macroscopic Solvent Models. J. Phys. Chem. 1994, 98, 1978-1988. (62) Zhang, H.; Lv, Y.; Tan, T.; van der Spoel, D. Atomistic Simulation of Protein Encapsulation in Metal–Organic Frameworks. J. Phys. Chem. B 2016, 120, 477-484. (63) Jorgensen, W. L.; Chandrasekhar, J.; Madura, J. D.; Impey, R. W.; Klein, M. L. Comparison of Simple Potential Functions for Simulating Liquid Water. J. Chem. Phys. 1983, 79, 926-935. (64) Neria, E.; Fischer, S.; Karplus, M. Simulation of Activation Free Energies in Molecular Systems. J. Chem. Phys. 1996, 105, 1902-1921. (65) Abascal, J. L. F.; Vega, C. A General Purpose Model for the Condensed Phases of Water: TIP4P/2005. J. Chem. Phys. 2005, 123, 234505. (66) Takemura, K.; Kitao, A. Water Model Tuning for Improved Reproduction of Rotational Diffusion and NMR Spectral Density. J. Phys. Chem. B 2012, 116, 6279-6287. (67) Berendsen, H. J. C.; Postma, J. P. M.; van Gunsteren, W. F.; Hermans, J. Interaction Models for Water in Relation to Protein Hydration. In Intermolecular Forces, Pullman, B., Ed.; Springer Netherlands: Dordrecht, 1981, pp 331-342. (68) Izadi, S.; Onufriev, A. V. Accuracy Limit of Rigid 3-Point Water Models. J. Chem. Phys. 2016, 145, 074501. (69) Izadi, S.; Anandakrishnan, R.; Onufriev, A. V. Building Water Models: A Different Approach. J. Phys. Chem. Lett. 2014, 5, 3863-3871. (70) Horn, H. W.; Swope, W. C.; Pitera, J. W.; Madura, J. D.; Dick, T. J.; Hura, G. L.; HeadGordon, T. Development of an Improved Four-Site Water Model for Biomolecular Simulations: TIP4P-Ew. J. Chem. Phys. 2004, 120, 9665-9678. (71) Rick, S. W. A Reoptimization of the Five-Site Water Potential (TIP5p) for Use with Ewald Sums. J. Chem. Phys. 2004, 120, 6085-6093.

ACS Paragon Plus Environment

33

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 34 of 45

(72) van der Spoel, D.; Lindahl, E.; Hess, B.; Groenhof, G.; Mark, A. E.; Berendsen, H. J. C. GROMACS: Fast, Flexible, and Free. J. Comput. Chem. 2005, 26, 1701-1718. (73) Pronk, S.; Páll, S.; Schulz, R.; Larsson, P.; Bjelkmar, P.; Apostolov, R.; Shirts, M. R.; Smith, J. C.; Kasson, P. M.; van der Spoel, D.; Hess, B.; Lindahl, E. GROMACS 4.5: A HighThroughput and Highly Parallel Open Source Molecular Simulation Toolkit. Bioinformatics 2013, 29, 845-854. (74) Hess, B.; Kutzner, C.; van der Spoel, D.; Lindahl, E. GROMACS 4: Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation. J. Chem. Theory Comput. 2008, 4, 435-447. (75) Abraham, M. J.; Murtola, T.; Schulz, R.; Páll, S.; Smith, J. C.; Hess, B.; Lindahl, E. GROMACS: High Performance Molecular Simulations through Multi-Level Parallelism from Laptops to Supercomputers. SoftwareX 2015, 1, 19-25. (76) Caleman, C.; van Maaren, P. J.; Hong, M.; Hub, J. S.; Costa, L. T.; van der Spoel, D. Force Field Benchmark of Organic Liquids: Density, Enthalpy of Vaporization, Heat Capacities, Surface Tension, Isothermal Compressibility, Volumetric Expansion Coefficient, and Dielectric Constant. J. Chem. Theory Comput. 2011, 8, 61-74. (77) van der Spoel, D.; van Maaren, P. J.; Caleman, C. GROMACS Molecule & Liquid Database. Bioinformatics 2012, 28, 752-753. (78) Fischer, N. M.; van Maaren, P. J.; Ditz, J. C.; Yildirim, A.; van der Spoel, D. Properties of Organic Liquids When Simulated with Long-Range Lennard-Jones Interactions. J. Chem. Theory Comput. 2015, 11, 2938-2944. (79) Jorgensen, W. L.; Tirado-Rives, J. Potential Energy Functions for Atomic-Level Simulations of Water and Organic and Biomolecular Systems. Proc. Natl. Acad. Sci. U.S.A. 2005, 102, 6665-6670. (80) Vorobyov, I.; Bennett, W. F. D.; Tieleman, D. P.; Allen, T. W.; Noskov, S. The Role of Atomic Polarization in the Thermodynamics of Chloroform Partitioning to Lipid Bilayers. J. Chem. Theory Comput. 2012, 8, 618-628. (81) Bayly, C. I.; Cieplak, P.; Cornell, W.; Kollman, P. A. A Well-Behaved Electrostatic Potential Based Method Using Charge Restraints for Deriving Atomic Charges: The RESP Model. J. Phys. Chem. 1993, 97, 10269-10280. (82) Cerutti, D. S.; Swope, W. C.; Rice, J. E.; Case, D. A. ff14ipq: A Self-Consistent Force Field for Condensed-Phase Simulations of Proteins. J. Chem. Theory Comput. 2014, 10, 45154534. (83) Muddana, H. S.; Sapra, N. V.; Fenley, A. T.; Gilson, M. K. The SAMPL4 Hydration Challenge: Evaluation of Partial Charge Sets with Explicit-Water Molecular Dynamics Simulations. J. Comput. Aided Mol. Des. 2014, 28, 277-287. (84) Mecklenfeld, A.; Raabe, G. Comparison of RESP and IPolQ-Mod Partial Charges for Solvation Free Energy Calculations of Various Solute/Solvent Pairs. J. Chem. Theory Comput. 2017, 13, 6266-6274. (85) Kirkwood, J. G. Statistical Mechanics of Fluid Mixtures. J. Chem. Phys. 1935, 3, 300313. (86) Zhang, H.; Jiang, Y.; Yan, H.; Cui, Z.; Yin, C. Comparative Assessment of Computational Methods for Free Energy Calculations of Ionic Hydration. J. Chem. Inf. Model. 2017, 57, 27632775. (87) Berendsen, H. J. C.; Postma, J. P. M.; van Gunsteren, W. F.; DiNola, A.; Haak, J. R. Molecular Dynamics with Coupling to an External Bath. J. Chem. Phys. 1984, 81, 3684-3690.

ACS Paragon Plus Environment

34

Page 35 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(88) Parrinello, M.; Rahman, A. Polymorphic Transitions in Single Crystals: A New Molecular Dynamics Method. J. Appl. Phys. 1981, 52, 7182-7190. (89) Nose, S.; Klein, M. L. Constant Pressure Molecular Dynamics for Molecular Systems. Mol. Phys. 1983, 50, 1055-1076. (90) Marenich, A. V.; Cramer, C. J.; Truhlar, D. G. Universal Solvation Model Based on Solute Electron Density and on a Continuum Model of the Solvent Defined by the Bulk Dielectric Constant and Atomic Surface Tensions. J. Phys. Chem. B 2009, 113, 6378-6396. (91) Frisch, M. J.; Trucks, G. W.; Schlegel, H. B.; Scuseria, G. E.; Robb, M. A.; Cheeseman, J. R.; Scalmani, G.; Barone, V.; Mennucci, B.; Petersson, G. A.; Nakatsuji, H.; Caricato, M.; Li, X.; Hratchian, H. P.; Izmaylov, A. F.; Bloino, J.; Zheng, G.; Sonnenberg, J. L.; Hada, M.; Ehara, M.; Toyota, K.; Fukuda, R.; Hasegawa, J.; Ishida, M.; Nakajima, T.; Honda, Y.; Kitao, O.; Nakai, H.; Vreven, T.; Montgomery, J. A.; Peralta, J. E.; Ogliaro, F.; Bearpark, M.; Heyd, J. J.; Brothers, E.; Kudin, K. N.; Staroverov, V. N.; Kobayashi, R.; Normand, J.; Raghavachari, K.; Rendell, A.; Burant, J. C.; Iyengar, S. S.; Tomasi, J.; Cossi, M.; Rega, N.; Millam, J. M.; Klene, M.; Knox, J. E.; Cross, J. B.; Bakken, V.; Adamo, C.; Jaramillo, J.; Gomperts, R.; Stratmann, R. E.; Yazyev, O.; Austin, A. J.; Cammi, R.; Pomelli, C.; Ochterski, J. W.; Martin, R. L.; Morokuma, K.; Zakrzewski, V. G.; Voth, G. A.; Salvador, P.; Dannenberg, J. J.; Dapprich, S.; Daniels, A. D.; Farkas, O.; Foresman, J. B.; Ortiz, J. V.; Cioslowski, J.; Fox, D. J. Gaussian 09, revision D.01; Gaussian, Inc.: Wallingford, CT, 2009. (92) Becke, A. D. Density-Functional Exchange-Energy Approximation with Correct Asymptotic Behavior. Phys.Rev. A 1988, 38, 3098. (93) Lee, C.; Yang, W.; Parr, R. G. Development of the Colle-Salvetti Correlation-Energy Formula into a Functional of the Electron Density. Phys. Rev. B 1988, 37, 785. (94) Stephens, P. J.; Devlin, F. J.; Chabalowski, C. F.; Frisch, M. J. Ab Initio Calculation of Vibrational Absorption and Circular Dichroism Spectra Using Density Functional Force Fields. J. Phys. Chem. 1994, 98, 11623-11627. (95) Zhao, Y.; Truhlar, D. G. The M06 Suite of Density Functionals for Main Group Thermochemistry, Thermochemical Kinetics, Noncovalent Interactions, Excited States, and Transition Elements: Two New Functionals and Systematic Testing of Four M06-Class Functionals and 12 Other Functionals. Theoretical Chemistry Accounts: Theory, Computation, and Modeling (Theoretica Chimica Acta) 2008, 120, 215-241. (96) Hehre, W.; Radom, L.; Schleyer, P. v. R.; Pople, J. Ab Initio Molecular Theory. New York 1986. (97) Dunning Jr, T. H. Gaussian Basis Sets for Use in Correlated Molecular Calculations. I. The Atoms Boron through Neon and Hydrogen. J. Chem. Phys. 1989, 90, 1007-1023. (98) Kendall, R. A.; Dunning Jr, T. H.; Harrison, R. J. Electron Affinities of the First-Row Atoms Revisited. Systematic Basis Sets and Wave Functions. J. Chem. Phys. 1992, 96, 67966806. (99) Martins, S. A.; Sousa, S. F. Comparative Assessment of Computational Methods for the Determination of Solvation Free Energies in Alcohol-Based Molecules. J. Comput. Chem. 2013, 34, 1354-1362. (100) Zhang, J.; Zhang, H.; Wu, T.; Wang, Q.; van der Spoel, D. Comparison of Implicit and Explicit Solvent Models for the Calculation of Solvation Free Energy in Organic Solvents. J. Chem. Theory Comput. 2017, 13, 1034-1043. (101) Bennett, C. H. Efficient Estimation of Free Energy Differences from Monte Carlo Data. J. Comput. Phys. 1976, 22, 245-268.

ACS Paragon Plus Environment

35

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 36 of 45

(102) R Core Team. R: A Language and Environment for Statistical Computing, version 3.2.3; R Foundation for Statistical Computing: Vienna, Austria, 2015. (103) Lide, D. R., In; CRC: Boca Raton, FL: 2010. (104) Smallwood, I., Handbook of Organic Solvent Properties. Butterworth-Heinemann: 2012. (105) Wolfenden, R. Experimental Measures of Amino Acid Hydrophobicity and the Topology of Transmembrane and Globular Proteins. The Journal of general physiology 2007, 129, 357362. (106) MacCallum, J. L.; Bennett, W. D.; Tieleman, D. P. Partitioning of Amino Acid Side Chains into Lipid Bilayers: Results from Computer Simulations and Comparison to Experiment. J. Gen. Physiol. 2007, 129, 371-377. (107) MacCallum, J. L.; Bennett, W. F. D.; Tieleman, D. P. Distribution of Amino Acids in a Lipid Bilayer from Computer Simulations. Biophys. J. 2008, 94, 3393-3404. (108) Yu, H.; Van Gunsteren, W. F. Accounting for Polarization in Molecular Simulation. Comput. Phys. Commun. 2005, 172, 69-85. (109) Beerepoot, M. T.; Steindal, A. H.; List, N. H.; Kongsted, J.; Olsen, J. g. M. H. Averaged Solvent Embedding Potential Parameters for Multiscale Modeling of Molecular Properties. J. Chem. Theory Comput. 2016, 12, 1684-1695. (110) Klamt, A.; Huniar, U.; Spycher, S.; Keldenich, J. r. Cosmomic: A Mechanistic Approach to the Calculation of Membrane-Water Partition Coefficients and Internal Distributions within Membranes and Micelles. J. Phys. Chem. B 2008, 112, 12148-12157. (111) Bemporad, D.; Luttmann, C.; Essex, J. Computer Simulation of Small Molecule Permeation across a Lipid Bilayer: Dependence on Bilayer Properties and Solute Volume, Size, and Cross-Sectional Area. Biophys. J. 2004, 87, 1-13. (112) Tkatchenko, A.; Scheffler, M. Accurate Molecular Van Der Waals Interactions from Ground-State Electron Density and Free-Atom Reference Data. Phys. Rev. Lett. 2009, 102, 073005. (113) Ponder, J. W.; Wu, C.; Ren, P.; Pande, V. S.; Chodera, J. D.; Schnieders, M. J.; Haque, I.; Mobley, D. L.; Lambrecht, D. S.; DiStasio, R. A.; Head-Gordon, M.; Clark, G. N. I.; Johnson, M. E.; Head-Gordon, T. Current Status of the Amoeba Polarizable Force Field. J. Phys. Chem. B 2010, 114, 2549-2564. (114) Lemkul, J. A.; Huang, J.; Roux, B.; MacKerell Jr, A. D. An Empirical Polarizable Force Field Based on the Classical Drude Oscillator Model: Development History and Recent Applications. Chem. Rev. 2016, 116, 4983-5013.

ACS Paragon Plus Environment

36

Page 37 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Table of Contents Image Used Only Force Field Benchmark of Amino Acids: II. Partition Coefficients between Water and Organic Solvents Haiyang Zhang, Yang Jiang, Ziheng Cui, and Chunhua Yin*

ACS Paragon Plus Environment

37

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 1. Representative cumulative average of density (ρ) as a function of simulation time showcasing the normal (a) and problematic (b-d) profiles (in blue) when using 1-octanol as a solvent. The systems with a pre-equilibration with simulated annealing (SA) before production simulations are presented in red and that without SA are in blue. The overall average density is indicated by a solid green line. The convergence issues (blue in b-d) are not specific to certain amino acids and force fields, even though the simulated systems are labeled. 157x295mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 38 of 45

Page 39 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 2. Computed solvation free energy calculations in 1-octanol (∆Goct) with simulated annealing (SA) pre-equilibrations before TI calculations versus that without SA for the simulated systems where problematic density profiles were detected. 64x49mm (600 x 600 DPI)

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 3. Comparison of calculated solvation free energies of amino acid side chain analogues in water (a), 1-octanol (b), chloroform (c), and cyclohexane (d) with experiments. 176x368mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 40 of 45

Page 41 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 4. Root-mean-square error (a), mean signed error (b), Pearson’s correlation coefficient (c), and Spearman's rank correlation coefficient (d) between calculated and experimental solvation free energies of amino acid side chain analogues in water (wat), 1-octanol (oct), chloroform (clf), and cyclohexane (chx). 162x314mm (300 x 300 DPI)

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 5. Comparison of calculated 1-octanol/water (log Poct, a), chloroform/water (log Pclf, b) and cyclohexane/water (log Pchx, c) partition coefficients of amino acid side chain analogues with experiments. 150x268mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 42 of 45

Page 43 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 6. Root-mean-square error (RMSE) and mean signed error (MSE) from experiment for the calculated hydration free energies (∆Gwat) of amino acid side chain analogues using Amber fb15 with different water models. 59x41mm (600 x 600 DPI)

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 7. Comparison of dipole moment (µ) for neutral amino acid side chain analogues by quantum mechanical (QM) at HF/6-31G* and molecular mechanical (MM) calculations with Amber force fields of (a) ff99SB-ILDN and ff03full and (b) ff14ipq and ff15ipq in gas phase 113x152mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 44 of 45

Page 45 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Table of Contents (TOC) Graphic 32x14mm (300 x 300 DPI)

ACS Paragon Plus Environment