Small Molecule Thermochemistry: A Tool for Empirical Force Field

Oct 26, 2018 - RMSD from Thermochemistry Results of Force Field Methods Using ...... A. D., Jr. Automation of the CHARMM general force field (CGenFF) ...
1 downloads 0 Views 1MB Size
Subscriber access provided by UNIV OF LOUISIANA

A: New Tools and Methods in Experiment and Theory

Small Molecule Thermochemistry: A Tool For Empirical Force Field Development David van der Spoel, Mohammad Mehdi Ghahremanpour, and Justin A Lemkul J. Phys. Chem. A, Just Accepted Manuscript • DOI: 10.1021/acs.jpca.8b09867 • Publication Date (Web): 26 Oct 2018 Downloaded from http://pubs.acs.org on October 26, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 22 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Small Molecule Thermochemistry: A Tool For Empirical Force Field Development David van der Spoel,∗,† Mohammad Mehdi Ghahremanpour,‡ and Justin A. Lemkul¶ †Uppsala Center for Computational Chemistry, Science for Life Laboratory, Department of Cell and Molecular Biology, Uppsala University, Husargatan 3, Box 596, SE-75124 Uppsala, Sweden ‡Uppsala Center for Computational Chemistry, Department of Cell and Molecular Biology, Uppsala University, Husargatan 3, Box 596, SE-75124 Uppsala, Sweden ¶Virginia Tech Department of Biochemistry, 303 Engel Hall, 340 West Campus Dr. Blacksburg, VA 24061, U.S.A. E-mail: [email protected]

1

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Abstract Spectroscopic analysis of compounds is typically combined with density functional theory, for instance for assigning vibrational frequencies, limiting application to relatively small compounds. Accurate classical force fields could, in principle, complement these quantumchemical tools. A relatively simple way to validate vibrational frequencies is by computing thermochemical properties. We present such a validation for over 1800 small molecules using the harmonic approximation. Two popular empirical force fields (GAFF and CGenFF) are compared to experimental data and results from Gaussian-4 quantum-chemical calculations. Frequency scaling factors of 1.035 (CGenFF) and 1.018 (GAFF) are derived from the zeropoint energies. The force field calculations have larger deviation from experiment than the G4 method for standard entropy, but for heat capacity the results are comparable. For internal thermal energy and zero-point energy the deviations from G4 are relatively small. The work suggests that, with some tuning, force fields could indeed complement DFT in spectroscopical applications.

2

ACS Paragon Plus Environment

Page 2 of 22

Page 3 of 22 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Introduction Prediction of molecular thermochemistry and spectroscopic analysis of compounds arguably is one of the most fundamental tasks in computational chemistry. 1 However, it has proved difficult to obtain accurate thermochemistry values even using quantum chemistry at high levels of theory. 2 Therefore, algorithms combining different methods have been introduced. Some of these methods use empirical corrections to obtain accurate results, e.g. the Gaussian-N series, 3–5 the complete basis set model chemistry 6,7 or the correlation consistent composite approach. 8 Another series of methods, dubbed “Weizmann N-series”, initially used empirical corrections too, but abandoned those in later versions 9,10 at the expense of more computer time. It should be noted that most methods are tuned to reproduce experimental enthalpies of formation, and less effort has been spent on standard entropies or heat capacities. Independent benchmarks of these methods have been published 11,12 and the G4 method 5 was found to be a reasonable compromise between accuracy and computational requirements if all thermochemical properties are of interest. Given the difficulties for high-level quantum chemistry calculations to predict thermochemistry properties, classical force field methods are expected to produce even larger deviations due to their approximate nature. Somewhat surprisingly, this question has not been evaluated at great length. In classical simulations, entropies of biomolecules are sometimes estimated based on a normal mode analysis using the quasi-harmonic method 13,14 or based on the covariance of molecular coordinates obtained from a molecular dynamics simulation. 15 Since it is difficult to obtain converged results for biomolecules, 16–18 interest in absolute entropy calculations in the classical simulation community has remained limited. Such neglect is unfortunate, since the experimentally determined standard entropies of molecules in the gas phase contain important information on the vibrational frequencies of compounds, data that can be used to validate force fields. In fact, the molecular mechanics (MM) family of force fields was tuned to reproduce energy barriers and vibrational frequencies, 19,20 however, due to its complexity it is not used as widely as the most popular biomolecular force fields. In contrast to the MM family, most other classical force fields are optimized to reproduce intermolecular interactions. This strategy is understandable, since there 3

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 22

is widespread interest in condensed-phase systems, such as molecular liquids, or biomolecules in solution, for instance in the context of drug-design. However, to predict the entropy of binding of a ligand to a receptor or enzyme, the standard entropy of the ligand needs to be modeled relatively accuratelyl. 21 In addition, if temperature dependence is of interest, the heat capacity of the ligand within the force field needs to be sufficiently accurate, as well. To this end, it is important to evaluate the accuracy of thermochemical properties in force fields. Previous benchmarks on force fields for organic liquids have shown that there are systematic issues with these empirical models, and some compound categories showed larger deviations from liquid properties than others. 22–25 Here, we focus on intramolecular components of force fields by computing the standard entropy (S 0 ), the heat capacity at constant volume (CV ), the internal thermal energy (E) and the zero-point energy (ZP E) in the gas phase using the popular General AMBER Force Field (GAFF) 26 and the CHARMM General Force Field (CGenFF). 27–29 GAFF is combined with charges determined by either restrained electrostatics potential fitting 30 (GAFFESP) or the AM1-BCC method with bond-charge corrections 31,32 (GAFF-BCC), see Methods. The results are compared to both experimental data and quantum chemistry calculations, and the accuracy of prediction for different compound categories is evaluated.

Theory The molecular internal thermal energy, the standard entropy and the heat capacity at constant volume can be derived from the partition function of the canonical ensemble Q(N, V, T ) as follows:

E = RT

 CV = 2RT

∂lnQ ∂T

2



∂lnQ ∂T



 + RT N,V

4

(1) N,V

2



∂ 2 lnQ ∂T 2

ACS Paragon Plus Environment

 (2) N,V

Page 5 of 22 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry



0

S = RlnQ + RT

∂lnQ ∂T

 (3) N,V

where R is the ideal gas constant and T is the absolute temperature. For an ideal gas, the partition function can be decomposed into partition functions of different degrees of freedom, namely electronic (el), translational (tr), rotational (rot), and vibrational (vib) motions using: 33,34

Q = Qel Qtr Qrot Qvib

(4)

The contribution of the rotational and vibrational motions can be derived from the rigid rotor and the quantum harmonic-oscillator approximation, respectively. 35 The partition function of a rigid rotor is defined as: Qrot



1 = σ

T Θrot

 (5)

where σ is the symmetry number and

Θrot =

h2 8π 2 IkB

(6)

where I are the moment of inertia, h is Planck’s constant, and kB is Boltzmann’s constant. For a harmonic oscillator, the partition function is defined as

Qvib =

e−hν/kB T 1 − e−hν/kB T

(7)

where ν is the frequency of vibration. Considering these approximations, the internal thermal energy E can be calculated by applying Eqn. 1 to Eqn. 4:

E = Etr + Erot + Evib

(8)

3 where Etr is RT for both linear and nonlinear polyatomic ideal gases, while Erot is RT 2 3 for linear and RT for nonlinear molecules. The contribution of vibrational modes into internal 2 5

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 22

thermal energy is given by  3n−f  X hνi ehνi /kB T f + + hνi /k T = RT B 2 2k T e −1 B i "

Evib

# (9)

where f is 5 for a linear and 6 for a nonlinear polyatomic molecule. Applying Eqn. 2 to Eqn. 4 gives the heat capacity at constant volume CV :

CV = Ctr + Crot + Cvib

(10)

3 3 where Ctr is R for all molcules, while Crot is R for linear and R for nonlinear molecules. 2 2 The contribution of vibrational modes into heat capacity is given by 3n−f  X hνi 2 f e−hνi /kB T =R + 2 2 kB T (1 − e−hνi /kB T ) i

"

Cvib

# (11)

Similarly, by applying Eqn. 3 to Eqn. 4, the standard entropy S 0 will be given by

S 0 = Str + Srot + Svib

"

5 + ln Str = R 2



Srot



kB T P

3 =R + ln 2

"

 + ln

2πM kB T h2

(12)

3/2 ##

√   3/2  π T + ln √ σ Θrot

(13)

(14)

where P is the pressure and M is the mass of the molecule. The vibrational entropy is given by 3n−f

Svib

 X  ehνi /kB T  −hνi /kB T =R − ln 1 − e hνi /kB T − 1 e i

(15)

The vibrational frequencies can also be used to compute the zero-point energy (ZPE) of a molecule, given by: 6

ACS Paragon Plus Environment

Page 7 of 22 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

ZP E =

1X hνi . 2 i

(16)

Methods Compounds were taken from the Alexandria Library, 36 which can be downloaded from Zenodo. 37 GAFF topologies were derived using the Antechamber package in AmberTools 16 38 and subsequent conversion to GROMACS format using the ACPYPE script. 39 Two different charge models were used. The first, GAFF-ESP, was based on restrained fitting the charges to the electrostatic potential. 30 For this parametrization, the B3LYP/aug-cc-pVTZ calculations in the Alexandria Library were used. Although this is a considerably higher level of theory than the standard for GAFF (HF/6-31G*), we found in a previous analysis 36 that accurate reproduction of electrostatic properties (such as dipoles and polarizabilities) requires the use of a higher level of theory. A total of 2101 compounds were used in the analysis with GAFF-ESP. We also used the AM1-BCC method 31,32 available in Antechamber to derive atomic charges. This 2060-compound set is termed GAFF-BCC. A correction was made to Antechamber output for compounds with linear angles like nitriles or alkynes. Due to numerical issues computing the forces and angle potential for 180o angles, Antechamber assigns angles that are ≈179o . This discrepancy was found to have a very large effect on the vibrational frequencies and a correction to 180o produces results that are much more accurate. CHARMM General Force Field (CGenFF) topologies were created with the CGenFF program, distributed as part of the SilcsBio software suite, version 2017.1 (SilcsBio LLC). Molecules were parametrized using version 4.0 of CGenFF, 27–29 which includes the latest representation of halogens that have positively charged virtual sites to describe halogen bonding. 40 CHARMM-formatted topologies were automatically converted to GROMACS format within the SilcsBio software. In total, 1886 compounds were successfully parametrized and used in the analysis. The remaining compounds in the Alexandria Library were not covered by CGenFF chemical space.

7

ACS Paragon Plus Environment

The Journal of Physical Chemistry

The structures of all compounds were energy-minimized using the GROMACS software, version 2018.1 41 compiled in double precision. Subsequently, a normal mode analysis was performed using GROMACS that is implemented to produce the Hessian matrix numerically. The Hessian was diagonalized and eigenvalues were computed. Finally, translational, rotational and vibrational components of standard entropy, heat capacity at constant volume and internal thermal energy were computed (See Theory) using new code in the gmx nmeig module that will be part of the GROMACS 2019 release. All force field calculations were performed on a laptop computer. Thermochemistry calculations at the G4 level of theory 5 performed previously 12,36 were used for comparison. Experimental reference data for S 0 were taken from a number of handbooks 42,43 and for CV from a combination of the experimental isentropic expansion factor and the heat capacity at constant pressure. 12 Full disclosure of individual references used can be found at http://virtualchemistry.org. 44

Results ∆ZPE (kJ/mol)

40

a CGenFF GAFF-ESP GAFF-BCC

20 0 -20 -40 -60 0

200

400

200

400

600

800

1000

1200

600

800

1000

1200

40

∆ZPE (kJ/mol)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 22

20

b

0 -20 -40 -60 0

ZPEG4 (kJ/mol)

Figure 1: Residual plot (force field - G4) for the zero point energy a) without and b) with scaled frequencies.

8

ACS Paragon Plus Environment

Page 9 of 22 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Zero-point energy The ZP E computed using the force field methods are compared to the G4 results 36 in Fig. 1a. When applying quantum chemistry or density functional theory it is customary to scale computed frequencies in order to improve correspondence to experiment. 45 Here we do this using the quantum mechanical ZP E from the G4 calculations as reference and minimize the residual

∆=

N  X

λZP EiF F − ZP EiQM

2

(17)

i

with respect to λ, where N is the number of molecules. From this calculation we obtain λCGenF F = 1.035 and λGAF F = 1.018, independent of charge type. In contrast to quantum chemistry, where frequencies typically are overestimated 46 the force fields underestimate them, but after correction there is no systematic deviation in the ZP E (Fig. 1b). Table S8 gives all the ZP E for all theoretical methods. To compare the impact of the scaling on other thermochemical properties, the root-meansquare deviation of force field-generated properties from the G4 values are listed in Table 1. The properties in Table 1 are discussed below. Table 1: RMSD from thermochemistry results of force field methods using the G4 method as reference and with or without scaled frequencies. Property (Unit) Scaled S 0 (J/mol K) No Yes CV (J/mol K) No Yes E (kJ/mol) No Yes ZPE (kJ/mol) No Yes

N CGenFF GAFF-ESP GAFF-BCC 1493 17.7 18.0 19.3 1493 17.5 18.0 19.3 1493 4.6 4.2 4.1 1493 3.7 4.9 4.9 1495 13.6 9.5 9.4 1495 3.6 6.4 6.3 1643 14.2 9.7 9.5 1643 4.1 6.7 6.7

Vibrational frequencies A careful analysis of the frequencies from the force field calculations highlights potential force field “issues”. Some molecular topologies were therefore corrected or rejected due to incorrect 9

ACS Paragon Plus Environment

200 0 -200 -400 -600 0

400 200 0

GAFF-ESP

200 0 -200 -400 -600

b CGenFF

600

Page 10 of 22

600 400 200 0

GAFF-BCC

a -1

200 0 -200 -400 -600

Root mean square deviation (cm )

-1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Mean signed error (cm )

The Journal of Physical Chemistry

600 400 200

1000

2000

3000

4000

-1

0 0

1000

2000

3000

4000

-1

Wavenumber (cm )

Wavenumber (cm )

Figure 2: a) MSE of (scaled) force field frequencies and b) RMSD for the three force fields, as a function of frequency. Values on x-axis are from the G4 calculations, scaled by the G4 scaling factor of 0.9854 in order to best match experimental frequencies. 5 topologies (see details in Table S1) and to aid in further validation and improvement of force fields, Table S2 gives the root mean square deviation (RMSD) from G4 frequencies for each compound separately. Fig. 2a allows to check whether there are systematic deviations for different frequencies after scaling the frequencies. In all force fields the regions around 1900-2300 cm−1 are underestimated, whereas frequencies in the range from 2600 to 2800 cm−1 are overstimated. High frequencies are severely underestimated in the GAFF force fields but slightly overestimated in CGenFF. Since bonds involving hydrogen atoms are usually constrained in molecular dynamics simulations they could be considered less important. In the CGenFF protocol however, force constants are tuned to reproduce the frequencies, leading to a better match. Fig. 2b displays the RMSD per frequency. Since the RMSD are considerably higher than the mean signed error (MSE) in Fig. 2a, there is compensation of errors going on as well, complicating the analysis. However, by carefully inspecting Table S2 it should be possible to detect specific deviations related to certain chemical groups and, likely, to tune the force constants to get better correspondence with G4.

10

ACS Paragon Plus Environment

Page 11 of 22

a

CGenFF GAFF-ESP GAFF-BCC

100

0

∆S (J/mol K)

200

0

200

b

100

0

∆S (J/mol K)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

0 100

200

300

400 0 G4

S

500

600

700

(J/mol K)

Figure 3: Residual plot (force field - G4) for standard entropy S 0 a) without and b) with frequency scaling.

Standard entropy Fig. 3a shows the correlation between G4 and force field methods for S 0 . It is clear that the force field methods have a larger spread than the quantum-chemical G4 5 method. As detailed previously, 12 molecular flexibility is not treated adequately in theoretical methods that analyze a single conformation independent of the theory used. A simple analytical correction of approximately ∆S 0 = RNrot , where R is the gas constant and Nrot is the number of rotational bonds, can alleviate this fundamental omission 12 (although more elaborate methods are under development 47–49 ). Since we only compare the force field results to G4 in this work, this is not an issue. Scaled force field frequencies (Fig. 3b) do not change the result notably (see also Table 1). A statistical analysis for different categories of compounds when comparing to experimental data, shows large differences in accuracy for S 0 between categories (Table S3). Interestingly for some compound classes, such as amides, arylfluorides, heterocycles, and nitro, all force fields show a systematic deviation (mean signed error > 0). Overall, all force fields reproduce the experimental S 0 with similar accuracy (RMSD of about 25 J/mol K) and force fields have considerably larger RMSD than the G4 method (12 J/mol K). Table S5 gives all the S 0 for all theoretical methods.

11

ACS Paragon Plus Environment

The Journal of Physical Chemistry

Heat capacity at constant volume For CV , we previously noted a systematic deviation from experiment for larger compounds in theoretical methods 12 that we were not able to correct analytically. Since the methods used here study a single conformation, contributions due to e.g. hindered rotations around bonds are not taken into account. 47–49 The CV from the force fields underestimate the G4 values somewhat (Fig. 4a) and scaling the frequencies makes the correspondence somewhat worse (Fig. 4b). Nevertheless, the deviation is less than 10 J/mol K for most compounds, irrespective of size. This result suggests that, even though there are random errors in CV derived from force fields, the heat capacity due to internal degrees of freedom is reproduced well. RMSD values for the force fields, when comparing to experimental data, are only slightly higher than that for the G4 method (Table S4) which is remarkable given that the difference between force fields and G4 for S 0 is significant (Table S3). Table S6 gives all the CV for all theoretical methods.

∆CV (J/mol K)

20

a

10 0 -10

CGenFF GAFF-ESP GAFF-BCC

-20 20

∆CV (J/mol K)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 22

b

10 0 -10 -20 0

100

200

300

400

cV,G4 (J/mol K)

Figure 4: Residual plot (force field - G4) for heat capacity CV a) without and b) with frequency scaling.

Internal thermal energy The internal thermal energy E (Fig. 5) predicted by force fields is close to the G4 methods (Table 1), but only when using scaled frequencies (Fig. 5b). This can be understood from Eqn. 9,

12

ACS Paragon Plus Environment

Page 13 of 22

which shows that E is almost linearly dependent on the frequencies ν. Table S7 gives all the E for all theoretical methods. a

CGenFF GAFF-ESP GAFF-BCC

∆E (kJ/mol)

20 0 -20 -40

b 20

∆E (kJ/mol)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

0 -20 -40 0

200

400

600

800

1000

1200

1400

EG4 (kJ/mol)

Figure 5: Residual plot (force field - quantum chemistry) between G4 and force fields for the internal thermal energy E for a) unscaled and b) scaled frequencies.

Discussion In this work the vibrations in compounds are treated as uncoupled harmonic oscillators according to theory described by e.g. McQuarrie. 33 In doing so the coupling between rotations and vibrations and also the anharmonicity of the vibrations are neglected for the sake of managable computational cost and it should be noted that the electronic motion is neglected in force field calculations as well. Some studies combined the rotational and vibrational degrees of freedom into one partition function to consider the coupling between rotations, vibrations and also conformational changes of compounds. 47–49 However, this approach is severely hampered by computational cost and in all practical thermochemistry applications the harmonic approximation is used, in particular if large amounts of compounds are to be studied. 11,12,36,37 Only a few reports of force field-based entropy calculations are available. Muddana and Gilson computed vibrational entropies using several force fields for 12 compounds in the context of the SAMPL3 benchmark and compared the results to semi-empirical calculations. 50 They found the

13

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

AMBER force field 51 to perform the best, but with a RMSD of almost 60 J/mol K. Our results (Table S3) are a factor of two better, likely due to the use of more modern force fields, but potentially also since we used experiments and more accurate quantum chemistry as a reference. Wlodek et al. computed configurational (vibrational) entropies for 255 compounds 52 and obtained good correspondence between the MMF94 force field 53–55 and experimental data. Wlodek et al. attempted to overcome the flexibility problem by using a frequency scaling factor of 0.85, and while doing so compensates to some extent for underestimated entropies of flexible molecules, it made the correspondence worse for other compounds. 52 This observation strengthens our earlier analysis of the contribution of flexibility to entropy. 12 In many quantum chemical methods, vibrational frequencies are overestimated due to the lack of anharmonicity and thus empirical frequency scaling factors are used to circumvent this shortcoming. An overview of different scaling factors in use is given elsewhere 45,46 but in the case of G4, the factor is 0.9854. 5 In contrast to this, we herein obtained scaling factors of 1.035 and 1.018 for the CGenFF and GAFF force fields, repectively, because the force fields underestimate the frequencies. Nowadays, CHARMM and CGenFF use quantum-chemical data for these force constants. A full vibrational analysis is performed and force constants are tuned to reproduce those frequencies. It is likely that force fields that use quantum-chemical target data like CGenFF could be influenced by the choice of model chemistry, 27 however the strategy may have been the reason for the somewhat better performance compared to GAFF for ZP E and E. The evaluations presented here show first and foremost that simple classical force fields are competitive for predicting gas phase heat capacity CV in comparison to the G4 method (note a slightly higher RMSD, but lower MSE for CGenFF in Table S4). Force fields are sufficiently accurate for internal thermal energy E (Table 1) but less so for the standard entropy S 0 (Table S3, Fig. 3). Scaling the frequencies improves E and ZP E, but for CV and S 0 the results are affected only little. The partial charges have very little impact on thermochemistry, only a small difference is found for the standard entropy S 0 between GAFF-ESP and GAFF-BCC (Table 1). Further testing of force fields for accurate reproduction of thermochemical properties, as done in

14

ACS Paragon Plus Environment

Page 14 of 22

Page 15 of 22 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

this work, is a simple yet highly relevant approach for further improvements. The potential for empirical force fields to predict molecular structures has been put into question recently, in a test comparing force field energies to density functional theory (DFT). 56 It may be clear that some effort is needed, for both CGenFF and GAFF to improve to a point where force fields can be used in conjunction with DFT in spectroscopical applications. Here we provide tools to aid in these improvements. For instance, using Table S2, the compounds that have frequencies that are very different from the G4 frequencies are highlighted. Chlorine-Fluoride (ClF) is very poorly reproduced by GAFF (RMSD 763 cm−1 ) because the force constant is artificially low (83.68 kJ/mol nm−2 , possibly corresponding to a default value of 20 kcal/mol nm−2 ). By changing it to a more reasonable 273000 kJ/mol nm−2 the G4 frequency can be matched within 1 cm−1 . Further information is provided in Tables S3 and S4 that show which compound classes are systematically off for reproducing S 0 and CV . Selected parameters for these compounds can then be tuned to improve the accuracy of force fields, using quantum-chemical frequencies for tuning the force field parameters and experimental data for validation.

Acknowledgments The Swedish research council is acknowledged for financial support to DvdS (grant 2013-5947). JAL is supported by startup funding from the Virginia Tech Office of the Provost, College of Agriculture and Life Sciences, and Department of Biochemistry. The authors thank Dr. S. Kaushik Lakkaraju at SilcsBio LLC for providing a custom script to automatically parametrize each of the compounds with the CGenFF program.

Supporting Information Available Supporting information containing statistics and tables of thermochemistry values are available free of charge at http://pubs.acs.org.

15

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

References (1) Licari, D.; Fus´e, M.; Salvadori, A.; Mendolicchio, M.; Tasinato, N.; Mancini, G.; Barone, V. Towards the SMART workflow for computational spectroscopy. Phys. Chem. Chem. Phys. 2018, (2) Peterson, K. A.; Feller, D.; Dixon, D. A. Chemical accuracy in ab initio thermochemistry and spectroscopy: current strategies and future challenges. Theor. Chem. Acc. 2012, 131, 1079. (3) Curtiss, L. A.; Raghavachari, K.; Trucks, G. W.; Pople, J. A. Gaussian-2 theory for molecular energies of first- and second-row compounds. J. Chem. Phys. 1991, 94, 7221–7230. (4) Curtiss, L. A.; Raghavachari, K.; Redfern, P. C.; Rassolov, V.; Pople, J. A. Gaussian-3 (G3) theory for molecules containing first and second-row atoms. J. Chem. Phys. 1998, 109, 7764– 7776. (5) Curtiss, L. A.; Redfern, P. C.; Raghavachari, K. Gaussian-4 theory. J. Chem. Phys. 2007, 126, 84108. (6) Montgomery Jr., J. A.; Frisch, M. J.; Ochterski, J. W.; Petersson, G. A. A complete basis set model chemistry. VI. Use of density functional geometries and frequencies. J. Chem. Phys. 1999, 110, 2822–2827. (7) Montgomery Jr., J. A.; Frisch, M. J.; Ochterski, J. W.; Petersson, G. A. A complete basis set model chemistry. VII. Use of the minimum population localization method. J. Chem. Phys. 2000, 112, 6532–6542. (8) Deyonker, N. J.; Cundari, T. R.; Wilson, A. K. The correlation consistent composite approach (ccCA): An alternative to the Gaussian-n methods. J. Chem. Phys. 2006, 124, 114104. (9) Parthiban, S.; Martin, J. M. L. Fully ab initio atomization energy of benzene via Weizmann-2 theory. J. Chem. Phys. 2001, 115, 2051–2054.

16

ACS Paragon Plus Environment

Page 16 of 22

Page 17 of 22 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

(10) Karton, A.; Rabinovich, E.; Martin, J. M. L.; Ruscic, B. W4 theory for computational thermochemistry: In pursuit of confident sub-kJ/mol predictions. J. Chem. Phys. 2006, 125, 144108. (11) Simmie, J. M.; Somers, K. P. Benchmarking compound methods (CBS-QB3, CBS-APNO, G3, G4, W1BD) against the active thermochemical tables: A litmus test for cost-effective molecular formation enthalpies. J. Phys. Chem. A 2015, 119, 7234–7246. (12) Ghahremanpour, M. M.; van Maaren, P. J.; Ditz, J.; Lindh, R.; van der Spoel, D. Large-scale calculations of gas phase thermochemistry: Enthalpy of formation, standard entropy and heat capacity. J. Chem. Phys. 2016, 145, 114305. (13) Karplus, M.; Kushick, J. N. Method for estimating the configurational entropy of macromolecules. Macromolecules 1981, 14, 325–332. (14) Levy, R. M.; Karplus, M.; Kushick, J.; Perahia, D. Evaluation of the configurational entropy for proteins: application to molecular dynamics simulations of an α-helix. Macromolecules 1984, 17, 1370–1374. (15) Schlitter, J. Estimation of absolute and relative entropies of macromolecules using the covariance-matrix. Chem. Phys. Lett. 1993, 215, 617–621. ˚ (16) Carlsson, J.; Aqvist, J. Absolute and relative entropies from computer simulation with applications to ligand binding. J. Phys. Chem. B 2005, 109, 6448–6456. (17) Chang, C. E.; Chen, W.; Gilson, M. K. Evaluating the accuracy of the quasiharmonic approximation. J. Chem. Theory Comput. 2005, 1, 1017–1028. (18) Genheden, S.; Ryde, U. Will molecular dynamics simulations of proteins ever reach equilibrium? Phys. Chem. Chem. Phys. 2012, 14, 8662–8677. (19) Nevins, N.; Allinger, N. L. Molecular mechanics (MM4) vibrational frequency calculations for alkenes and conjugated hydrocarbons. J. Comput. Chem. 1996, 17, 730–746.

17

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(20) Nevins, N.; Chen, K. S.; Allinger, N. L. Molecular mechanics (MM4) calculations on alkenes. J. Comput. Chem. 1996, 17, 669–694. ˚ (21) Aqvist, J.; Kazemi, M.; Isaksen, G. V.; Brandsdal, B. O. Entropy and enzyme catalysis. Accounts Chem. Res. 2017, 50, 190–207. (22) Caleman, C.; van Maaren, P. J.; Hong, M.; Hub, J. S.; Costa, L. T.; van der Spoel, D. Force field benchmark of organic liquids: Density, enthalpy of vaporization, heat capacities, surface tension, compressibility, expansion coefficient and dielectric constant. J. Chem. Theory Comput. 2012, 8, 61–74. (23) Fischer, N. M.; van Maaren, P. J.; Ditz, J. C.; Yildirim, A.; van der Spoel, D. Properties of liquids in molecular dynamics simulations with explicit long-range Lennard-Jones interactions. J. Chem. Theory Comput. 2015, 11, 2938–2944. (24) Zhang, J.; Tuguldur, B.; van der Spoel, D. Force field benchmark II: Gibbs energy of solvation of organic molecules in organic liquids. J. Chem. Inf. Model. 2015, 55, 1192–1201. (25) Zhang, J.; Tuguldur, B.; van der Spoel, D. Correction to force field benchmark II: Gibbs energy of solvation of organic molecules in organic liquids. J. Chem. Inf. Model. 2016, 56, 819–820. (26) Wang, J.; Wolf, R. M.; Caldwell, J. W.; Kollman, P. A.; Case, D. A. Development and testing of a general AMBER force field. J. Comput. Chem. 2004, 25, 1157–1174. (27) Vanommeslaeghe, K.; Hatcher, E.; Acharya, C.; Kundu, S.; Zhong, S.; Shim, J.; Darian, E.; Guvench, O.; Lopes, P.; Vorobyov, I. et al. Charmm general force field: A force field for drug-like molecules compatible with the CHARMM all-atom additive biological force fields. J. Comput. Chem. 2010, 31, 671–690. (28) Vanommeslaeghe, K.; MacKerell Jr., A. D. Automation of the CHARMM general force field (CGenFF) I: Bond perception and atom typing. J. Chem. Inf. Model. 2012, 52, 3144–3154. 18

ACS Paragon Plus Environment

Page 18 of 22

Page 19 of 22 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

(29) Vanommeslaeghe, K.; Raman, E. P.; MacKerell Jr., A. Automation of the CHARMM general force field (CGenFF) II: Assignment of bonded parameters and partial charges. J. Chem. Inf. Model. 2012, 52, 3155–3168. (30) Besler, B. H.; Merz Jr., K. M.; Kollman, P. A. Atomic charges derived from semiempirical methods. J. Comput. Chem. 1990, 11, 431–439. (31) Jakalian, A.; Bush, B. L.; Jack, D. B.; Bayly, C. I. Fast, efficient generation of high-quality atomic charges. AM1-BCC model: I. Method. J. Comput. Chem. 2000, 21, 132–146. (32) Jakalian, A.; Jack, D. B.; Bayly, C. I. Fast, efficient generation of high-quality atomic charges. AM1-BCC model: II. Parameterization and validation. J. Comput. Chem. 2002, 23, 1623– 1641. (33) McQuarrie, D. A. Statistical Mechanics; Harper & Row: New York, 1976. (34) Ochterski, J. W. Thermochemistry in gaussian. Gaussian, Inc.: Pitssburg PA, 2000. (35) Berens, P. H.; Mackay, D. H. J.; White, G. M.; Wilson, K. R. Thermodynamic and quantum corrections from molecular dynamics for liquid water. J. Chem. Phys. 1983, 79, 2375–2389. (36) Ghahremanpour, M. M.; van Maaren, P. J.; van der Spoel, D. The Alexandria library: A quantum chemical database of molecular properties for force field development. Sci. Data 2018, 5, 180062. (37) Ghahremanpour, M. M.; van Maaren, P. J.; van der Spoel, D. Alexandria library [data set]. Zenodo. 2017; http://doi.org/10.5281/zenodo.1004711. (38) Case, D.; Cerutti, D.; Cheatham III, T.; Darden, T.; Duke, R.; Giese, T.; Gohlke, H.; Goetz, A.; Greene, D.; Homeyer, N. et al. Amber 2017. University of California: San Francisco, 2017. (39) da Silva, A. W. S.; Vranken, W. F. ACPYPE-antechamber python parser interface. BMC research notes 2012, 5, 367. 19

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(40) Guti´errez, I. S.; Lin, F.-Y.; Vanommeslaeghe, K.; Lemkul, J. A.; Armacost, K. A.; Brooks III, C. L.; MacKerell Jr., A. D. Parametrization of halogen bonds in the CHARMM general force field: Improved treatment of ligand-protein interactions. Bioorg. & Medic. Chem. 2016, 24, 4812–4825. (41) Abraham, M. J.; Murtola, T.; Schulz, R.; P´all, S.; Smith, J. C.; Hess, B.; Lindahl, E. Gromacs: High performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX 2015, 1-2, 19–25. (42) Yaws, C. L. Yaws’ Handbook of Thermodynamic Properties for Hydrocarbons and Chemicals; Knovel: http://www.knovel.com, 2009. (43) Rowley, R. L.; Wilding, W. V.; Oscarson, J. L.; Yang, Y.; Giles, N. F. Data Compilation of Pure Chemical Properties (Design Institute for Physical Properties; American Institute for Chemical Engineering: New York, 2012. (44) van der Spoel, D.; van Maaren, P. J.; Caleman, C. GROMACS molecule & liquid database. Bioinformatics 2012, 28, 752–753. (45) Scott, A. P.; Radom, L. Harmonic vibrational frequencies: An evaluation of hartree-fock, møller-plesset, quadratic configuration interaction, density functional theory, and semiempirical scale factors. J. Chem. Phys. 1996, 100, 16502–16513. (46) Laury, M. L.; Boesch, S. E.; Haken, I.; Sinha, P.; Wheeler, R. A.; Wilson, A. K. Harmonic vibrational frequencies: Scale factors for pure, hybrid, hybrid meta, and double-hybrid functionals in conjunction with correlation consistent basis sets. J. Comput. Chem. 2011, 32, 2339–2347. (47) Zheng, J.; Yu, T.; Papajak, E.; Alecu, I. M.; Mielke, S. L.; Truhlar, D. G. Practical methods for including torsional anharmonicity in thermochemical calculations on complex molecules: The internal-coordinate multi-structural approximation. Phys. Chem. Chem. Phys. 2011, 13, 10885–10907. 20

ACS Paragon Plus Environment

Page 20 of 22

Page 21 of 22 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

(48) Zheng, J.; Mielke, S. L.; Clarkson, K. L.; Truhlar, D. G. Mstor: A program for calculating partition functions, free energies, enthalpies, entropies, and heat capacities of complex molecules including torsional anharmonicity. Comput. Phys. Commun. 2012, 183, 1803 – 1812. (49) Zheng, J.; Truhlar, D. G. Quantum thermochemistry: Multistructural method with torsional anharmonicity based on a coupled torsional potential. J. Chem. Theory Comput. 2013, 9, 1356–1367. (50) Muddana, H. S.; Gilson, M. K. Prediction of sampl3 host-guest binding affinities: evaluating the accuracy of generalized force-fields. J. Comput. Aid. Mol. Des. 2012, 26, 517–525. (51) Weiner, S. J.; Kollman, P. A.; Nguyen, D. T.; Case, D. A. An all atom force field for simulation of proteins and nucleic acids. J. Comput. Chem. 1986, 7, 230–252. (52) Wlodek, S.; Skillman, A. G.; Nicholls, A. Ligand entropy in gas-phase, upon solvation and protein complexation. Fast estimation with quasi-Newton Hessian. J. Chem. Theory Comput. 2010, 6, 2140–2152. (53) Halgren, T. A. Merck molecular force field. 1. Basis, form, scope, parameterization, and performance of MMFF94. J. Comput. Chem. 1996, 17, 490–519. (54) Halgren, T. A. Merck molecular force field. 2. MMFF94 van der Waals and electrostatic parameters for intermolecular interactions. J. Comput. Chem. 1996, 17, 520–552. (55) Halgren, T. A. Merck molecular force field. 3. Molecular geometries and vibrational frequencies for MMFF94. J. Comput. Chem. 1996, 17, 553–586. (56) Kanal, I. Y.; Keith, J. A.; Hutchison, G. R. A sobering assessment of small-molecule force field methods for low energy conformer predictions. Int. J. Quant. Chem. 2018, 5, e25512.

21

ACS Paragon Plus Environment

The Journal of Physical Chemistry

TOC Graphic

(a.u.)

G4

CGenFF (a.u.)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

0

1000

2000

3000

4000

-1

Wavenumber (cm )

22

ACS Paragon Plus Environment

Page 22 of 22