A universal method for electrostatic interaction energies estimation

2 days ago - ... for development of new non-bonded terms in force fields or a high success rate scoring function. View: PDF | PDF w/ Links. Related Co...
0 downloads 0 Views 15MB Size
Subscriber access provided by The University of Texas at El Paso (UTEP)

Molecular Mechanics

A universal method for electrostatic interaction energies estimation with charge penetration and easily attainable point charges S#awomir A. Bojarowski, Prashant Kumar, Claudia M. Wandtke, Birger Dittrich, and Paulina Maria Dominiak J. Chem. Theory Comput., Just Accepted Manuscript • DOI: 10.1021/acs.jctc.8b00781 • Publication Date (Web): 25 Oct 2018 Downloaded from http://pubs.acs.org on October 28, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

A universal method for electrostatic interaction energies estimation with charge penetration and easily attainable point charges Sławomir A. Bojarowski,† Prashant Kumar,† Claudia M. Wandtke,‡ Birger Dittrich,¶ and Paulina M. Dominiak∗,† †Biological and Chemical Research Centre, Department of Chemistry, University of Warsaw, ul. Żwirki i Wigury 101, 02-089, Warszawa, Poland ‡Institut fur Anorganische Chemie, Georg-August-Universitat, Tammannstr. 4, 37077 Gőttingen (Germany) ¶Anorganische Chemie und Strukturchemie, Heinrich-Heine Universitat Dűsseldorf, Universitatsstrasse 1, G˝ebaude 26.42.01.21, 40225 Dűsseldorf (Germany) E-mail: [email protected] Phone: +48 22 55 26 714

Keywords: electrostatic interaction energy, charge penetration, molecular modeling, point charges, simplified model of electron density Abstract Our new model of electron density augmented by point charges (aug-PROmol) provides an estimation of electrostatic interaction energies including penetration effects

1

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(ChemPhysChem, 2016, 17, 2455-2460). In this paper we prove that it can be applied using sources of point charges other than those from direct restrained fitting to electrostatic potential (RESP). We used a newly established databank of tabulated invariom point charges and a widely known semiempirical method. Both sources perform equivalently to the basic aug-PROmol method as well as to reference energies at the DFT-SAPT/aug-cc-pVTZ level of theory. This is possible due to universal character of penetration model included in the aug-PROmol. Aug-PROmol may become a basis for development of new non-bonded terms in force fields or a high success rate scoring function.

1

Introduction

In the classical force fields intermolecular interactions are estimated with so called non-bonded terms. The simplest force fields contain: Ees - electrostatic and Evdw - van der Waals terms. The latter term combines attractive dispersion forces with exchange-repulsion, whereas the former one relies on simple Coulomb equation with point charges. Such simplification makes the classical force fields very fast. However, in many situations, such computations are encumbered with large errors. For more accurate estimations of total interaction energy (Eint ), there is a need to take into account such effects as charge anisotropy, polarization of the charge due to the external electric field (a so-called polarization term - Epol ), etc. Moreover, even estimation of just Ees in reasonable time and with accuracy comparable to quantum methods is still a challenging task. The addition of charge anisotropy makes the Ees estimation more accurate. However, it not only causes several problems with assignment of the higher multipole moments to atom types, but primarily, it increases the computation time and complexity. The other problem, much more significant, occurs when the interacting molecules are so close that interaction energies cannot be estimated by multipole moments interactions. 1 At the distances close to equilibrium geometry multipole expansion starts to diverge due to the so-called penetration effects. 2,3 In such cases the electrostatic interaction 2

ACS Paragon Plus Environment

Page 2 of 32

Page 3 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

energies must be described using two components:

Ees = Emtp + Epen

(1)

where Emtp is the energy from multipole moments interactions (between atoms a and b): Emtp = T qa qb + T α (qa µαb − qb µαa ) 1 1 +T αβ ( qa θbαβ + qb θaαβ − µαa µαb ) + . . . 3 3

(2)

The q, µ and θ are permanent multipole moments (monopoles, dipoles, quadrupoles) of unperturbed charge densities and parameters T α , T αβ are symmetrical interaction tensors. The multipolar expansion can be truncated at any level. When multipoles higher than just monopoles are included, then the above mentioned charge anisotropy is taken into account. The second term in the equation 1 is the penetration energy (Epen ). It is to be noted that penetration effects are not new phenomena. The Epen is just the error of the multipolar expansion at the short distances (by definition: Epen = Ees − Emtp ). It does not occur when the quantum mechanics is applied. There, the electrostatic interaction energies are calculated applying the Coulomb law to the pre-calculated total charge distributions of interacting A and B systems: Z Z Ees = A

B

ρtot,A (r1 )ρtot,B (r2 ) dr1 dr2 |r1 − r2 |

(3)

For obvious reason eq. 3 cannot be adapted directly in the force fields, but still it is used in method development stage as a referential method. Discussion about the role and contribution of penetration and charge anisotropy effects in the electrostatic interaction energy estimation is included in our previous paper. 1 The main conclusion was that penetration effects dominate over charge anisotropy, at least in the case of studied aspherical density model. The accurate estimation of Ees , is so important not only due to the dominating contribution to the interactions in biomacromolecular complexes, as described in many recent 3

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 32

papers. 4–9 Electrostatic term has a predictive power and can be used alone to support such studies as, for example, ligand docking. An accurate electrostatic model can be a crucial indicator of molecular dimer stability. 10 Furthermore, the development of the non-bonded terms while building a new force field, usually starts with the Ees term. The other terms (Evdw , Epol , etc.) must be compatible with the model of electrostatics. When the Ees is estimated by the simple Coulomb law with point charges, all the deficiencies of Ees are covered by the Evdw term. There are many situations where this approach may lead to false conclusions. 11 Many models have recently been introduced to cope with penetration problem. 11–26 Most of them focus on the description of spherical approximation of penetration effects. Some go even further and include the penetration effects associated with higher multipole moments. 26 In our previous publication we proposed the aug-PROmol model, which is universal and highly transferable. 27 The main idea of the model is based on a simplified representation of electron density (a promolecule) 28 augmented by point charges (PC) from Restrained fitting to ElectroStatic Potential (RESP). 29 We rearranged the Hansen-Coppens Multipole 30 (HCM) model of electron density into the form:

ρat (r) = Ncore ρcore (r) + (Nvalence − qRESP )κ3 ρvalence (κr)

(4)

where ρcore and ρvalence are free atom Hartree-Fock core and valence atomic densities normalized to one electron. Ncore and Nvalence are integer number of core and valence electron, respectively. The parameter κ which responds for the expansion/contraction of ρvalence for all heavy atoms was turned off by setting it to 1.00, for hydrogen atoms it was set to 1.16. 27 The qRESP are RESP point charges. The restrained fitting to electrostatic potential, which requires initial quantum mechanical calculations, has been so far the bottleneck of the augPROmol model. Main motivation of introducing the aug-PROmol model was inclusion of penetration ef-

4

ACS Paragon Plus Environment

Page 5 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

fects. Following the definition, 2,3 the penetration energy from our model is equal to difference between exact electrostatic interaction energy from aug-PROmol electron density computed following the equation (eq. 3) and sum of charge-charge interactions using the same set of point charges which were used to build aug-PROmol density. In our previous paper we showed the great improvement in Ees estimation when the penetration effects are accounted, thanks to aug-PROmol. 27 The model was positively verified on datasets probing both: the dissociation curve (S66×8) 31 and the angular dependence (S66a8). 32 In addition to our benchmark calculations, we have introduced an analytical function of Ees energy computed from the aug-PROmol model: Ees (rab ) =

1 [Na Nb − Na (Nb − qRESP,b )χb − Nb (Na − qRESP,a )χa rab +(Na − qRESP,a )(Nb − qRESP,b )(−2χab +

(5)

χ2ab )]

where the qRESP are the RESP point charges, N is number of valence electrons and χ are in general exponential functions (ce−αr ), with c and α coefficients specific to chemical elements.. The indexes a and b refer to a pair of interacting atoms from molecules A and B. The form of this function follows the way how Ees is computed from aug-PROmol model by the use of exact integration (see supp. info 27 ), but it is much simpler and faster in terms of computation time. The usage of analytical function (eq. 5) increase calculation time slightly, by around 10%, when compared to usage of Empt (eq. 2) applied to point charges. 23 The approach was successfully verified on a benzene-benzene π − π molecular dimer. The (eq. 5) function can be further rewritten, to more clearly separate penetration effects from pure point charge contributions:  1 Ees (rab ) = qa qb + (Na − qa )Nb χa + (Nb − qb )Na χb rab  2 +(Na − qa )(Nb − qb )(−2χab + χab )

(6)

the indices are same as used in eq. 5, only the qRESP were replaced by simple q to denote

5

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

that charges from other methods may be used. The first part of the equation represents the point charge-charge interactions (Emtp ), whereas, the other three refers to penetration effects (Epen ). It is to be stressed that for heavy elements (N − q) ≈ N . Particularly, it means that for organic molecules the penetration from our model may be, in practice, only dependent on charges assigned to hydrogen atoms. Before we derive coefficients of the analytical function specific to all chemical elements, and thus create a new non-bonded model, we decided to investigate further the strength and weaknesses of aug-PROmol model. With this article we demonstrate that aug-PROmol can be successfully constructed with other sources of point charges such as databases or semiempirical methods. This is possible due to the versatility of penetration model included in aug-PROmol. Accurate description of penetration effects enables an accurate estimation of electrostatic interaction energy.

2 2.1

Methods and materials Charge methods

In the following article we used point charges from three different approaches, abbreviated as charge methods: RESP, IPC and AM1-BCC. For the RESP procedure, the molecular electrostatic potential was pre-calculated at the HF/6-31G* 33 level of theory. Values for RESP point charges were taken from our previous publication. 34 Tabulated point charges were taken from the Invariom Point Charges (IPC) database, 35 which is closely related to the Generalized Invariom Database. 36 For the IPC database construction, charges were obtained by restrained fitting to electrostatic potential at the B3LYP/aug-cc-pVTZ 37–40 level of theory. For further information please see the publication Wandtke et al. 2016. 35 For point charges from a semiempirical model we applied Austin Model 1 with Bond Charge Correction (AM1BCC). 41 AM1-BCC point charges were calculated with the Amber 14 package 42 using the same procedure as in previous publication. 34,43 6

ACS Paragon Plus Environment

Page 6 of 32

Page 7 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Originally the aug-PROmol model was constructed with RESP charges, so there was no need to mention the charge method in the model name. To differentiate the charge methods used in our model we add a charge method name in brackets. For example aug-PROmol(IPC) stands for aug-PROmol model with the IPC point charges. However, we will frequently be indicating the charge method only, since no models other than the aug-PROmol were tested.

2.2

S66×8 dataset

As a benchmark we used molecular dimers from the S66×8 dataset. 31,32 The basic S66 dataset was arranged to reproduce the most common non-covalent interactions in biomolecules. The S66×8 contains molecular dimers from original S66 dataset displaced at the eight fractions of distance around the equilibrium geometry (fraction 1.00 equals equilibrium geometry). 32 Originally the dataset was divided into three subgroups due to the type of dominant interactions: electrostatic, dispersion and mixed. Here, in addition, we follow our own division which was introduced in our previous publication, 1 to understand better the chemical reasons behind observed results. The S66 was categorised into nine subgroups due to dimer interaction types: SHB: single hydrogen-bonded (17 dimers), DHB: double hydrogen-bonded (5 dimers), WHB: weak hydrogen-bonded (4 dimers), PP: pi-pi stacking (5 dimers), PPU: pi-pi stacking with at least one molecule of uracil (5 dimers), PAA: pi-aliphatic or aliphatic-aliphatic (13 dimers), TS: T-shape (6 dimers), HP: hydrogen-pi (6 dimers), PNP: polar-non-polar dimers (5 dimers) (see supp. info Tab. S1). This kind of grouping gives a deeper insight into the performance of a particular method. However, in consequence of having more subgroups some of them are little populated. Statistical descriptors for such subgroup may not be reliable.

2.3

Electrostatic potential properties

To find the origin of the difference in quality, of the Ees estimation, molecular electrostatic potential V (ri ) were calculated. The calculations were performed with ≈ 0.15 bohr grid 7

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 32

size and 3.0 bohr margin for particular single molecules from the S66×8 dataset. VMoPro module of MoPro 44 package and Multiwfn 45 program were used for aug-PROmol model and point charges respectively. For a reference, the potential grids were calculated using Gaussian G09 46 at the B3LYP/aug-cc-pVTZ level of theory. The electrostatic potentials were visualized using the MolIso program, a part of the MoleCoolQT molecule viewer program. 47,48

2.4

Calculation of electrostatic interaction energies

Ultimately, the application of analytical function, mentioned in the introduction part, allows to avoid integration procedure. However, this feature is still in preparation stage. For the purpose of this article, electrostatic interaction energies (Ees ) were calculated using the Exact Potential Multipole Moments method (EPMM). 49 It combines two approaches: a direct Coulomb integration in the inner region (< 5Å) and a simple sum of multipole moments in the outer region. For the EP part, the Nrad 99 and Nang 590 grid was chosen. The Ees calculations were performed with the XDPROP module of the XD2006 program. 50 For reference energy calculations, the symmetry-adapted perturbation theory based on the density functional theory (DFT-SAPT) method was chosen. 51,52 Since there were no DFT-SAPT data available for the entire S66×8 in literature we did the calculations ourselves using the following procedure. The DFT-SAPT is a powerful method for decomposing the total intermolecular interaction energy to its physically meaningful components. In DFTSAPT the interaction energy, Eint was calculated, as the sum of the first(E (1) ) and secondorder (E (2) ) perturbation energy terms and δEHF energy terms, specifically electrostatic (1)

(2)

(2)

(Ees ), induction (Eind ) and dispersion (Edisp ) energy terms together with exchange-repulsion (1)

(2)

(2)

terms (Eexch , Eexch−ind and Eexch−disp ). The δEHF corrections were applied in all cases to estimate polarization effects beyond the second order, (1)

(2)

(2)

(2)

(2)

(1) Eint = Ees + Eexch + Eind + Eexch−ind + Edisp + Eexch−disp + δEHF

8

ACS Paragon Plus Environment

(7)

Page 9 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Asymptotic behavior of the PBE0 functional (PBE0AC) in monomers was corrected by a shift parameter (∆xc ) which was calculated as the difference between the HOMO energy ( HOMO) of each monomer and the first experimental ionization energy (Eion ). All SAPT and HOMO energy calculations were carried out in the MOLPRO2012.1 53 program with (1)

the aug-cc-pVTZ Dunning basis set. For analysis in the present article, we used Ees as a (1)

reference. Values of Ees obtained by our DFT-SAPT calculations are similar (overall MAE = 0.31 kcal mol−1 ) to the values of SAPT2+/aug-cc-pVTZ from Wang et al.. 12 However, this comparison is limited to the 59 dimers of S66 and fractions of equilibrium distance in the range from 0.90 to 1.10. For statistical descriptions we applied a Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Mean Signed Error (MSE) and others. For definitions see our previous publication. 34 For all detailed results see supp. info. It is worth to note that for RMSE, which is less frequently used statistical descriptor, the values are always larger or equal than for MAE. This is because RMSE gives a relatively high weight to larger errors, especially for a probe with a low number of elements.

3 3.1

Results and discussion Penetration effects accounted by aug-PROmol

To verify how penetration energies differentiate depending on used point charge method, we have computed mean of Epen and Emtp energies for each subgroup of S66×8 dataset and each fraction of distance. As expected, the penetration energies from all three versions of our model (RESP, IPC, and AM1-BCC) are all strongly negative and stabilize the interacting molecular dimers. Mean values of the Epen do not differ much among different charge methods (Fig. 1a). The largest differences occur in the DHB subgroup, where the mean energies are on average the highest among all dataset. At the equilibrium distance, the maximum difference between average energies from aug-PROmol with RESP and IPC or AM1-BCC 9

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

charge methods equals 0.20 and 0.40 kcal mol−1 respectively. Both values are observed in the DHB subgroup. Referring these numbers to corresponding Epen values show that these differences are relatively very low. It confirms the ideas presented in the introduction that penetration effects from our model do not rely much on applied charge method. On the other hand, the mean energies from the sum of point charges interactions (Emtp ) differs much among different charge methods (Fig. 1b). At the equilibrium distance the differences between averaged values are up to ≈ 3.0 kcal mol−1 . Epen(RESP)

EpenS66x8 (IPC)

Epen(AM1 − BCC)

Emtp(RESP)

0

Emtp(AM1 − BCC)

−4

SHB

−2

SHB

−2

−4 −6

DHB

−5 −10 −15 −20 0

1.5

−4

PPU

−4

0.0 0 −1 −2 −3 0.000 −0.025

PAA

PAA

−2

0.5

PPU

Mean/kcal mol−1

1.0

−7.5 0.0 −2.5 −5.0 −7.5 −10.0 0

PP

PP

−5.0

WHB

−2.5

WHB

0.0

0 −1 −2 −3 −4 2.0

−2

DHB

−8 0 −5 −10 −15 −20

−6 0

Mean/kcal mol−1

EmtpS66x8 (IPC)

0

−0.050 −0.075

−6 0

0.0 −0.5

TS

TS

−2

−1.0 −1.5

−4

HP

−2 −4 0 −1 −2 −3 −4 −5

HP

−2.0 0 −1 −2 −3 −4

0

0.0

PNP

PNP

−0.5 −1.0 −1.5

2

25

5

1.

1.

05

1

1.

95

1.

1

9

5

Fractions of equilibrium distance

0.

0.

2

1.

25

1

1.

1.

05

1.

1

95

9

0.

0.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 32

Fractions of equilibrium distance

(a)

(b)

Figure 1: Visualization of the mean penetration energies (kcal mol−1 ) (a) and mean electrostatic interaction energies computed by sum of charge-charge interactions (kcal mol−1 ) (b) using different charge methods at different intermolecular distances in nine subgroups of the S66×8 datasets.

10

ACS Paragon Plus Environment

Page 11 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

3.2

Total electrostatic interaction energy

When total Ees energies, (Emtp + Epen ), are compared to referential values there is high similarity among trends in errors of aug-PROmol constructed with RESP and with IPC or AM1-BCC, see Tab. 1, Fig. Fig. S3, S4 and Tab. Tab. S5, S6 from supp. info.. The errors for IPC and AM1-BCC still seems to be satisfactory small, regardless of some discrepancies observed among particular subgroups. Global MAE, at equilibrium distance, for IPC equals 1.05 kcal mol−1 and for AM1-BCC 0.85 kcal mol−1 (Tab. 1). These are only slightly higher than MAE for RESP (GLOBAL MAE at equilibrium = 0.78 kcal mol−1 ), which is very satisfactory. All the methods produce very low errors for dispersion subgroup of S66×8, MAE ≈ 0.65 kcal mol−1 and still acceptable errors for mixed group, RESP MAE = 1.00 kcal mol−1 , IPC MAE = 0.86 kcal mol−1 and AM1-BCC MAE =0.95 kcal mol−1 . Only for electrostatic subgroup IPC and AM1-BCC give larger errors than RESP. The MAE for AM1-BCC is only slightly higher whereas for IPC is substantially larger. Nevertheless all the errors are much smaller when compared to errors generated by the same set of point charges when the penetration effect is not taken into account (see supp. info. Fig. S5). Once again the numbers emphasize how important penetration effects are. Errors generated by the lack of penetration effects are by far larger than errors associated with deficiencies of methods used to deliver point charges to build aug-PROmol model.

11

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Table 1: Mean Absolute Errors (kcal mol−1 ) of electrostatic interaction energies computed with aug-PROmol model with different charge methods for particular subgroups in the S66×8 dataset at the equilibrium distance. The reference energies were computed at DFTSAPT/aug-cc-pVTZ level of theory. subgroup SHB DHB WHB PP PPU PAA TS HP PNP Electrostatic Mixed Dispersion Global

RESP 0.92 0.45 0.44 0.26 0.99 0.62 1.20 1.38 0.59 0.81 1.00 0.66 0.78

IPC 1.19 3.08 0.67 0.61 0.44 0.55 1.00 1.59 0.58 1.66 0.86 0.64 1.05

AM1BCC 0.81 1.33 0.82 0.25 0.95 0.58 1.06 1.60 0.55 1.05 0.95 0.61 0.85

Following the division of S66×8 due to the chemical character, it can be easily noted that the highest errors are observed for all three hydrogen-bonded subgroups (Tab. 1 and see RMSE, MSE in supp. info. Fig. Fig. S4, S6). Especially in the case of the DHB subgroup, the magnitude of errors for IPC and AM1-BCC is conspicuous.

12

ACS Paragon Plus Environment

Page 12 of 32

Page 13 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

3.3

The quality of the point charges

Table 2: Point charges generated by different charge methods on atoms in molecules from S66×8 dataset and electrostatic interaction energies (Ees ), penetration energies (Epen ) and energies from sum of point charge-charge interactions (Emtp ) (kcal mol−1 ) calculated with aug-PROmol model for chosen molecular dimers from S66×8 dataset at equilibrium distance. Atomic labels used for uracil atom types are consistent with standard numeration for nucleobases. atoms / energies

AM1 RESP IPC -BCC Peptide C(met) -0.43 -0.36 -0.18 H(met) 0.11 0.09 0.06 C(carb) 0.76 0.76 0.66 O(carb) -0.61 -0.60 -0.61 N -0.54 -0.39 -0.58 H(n) 0.33 0.31 0.31 C(met) -0.23 -0.30 0.08 H(met) 0.12 0.10 0.04 Peptide-Peptide Ees (REF) -10.74 Emtp -6.67 -6.72 -6.29 Epen -3.84 -3.88 -3.95 Ees -10.51 -10.60 -10.24 AcOH H(hyd) 0.45 0.39 0.44 O(hyd) -0.64 -0.60 -0.61 O(carb) -0.63 -0.55 -0.55 C(carb) 0.85 0.75 0.64 C(met) -0.42 -0.34 -0.15 H(met) 0.13 0.12 0.08 AcOH-AcOH Ees (REF) -31.98 Emtp -19.18 -13.88 -17.50 Epen -12.22 -13.45 -12.13 Ees -31.40 -27.33 -29.63 Benzene H 0.13 0.14 0.13 C -0.13 -0.14 -0.13 Benzene-Benzene_pi-pi Ees (REF) -2.17 Emtp 1.36 1.61 1.34 Epen -3.54 -3.54 -3.53 Ees -2.18 -1.93 -2.19

AcNH2 C(carb) 0.94 0.83 0.66 O(carb) -0.66 -0.60 -0.61 N -1.07 -1.03 -0.68 H(n) 0.44 0.44 0.31 C(met) -0.53 -0.36 -0.17 H(met) 0.14 0.10 0.07 AcNH2-AcNH2 Ees (REF) -24.88 Emtp -16.72 -15.24 -13.08 Epen -7.17 -7.18 -8.82 Ees -23.89 -22.42 -21.90 Uracil N3 -0.59 -0.66 -0.59 H3 0.36 0.36 0.35 C4 0.83 0.86 0.71 O4 -0.62 -0.57 -0.61 C5 -0.50 -0.50 -0.32 H5 0.21 0.23 0.18 C6 0.07 0.07 0.05 H6 0.18 0.15 0.16 N1 -0.44 -0.49 -0.48 H1 0.34 0.36 0.35 C2 0.77 0.74 0.80 O2 -0.61 -0.57 -0.64 Uracil-Uracil_BP Ees (REF) -25.81 Emtp -16.05 -13.32 -15.91 Epen -9.95 -9.65 -9.86 Ees -26.00 -22.97 -25.77 Uracil-Uracil pi_pi Ees (REF) -8.66 Emtp -5.18 -4.06 -5.10 Epen -5.27 -5.33 -5.29 Ees -10.45 -9.39 -10.39

Comparison of charge values from all three methods done for selected molecules gives insight to the possible reasons of high errors for the DHB subgroup. The point charges obtained with the IPC or the AM1-BCC methods are, in many cases, much closer to zero than RESP charges (Tab. 2). It is clearly visible in the exemplary AcOH and AcNH2 molecules, two 13

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

out of three molecules present in DHB. The third one is uracil. RESP charges for AcOH and AcNH2 seem to be perfectly fitted since the calculated energies are very close to the references. IPC and AM1-BCC charges for the same molecules, when referred to RESP charges, are always closer to neutrality. The most significant discrepancy is the case of the N atom type generated by AM1-BCC. RESP charge method suggests strongly negative value of -1.07 e, whereas AM1-BCC gives charge smaller (in absolute value) by almost 0.4 e. It must be the main reason of high discrepancy in estimated Ees . This discrepancy (2.98 kcal mol−1 ) is the largest observed for the aug-PROmol(AM1-BCC) method applied to all S66×8 dimers at equilibrium distance. Similar situation is in AcOH-AcOH molecular dimer. It is clear that large differences between RESP and IPC or AM1-BCC charges in the carboxylic group (AcOH) produce high errors at equilibrium geometry. However, the problem of IPC is more serious. Error at the level of 4.65 kcal mol−1 is exceptionally high and suggest the need of re-parametrization of this particular chemical group.

14

ACS Paragon Plus Environment

Page 14 of 32

Page 15 of 32

Peptide−Peptide

Uracil−Uracil_BP ● ● ●

−8

● ● ●

−10

● ●

−25

−35 ● ● ●

● ● ● ● ●

AcOH−AcOH

AcNH2−AcNH2 ● ● ● ●

● ● ● ●

● ● ● ●

−15 −20

● ● ● ●

−25

● ● ● ● ● ● ●

−35

Uracil−Uracil_pi−pi

−5

● ● ●

● ● ● ●

● ● ●

−10

● ●

10 1.

05 1.

90

● ●

0.

10 1.

05 1.

00 1.

0.

95

−20

● ●

0.

● ● ●

90

● ●



95

−15

−6

0.

● ● ●



00

−4

● ● ● ●

● ● ● ●

Benzene−Benzene_pi−pi

0 −2

● ● ● ●

● ● ●

−30

1.

−30

−8

● ●



−20

−40

● ● ●



−30

● ● ●

−14 −16

−15 −20

● ● ●

−12

E es /kcal mol−1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Fractions of equilibrium distance ●

REF



RESP



IPC



AM1−BCC

Figure 2: Electrostatic interaction energies (kcal mol−1 ) in chosen molecular dimers from S66×8 dataset computed from aug-PROmol model with RESP, IPC, AM1-BCC charge methods at five fractions of equilibrium distance. The reference energies (REF) are at DFT-SAPT/aug-cc-pVTZ level of theory. Vertical dotted line corresponds to equilibrium distance.

To complement this discussion, we want to comment on the results for the uracil dimers present in the S66×8 dataset. Uracil is an important molecule from a biological point of view. In this benchmark it occurs in two different homodimers. The first one is, strongly interacting, double hydrogen bonded, Watson-Crick base pair (Uracil-Uracil_BP). In the second one, the homodimer interacts by the π − π stacking (Uracil-Uracil_pi-pi). Point charges obtained by all the three charge methods are diverse, but there is no large difference in charges magnitudes, as it was in the case of AcOH or AcNH2 molecules (Tab. 2). Nevertheless, the Emtp energies are very similar for the RESP and AM1-BCC methods. Further, the Ees energies from these two methods estimate quite well the reference value for hydrogen bonding dimer, but not as well in the case of π − π stacking. The opposite is observed for Ees 15

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

from IPC. The Ees energy is better estimated for π − π stacking than for hydrogen bonding. All this is only due to problems with proper Emtp energy estimation, since Epen energies from all three methods are almost the same. It shows that it is not easy to fine tune charge values which will give truly well balanced energies. Four out of five molecular dimers in DHB subgroup contain at least one of AcOH or AcNH2 molecule. The two most problematic molecules are grouped in one subgroup. Moreover the errors are magnified due to the strength of interaction and large values of energies (in absolute value). All above may induce the misrepresentation of the charge methods performance. In other cases, for example single-H-bonded Peptide-Peptide molecular dimer, the quite large differences in charges on particular atom types do not causes much discrepancies in energies. Among all methods the Emtp , Epen and, the most important, Ees energies are almost identical. In another example, Benzene-Benzene_pi-pi dimer (PP subgroup) the point charges of all methods are similar (Tab. 2). It is to be noted that Emtp energies from all three charge methods are positive for Benzene-Benzene_pi-pi dimer and falsely imply dimer destabilization. Penetration energy complements Emtp calculations and, when accounted, the Ees estimated with all methods coincides with the reference energy. To analyze the aug-PROmol performance with different charge methods at the distance around the equilibrium geometry please see the Fig. 2. In almost all cases the trends regarding which charge method provides better estimation of Ees observed at equilibrium distance are consistent with the other fractions of distance. Only in the case of the UracilUracil_BP dimer at the distances shorter than equilibrium errors of RESP and AM1-BCC methods seems to become as big as of IPC. Further, when the AcOH and AcNH2 molecules are removed from the benchmark dataset the errors of all methods are satisfactory (Fig. 3). At equilibrium distance only the IPC in the electrostatic subgroup produces error larger than 1.00 kcal mol−1 . What is more important, at the same distance, the AM1-BCC produces slightly lower error than RESP

16

ACS Paragon Plus Environment

Page 16 of 32

Page 17 of 32

charge method. It is to be noted, that DHB without AcOH and AcNH2 molecules consist of only one molecular dimer and this kind of grouping should not be taken into consideration for further statistical analysis. RESP

S66x8 IPC AM1−BCC

1.5

Electrostatic

1.0

0.0 2.0 1.5

Mixed

MAE /kcal mol−1

0.5

1.0 0.5 0.0 1.5

Dispersion

1.0 0.5 0.0

2

5

1.

25

1.

1

1.

05

1.

1

95

0.

9

0.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Fractions of equilibrium distance

Figure 3: Visualization of the Mean Absolute Errors (kcal mol−1 ) of the electrostatic interaction energies computed with the aug-PROmol model using different charge methods at different intermolecular distances in three subgroups of the S66×8 datasets. The AcOH and AcNH2 molecules were removed from the benchmark dataset as explained in the text. Corresponding reference energies were obtained at DFT-SAPT/aug-cc-pVTZ level of theory. The dashed horizontal lines along the x axis correspond to the value of 1.0 kcal mol−1 Regarding the statement given in the introduction, it is hard to judge whether the penetration effects in aug-PROmol is sensitive only to the charges on the hydrogen atoms. All the calculated penetration energies are very much comparable among all three methods. It seems that penetration effects are not sensitive to observed variations in charges. It is to be noted that among three methods variations in hydrogen charges are very small, much smaller than for non-hydrogen atoms. Again, the results for analyzed dimers illustrates that Epen from aug-PROmol is independent from the studied charge methods. It means that the errors of the estimation of the Ees relies mostly on calculated Emtp . Hence, the proper estimation of the Ees within aug-PROmol method is largely dependent on the quality of the used point 17

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

charges. The other components of the aug-PROmol method are well designed.

3.4

Molecular electrostatic potentials

To show the origin of the difference in quality between pure point charges and aug-PROmol model we present the visualizations of the molecular electrostatic potentials for molecules discussed in the previous sections. Indeed, the difference is highly visible (Fig. 4). The molecular potentials obtained with just point charges (Fig. 4e) always consists of single well-separated components centered at atom positions. Depending on the charge assigned to particular atom, huge positive or huge negative values of potential are observed at atom positions. The potentials surrounding the atoms deviate from the full sphericity due to the contribution from other atoms, but obviously, they cannot be considered as anisotropic. At the longer distances, far from atom positions, point charges reproduce the potential with reasonable accuracy when compared to the reference. At the outer regions of the molecules the iso-surfaces from the RESP charges seem to coincide with the reference iso-surfaces. At the shorter distances the differences are significant. Above trends are in line with the results from electrostatic energy estimations. In the case of the aug-PROmol models (Fig. Fig. 4: b, c and d), the appearance of molecular electrostatic potentials is completely different than it was for point charges, and resembles very much the true potential. Here description of the potential can be divided into two features. First: the huge and positive potential at atom position, were nuclei potential dominate over potential from electron density. Second: limited regions of negative potential outside of the molecular interior, which are located at free electron pairs or π electrons, i.e. at the most crucial regions for particular intermolecular interactions. All three charge methods, when combined with the aug-PROmol lead to the shapes of the potential iso-surfaces similar to each other and close to these from the referential method. Above observations explain the aug-PROmol accurate estimation of the Ees , which is far better than this from just pure point charges, particularly at the shorter distances. 18

ACS Paragon Plus Environment

Page 18 of 32

Page 19 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

The aug-PROmol model follows the reference trends and reproduce all, even tiny, regions of negative potential. The surrounding of the hydroxyl oxygen in the AcOH molecule (Fig. Fig. 4 - bottom) gives a good example. All three aug-PROmol models reproduce well the slightly negative potential region of this oxygen. Its shape is far from the spherical, and far from the one produced by the pure RESP point charges. Indeed, it is very similar to the aspherical potential from the reference method. Similar trend can be observed in other examples. In the case of benzene (Fig. 4 - top) the aug-PROmol attempts to reconstruct the aspherical potential from π electrons. Again, the shape is close to this observed in the reference. Moreover, the shape similarity of reference potentials and these estimated from aug-PROmol, lead us to conclusion that aug-PROmol should estimate the Ees with accuracy at least partially angularly independent. The good performance of aug-PROmol is only possible due to continuous electron charge distribution and results from the separation of electron density into the core and valence parts (eq. 4), were only valence part is repopulated according to the assigned charge. Electrostatic potential of such electron density interfere with nuclei potential allowing to reproduce quite well the true molecular potential. The differences among aug-PROmol with tested charge methods occurs in values of the potentials. Mostly, they appear as a slight and impactless distinctions. However, sometimes the discrepancies can be significant as for example aug-PROmol(IPC) for AcOH molecule. The visible weakness of the aug-PROmol model is the slight overestimation of the negative potentials. In every case, the estimated minimal values are more negative than those from quantum mechanics calculations. For example, the negative potential around the oxygen atoms in uracil (Fig. 4 - middle) and AcOH (Fig. 4 - bottom) molecules seems to be little overestimated. However, we notice, that the potentials are overestimated, to compensate for lack of true anisotropy. The potentials, placed at the electron pairs, are big enough to properly simulate electrostatic interactions. In absence of such a compensation, as it is in the case of AcOH molecule, where the IPC does not overestimate minimum, larger errors in

19

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 32

Ees estimation occur.

(a)

(b)

(c)

(d)

(e)

Figure 4: Visualization of the Benzene (top), Uracil (middle) and AcOH (bottom) electrostatic potentials (e bohr−1 ) for five diferent methods: (a) - reference, (b) - augPROmol(RESP), (c) - aug-PROmol(IPC), (d) - aug-PROmol(AM1BCC) and (e) - RESP point charges. Contours every 0.02 (e bohr−1 ) over the range of potential values displayed at the legend on the left specific for each molecule. For benzene an extra contour 0.01 (e bohr−1 ) was added.

3.5

aug-PROmol vs other methods

Our approach to account for penetration effects in Ees estimation differs from the other published ones. 11–26 Most of the authors established their models by fitting parameters to energies taken from a reference method. The fitting procedure was performed over a set of chosen molecules and later on the statistical analysis was done for the same set of molecules. This approach produce the smallest possible errors by definition. In our approach we try to follow the opposite way, which is to build a general model with as few fitted parameters as possible. This approach should be more resistant to errors associated with transfer of the

20

ACS Paragon Plus Environment

Page 21 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

model to the chemical systems other than it has been parametrized on. Nevertheless the errors of our method are comparable to the other methods. For direct comparison see supp. info. Fig. S7. To summarize, the aug-PROmol method seems to be applicable to any source of point charges, the only condition is the charges describe molecular electrostatic potential very well outside of the van der Waals surface. Any other than studied set of such point charges, for example tailored for particular type of molecules, should also allow aug-PROmol method for fast and accurate estimation of the exact Ees . It proves the concept introduced in our previous paper, 27 that the way how the penetration is accounted in our method is universal. Hence, it can be approximated by the analytical function with coefficient specific to chemical elements and further integration procedure can be omitted. This feature allows the aug-PROmol to be applicable in the force fields. To achieve this, the parametrization of analytical function must be finished. 27 Remaining coefficients specific to all chemical elements need to be derived. This step will finalize the aug-PROmol development. To obtain the accurate Ees term, a user would need only to provide a set of point charges. Further, to create a new force field all the other non-bonded terms should be re-parametrized. Since point charges commonly used in currently existing force fields are embedded, we are expecting that the non-bonded terms used in these force fields should be quite easily adapted to be used with our model. However, we notice that compatibility of our electrostatic term (Ees ) with the eventual polarization term (Epol ) require deeper investigations.

4

Conclusions

With this paper we demonstrate that our aug-PROmol method is highly universal and can be easily applied in molecular modeling. The replacement of RESP point charges does not have to increase the error of the method. We used two new sources of point charges: one tabulated

21

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

in the databank (IPC) and the other from a semiempirical method (AM1-BCC). The latter charges are much easier to obtain, than the RESP, since there is no need to use quantum mechanics calculations of electrostatic potential. The former charges, deposited in the IPC database, can be available at once, with the same ease as any other force field parameter. Both charge methods perform comparably to the basic aug-PROmol(RESP) method and at the same time to the referential method. Their overall mean errors are quite acceptable: 0.78 kcal mol−1 , 1.05 kcal mol−1 and 0.85 kcal mol−1 for RESP, IPC and AM1-BCC, respectively. It might be noted that the IPC based method is slightly worse. This is understandable. The IPC charges come from a databank, are of general use and thus are burned with transferability error. The other two types of charges are computed on demand and are tailored to each molecule separately. On the other hand, IPC charges all accessible almost instantaneously, where the time of computations for AM1-BCC charges is still noticeable and rise quite fast with the size of a molecule. We may conclude that in the case of IPC, the price paid for the speed is acceptable. In addition, there is still a room for improvement for the IPC method, the databank may be built from larger set of model molecules and more diverse atom types, especially for carboxylic oxygens, may be introduced. This always reduce the transferability error. Straight comparison between energies from sum of point charge-charge interactions (Emtp ) and from penetration effects (Epen ) emphasizes that only for estimation of the Emtp component, the quality of utilized point charges plays important role. The values of energies of Emtp are quite dependent on the type of charges used, whereas Epen accounted in the augPROmol method seem not to be sensitive to it. The latter energies are almost identical for all studied methods, even in the case of problematic DHB subgroup of dimers. All applied methodologies of obtaining charges were designed to provide charges which reconstruct true molecular potential at the van der Waals surface and further outside of a molecule. This may mean, that all of them require the same model of penetration to supplement description of electrostatic potential inside of a molecule. The aug-PROmol model does it very well.

22

ACS Paragon Plus Environment

Page 22 of 32

Page 23 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

We see two main possible applications of our model. First one is the usage of aug-PROmol to build a scoring function. The accurate model of electrostatics seems to be the best molecular dimer stability indicator. 10 Langner et al. sugests that at equilibrium geometry and at shorter distances full electrostatics (i.e. with penetration term) has a higher succes rate than other tested approaches. Here, the aug-PROmol can be applied immediately after its finalization. Second one, widely commented in this paper, is the usage of aug-PROmol as a model for accurate Ees term. It may become a basis for new non-bonded terms in force fields. In the recent years charge penetration has become a hot topic among the force fields designers. There have been created a number of models to compensate for the lack of penetration effects in the electrostatic term. From our side, we propose a universal and simple model which estimates the Emtp and Epen terms in a balanced way. The development of aug-PROmol follows our main postulate to make the model general and to use as few empirical parameters as possible. This allows aug-PROmol to become a support for already existing force fields.

Acknowledgement Support of this work by the National Centre of Science through grant MAESTRO No. UMO/2012/04/A/ST5/00609 is gratefully acknowledged. SAB acknowledges the Department of Chemistry, University of Warsaw, for financial support from grant 120000-501/86DSM-112 700. PK acknowledges the Department of Chemistry, University of Warsaw, for financial support from grant 120000-501/86-DSM-110200. The authors also thank PL-Grid Infrastructure and Interdisciplinary Centre for Mathematical and Computational Modeling of University of Warsaw (Grant No. ELESAPT2017, UBDB2017) for providing computation facilities. The authors declare no competing financial interest.

23

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Supporting Information Available Table S1 shows list of dimers in the S66 dataset indicating dominant components (electrostatic or dispersion) to total interaction energy and subcategories mentioned in the introduction. Figure S1 shows the geometries of six molecular dimers described separately in this publication. Figure S2 visualize the electrostatic potentials of rest two molecules, which the electrostatic potentials are not mentioned in the main body of this publication. Table S2 shows the mean values of reference energies. Tables S3 and S4 show the numerical values introduced in Figures 1a and 1b respectively. Tables S5, S6, S7, S8, S9, S10 and figures S3, S4, S5, S6 show detailed statistics of aug-PROmol results for particular subgroups in the S66×8 dataset at eight fractions of equilibrium distance. UBDB data from Acta Cryst., 2017, B73, 598 are given for comparison. Figure S7 presents comparison between results from aug-PROmol model and those taken from Acta Cryst., 2017, B73, 598 and JCTC 2015, 11, 2609. Table S11 contains the reference energies for molecular dimers from S66×8 dataset.

References (1) Bojarowski, S. A.; Kumar, P.; Dominiak, P. M. Interplay of point multipole moments and charge penetration for intermolecular electrostatic interaction energies from the University at Buffalo pseudoatom databank model of electron density. Acta Crystallogr. 2017, B73, 598–609. (2) Piela, L. Ideas of Quantum Chemistry.; Elsevier: Amsterdam, The Netherlands, 2006. (3) Stone, A. The Theory of Intermolecular Forces, 2nd Edition; Oxford University Press: Oxford, United Kingdom, 2013. (4) Dominiak, P. M.; Volkov, A.; Dominiak, A. P.; Jarzembska, K. N.; Coppens, P. Combining crystallographic information and an aspherical-atom data bank in the evaluation

24

ACS Paragon Plus Environment

Page 24 of 32

Page 25 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

of the electrostatic interaction energy in an enzyme–substrate complex: influenza neuraminidase inhibition. Acta Cryst. 2009, D65, 485–499. (5) Dominiak, P. M.; Li, X.; Coppens, P.; Messerschmidt, M.; Volkov, A. A Theoretical Databank of Transferable Aspherical Atoms and Its Application to Electrostatic Interaction Energy Calculations of Macromolecules. J. Chem. Theory Comput. 2007, 3, 232–247. (6) Li, X.; Volkov, A. V.; Szalewicz, K.; Coppens, P. Interaction energies between glycopeptide antibiotics and substrates in complexes determined by X-ray crystallography: application of a theoretical databank of aspherical atoms and a symmetry-adapted perturbation theory-based set of interatomic potentials. Acta Cryst. 2006, D62, 639–647. (7) Kulik, M.; Goral, A. M.; Jasiński, M.; Dominiak, P. M.; Trylska, J. Electrostatic Interactions in Aminoglycoside-RNA Complexes. Biophys. J. 2015, 108, 655–665. (8) Malińska, M.; Jarzembska, K. N.; Goral, A. M.; Kutner, A.; Woźniak, K.; Dominiak, P. M. Sunitinib: from charge-density studies to interaction with proteins. Acta Cryst. 2014, D70, 1257–1270. (9) Malinska, M.; Kutner, A.; Woźniak, K. Predicted structures of new Vitamin D Receptor agonists based on available X-ray structures. Steroids 2015, 104, 220–229. (10) Langner, K. M.; Beker, W.; Sokalski, W. A. Robust Predictive Power of the Electrostatic Term at Shortened Intermolecular Distances. J. Phys. Chem. Lett. 2012, 3, 2785–2789. (11) Tafipolsky, M.; Engels, B. Accurate Intermolecular Potentials with Physically Grounded Electrostatics. J. Chem. Theory Comput. 2011, 7, 1791–1803. (12) Wang, Q.; Rackers, J. A.; He, C.; Qi, R.; Narth, C.; Lagardere, L.; Gresh, N.; Ponder, J. W.; Piquemal, J. P.; Ren, P. General Model for Treating Short-Range Electro-

25

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

static Penetration in a lar Mechanics Force Field. J. Chem. Theory Comput. 2015, 11, 2609–2618. (13) Wang, B.; Truhlar, D. G. Screened Electrostatic Interactions in lar Mechanics. J. Chem. Theory Comput. 2014, 10, 4480–4487. (14) Cisneros, G. A.; Piquemal, J. P.; Darden, T. A. Generalization of the Gaussian electrostatic model: Extension to arbitrary angular momentum, distributed multipoles, and speedup with reciprocal space methods. J. Chem. Phys. 2006, 125, 1–16. (15) Elking, D. M.; Perera, L.; Pedersen, L. G. HPAM: Hirshfeld partitioned atomic multipoles. Comp. Phys. Commun. 2012, 183, 390–397. (16) Cardamone, S.; Hughes, T. J.; Popelier, P. L. A. Multipolar electrostatics. Phys. Chem. Chem. Phys. 2014, 16, 10367–10387. (17) Cisneros, G. A.; Karttunen, M.; Ren, P.; Sagui, C. Classical Electrostatics for Biomolecular Simulations. Chem. Rev. 2014, 114, 779–814. (18) Cisneros, G. A. Application of Gaussian Electrostatic Model (GEM) Distributed Multipoles in the AMOEBA Force Field. J. Chem. Theory Comput. 2012, 8, 5072–5080. (19) Duke, R. E.; Starovoytov, O. N.; Piquemal, J. P.; Cisneros, G. A. GEM*: A lar Electronic Density-Based Force Field for lar Dynamics Simulations. J. Chem. Theory Comput. 2012, 10, 1361–1365. (20) Shi, Y.; Xia, Z.; Zhang, J.; Best, R.; Wu, C.; Ponder, J. W.; Ren, P. Polarizable Atomic Multipole-Based AMOEBA Force Field for Proteins. J. Chem. Theory Comput. 2013, 9, 4046–4063. (21) Bertoni, C.; Slipchenko, L. V.; Misquitta, A. J.; Gordon, M. S. Multipole Moments in the Effective Fragment Potential Method. J. Phys. Chem. A 2017, 121, 2056–2067.

26

ACS Paragon Plus Environment

Page 26 of 32

Page 27 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(22) Devereux, M.; Gresh, N.; Piquemal, J.-P.; Meuwly, M. A supervised fitting approach to force field parametrization with application to the SIBFA polarizable force field. J. Comput. Chem. 2014, 35, 1577–1591. (23) Christophe, N.; Lagardere, L.; Polack, E.; Gresh, N.; Wang, Q.; Bell, D. R.; Rackers, J. A.; Ponder, J. W.; Ren, P. Y.; Piquemal, J.-P. Scalable improvement of SPME multipolar electrostatics in anisotropic polarizable molecular mechanics using a general short-range penetration correction up to quadrupoles. J. Comput. Chem. 2016, 37, 494–506. (24) Öhrn, A.; Hermida-Ramon, J. M.; Karlström, G. Method for Slater-Type Density Fitting for Intermolecular Electrostatic Interactions with Charge Overlap. I. The Model. J. Chem. Theory Comput. 2016, 12, 2298–2311. (25) Grynova, G.; Corminboeuf, C. Steric attraction: not by dispersion alone. Beilstein Journal of Organic Chemistry 2018, 14, 1482–1490. (26) Piquemal, J.-P.; Chevreau, H.; Gresh, N. Toward a Separate Reproduction of the Contributions to the Hartree-Fock and DFT Intermolecular Interaction Energies by Polarizable Molecular Mechanics with the SIBFA Potential. Journal of Chemical Theory and Computation 2007, 3, 824–837. (27) Bojarowski, S. A.; Kumar, P.; Dominiak, P. M. A Universal and Straightforward Approach to Include Penetration Effects in Electrostatic Interaction Energy Estimation. ChemPhysChem 2016, 17, 2455–2460. (28) Hirshfeld, F.; Rzotkiewicz, S. Electrostatic binding in the first-row AH and A2 diatomic molecules. Mol. Phys. 1974, 27, 1319–1343. (29) Bayly, C.; Cieplak, P.; Cornell, W.; Kollman, P. A Well-Behaved Electrostatic Potential Based Method Using Charge Restraints For Determining Atom-Centered Charges: The RESP Model. J. Phys. Chem. 1993, 97, 10269–10280. 27

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(30) Hansen, N. K.; Coppens, P. Testing aspherical atom refinements on small-molecule data sets. Acta Crystallogr. 1978, A34, 909–921. (31) Řezáč, J.; Riley, K. E.; Hobza, P. S66: A Well-balanced Database of Benchmark Interaction Energies Relevant to Biomolecular Structures. J Chem Theory Comput. 2011, 7, 2427–2438. (32) Řezáč, J.; Riley, K. E.; Hobza, P. Extensions of the S66 Data Set: More Accurate Interaction Energies and Angular-Displaced Nonequilibrium Geometries. J. Chem. Theory Comput. 2011, 7, 3466–3470. (33) Krishnan, R.; Binkley, J. S.; Seeger, R.; Pople, J. A. Self-Consistent lar Orbital Methods. XX. A Basis Set for Correlated Wavefunctions. J. Chem. Phys. 1980, 72, 650–654. (34) Kumar, P.; Bojarowski, S. A.; Jarzembska, K. N.; Domagała, S.; Vanommeslaeghe, K.; MacKerell Jr, A. D.; Dominiak, P. M. A comparative study of transferable aspherical pseudoatom databank and classical force fields for predicting electrostatic interactions in molecular dimers. J. Chem. Theory Comput. 2014, 10, 1652–1664. (35) Wandtke, C. M.; Lübben, J.; Dittrich, B. Molecular Electrostatic Potentials from Invariom Point Charges. ChemPhysChem 2016, 17, 2238–2246. (36) Dittrich, B.; Hübschle, C. B.; Pröpper, K.; Dietrich, F.; Stolper, T.; Holstein, J. J. The generalized invariom database (GID). Acta Cryst. 2013, B69, 91–104. (37) Becke, A. D. Density-functional exchange-energy approximation with correct asymptotic behavior. Phys. Rev. A 1988, 38, 3098–3100. (38) Perdew, J. P. Density-functional approximation for the correlation-energy of the inhomogenous electron gas. Phys. Rev. B 1986, 33, 8822–8824. (39) Lee, C.; Yang, W.; Parr, R. G. Development of the Colle-Salvetti correlation-energy formula into a functional of the electron density. Phys. Rev. B 1988, 37, 785–789. 28

ACS Paragon Plus Environment

Page 28 of 32

Page 29 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(40) Kendall, R. A.; Dunning, J. T. H.; Harrison, R. J. Electron Affinities of the First-Row Atoms Revisited. Systematic Basis Sets and Wave Functions. J. Chem. Phys. 1992, 96, 6796–6806. (41) Jakalian, A.; Jack, D. B.; Bayly, C. I. Fast, efficient generation of high-quality atomic charges. AM1-BCC model: II. Parameterization and validation. J. Comput. Chem. 2002, 23, 1623–41. (42) Case, D. A.; Babin, V.; Berryman, J. T.; Betz, R.; Cai, Q.; Cerutti, D. S.; Cheatham, T. E.; Darden, I. T. A.; Duke, R. E.; Gohlke, H.; Goetz, A. W.; Gusarov, S.; Homeyer, N.; Janowski, P.; Kaus, J.; Kolossváry, I.; Kovalenko, A.; Lee, T. S.; LeGrand, S.; Luchko, T.; Luo, R.; Madej, B.; Merz, K. M.; Paesani, F.; Roe, D. R.; Roitberg, A.; Sagui, C.; Salomon-Ferrer, R.; Seabra, G.; Simmerling, C. L.; Smith, W.; Swails, J.; Walker, R. C.; Wang, J.; Wolf, R. M.; Wu, X.; Kollman, P. A. AMBER 14, University of California, San Francisco. 2014. (43) Wang, J.; Wang, W.; Kollman, P. A.; Case, D. A. Automatic atom type and bond type perception in molecular mechanical calculations. J Mol Graph Model 2006, 25, 247–260. (44) Jelsch, C.; Guillot, B.; Lagoutte, A.; Lecomte, C. Advances in Proteins and Small Molecules Charge Density Refinement Methods using software MoPro. 2005. (45) Lu, T.; Chen, F. Multiwfn: A multifunctional wavefunction analyzer. Journal of Computational Chemistry 33, 580–592. (46) Frisch, M. J.; Trucks, G. W.; Schlegel, H. B.; Scuseria, G. E.; Robb, M. A.; Cheeseman, J. R.; Scalmani, G.; Barone, V.; Petersson, G. A.; Nakatsuji, H.; Li, X.; Caricato, M.; Marenich, A.; Bloino, J.; Janesko, B. G.; Gomperts, R.; Mennucci, B.; Hratchian, H. P.; Ortiz, J. V.; Izmaylov, A. F.; Sonnenberg, J. L.; Williams-Young, D.; Ding, F.; Lipparini, F.; Egidi, F.; Goings, J.; Peng, B.; Petrone, A.; Henderson, T.; 29

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Ranasinghe, D.; Zakrzewski, V. G.; Gao, J.; Rega, N.; Zheng, G.; Liang, W.; Hada, M.; Ehara, M.; Toyota, K.; Fukuda, R.; Hasegawa, J.; Ishida, M.; Nakajima, T.; Honda, Y.; Kitao, O.; Nakai, H.; Vreven, T.; Throssell, K.; Montgomery, J. A.; Peralta, J. E.; Ogliaro, F.; Bearpark, M.; Heyd, J. J.; Brothers, E.; Kudin, K. N.; Staroverov, V. N.; Keith, T.; Kobayashi, R.; Normand, J.; Raghavachari, K.; Rendell, A.; Burant, J. C.; Iyengar, S. S.; Tomasi, J.; Cossi, M.; Millam, J. M.; Klene, M.; Adamo, C.; Cammi, R.; Ochterski, J. W.; Martin, R. L.; Morokuma, K.; Farkas, O.; Foresman, J. B.; Fox, D. J. Gaussian 09, Revision B.01. 2016. (47) Hübschle, C. B.; Luger, P. MolIso – a program for colour-mapped iso-surfaces. Journal of Applied Crystallography 2006, 39, 901–904. (48) Hübschle, C. B.; Dittrich, B. MoleCoolQt – a molecule viewer for charge-density research. Journal of Applied Crystallography 2011, 44, 238–240. (49) Volkov, A.; Koritsánszky, T.; Coppens, P. Combination of the exact potential and multipole methods (EP/MM) for evaluation of intermolecular electrostatic interaction energies with pseudoatom representation of molecular electron densities. Chem. Phys. Lett. 2004, 391, 170–175. (50) Volkov, A.; Macchi, P.; Farrugia, L. J.; Gatti, C.; Mallinson, P.; Richter, T.; Koritsánszky, T. XD2006 - A Computer Program for Multipole Refinement, Topological Analysis of Charge Densities and Evaluation of Intermolecular Energies from Experimental or Theoretical Structure Factors. 2006. (51) Jeziorski, B.; Moszynski, R.; Szalewicz, K. Perturbation Theory Approach to Intermolecular Potential Energy Surfaces of van der Waals Complexes. Chem. Rev. 1994, 94, 1887–1930. (52) Heßelmann, A.; Jansen, G. First-order intermolecular interaction energies from KohnSham orbitals. Chem. Phys. Lett. 2002, 357, 464–470. 30

ACS Paragon Plus Environment

Page 30 of 32

Page 31 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(53) Werner, H.; Knowles, P. J.; Knizia, G.; Manby, F. R.; Schütz, M.; and others, MOLPRO version 2015.1 a package of ab initio programs. 2015.

31

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Graphical TOC Entry

32

ACS Paragon Plus Environment

Page 32 of 32