The hpCADD NDDO Hamiltonian: Parametrization - Journal of

Jul 12, 2017 - Comparison of the resulting solvation free energies for the training and test sets to atomic charges derived from standard procedures, ...
0 downloads 8 Views 2MB Size
Subscriber access provided by Imperial College London | Library

Article

The hpCADD NDDO-Hamiltonian: Parameterization Heike B. Thomas, Matthias Hennemann, Patrick Kibies, Franziska Hoffgaard, Stefan Gussregen, Gerhard Hessler, Stefan M. Kast, and Timothy Clark J. Chem. Inf. Model., Just Accepted Manuscript • DOI: 10.1021/acs.jcim.7b00080 • Publication Date (Web): 12 Jul 2017 Downloaded from http://pubs.acs.org on July 24, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Chemical Information and Modeling is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 39

Journal of Chemical Information and Modeling

1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The hpCADD NDDO-Hamiltonian: Parameterization Heike B. Thomas,a Matthias Hennemann,a Patrick Kibies,b Franziska Hoffgaard,b Stefan Güssregen,c Gerhard Hessler,c Stefan M. Kast,b and Timothy Clarka,* a

Computer-Chemie-Centrum, Department Chemie und Pharmazie, Friedrich-

Alexander-Universität Erlangen-Nürnberg, Nägelsbachstr. 25, 91052 Erlangen, Germany. b

Physikalische Chemie III, Technische Universität Dortmund, Otto-Hahn-Str. 4a,

44227 Dortmund, Germany c

R&D, IDD, Structural Design and Informatics, Sanofi-Aventis Deutschland GmbH,

Industriepark Höchst, 65926 Frankfurt am Main, Germany.

*Corresponding author ([email protected])

Abstract: A Neglect of Diatomic Differential Overlap (NDDO) Hamiltonian has been parameterized as an electronic component of a polarizable force field. Coulomb and exchange potentials derived directly from the NDDO Hamiltonian in principle can be used with classical potentials, thus forming the basis for a new generation of efficiently applicable multipolar polarizable force fields. The new hpCADD Hamiltonian uses force-field like atom types and reproduces the electrostatic properties (dipole moment, molecular electrostatic potential) and Koopmans’ Theorem ionization potentials closely, as demonstrated for a large training set and an independent test set of small molecules. The Hamiltonian is not intended to reproduce geometries or total energies well as these will be controlled by the classical force-field potentials. In order to establish the hpCADD Hamiltonian as electronic component in force field-based calculations, we test its performance in combination with the 3D Reference Interaction Site Model (3D RISM) for aqueous solutions. Comparison of the resulting solvation free energies for training and test sets to atomic charges derived from standard procedures, exact solute-solvent electrostatics based on high-level quantum chemical reference data and established

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

Page 2 of 39

2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

semiempirical

Hamiltonians

demonstrates

the

advantages

parametrization.

ACS Paragon Plus Environment

of

the

hpCADD

Page 3 of 39

Journal of Chemical Information and Modeling

3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Introduction Molecular mechanics force fields1 have proven to be far more successful in reproducing molecular structures and energies than might be expected from the sometimes very simple physical model on which they are based. However, they do have systematic weakness, one of which is that most are not polarizable (i.e. their electrostatic parameters remain constant under an applied electric field). As polarization is now increasingly being recognized to be an important factor in noncovalent interactions,2,3 this can be a serious disadvantage because specific electrostatic effects that depend on polarization are only reproduced in an averaged fashion. Although multipole-based force fields do exist,4,5,6 most use a simple atomic monopole (net atomic charge) electrostatic model, which cannot reproduce many important features of the molecular electrostatic potential (MEP) around the molecule, including the σ-hole.7 Many of these problems can be traced back to the coarse graining of all electronic degrees of freedom inherent in most force fields. Polarizable force fields attempt to alleviate this problem, but no single implementation has yet achieved general acceptance.8 Although much progress has been made in developing polarizable force fields,9 modifying the electrostatics to reproduce the σhole,10,11,12,13,14 or both together,15 we decided to use the fact that NDDO wavefunctions are polarizable and can reproduce anisotropic electrostatics around atoms as the electrostatic component of a new force field. We here report a Neglect of Diatomic Differential Overlap (NDDO) Hamiltonian designed to reproduce the electronic properties of unperturbed molecules as closely as possible, confining ourselves in this pilot study to isolated species. Later work will focus on environment-induced polarization and classical force-field potentials for completing the model description in a way that is compatible with force field-based molecular simulation. Note that the only function that the NDDO Hamiltonian needs to perform in such a hybrid force field is to reproduce the molecular electrostatics (Coulomb and Exchange interactions) correctly. It need not reproduce either geometries or energies well as these will be governed by the classical force-field potentials being developed. The concept of developing the force field is first to parameterize the Coulomb (NDDO) Hamiltonian independently and then to leave it fixed and parameterize the classical potentials “on top” of the existing electrostatics. ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

Page 4 of 39

4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

In this way, we can use high-level ab initio standards to parameterize the electrostatics, which are thus removed from the plethora of non-independent twocenter interactions in the classical potentials. In contrast to standard force fields, computationally economical quantum mechanical techniques such as NDDO-based16 molecular-orbital theory,17 reproduce the MEP around molecules (and other one-electron properties) very well18 and are polarizable, but often have difficulties reproducing structures and energies (heats of formation or reaction energies) correctly if other parameters (e.g. ionization potentials or dipole moments) are also to be reproduced accurately. Thus, there are often conflicts in the goals of parameter optimization. Stewart, for instance, used a relatively low weight for dipole moments in developing PM619 and has discussed the problems encountered and strategies used in parameterizing new, generalist Hamiltonians (i.e. those intended to reproduce a wide range of properties).20 These observations are the basis for the hpCADD (high-Performance Computer-Aided Drug Design) force field, which we introduce here. We have chosen to use NDDO, rather than, for instance DFTB3,21

or

other

self-consistent

charge

density-functional

tight-binding

(SCCDFTB)22 methods because MNDO-like NDDO methods offer the advantage of multipole electrostatics and reproduce electrostatic properties well.18 As highperformance semiempirical molecular orbital programs such as EMPIRE23,24 are of comparable speed and scaling to classical polarizable force fields and are able to calculate tens of thousands of molecules in aggregates or single molecules of perhaps 100,000 atoms on highly parallel hardware, the NDDO formalism represents an attractive alternative to classical polarization models for force fields. One weakness is that NDDO wavefunctions with the traditional minimal Slater basis sets underestimate polarizability. Truhlar and Gao recently observed25 that the polarizabilities of organic molecules can be reproduced well in ab initio and Density Functional Theory (DFT) calculations simply by adding diffuse p-functions to the basis set. We pointed out2 that this effect can be traced back to the facile shift of electron density in bonds to hydrogen atoms that is also responsible for the electrostatic component of the directionality of hydrogen bonds. This means that even techniques that use minimal basis sets, such as traditional NDDO-based Modified Neglect of Diatomic Overlap (MNDO-) like formalisms, can be made sufficiently polarizable by adding p-orbitals on hydrogen. Interestingly, adding pfunctions to hydrogen was found to enable SINDO/1 (Symmetrically Orthogonalized ACS Paragon Plus Environment

Page 5 of 39

Journal of Chemical Information and Modeling

5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Intermediate Neglect of Differential Overlap) to reproduce hydrogen bonds correctly.26 Thus, an MNDO-like ansatz should be able to reproduce the electrostatics of molecules very well. An additional advantage of this approach is that the one- and two-electron integrals in MNDO-like theories are approximated by multipole interactions up to the quadrupole-quadrupole level,27,28,29 so that they deliver an atom-centered multipole model for molecular electrostatics directly.18 Note that the use of atom-centered multipole expansion up to quadrupoles is an approximation introduced with MNDO,28,29 rather than an analysis method such as that of Stone.30 In our case, however, the multipole approximation can be translated directly into an atom-centered multipole representation.18 We have therefore decided to construct a force field, initially intended for drug design and biomolecular modeling, which uses a re-parameterized MNDO Hamiltonian for the Coulomb and exchange interactions and classical force-field potentials for all others. In this way, we can use the strengths of semiempirical molecular-orbital theory in reproducing one-electron properties and those of classical force fields for structures and energies. We now present first proof-of-principle results for the parameterization of the electrostatic component in the absence of environmental polarization. Our strategy is to separate the Coulomb (and exchange) interaction from other two-center potentials by parameterizing the quantum mechanical Hamiltonian only with respect to electrostatic and electronic properties. These are the MEP close to the van der Waals surface, where it is most important for intermolecular interactions, both the magnitude and direction of dipole moments and the molecular ionization potential, which guarantees that other local properties such as the local ionization energy31 and by implication the local electron affinity32 are also reproduced reliably. This has the advantage that conflicting goals in the parameterization process (such as, for instance, the necessity to reproduce structures and energies well and, at the same time, to give accurate dipole moments)33 are alleviated or eliminated, so that the parameterization process is normally easier and more successful. Once the semiempirical Hamiltonian has been parameterized to reproduce these properties as accurately as possible, we test its performance by computing solvation free energies from three-dimensional Reference Interaction Site Model (3D RISM) integral equation theory for aqueous solutions34,35,36 in comparison to high-level MEP ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

6 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

data used as reference for the parameterization and to other common MEP models. Note that we do not intend to reproduce the experimental solvation thermodynamics with absolute accuracy since RISM models are known to have systematic deficiencies that can be corrected using a suitable approach based on the partial molar volume (PMV).37,38 Here, we only test the consistency between results using the MEP from reference calculations and from the hpCADD model. In the next stage to be described elsewhere, the optimized NDDO-Hamiltonian (and thus the molecular electrostatics) will be fixed and the remaining force-field potentials can be parameterized “on top” of it. Similarly, a suitable PMV correction for RISM-based solvation free energies can be parametrized once the final model becomes available. In its final form, the hpCADD force field should represent a transferable and universal atom-type based polarizable multipole force field that will not require expensive reparameterization calculations in order to treat chemical entities not considered in the training set. By applying the final parameters to an independent test set of small molecules, we will demonstrate that hpCADD provides this desirable transferability feature. By construction, the hpCADD parametrization strategy is easily extensible to situations where new chemical types are thought to be essential for enhancing predictive capabilities. It is thus in principle possible to cover large and diverse chemical spaces by this approach. We now describe the parameterization process for the semiempirical Hamiltonian, present the parameters for a series of common elements, and outline the solution state calculations. Note that we are able to use different parameters for different types of atoms of the same element (e.g. one set for sp3- and one for sp2- and spcarbon atoms) because these atom types will be carried over into the classical components of the force field.

Methods Parameterization All semiempirical molecular orbital calculations used the EMPIRE program.23,24 The reference geometries used for the parameterization were taken from second order Møller-Plesset perturbation theory MP2/aug−cc−pVDZ39,40,41 geometry optimizations

ACS Paragon Plus Environment

Page 6 of 39

Page 7 of 39

Journal of Chemical Information and Modeling

7 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

that were performed without symmetry constraints using Gaussian09.42 The target properties for the parameterization were: 1. The three Cartesian components of the molecular dipole moment calculated at MP2/aug−cc−pVDZ. 2. The standard deviation, mean signed and unsigned deviations, most positive and most negative deviations between the MP2/aug−cc−pVDZ-calculated MEP at points on the ParaSurf43 standard AM1 (Austin Model 1)44,45 isodensity surface and those given by the atomic multipole technique18 at the same points with the Hamiltonian being parameterized. 3. Vertical experimental ionization potentials (IP) taken from the literature and given in the Supporting Information (Table S4). Initial parameters were usually based on those taken from the MNDO27 Hamiltonian with the exception that the one-electron ionization potential for the hydrogen 1sorbital (USS in the original terminology used in Reference 25) was fixed at its experimental value, as suggested previously46 in order to anchor the ionization energies for the entire parameterization. In contrast to what has become standard practice since the parameterization of PM3,47,48 (Parameterized Model 3) the derived parameters for the multipole integral calculations were not treated as parameterizable variables, but calculated as in the original MNDO technique.27 This reduces the maximum number of parameters per atom type to twelve. Parameterization was performed using an in-house standalone version of Baker’s Eigenvector-Following (EF) algorithm49 using numerical gradients with EMPIRE as the semiempirical compute engine. The Hamiltonians parameterized do not contain d-orbitals. The magnitudes of the parameters were monitored continuously during the parameterization runs in order to avoid overtraining or optimization into spurious minima. The training set, and the initial set of atoms types, which are described in Table 1, do not contain phosphorus, which would most likely need d-orbitals for good results. There is, however, no reason other than computational cost that d-orbitals should not be included in a future hpCADD NDDO Hamiltonian.

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

Page 8 of 39

8 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Atom Types and Parameterization Datasets The atom types defined for the initial parameterization are shown in Table 1. Table 1: Initial atom types for the hpCADD force field. Atom Symbol

Type

Definition

3

Csp3

sp -hybridized Carbon

Csp2

sp -hybridized Carbon

Csp1

sp-hybridized Carbon

HCsp3

H bound to sp3-hybridized Carbon

HCsp2

H bound to sp2-hybridized Carbon

HCsp1

H bound to sp-hybridized Carbon

OC

Carbonyl oxygen

CO

Carbonyl carbon

HCO

Aldehyde hydrogen

SC

Thioketone sulfur

HAS

Thioaldehyde hydrogen

2

O

ACS Paragon Plus Environment

C

Page 9 of 39

Journal of Chemical Information and Modeling

9 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

OHc

Carboxylic acid OH oxygen

HOc

Carboxylic acid proton

ORc

Ester OR oxygen

Namd

Amide nitrogen

Hamd

Amide NH hydrogen

OH

Alcohol oxygen

HO

Alcohol hydrogen

SH

Thiol sulfur

HS

Thiol hydrogen

N3

Amine nitrogen

HN

Amine hydrogen

NC

Nitrile nitrogen

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

OR

Ether oxygen

SR

Thioether sulfur

F

Fluoride

Cl

Chloride

Br

Bromide

I

Iodide

SO

Sulfone sulfur

OS

Sulfone oxygen

HSO

HSO2 hydrogen

Nim

Divalent nitrogen as part of a π-system

Ccon

Carbon as part of a π-system

Hcon

Hydrogen bonded to a Carbon as part of a π-system

Ncon

Trivalent nitrogen as part of a π-system

HNcon

Hydrogen bonded to conjugated nitrogen

ACS Paragon Plus Environment

Page 10 of 39

Page 11 of 39

Journal of Chemical Information and Modeling

11 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Ocon

Oxygen as part of a π-system

Scon

Sulfur as part of a π-system

The parameterization data for 188 compounds (training set) are shown in Tables S1S4 of the Supporting Information. They include many common organic groups without assuming complete coverage of chemical space. The parameterization was performed incrementally in the order in which the atom types appear in Table 1. This means that, for instance, the Csp3, Csp2, Csp1, HCsp3, HCsp2 and HCsp1 atom types were parameterized for the hydrocarbons and then used unchanged for other groups unless they led to poor performance. Thus, for instance, HCsp2 is not used for aldehyde hydrogens, but rather a dedicated atom type HCO. In many cases, the existing parameters were found to perform well and no new atom type was introduced. The ordering of chemical compound classes used for this sequential parametrization strategy is indicated behind the class names in Tables 2 and 5 and in the tables in the Supporting Information. The raw performance of the hpCADD model for approximating electronic structure was characterized by computing various electrostatic and quantum mechanical descriptors in comparison with reference MP2 results. In particular, we report below the overall root mean square deviations (RMSD) for dipole moments, ionization potentials and MEP together with the largest MEP errors with mean signed errors (MSE). We furthermore compare details of the MEP topographies.

3D RISM calculations A technique that calculates the electron density, and therefore the MEP around compounds well should also be appropriate for reproducing solvation free energies deduced from reference MEPs used for the parameterization process. We therefore tested the hpCADD Hamiltonian in conjunction with the 3D RISM solvation model34,35 for aqueous solutions. This model yields analytical expressions for the excess chemical potential, µex, within certain approximations.36 Since we are not interested in ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

12 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

absolute data here but only in measuring the performance of hpCADD electrostatics in comparison with high-level calculations and other established electrostatics models, we do not consider the solvent-induced polarization of the solute, which is in principle possible.50 The electrostatic component of the solute-solvent interaction is described by computing the energy of solvent point charges in the MEP field. As we have demonstrated earlier,51 it is necessary for numerical reasons to split the total electrostatic interaction into an auxiliary long-range part arising from atomic solute point charges and a residual short ranged part such that the sum is identical with the exact MEP. After subsequent iterative solution to the 3D RISM integral equations on a 3D grid, µex is obtained from the set of correlation functions between solute and solvent sites. As a stringent test of the hpCADD model we compared its performance for solution thermodynamics not only with reference MP2 MEPs, but also with common semiempirical Hamiltonians (AM1, PM3, MNDO) and with a standard approach to fixed charge force field models following the Generalized Amber Force Field (GAFF) strategy to derive atomic point charges for small non-peptide molecules.52 In this case, the RESP (“Restrained Electrostatic Potential”) procedure was employed using antechamber from the Amber tools53 to represent the HF/6-31G(d) Hartree-Fock MEP as accurately as possible with topologically symmetrized atomic site charges.54 MEP fitting points were selected according to the Merz-Singh-Kollman scheme55,56 on four concentric layers around the atoms with an increased number of six points per unit area. Since the Amber-GAFF model is atom-centered, the resulting MEP was used directly for 3D RISM calculations, not accounting for the residual difference to the exact HF MEP. The reference MP2 MEP was calculated using the Gaussian auxiliary program cubegen using the MP2 density. Auxiliary charges for semiempirical Hamiltonians including hpCADD were obtained by fitting VESPA charges57 to the MEP calculated with the hpCADD Hamiltonian. In all cases, optimized geometries from the MP2/aug-cc-pvDZ calculations were used for charge and MEP calculations. As input for 3D RISM iterations, the MEP for all methods was computed on a grid with a minimum distance between box edges and any solute atom of 15 Å and a grid spacing of 0.3 Å in each direction. Following closely our earlier work,51 a modified SPC/E (Extended Simple Point Charge) water representation58 was used to model

ACS Paragon Plus Environment

Page 12 of 39

Page 13 of 39

Journal of Chemical Information and Modeling

13 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

the solvent. The Lennard-Jones parameters for the dispersion-cavitation solutesolvent description were taken from the GAFF force field in all cases. The PSE-3 closure was used throughout;36 the convergence criterion was set to 10-6 for the maximum residual deviation of the direct correlation functions between subsequent iteration cycles. The results for training and test data sets were compared by computing statistical descriptors for the deviation between the approximate (hpCADD, Amber, AM1, PM3, MNDO) solvation free energies measured by µex and the MP2-based reference results. To this end, the RMSD, the mean absolute deviation (MAD) along with the MSE, and the Pearson correlation coefficient were computed. Additionally, slope and intercept of a linear regression model were derived together with the resulting coefficient of determination R2 and the estimated standard deviation σ.

Results and discussion Performance of electronic structure descriptors

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

14 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Table 2 shows the parameterization results for each of the target values and for the entire training set divided into classes of molecules. The results table is organized in the order in which the molecule classes were parameterized (i.e. lower classes of molecules use parameters from those above them in addition to their own specific parameterization). The “inapplicable” test-set compounds belong to structural classes not represented in the training set and therefore outside the applicability domain of the current parameterization. They are included to illustrate the locality of the hpCADD Hamiltonian and to illustrate likely errors for “unknown” compound types (see below).

ACS Paragon Plus Environment

Page 14 of 39

Page 15 of 39

Journal of Chemical Information and Modeling

15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Table 2: Performance of the optimized parameters for the target variables organized in compound groups. The data for the individual compounds are given in the Supporting Information. Dipole moments are reported in Debye, ionization potentials in eV and the MEP in kcal mol−1. The number in parentheses after the class name indicates the parametrization ordering. The column “MEP-RMSD” gives the root-mean-square deviation of the RMSDs of the individual compounds for the MEP, whereas the “MEP” column gives the RMSD of the MEP for all surface points for all molecules in the dataset. Comparison of the two values indicates how evenly the molecules in the dataset are treated. The values in parentheses give the range of the target value for the dataset. See the text above for the definition of “Inapplicable” compounds. Atom Types

Csp3 Csp2 Csp1 HCsp3 HCsp2 HCsp1

Overall RMSD µ

IP

0.08 (-0.63 to 0.76)

0.23 (8.09 – 11.51)

OH HO

0.11 (-1.50 to 1.67)

0.37 (10.2 to 10.96)

Namd Hamd

0.40 (-4.14 to 4.06)

0.49 (9.14 to 10.5))

N3 HN

0.24 (-1.41 to 1.15)

0.56 (8.54 to 9.65)

Br

0.17 (-3.56 to 2.54)

0.35 (9.39 to 10.76)

OHc HOc

0.39 (-2.96 to 3.92)

Cl

Largest MEP errors

MEPMEP RMSD 1. Hydrocarbons (1)

0.85 (-23.68 to 32.31)

2. Alcohols (7) 0.85 (-41.02 to 44.02) 3. Amides (6) 2.22 (-51.23 to 48.76) 4. Amines (9) 0.94 (-52.86 to 39.70) 5. Bromides (15)

3.96

-5,87

0.58

0.98

4.93

-5.57

0.72

2.84

6.95

-19.37

0.88

1.12

6.51

-7.67

0.67

7.23

-9.15

0.68

11.23

-15.72

1.04

7.10

-10.79

0.46

1.12

2.43

-7.80

0.22

1.53

7.98

-10.55

0.89

1.54

6.07

-10.96

1.13

2.17

11.17

-12.05

0.63

0.32 0.87 0.16 1.27 (-2.50 to 2.25) (10.18 to 11.29) (-19.87 to 34.97) 8. Conjugated Systems (18)

0.20 (-3.22 to 4.86)

0.64 (7.86 to 10.15)

ORc

0.20 (-2.02 to 3.43)

0.52 (8.54 to 10.75)

OR

0.18 (-1.44 to 1.90)

0.43 (9.32 to 10.02)

F

0.24 0.43 (-1.97 to 2.51) (10.63 to 13.27)

0.79 (-55.52 to 311.97)

9. Esters (5) 1.31 (-43.81 to 43.64) 10. Ethers (11) 1.19 (-44.34 to 30.26) 11. Fluorides (13) 1.08 (-30.43 to 34.56) 12. Iodides (16)

MSE

1.13

1.01 1.41 (-23.06 to 33.61) 6. Carboxylic acids (4) 1.37 1.82 2.50 (10.3 to 10.84) (-43.06 to 55.02) 7. Chlorides (14)

Ccon HCcon Nim Ncon HNcon Ocon Scon

Positive Negative

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

Page 16 of 39

16 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

I

0.27 (-2.12 to 2.47)

OC CO HCO

0.34 (-3.11 to 3.11)

0.57 1.49 2.08 (9.00 to 9.9) (-14.31 to 33.61) 13. Ketones and Aldehydes (2)

7.73

-8.99

0.97

1.95

10.56

-6.21

1.38

NC

0.07 0.57 (-4.02 to 3.85) (11.03 to 12.46)

14. Nitriles (10) 0.71 0.75 (-44.91 to 34.51) 15. Sulfones (17)

3.51

-5.32

0.27

SO OS HSO

0.15 (-4.72 to 5.54)

0.83 (-43.81 to 54.61)

7.14

-7.32

0.36

-10.76

1.10

-7.49

0.71

9.21

-4.91

1.41

2.84

11.23

-19.37

1.41

3.36 (-69.60 to 64.08)

13.05

36.28

-37.75

5.11

7.60 (-68.79 to 64.08)

21.82

119.84

-55.33

1.45

1.55 (9.29 to 10.24)

0.42 (9.96 to 10.81)

1.52 (-43.10 to 33.78)

1.20

16. Thioethers (12) SR

0.20 (-2.17 to 1.70)

SH HS

0.26 (-1.70 to 1.70)

SC HSA CS

0.47 (-2.64 to 1.69)

0.65 (8.07 to 8.81)

1.46 1.82 7.95 (-31.71 to 31.13) 17. Thiols (8) 0.59 1.01 1.36 7.96 (8.90 to 9.51) (-27.64 to 31.59) 18. Thioketones and Thioaldehydes (3) 0.30 (8.18 to 9.37)

1.80 (-29.21 to 29.04)

1.95

Complete Training Set All types

0.23 (-4.72 to 5.54)

0.57 (7.86 to 13.27)

1.15 (-55.52 to 311.97) Test sets

All types

Inapplicable

Experimental 0.90 values not (-7.70 to 4.74) available 1.45 (-7.81 to 12.94)

Experimental values not available

Parameters Table 3 lists the parameters obtained for the individual atom types. These parameters were used in the precision given in the table in the EMPIRE parameters file used for all subsequent hpCADD calculations. The symbols and units used for the parameters correspond to references 27 and 47 and their roles are defined in the Supporting Information.

ACS Paragon Plus Environment

Page 17 of 39

Journal of Chemical Information and Modeling

17 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Table 3: Optimized parameters for the atom types listed in Table 1. The Uss value for all types of hydrogen was fixed at the experimental value of -13.59844 eV and is indicated in bold in the table. Nomenclature for the parameters and units are taken from reference 27 and are defined in the Supporting Information. Parameters Ussa a Upp

βs a βp a ξs b ξp b a

Gss Gspa Gppa a Gp2

ρcore b Parameters Ussa a Upp

βs a βp a ξs b ξp b Gssa Gspa Gppa a Gp2

ρcore b Parameters a Uss a Upp

βs a βp a ξs b ξp b Gssa a Gsp a Gpp a Gp2

ρcore b Parameters Ussa Uppa

βs a βp a ξs b ξp b Gssa a Gsp

Csp1 -57.134540 -31.550208 -20.576034 -11.556630 2.651424 1.350009 11.385277 11.367651 7.317063 7.193675 1.203862

Csp2 -56.004533 -32.771727 -19.937626 -4.392654 1.823864 1.232033 10.373712 8.595365 9.095091 9.094891 1.350157

OC -99.290252 -82.324401 -33.087822 -30.608878 2.441799 2.53864 28.527808 7.026354 22.464162 17.820685 0.319475

-51.051834 -34.019691 -23.242293 -7.193323 1.845027 1.720874 10.008872 10.539312 8.029731 7.929731 1.291188

13.598440 -0.111755 -7.279856 -0.347596 1.133597 1.378217 14.311001 10.831106 4.089672 3.989672 0.732684

-70.525034 -52.846144 -8.159588 4.887574 1.748760 2.023847 14.714087 5.506948 15.744037 11.022401 0.544163

-50.667713 -37.016841 -19.336368 -10.944181 1.332529 3.268046 8.697626 11.297338 18.208000 10.165255 1.687183

-13.598440 -4.296529 -7.385536 -2.365866 0.977236 1.694048 16.471460 6.335711 0.952519 0.766008 0.467238

OHc -98.773392 -73.959228 -30.285285 -39.393993 1.812872 1.390249 16.682234 19.098621 4.429111 3.630789 0.805882

HOc -13.598440 -9.792256 -2.518255 -5.178289 1.402333 6.690095 11.745390 10.993214 11.359259 10.134907 0.425108

ORc -101.397130 -73.497761 -31.105996 -32.086173 1.168803 4.643811 12.006680 13.387283 13.796043 13.012912 1.148969

Namd -71.594154 -57.995725 -20.724598 -24.181100 2.264992 0.883725 22.923393 16.675275 0.392244 0.284953 0.876562

Hamd -13.598440 -4.507666 -5.932420 -7.518052 2.456534 1.421647 10.208671 10.381695 4.611840 2.925733 0.935030

NC -68.982170 -71.527271 -20.781192 -29.734675 4.286493 1.064035 28.502824 8.867105 17.346125 17.246125 0.550223

CN OH -54.214780 -110.644199 -61.018598 -62.779013 -18.599950 -48.049193 -4.884952 -6.068511 1.905249 2.025751 2.538566 1.544880 10.001897 6.447996 15.288876 11.438543

HO -13.598440 -6.345576 -5.188109 -30.289408 3.741170 4.300671 15.921467 23.187677

SH -69.404196 -53.329494 -20.797253 -0.477919 2.187098 1.393829 6.432390 7.225023

HS -13.598440 -6.828391 -3.524485 -0.732089 1.456146 2.160097 18.802828 16.774856

OR -93.131866 -57.266407 -56.385721 -8.437938 2.211918 1.576284 7.393121 9.332506

CO

Csp3 -56.866067 -32.873021 -18.879427 -13.704305 2.053747 2.325245 15.913764 9.351601 10.084143 9.901188 0.872597 HCO

HCsp1 -13.598440 4.651429 -6.781701 -0.660783 0.840477 3.117138 22.504821 1.812203 1.034360 0.615950 0.490854 SC

ACS Paragon Plus Environment

HCsp2 -13.598440 5.936670 -3.015955 0.437893 3.656390 0.301103 18.219326 8.755309 3.297240 3.297040 0.542826 CS

HCsp3 -13.598440 1.106749 -12.598789 -0.934851 1.370618 1.694617 15.522633 7.340549 5.429447 5.311533 0.678804 HAS

Journal of Chemical Information and Modeling

Page 18 of 39

18 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Gppa a Gp2

27.464126 16.226985 1.324776

10.437636 10.153486 2.174147

8.720638 8.129336 0.669973

10.264111 9.936538 2.075309

12.439347 6.937294 0.662903

10.041088 9.685367 1.880738

SR -81.441743 -51.075098 -26.434130 -1.725599 1.970969 1.358496 6.277973 11.108872 7.621889 7.151393 2.100699

NH -48.552277 -43.524106 -9.463622 -3.842305 1.752407 1.205681 9.520222 8.195266 10.116551 10.016551 1.552650

HN -13.598440 -2.829472 -3.994636 -1.449631 0.844586 1.605460 14.582207 11.878712 8.245272 -0.007277 0.308455

SO -67.374544 -56.231516 -10.736591 -17.849105 2.129794 2.195676 12.286556 11.450864 8.416305 8.316251 1.154768

OS -98.359575 -73.238164 -33.959484 -21.324128 11.982624 1.684346 12.951124 16.664765 9.698725 9.598725 1.050488

HSO -13.598440 3.017358 -5.890047 0.928311 1.141711 1.496751 11.820608 21.994001 8.058844 1.210303 0.915851

Parameters F a -145.121500 Uss a -121.815460 Upp a -34.145166 βs -12.371271 βp a 2.334286 ξs b 2.190831 ξp b 14.038205 Gssa a 14.903272 Gsp a 19.703140 Gpp a 19.551353 Gp2 b 0.989531 ρcore

Cl -93.675395 -69.792916 -62.799677 -3.995695 2.756296 1.565311 13.913884 1.222128 14.324101 14.084516 0.960562

Br -97.497102 -75.241413 -44.314455 0.305045 2.185464 1.814781 9.574625 4.056305 14.360031 14.020251 1.365639

I -78.669148 -83.328898 -29.207027 0.518614 2.339867 1.977634 8.364136 4.636808 15.828207 15.427902 1.504931

Nim -70.183426 -50.981265 -16.290831 -13.112620 2.867447 1.271439 14.318383 13.672876 8.380220 8.136448 0.834358

Ncon -69.894881 -60.974364 -16.942307 -17.327652 1.192675 2.200598 14.527863 11.562961 18.841732 14.985688 0.951010

Parameters Ussa Uppa

Scon -63.926189 -52.527459 -30.025352 -7.921570 2.842908 1.318051 8.550021 8.450424 9.472057 9.009453 1.494701

Ocon -101.058827 -72.283708 -38.365078 -21.418025 0.938976 2.772824 6.074053 10.621859 21.388329 13.010110 2.275529

Ccon -52.690492 -52.690492 -33.376010 -20.694394 1.684054 1.068103 8.361445 10.574547 8.857099 8.550273 1.670025

HCcon -13.598440 4.914856 -12.787739 -0.432002 1.917542 1.793264 17.156526 4.526758 8.133185 -2.022178 0.075378

ρcore b Parameters Ussa a Upp

βs a βp a ξs b ξp b a

Gss a Gsp Gppa a Gp2

ρcore b

βs a βp a ξs b ξp b Gssa a Gsp Gppa a Gp2

ρcore b a

eV

b

HNcon -13.598440 -5.249159 0.000000 -1.491020 1.197927 0.664124 21.655184 14.406163 6.031612 5.535249 0.079740

Bohr−1

As expected, it is straightforward to parameterize a standard MNDO-like Hamiltonian to reproduce our limited set of electrostatic and electronic structure-related target properties quite accurately without needing to consider structures or energies. The different parameter sets for hydrogen illustrate the effect of using force field like atom types for the semiempirical Hamiltonian. Table 4 shows the mean values and

ACS Paragon Plus Environment

Page 19 of 39

Journal of Chemical Information and Modeling

19 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

standard deviations of the nine variable hydrogen parameters (Uss is always fixed) for the 12 different hydrogen atom types. Table 4: Statistical distribution of the optimized parameters for hydrogen (HCsp1, HCsp2, HCsp3, HCO, HAS, HOc, Hamd, HO, HS, HN, HSO, HCcon and HNcon). Nomenclature for the parameters and units are defined in the Supporting Information. Parameter a

-1.671 -5.592 -4.031 1.690 2.253 16.218 11.563 6.496 2.911 0.521

Upp

a

βs βp a ξs b ξp b a

Gss a Gsp a Gpp Gp2a

ρcore b a

eV

b

Mean

Standard Deviation Absolute % 5.119 406 3.723 167 8.216 304 0.999 41 1.830 19 3.807 77 6.594 43 3.623 44 3.050 5 0.277 47

Bohr−1

The parameters, in particular those associated with the p-polarization functions, exhibit considerable scatter. Although much of this results from the pragmatic statistical parameterization and thus has little physical meaning, some of the variation can be interpreted as a surrogate for the effects that would normally be accounted for by a split-valence basis set. Upp, βs, βp and ξp, for instance, all correlate roughly with the acidity (in this case, the pKa value in water) of the proton, in accord with the polarization effect observed previously. The trends can be understood in terms of the 1s-orbital becoming more compact and the polarization effect made possible by the p-orbitals becoming more important as the hydrogen becomes more positive; Upp is most negative and ξp largest for the most acidic protons.

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 1: Visual comparison of the optimized parameters for the three differently hybridized carbon atom types. Nomenclature for the parameters and units are defined in the Supporting Information.

Figure 1 shows the optimized parameters for the three differently hybridized carbon atom types. The parameters show consistent trends, although the Csp2 ξs, ξp and βs parameters are lower than those for both Csp1 and Csp3. This is likely because the Csp2 parameterization is strongly adapted to reproduce the electrostatics of π-clouds accurately.

ACS Paragon Plus Environment

Page 20 of 39

Page 21 of 39

Journal of Chemical Information and Modeling

21 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 2: Visual comparison of the optimized parameters for the halogens F, Cl, Br and I. Nomenclature for the parameters and units are defined in the Supporting Information.

Figure 2 shows a similar comparison for the halogens. Fluorine generally forms an exception, but the trends are quite smooth and without large exceptions.

Molecular electrostatic potential topography Atom-centered monopole electrostatic models often fail to capture the correct topography of the MEP projected onto the van der Waals surface, especially for strongly anisotropic atoms such as the heavier elements.59 This failure has strong consequences for intermolecular interactions because features that correspond to the H-bond accepting faces of aromatic systems60 or the σ-hole7,61 are often not reproduced using atom-centered monopoles, which tend towards isotropic MEP distributions around single atoms or homogeneous rings. We have therefore selected some molecules that exhibit such binding features to investigate the behavior of the newly parameterized hpCADD NDDO Hamiltonian with respect to the accurate ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

Page 22 of 39

22 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

reproduction of electrostatic properties. As for the parameterization, we have used the MP2/aug-cc-pVDZ-calculated data and optimized geometries as reference. p-Chloro benzoic acid, 1

Figure 3 shows the hpCADD MEP projected onto the standard isodensity surface of p-chloro benzoic acid, 1. Red arrows in the above scheme indicate favorable interactions with H-bond (or halogen–bond) acceptors, blue ones with H-bond donors. This molecule shows several important features that must be reproduced by a satisfactory electrostatic model. It can enter into classical hydrogen bonds through the OH-donor and =O-acceptor functions of the carboxylic acid, function as a halogen-bond donor via the σ-hole on chlorine or as an H-bond acceptor through the negative equatorial belt of the halogen. The aromatic ring can function as an H-bond acceptor on its faces and as a CH hydrogen-bond donor in the ring plane. Figure 3a) shows the MEP calculated with the hpCADD Hamiltonian and Figure 3b) the difference between hpCADD and MP2/aug-cc-pVDZ MEPs projected onto the same surface.

kcal mol−1

kcal mol−1

Figure 3: a) MEP for p-chloro benzoic acid in kcal mol−1 calculated at the 0.001 a.u. (atomic units) isodensity surface with the hpCADD Hamiltonian. The color scale has been compressed to 0 ± 20 kcal mol−1 to emphasize the features of the MEP map. A corresponding figure in which the color code extends over the entire range (-41 – +37 kcal mol−1) is given in the Supporting Information (Table S5); b) the difference between the hpCADD and the MP2/aug-cc-pVDZ calculated MEP (i.e. the hpCADD error) projected onto the same surface.

ACS Paragon Plus Environment

Page 23 of 39

Journal of Chemical Information and Modeling

23 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The hpCADD Hamiltonian gives the largest deviation (-15.7 kcal mol−1) for the σ-hole at the chlorine atom in p-chloro benzoic acid with a most positive MEP value in this region of -7.2 kcal mol−1. The largest positive error (+11.2 kcal mol-1) is situated in the negatively polarized area of the hydrogen atoms around the benzene ring. The overall

RMSE

for

the

MEP

is

1.7 kcal mol-1.

The

hpCADD

Hamiltonian

underestimates the polarity of these two important binding features. Most likely, this depends on the electronic properties of the chlorine atom and may mean that dorbitals are needed. 2-(4-pyridyl)-pyrrole, 2

Pyridine is a very electron-poor aromatic system and pyrrole very electron-rich. Therefore, linking the two at the pyridine 4-position and the 2-position of pyrrole should result in a bis-aromatic system, 2, with two very different aromatic ring systems and different MEPs on the π-faces of the rings. Figure 4 shows hpCADD results and a comparison with MP2/aug-cc-pVDZ analogously to that shown in

Figure 3. Figure 4: The difference between the hpCADD and the MP2/aug-cc-pVDZ calculated MEP (i.e. the hpCADD error, VhpCADD(r) – VMP2(r)) projected onto the 0.001 a.u. isodensity surface. The right-hand structure is that optimized fully at MP2/aug-cc-pVDZ. The three left-hand structures are based on partially optimized geometries in which the twist angle around the central bond was constrained to 30°, 60° and 90°.

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

Page 24 of 39

24 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The range of the MEP values given by the hpCADD Hamiltonian is similar to that found for MP2/aug-cc-pVDZ. The most positive deviation (8.87 kcal mol-1, 9%) of the overall MEP range of the MP2/aug-cc-pVDZ calculations corresponds to the positive area around the pyridine ortho-hydrogen atoms and the most negative for their metaneighbors. The hpCADD errors for the pyridine moiety are approximately 50% larger than for unsubstituted pyridine, whereas the pyrrole ring is reproduced very similarly to the unsubstituted parent. hpCADD underestimates the induced dipole of the pyridine ring slightly. Twisting the rings relative to each other demonstrates that the hpCADD Hamiltonian can adapt to conformational changes in conjugated systems; the most positive error remains approximately constant in the twisted structures and the most negative one decreases with increasing twist angle. Carbohydrate modeling is traditionally regarded as being difficult, so that a good description of the electrostatic properties of glucose is an important quality criterion for any new force field for natural systems. Figure 5 shows the hpCADD results for glucose, 3, analogously to Figure 4. Glucose, 3

kcal mol−1

kcal mol−1

Figure 5: a) MEP for glucose calculated at the 0.001 a.u. isodensity surface with the hpCADD Hamiltonian. The color scale has been compressed to 0 ± 20 kcal mol−1 to emphasize the features of the MEP map. A corresponding figure in which the color code extends over the entire range (-49 – +52 kcal mol−1) is given in the Supporting Information (Table S5); b) the difference between the hpCADD and the MP2/aug-cc-pVDZ calculated MEP (i.e. the hpCADD error, V(r)hpCADD – V(r)MP2) projected onto the same surface.

ACS Paragon Plus Environment

Page 25 of 39

Journal of Chemical Information and Modeling

25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The overall RMSE for the MEP from the hpCADD Hamiltonian relative MP2/aug-ccpVDZ to is 3.40 kcal mol-1. The largest deviation is -12.04 kcal mol-1, 12% of the MEP range from MP2/aug-cc-pVDZ calculations, is found around the oxygen of an equatorial OH. The MP2 value here is -14 kcal mol−1, compared with the hpCADD value of -26 kcal mol-1. The most positive error (10 kcal mol−1) is found on the side of C6 opposite the axial hydrogen atom.

3D RISM solvation free energy results Tables 5 and 6 (raw data are shown in Fig. S1 and Table S6 of the Supporting Information) summarize the statistical descriptors for a comparison to the MP2 reference of the 3D RISM excess chemical potentials calculated using MEPs calculated with hpCADD, Amber-GAFF, and three semiempirical methods. The results for are shown for the complete training data set, for the individual chemical compound classes and for two independent test sets. The “inapplicable” test set consists of compound types not represented in the training set (predominantly guanidines, ureas, anhydrides and aromatics with two or more adjacent nitrogen atoms). These compounds are included to test and illustrate the applicability of the parameterized Hamiltonian. Their poor performance indicates that additional atom types are needed. Figure 6 shows RMSD and MSE for distinct classes and all methods and Figure 7 in addition the deviations for each model for both training and test sets. More details including raw data for all compounds and separate scatter plots for all models/classes are given in the Supporting Information.

-1

Table 5: Statistical results of aqueous solvation free energies (in kcal mol ) from 3D RISM calculations for the training data set for hpCADD and other charge models measured in relation to reference MP2 calculations. R2 and σ that were derived from a linear model fit with respect to the MP2based results. The statistics were obtained for the following numbers of data points within each chemical compound class; all (188), hydrocarbons (21), alcohols (8), amides (9), amines (6), bromides (12), carboxylic acids (6), chlorides (11), conjugated systems (17), esters (22), ethers (11), fluorides (10), iodides (10), ketones and aldehydes (8), nitriles (5), sulfones (6), thioethers (11), thiols (8) as well as thiones and thioaldehydes (7). The number in parentheses after the class name indicates the parametrization ordering. For the raw data, see the Supporting Information. All hpCADD Amber AM1 PM3

RMSD 1.052 1.891 2.676 2.608

MAD 0.756 0.978 1.795 1.561

MSE -0.270 -1.356 -1.266 -0.725

Pearson Slope Intercept R2 σ 0.990 0.923 1.182 0.979 0.887 0.979 0.930 2.109 0.959 1.245 0.925 0.943 1.884 0.855 2.346 0.921 0.880 2.089 0.848 2.397

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

Page 26 of 39

26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

MNDO 1. Hydrocarbons (1) hpCADD Amber AM1 PM3 MNDO 2. Alcohols (7) hpCADD Amber AM1 PM3 MNDO 3. Amides (6) hpCADD Amber AM1 PM3 MNDO 4. Amines (9) hpCADD Amber AM1 PM3 MNDO 5. Bromides (15) hpCADD Amber AM1 PM3 MNDO 6. Carboxylic acids (4) hpCADD Amber AM1 PM3 MNDO 7. Chlorides (14) hpCADD Amber AM1 PM3 MNDO 8. Conjugated systems (18) hpCADD Amber AM1 PM3 MNDO

2.328 RMSD 0.539 1.285 0.512 0.780 1.250 RMSD 0.502 1.352 1.272 1.819 1.681 RMSD 1.512 2.537 0.706 0.811 1.174 RMSD 0.610 1.235 1.022 1.399 0.860 RMSD 0.911 0.585 0.529 0.367 0.336 RMSD 1.099 2.632 1.136 1.133 0.959 RMSD 0.491 0.609 0.351 0.755 0.200 RMSD 1.982 3.163 2.676 2.608 2.316

1.479 MAD 0.215 0.678 0.398 0.410 0.566 MAD 0.212 0.245 0.143 0.202 0.297 MAD 0.564 0.240 0.664 0.746 0.504 MAD 0.362 0.693 0.458 0.636 0.570 MAD 0.487 0.163 0.224 0.206 0.153 MAD 0.800 0.474 0.537 0.511 0.491 MAD 0.345 0.201 0.171 0.307 0.156 MAD 1.263 1.279 1.795 1.561 1.444

-0.635 MSE 0.471 -1.019 0.189 0.589 1.050 MSE -0.433 -1.311 1.260 1.800 1.635 MSE -1.358 -2.524 0.176 0.181 0.963 MSE -0.436 -0.877 -0.792 1.061 0.234 MSE -0.566 -0.544 -0.455 -0.267 -0.273 MSE -0.071 -2.568 0.864 0.884 0.653 MSE -0.247 -0.563 0.255 0.623 -0.028 MSE -0.649 -2.125 -1.266 -0.725 -0.692

0.935 Pearson 0.998 0.985 0.997 0.995 0.989 Pearson 0.999 0.997 0.999 0.998 0.996 Pearson 0.988 0.998 0.988 0.985 0.989 Pearson 0.996 0.982 0.995 0.988 0.990 Pearson 0.986 0.999 0.998 0.998 0.999 Pearson 0.946 0.992 0.980 0.981 0.981 Pearson 0.996 0.999 0.999 0.996 0.999 Pearson 0.946 0.858 0.925 0.921 0.937

ACS Paragon Plus Environment

0.914 Slope 1.014 1.038 1.083 1.048 1.003 Slope 0.949 1.047 1.027 1.045 1.048 Slope 0.953 0.981 1.061 1.102 0.939 Slope 0.962 0.934 1.125 1.168 1.149 Slope 1.006 0.989 1.018 1.006 1.025 Slope 1.047 1.140 1.110 1.100 1.085 Slope 0.956 1.034 0.993 0.970 1.021 Slope 0.718 0.716 0.943 0.880 0.912

1.628 Intercept -0.690 0.489 -1.463 -1.348 1.097 Intercept 0.785 1.034 -1.492 -2.204 -2.060 Intercept 1.499 2.558 -0.451 -0.642 -0.641 Intercept 0.685 1.280 -0.025 -2.403 -1.306 Intercept 0.493 0.687 0.222 0.195 -0.046 Intercept -0.238 2.007 -1.682 -1.627 -1.262 Intercept 0.811 0.141 -0.163 -0.219 -0.238 Intercept 2.815 3.890 1.884 2.089 1.695

0.874 R2 0.997 0.971 0.995 0.989 0.977 R2 0.999 0.994 0.999 0.997 0.992 R2 0.976 0.997 0.975 0.971 0.977 R2 0.992 0.965 0.990 0.976 0.980 R2 0.972 0.998 0.996 0.997 0.998 R2 0.894 0.985 0.961 0.963 0.962 R2 0.992 0.998 0.997 0.991 0.998 R2 0.895 0.736 0.855 0.848 0.878

2.185 σ 0.268 0.805 0.345 0.492 0.712 σ 0.172 0.330 0.167 0.245 0.401 σ 0.718 0.272 0.729 0.791 0.700 σ 0.478 0.998 0.531 0.823 0.748 σ 0.781 0.230 0.283 0.275 0.184 σ 1.331 0.504 0.811 0.788 0.802 σ 0.414 0.203 0.264 0.448 0.196 σ 1.310 2.078 2.346 2.397 2.151

Page 27 of 39

Journal of Chemical Information and Modeling

27 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

9. Esters (5) hpCADD Amber AM1 PM3 MNDO 10. Ethers (11) hpCADD Amber AM1 PM3 MNDO 11. Fluorides (13) hpCADD Amber AM1 PM3 MNDO 12. Iodides (16) hpCADD Amber AM1 PM3 MNDO 13. Ketones/aldehydes (2) hpCADD Amber AM1 PM3 MNDO 14. Nitriles (10) hpCADD Amber AM1 PM3 MNDO 15. Sulfones (17) hpCADD Amber AM1 PM3 MNDO 16. Thioethers (12) hpCADD Amber AM1 PM3 MNDO 17. Thiols (8)

RMSD 1.004 2.374 4.775 0.879 3.576 RMSD 0.739 0.439 0.622 0.797 0.384 RMSD 0.763 0.626 0.611 0.594 0.854 RMSD 0.221 1.269 1.279 1.214 1.226 RMSD 0.696 2.148 0.464 0.532 0.547 RMSD 1.682 1.503 1.075 0.698 1.592 RMSD 1.339 4.597 9.285 11.760 9.514 RMSD 1.401 0.501 2.197 2.861 0.507 RMSD

MAD MSE 0.506 0.769 0.209 -2.357 0.784 -4.669 0.315 -0.783 0.547 -3.500 MAD MSE 0.470 -0.436 0.265 -0.269 0.376 -0.383 0.412 0.553 0.264 -0.064 MAD MSE 0.590 -0.300 0.144 -0.604 0.164 0.563 0.219 0.514 0.310 -0.741 MAD MSE 0.171 -0.040 0.555 -1.109 0.295 -1.213 0.283 -1.160 0.290 -1.156 MAD MSE 0.562 -0.194 0.276 -2.121 0.164 -0.378 0.188 -0.459 0.248 -0.448 MAD MSE 0.765 -1.370 0.181 -1.487 0.180 1.055 0.091 -0.688 0.462 1.468 MAD MSE 1.075 -0.537 0.332 -4.577 0.441 -9.265 0.637 -11.728 0.666 -9.484 MAD MSE 0.789 -1.043 0.341 0.237 0.458 -2.093 0.377 -2.803 0.346 -0.156 MAD MSE

Pearson 0.991 0.998 0.986 0.998 0.992 Pearson 0.998 0.998 0.999 0.998 0.999 Pearson 0.994 0.999 0.998 0.997 0.994 Pearson 0.999 0.991 0.996 0.997 0.995 Pearson 0.998 1.000 1.000 1.000 0.999 Pearson 0.955 0.998 0.998 0.999 0.981 Pearson 0.961 0.993 0.981 0.965 0.980 Pearson 0.992 0.997 0.996 0.996 0.996 Pearson

ACS Paragon Plus Environment

Slope 0.969 1.010 1.148 1.060 1.090 Slope 0.919 1.0011 1.082 1.085 1.053 Slope 0.870 1.023 0.974 0.981 1.003 Slope 0.983 0.926 1.043 1.039 1.024 Slope 0.960 1.022 1.022 1.017 1.003 Slope 0.928 1.015 1.027 1.005 0.995 Slope 0.758 0.918 1.069 0.882 1.199 Slope 0.880 1.042 1.102 1.075 1.028 Slope

Intercept -0.242 2.223 2.998 -0.124 2.383 Intercept 1.488 0.254 -0.681 -1.739 -0.645 Intercept 1.696 0.365 -0.260 -0.290 0.713 Intercept 0.270 2.024 0.687 0.679 0.859 Intercept 0.666 1.899 0.126 0.260 0.417 Intercept 1.896 1.384 -1.315 0.646 -1.413 Intercept 2.019 4.748 9.445 11.129 10.046 Intercept 2.812 -0.915 0.689 1.825 -0.277 Intercept

R2 0.983 0.997 0.972 0.996 0.983 R2 0.996 0.996 0.998 0.995 0.998 R2 0.987 0.999 0.997 0.994 0.987 R2 0.997 0.983 0.992 0.994 0.990 R2 0.996 0.999 0.999 0.999 0.999 R2 0.911 0.996 0.997 0.999 0.962 R2 0.923 0.987 0.961 0.931 0.961 2 R 0.984 0.994 0.991 0.992 0.992 2 R

σ 0.657 0.293 0.835 0.310 0.650 σ 0.384 0.384 0.289 0.422 0.285 σ 0.476 0.160 0.241 0.322 0.476 σ 0.230 0.588 0.411 0.364 0.445 σ 0.656 0.324 0.233 0.262 0.360 σ 1.223 0.275 0.239 0.148 0.795 σ 1.009 0.419 0.714 0.958 0.721 σ 0.704 0.432 0.524 0.497 0.512 σ

Journal of Chemical Information and Modeling

Page 28 of 39

28 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

hpCADD Amber AM1 PM3 MNDO 18. Thiones/thioaldehydes (3) hpCADD Amber AM1 PM3 MNDO

0.640 0.306 2.456 2.719 0.361 RMSD 0.600 1.287 3.981 4.252 1.813

0.249 0.199 0.189 0.191 0.249 MAD 0.418 0.411 0.365 0.375 0.409

-0.577 -0.162 -2.443 -2.705 0.086 MSE -0.338 -1.199 -3.961 -4.229 -1.736

0.999 0.940 1.252 0.999 0.168 0.999 1.055 -0.476 0.998 0.200 0.999 1.043 2.036 0.997 0.237 0.998 1.032 2.413 0.996 0.286 0.996 1.028 -0.416 0.992 0.388 2 Pearson Slope Intercept R σ 0.999 0.968 0.997 0.997 0.499 0.999 1.036 0.491 0.998 0.449 0.999 1.002 3.931 0.998 0.474 0.999 0.995 4.310 0.997 0.520 0.998 0.994 1.858 0.996 0.615

Table 6: Statistical results of aqueous solvation free energies (in kcal mol-1) from 3D RISM calculations for the independent test data set comprising a total of 55 compounds and for hpCADD 2 and other charge models measured in relation to reference MP2 calculations. R and σ that were derived from a linear model fit with respect to the MP2-based results. For raw data, see Supporting Information. Test set (all compounds) hpCADD Amber AM1 PM3 MNDO

RMSD

MAD

MSE

Pearson

Slope

Intercept

2.158 3.533 2.730 2.964 2.547

1.594 1.282 1.731 1.779 1.648

-0.678 -3.078 -0.413 0.182 0.412

0.985 0.989 0.972 0.966 0.976

0.961 1.033 1.000 0.985 0.984

1.402 2.554 0.415 0.102 -0:010

2

R

0.970 0.978 0.945 0.934 0.952

σ 2.036 1.731 2.752 3.012 2.556

-1

Table 7: Statistical results of aqueous solvation free energies (in kcal mol ) from 3D RISM calculations for the 18 “inapplicable” substances excluded from the independent test set for hpCADD and other charge models measured in relation to reference MP2 calculations. R2 and σ that were derived from a linear model fit with respect to the MP2-based results. For raw data, see Supporting Information. Test set (“inapplicable compounds”) hpCADD Amber AM1 PM3 MNDO

RMSD

MAD

MSE

Pearson

Slope

8.709 5.807 5.543 6.062 5.259

4.039 1.549 4.018 4.189 3.257

-6.715 -5.502 -2.868 -3.511 -3.262

0.909 0.990 0.926 0.919 0.944

0.822 1.067 1.139 1.143 1.111

Intercept 8.458 4.768 0.978 1.662 1.796

R2 0.826 0.980 0.857 0.844 0.891

σ 5.316 1.802 4.821 5.033 4.207

For the complete training data set (55 compounds), we find that the electrostatics of the hpCADD Hamiltonian best reproduce the excess chemical potentials of all the methods investigated with respect to the reference calculations. This trend is confirmed by the consistency of all statistical metrics applied and becomes particularly obvious from both the resulting deviations and from the regression analysis, for which the slope of the hpCADD model is closer to one than for all others.

ACS Paragon Plus Environment

Page 29 of 39

Journal of Chemical Information and Modeling

29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Clearly and not unexpectedly, MNDO is the runner-up among semiempirical Hamiltonians. The overall good performance, however, varies among different compound classes, but, most notably, hpCADD performs significantly better in cases where AM1, PM3, and MNDO fail dramatically, particularly for compounds containing sulfur. The MSE data and the values of the regression intercept indicate that hpCADD electrostatics lead to a better balance and less systematic shifts than other methods. Interestingly, Amber-GAFF point charge electrostatics, although worse than hpCADD, also show good overall agreement with the reference data as measured by the RMSD, although the MSE and Fig. 7 indicate a strong and systematic shift with respect to the MP2 reference. This property of the RMSD for a simple fixed charge Hamiltonian could be among the reasons why established force fields such as Amber still prevail. However, several compounds classes, mostly containing oxo and thiocarbonyl groups, amines, iodides, and conjugated systems exhibit stronger deviations. This is related to the difference between exact electrostatics and the point charge approximation, which neglects diffuse electron distribution components. To verify the predictive and thus transferability capability of the hpCADD electrostatics, 3D RISM calculations were performed for two independent test sets (a 55-compound test set and 18 “inapplicable” compounds) and evaluated equivalently as has been done for the training data set. The resulting statistics (see Tables 6 and 7 and Fig. S1 and Table S6 in the Supporting Information) reveal both overall good correlation with reference data and superior predictive quality of the hpCADD model measured by the RMSD and MAD, although the MSE of other semiempirical models show their somewhat better balance, at the price of considerably larger spread of absolute values. A clear advantage of the hpCADD approach is further revealed by regression analysis. Here, we find that both slope and intercept are optimal, indicating a well-balanced model that could be interpreted in the sense that the physical basis is captured best among the competing approaches. The hpCADD quality for solvation properties also agrees with the statistics of quantum mechanical observables (see Table 2), where the performance of hpCADD is optimal for the training set, while the test set shows only a small increase of error. The set of 18 “inapplicable” compounds illustrates the limits of our approach. Compound types not represented in the training set are treated less well by the hpCADD Hamiltonian than by the more generally applicable MNDO, AM1 and PM3

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Hamiltonians and by Amber. This deficit is expected for a highly parameterized Hamiltonian with specific atom types. It illustrates the need to consider all relevant atom types in the parameterization.

Figure 6: Bar chart for the root mean square deviation (RMSD) and the mean signed error (MSE) of aqueous solvation free energies computed with the electrostatics described by hpCADD, Amber, AM1, PM3 and MNDO with respect to MP2 calculations. The data is shown for the complete training data set and for the chemical compound classes hydrocarbons 1, alcohols 2, amides 3, amines 4, bromides 5, carboxylic acids 6, chlorides 7, conjugated systems 8, esters 9, ethers 10, fluorides 11, iodides 12, ketones and aldehydes 13, nitriles 14, sulfones 15, thioethers 16 thiols 17, thiones and thioaldehydes 18.

However, the present parametrization does not assume completeness for the entire range of small molecule chemistries. For instance, it is necessary to introduce special nitrogen parameters e.g. for urea-type molecular fragments in order to cover a wider range of pharmaceutically relevant compounds or nucleobases. The “inapplicable test set was assembled in order to characterize these systematic deficiencies more clearly. Remarkably, not only the hpCADD parameterization fails for these species, but all Hamiltonians used in this work show rather large RMSD values in comparison with high level MP2 calculations (see Table 7 and Supporting Information for details). This finding is actually an advantage of the hpCADD approach since the enhanced parametrization flexibility will most likely allow for improvements in future work. The parametrization strategy allows for a simple complementation of atom types, should it

ACS Paragon Plus Environment

Page 30 of 39

Page 31 of 39

Journal of Chemical Information and Modeling

31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

turn out to be necessary, since these new parameters additively supplement the existing set.

Figure 7: Scatter plot of the aqueous solvation free energies derived by MP2 reference calculations and the solute models under investigation shown for both the training (left) and the test (right) data set (not accounting for the “inapplicable” compounds discussed in the text).

Conclusions We have shown that it is possible to parameterize an NDDO (MNDO-like) Hamiltonian to reproduce molecular electrostatic potentials around molecules, dipole moments, ionization potentials, and aqueous solvation free energies well. The agreement between the MEP from our first hpCADD parameterization reported here at the standard isodensity surface and that calculated at the MP2/aug-cc-pVDZ level is good. The structure of the MEP hypersurface is reproduced far better using the hpCADD Hamiltonian than by fitting MEP-derived charges to the MP2/aug-cc-pVDZ results. Important effects such as the σ-hole and the H-bond accepting properties of aromatic rings are reproduced well without having to resort to additional charges etc. Of all the semiempirical Hamiltonians tested, the hpCADD model reproduces MP2/aug-cc-pVDZ excess chemical potentials best within the 3D RISM approach. This holds for both training set (188 compounds) and independent test data sets (55 compounds, excluding 18 compounds that belong to “inapplicable” compound types not covered by the parametrization). Amber-GAFF fixed charged chemical potentials also show good correlation and relatively small errors in comparison with established ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

semiempirical models. However, a fixed charge model exhibits systematic shifts compared to reference data while hpCADD turns out to improve fixed charge thermodynamics considerably and in a well-balanced way at a reasonable cost. In ongoing work, we will adapt the classical potentials used in the complete hpCADD force field. Further developments will concentrate on reproducing polarizability for self-consistent applications. Finally, the recently introduced NDDO one-electron integrals based on CCSD(T) calculations may provide the basis for a later generation hpCADD NDDO-Hamiltonian.62

Acknowledgments This work was supported by the Bundesministerium für Bildung und Forschung (BMBF) as part of the hpCADD project. We thank Nicolas Tielker for supporting the selection of test set compounds.

ACS Paragon Plus Environment

Page 32 of 39

Page 33 of 39

Journal of Chemical Information and Modeling

33 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

References 1

Allinger, N. L. Molecular Structure: Understanding Steric and Electronic Effects from

Molecular Mechanics, John Wiley and Sons, Hoboken, NJ, 2010. 2

Hennemann, M.; Murray, J. S.; Politzer, P.; Riley, K. E.; Clark, T. Polarization-

Induced σ-Holes and Hydrogen Bonding, J. Mol. Model., 2012, 18, 2461-2469. 3

Clark, T.; Murray, J. S.; Politzer, P. The Role of Polarization in a Halogen Bond, Aus

J. Chem., 2014, 67, 451-456. 4

Devereux, M.; Plattner, N.; Meuwly, M. Application of Multipolar Charge Models and

Molecular Dynamics Simulations to Study Stark Shifts in Inhomogeneous Electric Fields, J. Phys. Chem. A 2009, 113, 13199-13209. 5

Plattner, N.; Bandi, T.; Doll, J. D.; Freeman, D. L.; Meuwly, M. MD Simulations

Using Distributed Multipole Electrostatics: Structural and Spectroscopic Properties of CO- and Methane-Containing Clathrates, Mol. Phys. 2008, 106, 1675-1684. 6

Devereux, M.; Gresh, N.; Piquemal J.-P.; Meuwly, M. A Supervised Fitting

Approach to Force Field Parametrization with Application to the SIBFA Polarizable Force Field, J. Comput. Chem. 2014, 35, 1577-1591. 7

Clark, T. σ-Holes, WIREs Comput. Mol. Sci., 2013, 3, 13-20.

8

Warshel, A.; Kato, M.; Pisliakov, A. V. Polarizable Force Fields: History, Test

Cases, and Prospects. J. Chem. Theor. Comput. 2007, 3, 2034-2045. 9

Lopes, P. E. M.; Huang, J.; Shim, J.; Luo, Y.; Li, H.; Roux, B.; MacKerell, A. D. jr.

Polarizable Force Field for Peptides and Proteins Based on the Classical Drude Oscillator. J. Chem. Theory Comput., 2013, 9, 5430-5449. 10

Jorgensen, W. L.; Schyman, P. Treatment of Halogen Bonding in the OPLS-AA

Force Field: Application to Potent Anti-HIV Agents. J. Chem. Theory Comput., 2012, 8, 3895-3901. 11

Wolters, L. P.; Schyman, P.; Pavan, M. J.; Jorgensen, W. L.; Bickelhaupt, F. M.;

Kozuch, S. WIRES Comput Mol. Sci., 2014, 4, 523-540. 12

El Hage, K.; Bereau, T.; Jakobsen, S.; Meuwly, M. Impact of Quadrupolar

Electrostatics on Atoms Adjacent to the Sigma-Hole in Condensed-Phase Simulations. J. Chem. Theory Comput., 2016, 12, 3008-3019.

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

13

Hédin, F.; El Hage, K.; Meuwly, M. A Toolkit to Fit Nonbonded Parameters from

and for Condensed Phase Simulations. J. Chem. Inf. Model., 2016, 56, 1479-1489. 14

Kolář, M. H.; Hobza, P. Computer Modeling of Halogen Bonds and Other σ-Hole

Interactions. Chem. Rev., 2016, 116, 5155-5187. 15

Mu, X.; Wang, Q.; Wang, L.-P.; Fried, S. D.; Piquemal, J.-P.; Dalby, K. N.; Ren, P.

Modeling Organochlorine Compounds and the σ-Hole Effect Using a Polarizable Multipole Force Field. J Phys Chem B. 2014,118, 6456-6465. 16

Pople, J. A.; Santry, D. P.; Segal, G. A. Approximate Self-Consistent Molecular

Orbital Theory I. Invariant Procedures, J. Chem. Phys. 1965, 43, S129-S135. 17

Clark, T.; Stewart, J. J. P.; MNDO-like Semiempirical Molecular Orbital Theory and

its Application to Large Systems, in, Computational Methods for Large Systems, Reimers, J. J. (ed.), Wiley, Chichester, 2011, Chapter 8 (ISBN: 978-0-470-48788-4). 18

Horn, A. H. C.; Lin J.-H.; Clark, T. Multipole Electrostatic Model for MNDO-Like

Techniques with Minimal Valence spd-Basis Sets, Theor. Chem. Acc. 2005, 114, 159-168; Erratum: Theor. Chem. Acc., 2007, 117, 461−465. 19

Stewart, J. J. P.; Optimization of Parameters for Semiempirical Methods V:

Modification of NDDO Approximations and Application to 70 Elements. J. Mol. Model. 2007, 13, 1173-1213. 20

Stewart, J. J. P. Optimization of Parameters for Semiempirical Methods VI: More

Modifications to the NDDO Approximations and Re-optimization of Parameters. J. Mol. Model. 2013, 19, 1-32. 21

Gaus, M.; Cui, Q.; Elstner, M. DFTB3: Extension of the Self-Consistent-Charge

Density-Functional Tight-Binding Method (SCC-DFTB). J. Chem. Theory Comput. 2011, 7, 931-948. 22

Elstner, M.; Porezag, D.; Jungnickel, G.; Elsner, J.; Haugk, M.; Frauenheim, T.;

Suhai, S.; Seifert, G. Self-consistent-charge density-functional tight-binding method for simulations of complex materials properties. Phys. Rev. B 1998, 58, 7260-7268 23

Hennemann M.; Clark, T. EMPIRE: A Highly Parallel Semiempirical Molecular

Orbital Program: 1: Self-Consistent Field Calculations, J. Mol. Model. 2014, 20, 2331;

ACS Paragon Plus Environment

Page 34 of 39

Page 35 of 39

Journal of Chemical Information and Modeling

35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

see also http://www.ceposinsilico.de/products/empire.htm, accessed on November 7th, 2016. 24

Margraf, J. T.; Hennemann, M.; Meyer B.; Clark, T. EMPIRE: A Highly Parallel

Semiempirical Molecular Orbital Program: 2: Periodic Boundary Conditions, J. Mol. Model., 2015, 21, 144. 25

Fielder, L.; Gao, J.; Truhlar, D. G.; Polarized Molecular Orbital Model Chemistry. 1.

Ab Initio Foundations, J. Chem. Theory Comput. 2011, 7, 852-856. 26

Jug K.; Geudtner, G. Treatment of Hydrogen Bonding in SINDO1, J. Comput.

Chem., 1993, 14, 639-646. 27

Dewar, M. J. S.; Thiel, W. Ground States of Molecules, 38. The MNDO Method.

Approximations and Parameters, J. Am. Chem. Soc., 1977, 99, 4899-4907. 28

Dewar, M. J. S.; Thiel, W. Ground States of Molecules. 39. MNDO Results for

Molecules Containing Hydrogen, J. Am. Chem. Soc., 1977, 99, 4907-4917. 29

Thiel, W. MNDO, in Encyclopedia of Computational Chemistry, Vol. 3, Schleyer, P.

v. R.; Allinger, N. L.; Clark, T.; Gasteiger, J.; Kollman, P. A.; Schaefer III, H. F.; Schreiner, P. R. (Eds.), Wiley, Chichester, 1998, vol 3, pp 1599-1604. 30

Stone, A. J. Distributed Multipole Analysis, or how to Describe a Molecular Charge

Distribution. Chem. Phys. Lett. 1981, 83, 233-239. 31

Sjoberg, P.; Murray, J. S.; Brinck, T.; Politzer, P. A. Average Local Ionization

Energies on the Molecular Surfaces of Aromatic Systems as Guides to Chemical Reactivity, Can. J. Chem. 1990, 68, 1440-1443. 32

Ehresmann, B.; Martin, B.; Horn A. H. C.; Clark, T. Local Molecular Properties and

their Use in Predicting Reactivity, J. Mol. Model. 2003, 9, 342-347. 33

Stewart, J. J. P. Optimization of Parameters for Semiempirical Methods VI: More

Modifications to the NDDO Approximations and Re-Optimization of Parameters. J. Mol. Model. 2013, 19, 1-32. 34

Beglov, D.; Roux, B. An Integral Equation to Describe Solvation of Polar Molecules

in Liquid Water, J. Phys. Chem. 1997, 101, 7821-7826.

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

36 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

35

Kovalenko, A.; Hirata, F. Three-Dimensional Density Profiles of Water in Contact

with a Solute of Arbitrary Shape: A RISM Approach, Chem. Phys. Lett. 1998, 290, 237-244. 36

Kast, S. M.; Kloss, T. Closed-Form Expressions of the Chemical Potential for

Integral Equation Closures with Certain Bridge Functions. J. Chem. Phys. 2008, 129, 236101. 37

Ratkova, E. L.; Palmer, D. S.; Fedorov, M. V. Solvation Thermodynamics of

Organic Molecules by the Molecular Integral Equation Theory: Approaching Chemical Accuracy. Chem. Rev. 2015, 115, 6312-6356. 38

Tielker, N.; Tomazic, D.; Heil, J.; Kloss, T.; Ehrhart, S.; Güssregen, S.; Schmidt, K.

F. Kast, S. M. The SAMPL5 Challenge for Embedded-Cluster Integral Equation Theory: Solvation Free Energies, Aqueous pKa, and Cyclohexane–Water log D. J. Comput. Aided Mol. Des. 2016, 30, 1035-1044. 39

Pople, J. A.; Binkley, J. S.; Seeger, R. Theoretical Models Incorporating Electron

Correlation, Int. J. Quantum Chem. Symp. 1976, 10, 1-19. 40

Dunning, T. H. Gaussian-Basis Sets for Use in Correlated Molecular

Calculations.1. The Atoms Boron through Neon and Hydrogen. J. Chem. Phys. 1989, 90, 1007-1023. 41

Kendall, R. A.; Dunning, T. H.; Harrison, R. J. Electron-Affinities of the 1st-Row

Atoms Revisited - Systematic Basis-Sets and Wave-Functions. J. Chem. Phys. 1992, 96, 6796-6806. 42

Frisch, M. J.; Trucks, G. W.; Schlegel, H. B.; Scuseria, G. E.; Robb, M. A.;

Cheeseman, J. R.; Scalmani, G.; Barone, V.; Mennucci, B.; Petersson, G. A.; Nakatsuji, H.; Caricato, M.; Li, X.; Hratchian, H. P.; Izmaylov, A. F.; Bloino, J.; Zheng, G.; Sonnenberg, J. L.; Hada, M.; Ehara, M.; Toyota, K.; Fukuda, R.; Hasegawa, J.; Ishida, M.; Nakajima, T.; Honda, Y.; Kitao, O.; Nakai, H.; Vreven, T.; Montgomery, J. A., Jr.; Peralta, J. E.; Ogliaro, F.; Bearpark, M.; Heyd, J. J.; Brothers, E.; Kudin, K. N.; Staroverov, V. N.; Kobayashi, R.; Normand, J.; Raghavachari, K.; Rendell, A.; Burant, J. C.; Iyengar, S. S.; Tomasi, J.; Cossi, M.; Rega, N.; Millam, N. J.; Klene, M.; Knox, J. E.; Cross, J. B.; Bakken, V.; Adamo, C.; Jaramillo, J.; Gomperts, R.; Stratmann, R. E.; Yazyev, O.; Austin, A. J.; Cammi, R.; Pomelli, C.; Ochterski, J. W.;

ACS Paragon Plus Environment

Page 36 of 39

Page 37 of 39

Journal of Chemical Information and Modeling

37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Martin, R. L.; Morokuma, K.; Zakrzewski, V. G.; Voth, G. A.; Salvador, P.; Dannenberg, J. J.; Dapprich, S.; Daniels, A. D.; Farkas, Ö.; Foresman, J. B.; Ortiz, J. V.; Cioslowski, J.; Fox, D. J. Gaussian 09, Gaussian, Inc., Wallingford CT, 2009. 43

Hennemann, M.; Lin, J.-H., T. Clark, ParaSurf’12, Cepos Insilico Ltd, Kempston,

UK, 2013: http://www.ceposinsilico.de/products/parasurf.htm, accessed on November 7th, 2016. 44

Dewar, M. J. S.; Zoebisch, E. G.; Healy, E. F.; Stewart, J. J. P. AM1 - A New

General Purpose Quantum Mechanical Molecular Model. J. Am. Chem. Soc. 1985, 107, 3902-3909. 45

Holder, A. J. AM1 in Encyclopedia of Computational Chemistry, Schleyer, P. v. R.;

Allinger, N. L.; Clark, ;T. Gasteiger, J.; Kollman, P. A.; Schaefer III, H. F.; Schreiner, P. R. (Eds.), Wiley, Chichester, 1998, Vol. 1, pp. 8-11. 46

Winget, P.; Selçuki, C.; Horn, A. H. C.; Martin B.; Clark, T. Towards a "Next

Generation" NDDO-Based Semiempirical Molecular Orbital Technique. Theor. Chem. Acc. 2003, 110, 254-266. 47

Stewart, J. J. P. Optimization of Parameters for Semi-Empirical Methods I-Method,

J. Comput. Chem. 1989, 10, 209-220; 221-264. 48

Stewart, J. J. P. Encyclopedia of Computational Chemistry, Vol. 3, Schleyer, ;

Allinger, N. L.; Clark, T.; Gasteiger, J.; Kollman, P. A.; Schaefer III, H. F.; Schreiner, P. R. (Eds.), Wiley, Chichester, 1998, pp. 2080-2086. 49

Baker, J. An Algorithm for the Location of Transition-States, J. Comput. Chem.

1986, 7, 385-395. 50

Kloss, T.; Heil, J.; Kast, S. M. Quantum Chemistry in Solution by Combining 3D

Integral Equation Theory with a Cluster Embedding Strategy. J. Phys. Chem. B 2008, 112, 4337-4343. 51

Hoffgaard, F.; Heil, J.; Kast S. M. Three-Dimensional RISM Integral Equation

Theory for Polarizable Solute Models. J. Chem. Theory Comput. 2013, 9, 4718-4726. 52

Wang, J.; Wolf, R. M.; Caldwell, J. W.; Kollman, P. A.; Case D. A. Development

and Testing of a General Amber Force Field. J. Comput. Chem. 2004, 25, 11571174.

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

53

Wang, J., Wang, W., Kollman P. A.; Case, D. A. J. Mol. Graph. Model. 2006, 25,

247-260. 54

Wang, J.; Wolf, R. M.; Caldwell, J. W.; Kollman, P. A.; Case, D.A. Development

and Testing of a General Amber Force Field. J. Comput. Chem. 2003, 25, 9, 11571174. 55

Singh, U. C.; Kollman, P. A. An Approach to Computing Electrostatic Charges for

Molecules, J. Comput. Chem. 1984, 5, 129-145. 56

Besler, B. H.; Merz Jr., K. M.; Kollman, P. A. Atomic Charges Derived from

Semiempirical Methods, J. Comput. Chem. 1990, 11, 431-439. 57

Beck, B.; Clark, T.; Glen, R. C. VESPA: A New, Fast Approach to Electrostatic

Potential-Derived Atomic Charges from Semiempirical Methods, J. Comput. Chem. 1997, 18, 744-756. 58

Maw, S.; Sato, H.; Ten-no, S.; Hirata F. Ab Initio Study of Water: Self-Consistent

Determination of Electronic Structure and Liquid State Properties, Chem. Phys. Lett. 1997, 276, 20-25. 59

Politzer, P.; Murray, J. S.; Concha, M. C.; σ-Hole Bonding Between Like Atoms; A

Fallacy of Atomic Charges. J. Mol. Model. 2008, 14, 659-665. 60

Suzuki, S.; Green, P. G.; Bumgarner, R. E.; Dasgupta, S.; Goddard, W. A.; Blake,

G. A. Benzene Forms Hydrogen Bonds with Water, Science, 1992, 257, 942-945. 61

Clark, T.; Hennemann, M.; Murray, J. S.; Politzer, P. Halogen Bonding: The σ-

Hole, J. Mol. Model. 2007, 13, 291-296. 62

Margraf, J. T.; Claudino, D.; Bartlett, R. J. Determination of Consistent

Semiempirical One-Centre Integrals Based on Coupled Cluster Theory, Mol. Phys., 2016, published online, DOI: 10.1080/00268976.2016.1200755

ACS Paragon Plus Environment

Page 38 of 39

Page 39 of 39

Journal of Chemical Information and Modeling

39 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ToC Graphic:

ACS Paragon Plus Environment