Efficient Implicit Solvation Method for Full Potential DFT - American

Sep 14, 2017 - MPE method presents a particularly cheap way of solving the ... implementation of the MPE model in the FHI-aims electronic structure th...
0 downloads 0 Views 3MB Size
Subscriber access provided by Purdue University Libraries

Article

An efficient implicit solvation method for full potential DFT Markus Sinstein, Christoph Scheurer, Sebastian Matera, Volker Blum, Karsten Reuter, and Harald Oberhofer J. Chem. Theory Comput., Just Accepted Manuscript • DOI: 10.1021/acs.jctc.7b00297 • Publication Date (Web): 14 Sep 2017 Downloaded from http://pubs.acs.org on September 14, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Chemical Theory and Computation is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 75

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

An efficient implicit solvation method for full potential DFT Markus Sinstein,† Christoph Scheurer,† Sebastian Matera,‡ Volker Blum,¶ Karsten Reuter,† and Harald Oberhofer∗,† †Chair for Theoretical Chemistry and Catalysis Research Center, Technische Universit¨ at M¨ unchen, Lichtenbergstr. 4, D-85747 Garching, Germany ‡Institut f¨ ur Mathematik, Freie Universit¨ at Berlin, Arnimallee 9, D-14195 Berlin, Germany ¶Department of Mechanical Engineering and Materials Science, Duke University, Durham, NC 27708, USA E-mail: [email protected] Abstract With the advent of efficient electronic structure methods, effective continuum solvation methods have emerged as a way to, at least partially, include solvent effects into simulations without the need for expensive sampling over solvent degrees of freedom. The multipole moment expansion (MPE) model, while based on ideas initially put forward almost 100 years ago, has recently been updated for the needs of modern electronic structure calculations. Indeed, for an all-electron code relying on localized basis sets and—more importantly—a multipole moment expansion of the electrostatic potential, the MPE method presents a particularly cheap way of solving the macroscopic Poisson equation to determine the electrostatic response of a medium surrounding a solute. In addition to our implementation of the MPE model in the FHI-aims electronic structure theory code, 1 we describe novel algorithms for determining equi-distributed

1

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 75

points on the solvation cavity—defined as a charge density iso-surface—and the determination of cavity surface and volume from just this collection of points and their local density gradients. We demonstrate the efficacy of our model on an analytically solvable test-case, against high-accuracy finite element calculations for a set of ≈ 140000 2D test-cases, and finally against experimental solvation free energies of a number of neutral and singly charged molecular test-sets. 2,3 In all test-cases we find that our MPE approach compares very well with given references at computational overheads < 20% and sometimes much smaller compared to a plain self-consistency cycle.

1

Introduction

In the field of atomistic simulation, ranging from the study of large bio-molecules 4–8 to reactions on catalyst surfaces 9–13 the accurate treatment of environmental effects especially with regards to solvation is an important and often discussed issue. It usually requires large simulation boxes and generally also a sampling over solvent degrees of freedom, which typically are of only minor interest for a given chemical or biological problem. 14 Particularly for those scientific questions—such as any form of catalysis—that require a quantum mechanical description, e.g. density functional theory (DFT), 15 these size and sampling requirements can be a major obstacle. Quantum mechanical simulations are most generally based on an iterative self-consistent solution of the Schr¨odinger equation 16 of the electrons for an electrostatic potential Φ, which in turn depends on the density ̺ of all charges in the system. In vacuum the necessary relationship between ̺ and Φ that enters the Hamilton operator is described by the Poisson equation (Gaussian units are used throughout this paper unless stated otherwise)

∇ (ε0 ∇Φ(r)) = −4π̺(r)

(1)

where ε0 is the dielectric permittivity of vacuum, r the spatial coordinate vector, and ∇ the

2

ACS Paragon Plus Environment

Page 3 of 75

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

gradient operator. The same equation holds in the case of fully atomistic descriptions in dielectric media, yet, as pointed out above, these tend to be prohibitively expensive. Therefore, in order to still capture environmental effects in e.g. a solvated system, approximate methods that implicitly include the most important electrostatic response of the environment have been put forward. 14 These approaches are frequently realized as modifications of eq 1. Among the most important such models are the Born model, 17 the polarizable continuum model (PCM), 14,18 and the conductor-like screening model (COSMO). 19 Implicit solvation methods typically offer computationally inexpensive corrections on top of the total energy or the electrostatic potential, capturing some of the electrostatic effects of the environment on the electronic structure and the free energy—in the form of the solvation free energy ∆Gsol —otherwise neglected in vacuum simulations. Thereby, the solvation free energy is simply defined as the difference in free energies of the solute in a solvent and in vacuum. In the framework of implicit solvation methods the former includes electrostatic—and potentially also non-electrostatic (cf. Section 4.2 below)—contributions due to all degrees of freedom of the solvent in the total energy of the solute in the solvent response-modified potential, thereby effectively reducing the problem to finding a free energy of the solute alone which is beyond the scope of this work. The price, of course, is that specific interactions with any structural details of the environment, such as the all-important hydrogen bonding in aqueous solutions, are coarse-grained out of the description. Considering an all-electron, full potential electronic structure code such as FHI-aims, 1 which relies on localized, numeric atom-centered basis sets and non-uniform, overlapping integration grids, the above methods may not be efficiently applicable. 20 Instead we note that in FHI-aims already the vacuum electrostatic potential is resolved in a multi-centre multipole expansion. In the following we will therefore, based on earlier ideas by Kirkwood 21 and Onsager 22 as well as Rivail and Rinaldi 23 , Dillet et al. 24 , and Rinaldi et al. 25 , outline our implicit solvation scheme based on a multipole moment expansion (MPE) model to efficiently unlock the potential of FHI-aims for solvated systems, with regards to DFT 26 as

3

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

well as higher level methods. 27,28 In this work, we first describe the theoretical background of the generalized Poisson equation, as well as its solution via a multipole expansion. We go on to sketch our implementation of the MPE in FHI-aims before outlining the parametrisation procedure employed here. We also present a novel and efficient way of determining discrete and equi-distributed points on the interface between the explicit solute and the solvent medium, which are a necessary prerequisite of the MPE approach. We close with a number of benchmarks showing the excellent agreement of our implicit solvation results with that from other established methods and experiments.

2

Theoretical Background

In essence, implicit solvation methods amount to a coarse graining of the system by effectively integrating out solvent degrees of freedom. 14 Based on a macroscopic view, the solvent is modeled as a homogeneous, polarizable continuum with a bulk (relative) dielectric permittivity of εb while the solute is treated explicitly at an atomistic/quantum mechanical level of detail. Considering furthermore that the solute creates an exclusion zone for the solvent molecules, one can separate space into a region containing the explicit solute and a region with non-vanishing dielectric response from the solvent. This leads to the concept of a solvation cavity in which the solute resides (as illustrated in Figure 1). 29 Physically, the transition between the two regions, expressed as the variation in the dielectric function ε(r), is not unambiguously defined, but usually rather constitutes a set of adjustable effective parameters of the model. 14 Lacking an analytical solution to the problem of continuum solvation these parameters are generally set by fitting an experimental observable such as the free energy of solvation over a given test-set. Our own approach is described in Section 5.3 below, where we show the efficacy of our model for a set of small organic molecules.

4

ACS Paragon Plus Environment

Page 4 of 75

Page 5 of 75

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

The freedom in defining the shape of the cavity has given rise to a number of solvation models, outlined for example in more detail in Tomasi et al. 14 . In the present work, we focus on cavities with a sharp, step-like transition of the permittivity between interior and exterior regions, as well as a single, contiguous cavity. In this case, space within the cavity has a relative permittivity of εint = 1, while in all space outside of the cavity the dielectric response of the solvent leads to εext = εb . This is obviously an effective model, since in nature there is no “hard boundary” that is not crossed by solvent molecules. The cavity therefore has to be viewed as an effective parameter to be adapted for the model to optimally agree with explicitly simulated or measured solvation results. In the macroscopic picture introduced by coarse graining the solvent degrees of freedom, the relation between a charge distribution and its electrostatic potential is no longer given by eq 1 but rather, in isotropic media and assuming a linear response, by a generalized form of Poisson’s equation

∇ (ε0 ε(r)∇Φ) = −4π̺(r)

(2)

In eq 2 all information on the shape of the cavity, especially the width of the transition region, is contained in the relative dielectric function ε(r). As stated above, in the present work we focus on sharp transitions, where the permittivity function is simply given as a sum of Heaviside step functions Θ:

ε(r) = εb Θ [C(r)] + (1 − Θ [C(r)])

(3)

where C(r) is a function defining the shape of the cavity, which is < 0 within and > 0 on the outside—as illustrated in Figure 1. The exact choice of this shape function obviously depends on the geometry of the solute. A straightforward way to define it will be discussed below. It is important to note that both the original and the generalized Poisson equations, eqs 1 5

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 75

solvent 𝐩

𝐧𝐩

𝐭𝐩,1

solute 0 Φext , 𝜀 = 𝜀b

Figure 1: Schematic picture of the cavity function C. For a point p on the cavity surface, the local coordinate system with the normal vector, np , and one tangent vector, tp,1 , is sketched in gray. and 2, are linear differential equations. As such, they obey a superposition principle which greatly simplifies their solution.

2.1

Solution Ansatz

A discontinuous, step-like effective permittivity function as given in eq 3 leads to two distinct branches of the generalized Poisson equation, eq 2.



   △Φ,

C(r) < 0 4π̺(r) =  ε0  εb △Φ, C(r) > 0

(4)

Therefore, based on the superposition principle, a first straight-forward step in the ansatz for the electrostatic potential is the separation of internal, Φint , and external, Φext , contributions— as shown in Figure 1. As already noted by Kirkwood in the 1930s, it is not necessary to actually integrate the Poisson equation with the piece-wise ansatz of the electrostatic potential at hand. Rather, one can make use of boundary and continuity conditions on the total electrostatic potential

6

ACS Paragon Plus Environment

Page 7 of 75

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Φ, the electric field E, and the displacement field D, which are connected by the following equation:

D(r) = ε(r)E(r) = −ε(r)∇Φ(r)

(5)

This is again made possible by the linearity of the Poisson equation, which implies that the solution is uniquely determined by imposing a suitable set of boundary conditions. 30 If one defines the cavity dividing surface as the manifold Ω = {p|C(p) = 0} of all points on the interface, then at all such points p in Ω internal and external components of the electrostatic potential, electric field and displacement field are respectively coupled via the following continuity relations due to the basic Maxwell equations:

Φint (p) = Φext (p)

(6a)

np · Dint (p) = np · Dext (p)

(6b)

tp,i · Eint (p) = tp,i · Eext (p)

(6c)

Note that the above equations couple the components of the displacement field that are perpendicular to the interface by projecting it on the normalized perpendicular vector np at point p, while for the electric field tangential components couple via the respective vectors tp,i , i ∈ {1, 2}, tangential to the interface (as illustrated in Figure 1). This is due the dielectric permittivity by definition showing a jump in the direction perpendicular to the interface while staying constant along the others. It should also be noted here, that eq 6c follows from eqs 6a and 6b and has thus been neglected in preceding works. 23–25 However, since the continuity conditions above are only evaluated at a finite number of points p, eq 6c

7

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 75

can be used as a (higher order) correction of the standard discretization using only eqs 6a and 6b. Thus and for the sake of completeness, we consider all continuity equations in the following derivations, leaving it optional for later works whether to include eq 6c or not. Finally, the electrostatic potential is required to vanish at infinity. Assuming a closed cavity surface, this Dirichlet type boundary condition applies only to the external potential

lim Φext (r) = 0

(7)

r→∞

while the internal potential is already fully defined by the continuity conditions, eqs 6, above. One can write both Φint and Φext as a sum of two distinct terms, respectively: First, a Hartree term ΦH that would arise due to the charge distribution of the solute in vacuum, but is scaled with the inverse of the relative dielectric permittivity function ε. The second term, ΦP , results from the polarization of the surrounding dielectric medium. An ansatz for the total solvent potential can thus be written as

Φ(r) = ε(r)−1 ΦH (r) + ΦP (r)    Φint (r) = ΦH (r) + ΦR (r), C(r) < 0 =   Φext (r) = ε−1 b ΦH (r) + ΦO (r), C(r) > 0

(8)

as exemplified in Figure 2 for a simple model case.

By definition, ΦH is just the solution of the Poisson equation, eq 1, of the solute in vacuum, i.e. in the case of DFT the Hartree potential as found in a gas-phase calculation. Essentially, this leads to a regularization of the solvation potential with respect to the gasphase electrostatic potential where all irregularities, such as the Coulomb singularities at the positions of the nuclei, are already contained in ΦH . Inserting eq 8 into the branched

8

ACS Paragon Plus Environment

Page 9 of 75

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Φ

𝜀 𝜀b

𝜀−1 ΦH Φint

1 Φext 𝑟c

0

ΦO

𝑟

ΦR

Figure 2: Electrostatic potentials (cf. eq 8) for a point charge located in the center of a spherical cavity of radius rc . At the cavity radius, the dielectric function transitions abruptly from a value ε = 1 inside the cavity to a value εb outside, giving rise to a dielectric response potential ΦP , labeled ΦR inside and ΦO outside the cavity (see text). generalized Poisson equation, eq 4, thus yields

∆ΦR = 0,

C(r) < 0

(9a)

and

εb ∆ΦO = 0,

C(r) > 0

(9b)

Of the remaining two unknown functions, ΦO represents the polarization within the solvent due to the Hartree potential, and ΦR —commonly referred to as the reaction field— its feedback on the cavity interior. We point out that the MPE model specifically does not suffer from the so-called outlying charge error other models are subject to, 18,31 because the multipole moments expansions are determined from the complete density inside and outside the cavity. 32 Additionally, our formulation of the MPE method employs with eq 8 a different regularisation compared to earlier versions of the model. 23–25 This leads to both branches of the central equation 9, inside and outside, taking the form of a Laplace equation, the solution to which is given by solid harmonic functions. Relying on a highly accurate representation

9

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 75

of the Hartree potential ΦH in FHI-aims, 1,33 we thus do not introduce any additional sources of error by expanding the potentials in multipole moments, as outlined below.

2.2

Electrostatic Potentials and the Multipole Moment Expansion Model

In order to make the calculation of the two unknown potential contributions ΦR and ΦO computationally tractable one needs to express all electrostatic potentials in terms of suitable basis functions. As noted by earlier authors, 23–25 regular solid harmonics µ

R (x) =

r

4π (||x||2 )l Yml (x) 2l + 1

(10a)

4π (||x||2 )−l−1 Yml (x) 2l + 1

(10b)

and irregular solid harmonics µ

I (x) =

r

are a natural choice for such a basis as both are solutions to Laplace’s equation, cf. eqs 9a and 9b. Note that for the sake of legibility, the index pair (l, m)—being degree and order of the spherical harmonic function Yml —is represented, here and henceforth, by the combined index µ where µ1 = (0, 0), µ2 = (1, −1), µ3 = (1, 0), etc. Due to the linearity of the Laplace equation, any linear combination of solid harmonics forms a solution of the regularized central equations 9a and 9b. However, as pointed out by others, 1,33 a series expansion of the Hartree potential ΦH as solution to Poisson’s equation, eq 1, in this basis is infeasible for problems with spatially extended charge distributions. In DFT this would lead to very large basis set sizes and eventually numerical problems due to large polynomial exponents. To alleviate this problem, the electronic structure code FHI-aims used in this work employs a multi-center multipole expansion of the Hartree potential ΦH with radial splines ν as expansion coefficients, thereby dramatically reducing the

10

ACS Paragon Plus Environment

Page 11 of 75

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

necessary expansion order lmax,H

ΦH (r) =

l N lmax,H X X X I=1

l=0 m=−l

νIµ (||r − rI ||2 ) Yml (r − rI )

(11)

where rI are the coordinates of expansion center I as illustrated in Figure 3. For the sake

𝐼1 𝐽1

𝐼3 𝐽3

Φint = ΦH + ΦR 𝐼4 𝐽4

Φext = 𝜀−1 ΦH + ΦO b

𝐼2 𝐾 𝐽2

𝐼5 𝐽5

𝐼6 𝐽6

Figure 3: Schematic picture of the cavity surface together with the expansion centers used to express the potentials ΦH , ΦO , and ΦR . of simplicity and unlike in the original formulation of FHI-aims, 1 we don’t distinguish here between different contributions to ΦH , e.g. from the nuclei, electrons or external sources. In eq 11, the free atom contributions are for example simply included in the monopole terms νIµ1 . An equivalent expansion is made for ΦO in order to guarantee that the potential vanishes at infinity, eq 7. Based eq 9b taking the form of a Laplace equation, its solution can fully ′

be described as an expansion in solid harmonic functions with constant coefficients OJµ . In principle, any center(s) J within the cavity can be used here since the expansion only needs to accurately describe the potential outside of the cavity, far away from the nuclear singularities. For convenience, however, the same expansion centers are used as for the

11

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 75

Hartree potential up to an expansion order lmax,O . l N lmax,O X X X ′

ΦO (r) =

J=1

l′ =0

m′ =−l′



OJµ I µ (r − rJ ) ′

(12a)

In contrast to ΦH and ΦO which describe potentials that originate directly at or near the expansion center, ΦR represents the smooth potential due to the dielectric response of the medium outside of the cavity. This response can be described by an image charge density in the medium 30 which vanishes at low distances r to the center of expansion. The expansion of the reaction field ΦR within the cavity is thus described in the basis of regular solid harmonics around a single center K—typically placed at the geometric center of the solute, i.e. the arithmetic mean of its nuclei’s positions, as indicated in Figure 3—up to an µ expansion order of lmax,R 25 with coefficients RK lmax,R

ΦR (r) =

l X X

l=0 m=−l

µ Rµ (r − rK ) RK

(12b)

From a numerical point of view, convergence of the potential expansions with finite expansion orders lmax,H , lmax,O , and lmax,R has to be carefully tested. Note that for ΦH and ΦO , the respective expansion orders necessary for an accurate representation of the potentials are usually significantly reduced by introducing multiple expansion centers as done above (cf. also Section 5.3 below). As can be seen from the translational properties of regular solid harmonics (see e.g. Steinborn and Ruedenberg 34 ), the expansion origin of ΦR within the cavity is arbitrary. Only due to numerical reasons—i.e. to avoid having to resolve two points in close proximity from a remote expansion center—it should be placed somewhere near the center of the cavity. Consequently, an expansion of ΦR at multiple centers as employed for ΦH and ΦO would only yield a set of largely linearly dependent basis functions and thus not contribute additional degrees of freedom to the expansion. Due to the completeness of the regular solid harmonic

12

ACS Paragon Plus Environment

Page 13 of 75

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

basis, this holds for arbitrary cavity shapes beyond simple convex or star domain topologies. Yet, very complex or very large cavities could necessitate infeasibly large expansion orders for an accurate description of the reaction potential according to eq 12b. Improvement of the basis functions for ΦR to also cover such cases will be the subject of further research.

2.3

System of Linear Equations

With the above formalism, we have set the stage for an MPE based solution to the macroscopic generalized Poisson problem. By truncating the expansions of the unknown potentials in the solution ansatz (eq 8), namely ΦR , eq 12b, and ΦO , eq 12a, we have transformed the ′

µ and OJµ , respectively. Makproblem to determining finite sets of expansion coefficients RK

ing use of the continuity conditions, eqs 6, this ultimately leads to an algebraic root finding problem. 2.3.1

Direct Approach

In order to determine the unknown reaction fields one first inserts the expansion formulae for the potentials, eq 12, into the ansatz for internal and external potential components, eq 8. The resulting relations are in turn inserted into the continuity conditions, eq 6, which are evaluated at a given point p at the interface. xI , xJ , and xK denote the relative position of p with respect to the expansion centers I, J, and K, respectively, as illustrated in Figure 3.

xI/J/K = p − rI/J/K

(13)

Exploiting furthermore the properties of linear, homogeneous dielectric media, eq 5, one obtains the following equations: X µ

µ Rµ (xK ) − RK

X J,µ′

 ′ ′ OJµ I µ (xJ ) = ε−1 b − 1 ΦH (p)

13

ACS Paragon Plus Environment

(14a)

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

X µ

X µ

µ np · ∇Rµ (xK ) − RK

µ tp,i · ∇Rµ (xK ) − RK

X J,µ′



X J,µ′



Page 14 of 75

OJµ np · εb ∇I µ (xJ ) = 0 ′

(14b)

OJµ tp,i · ∇I µ (xJ ) ′

= tp,i · (ε−1 b − 1)∇ΦH (p) (14c) Rewriting eqs 14 in an inner product representation yields (note the matrix transpose): 

µ1

µ1

µ1

µ1

T 



µ1 RK  

R (xK ) np · ∇R (xK ) tp,1 · ∇R (xK ) tp,2 · ∇R (xK )     µ2   µ µ2 µ2 µ2   RK   R 2 (xK ) n · ∇R (x ) t · ∇R (x ) t · ∇R (x ) p K p,1 K p,2 K        .  . . . .    ..  .. .. .. ..       ′    µ  µ′1 ′ ′ ′ −I (xJ1 ) −np · εb ∇I µ1 (xJ1 ) −tp,1 · ∇I µ1 (xJ1 ) −tp,2 · ∇I µ1 (xJ1 ) OJ11     ·    µ′ ′  ′ ′ ′ µ −I 2 (xJ1 ) −np · εb ∇I µ2 (xJ1 ) −tp,1 · ∇I µ2 (xJ1 ) −tp,2 · ∇I µ2 (xJ1 ) OJ 2    1    .   .. .. .. ..   ..   . . . .       ′    µ  µ′1 ′ ′ ′ −I (xJ2 ) −np · εb ∇I µ1 (xJ2 ) −tp,1 · ∇I µ1 (xJ2 ) −tp,2 · ∇I µ1 (xJ2 ) OJ21      .. .. .. .. .. . . . . .   −1 (εb − 1)ΦH (p)       0   =  (15) t · (ε−1 − 1)∇Φ (p)  p,1  H b   −1 tp,2 · (εb − 1)∇ΦH (p) Equations 14 and 15 can both be expressed in the simplified form

Ac = b

(16)

with an (mA × nA ) coefficient matrix A, the vector of unknowns c, and a right hand side b.

14

ACS Paragon Plus Environment

Page 15 of 75

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Since the number of (unknown) expansion coefficients in ΦR and ΦO , cf. eqs 12a and 12b,

2

nA = (lmax,R + 1) +

J NJ X

(lmax,O + 1)2

(17)

J=J1

is much larger than the number of boundary conditions at a single point at the interface— that is two or four, depending on the applied continuity conditions, see eqs 6—the system of linear equations (SLE) would severely be under-determined for realistically sized systems. Nevertheless, the equations can still be solved observing that eq 15 has to hold for any point on the interface manifold, allowing us to simply evaluate the boundary conditions at a number of Np points pi ∈ Ω. Obviously, it is desirable to distribute these points homogeneously across the interface such that a representative sampling is achieved faster and numerical noise is reduced. An efficient method for finding such a set of interface points will be introduced in Section 4.1 below. Each of these additional points gives rise to two (or four) new equations, cf. eqs 14 or eq 15, which directly translates to two (or four) new rows in the coefficient matrix A and the right-hand side vector b. Disregarding the tangential continuity equation, eq 6c, the final number of rows in A is thus

mA = 2Np

(18)

The number of sampling points is chosen such that the SLE is over-determined and that it can be solved in a least squares manner. The reason for choosing the number of boundary points in this manner is easily seen considering the rank of the coefficient matrix A. Due to the large number of unknowns, it is easily possible to arrive at hard to control linearly dependent rows of A, effectively leading to an under-determined problem. This means that if the coefficient matrix was chosen to be square or only very mildly over-determined no unique solution may exist according to the Rouch´e-Capelli theorem. 35 Therefore, to yield optimal accuracy (which reaches a plateau as soon as A reaches full rank) at minimal computational 15

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

cost we typically choose the number of boundary points such that A is three to eightfold over-determined when only the first two continuity conditions in eqs 6 are considered. An illustrative example of the resulting convergence behavior of the solvation free energy (cf. eqs 38 below) of several molecules is given in Figure 4.

−100

1.5

−100

0.0 0

10 20 dod

−50

(b)

−100

0.8

0.4

−150 −50

(d)

0.0 0.8

−100

0.4

−150

0.0 0

Time (s)

0.8

Energy (meV)

0.0 1.6

(c)

Time (s)

Energy (meV)

−150

Energy (meV)

(a)

−50

tsolv

3.0

Time (s)

Energy (meV)

∆Gsol

−50

Time (s)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 75

10 20 dod

Figure 4: Convergence behaviour of the root finding problem with respect to the degree of determination, dod , of the MPE equations, i.e. the ratio of number of rows (conditions) and columns (variables) in the coefficient matrix A, for four test molecules (a) biphenyl (C12 H10 ), (b) butanal (C4 H8 O), (c) diethyl disulfide (C4 H10 S2 ), and (d) 1,3-dioxolane (C3 H6 O2 ) (see also Section 5.3). Overall solvation energies ∆Gsol are marked as red crosses. Orange circles depict the time tsolv necessary to set up and solve the MPE equations using four parallel MPI tasks running on an Intel i5-4670 CPU. Technical details on how to solve eq 15 are given in Section 3 below. 2.3.2

Probe Charge Approach and Comparison to Direct Approach

In the literature, 25 the direct approach has so far not received proper consideration—most probably due to the fact that the Hartree potential ΦH directly enters the MPE equation. This has the disadvantage that eq 15 has to be set up and solved in each step during the SCF cycle until the electron density and thus ΦH do not change anymore. By explicitly representing the reaction field ΦR as a linear response to ΦH —using the method of so-called probe charges—one can transfer eq 15 into an equation that no longer

16

ACS Paragon Plus Environment

Page 17 of 75

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

contains ΦH but only linear coupling terms instead. 25 The latter then only depend on the shape of the cavity and the solvent’s dielectric permittivity. However, this re-formulation requires to express ΦH in the basis of irregular solid harmonics with constant expansion coefficients—in contrast to the radial dependence of the coefficients in eq 11—which is equivalent to the assumption that there is no charge density outside of the cavity—an approximation not required by the direct approach above. Tackling the allegedly higher computational cost of the direct compared to the probe charge approach, we will in the following outline an efficient way of solving the system of linear equations arising from eq 15 which allows to compute the MPE equations of both approaches at almost identical computational cost—even in cases where the dielectric function and thus the cavity does not change.

3

Methodological Details

We demonstrate the efficient solution of the direct approach to the MPE model formulated above in a localized basis set electronic structure code such as FHI-aims, where we implemented this method. Although we here concentrate on our implementation in FHI-aims and DFT, all algorithms can easily be adapted to other electronic structure methods and codes, as long as they include direct access to the Hartree potential and, potentially, its gradient. As outlined above, the MPE method transforms a problem of solving the generalized Poisson equation for the solvent reaction potential into a problem of finding the root of a system of coupled linear equations with an (mA × nA ) coefficient matrix A. Generally, A cannot be chosen as a square matrix, but is rather derived from an over-determined problem with mA > nA . Thus, both sets of linear equations are not generally amenable to a simple direct inversion of A. Instead, an optimal solution c, which minimizes the residual F for a

17

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 75

respective right-hand side b has to be found:

min F = min ||Ac − b||2 x

x

(19)

Here, the residual is defined as the l2 -norm of the difference between left and right hand side for a given solution vector c. Using a Gauss transform this minimization problem can formally be rewritten as a system of normal equations AT Ac = AT b

(20)

where AT A is now a positive definite (nA × nA ) square matrix. Note that the set of normal equations never needs to be explicitly calculated, but rather serves as a stepping-stone towards numerically much more stable approaches. In order to solve a general set of linear equations it has long been accepted practice to factorize the (mA × nA ) coefficient matrix A as follows A = QR

(21)

Here, Q is an orthogonal (mA × mA ) square matrix and R an (mA × nA ) matrix, where an (nA × nA ) sub-matrix is of upper triangular form and all other elements are zero. Inserting this into eq 20, the matrix Q cancels, reducing the equations to RT Rc = RT QT b

(22)

Since multiplication with an orthogonal matrix is a norm preserving operation, the conditioning of the system of equations remains unchanged. Due to the triangular shape of R the least-squares solution c to the over-determined system of equations can then simply be found by forward and backward substitution—provided that R and thus also A have full rank. Us-

18

ACS Paragon Plus Environment

Page 19 of 75

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

ing a given factorization algorithm—such as e.g. the Gram-Schmidt, Givens or Householder methods 36 —QR-decomposition can thus be a straightforward and exact approach to solving linear least-squares problems as occurring in the MPE solvation model. However, especially for very large expansion orders of the solid harmonic functions that enter A we have found that R can be rank-deficient. A numerically more robust way of solving eq 22 is thus to further factorize R using a singular value decomposition (SVD) into the orthogonal matrices U and V, and the diagonal matrix Σ containing the singular values. R = UΣVT

(23)

While the (thin) SVD can also be performed directly with the coefficient matrix A, the additional cost of the QR-factorization can pay off by reducing the size of the matrix operated on from (mA × nA ) to (nA × nA ) and thus reducing the cost for the SVD. According to the LAPACK reference implementation, 37 this crossover point is reached for current algorithms already at mA ≥ 1.6nA and thus typically exceeded by the coefficient matrices occuring in MPE models (cf. Figure 4). Inserting eq 23 into eq 22 leads to VΣT ΣVT c = VΣT QU

T

b

(24)

Since the orthogonal matrix V can trivially be inverted, the problem reduces to finding the inverse of the diagonal matrix Σ which is only defined if the diagonal of Σ contains only non-zero values, meaning that A has full rank. Nonetheless, also for rank-deficient problems it is possible to find the unique solution

c = Vӆ QU

19

T

b

ACS Paragon Plus Environment

(25)

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 75

by defining the pseudoinverse Σ† of Σ as diagonal matrix with the entries Σ†ii . 36

Σ†ii =

   1/Σii , Σii 6= 0   0,

(26)

Σii = 0

For the required QR- and SVD-factorization, our implementation relies on standard LAPACK 37 routines for serial and the corresponding ScaLAPACK 38 routines for parallel execution. Note that in cases where the cavity shape—and thus A—does not change we can almost completely eliminate the computational overhead spent on solving the MPE equations in subsequent SCF cycles by simply storing the factorized form of the coefficient matrix after the first step. The reduced computational cost of the direct approach thus makes the more complicated and less general probe charge approach equations more or less obsolete.

4

Iso-Density Cavity

As already alluded to above, there is no unique way of defining a solvation cavity. The standard approach is to pick a deterministic generation function and fit its parameters to a given observable over a sufficiently large test-set (see e.g. ref. 2). In the case of the MPE model, these cavity functions should readily yield a chosen number of points, preferably equi-distributed over a smooth, closed surface. Furthermore, the generation function needs to yield a—sufficiently efficient—way to determine surface tangents and normal vectors for each of the given points. Due to several extensions 23,25 to the early examples of MPE methods, 21,22 this surface is not limited to any particular shape. For these reasons, we base our current MPE solver on solvation cavities that are iso-surfaces of the DFT electronic density. These are, by definition, smooth and—from the level of generalized gradient density functionals upwards—the normal vectors are trivially obtainable from the density gradient. On top of that, density-based cavity generation has the added advantage of only depending 20

ACS Paragon Plus Environment

Page 21 of 75

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

on few free parameters, in case of a sharp transition as discussed here just a single one, the density iso-value ̺iso . It is important to note here that the overall achievable accuracy of an implicit solvation model, how faithfully it reproduces explicit or experimental solvation results, is to a very large degree determined through the choice of the solvation cavity. To this end, earlier studies 2,20,29,39–42 already demonstrated the applicability of charge density based cavities, reproducing solvation free energies for a large molecular test-set—even in the case of charged molecular ions. 43 In the following, before demonstrating the performance of our model, we present an efficient way of determining an evenly spaced set of points for use in MPE solvation calculations.

4.1

Density Iso-Surface Sampling

Given an interface Ω between the internal and external dielectric medium that is defined through a certain charge-density iso-value, the task now is to find enough points p ∈ Ω to set up the MPE equation systems, cf. Sections 2.3.1. This task is complicated by the fact that the density is not known analytically and root finding of e.g. a 3D spline function is complex and would also not guarantee an even spacing of points in the interface. Yet, as mentioned above, in order to minimize the number of linearly dependent rows in the coefficient matrix, spreading the interface points as evenly as possible over the entire cavity surface is clearly desirable. To facilitate cavity point generation and ensure homogeneous sampling of the interface, we therefore present a deterministic algorithm based on a constrained molecular dynamics of a set of fictitious “density-walkers”. The total number of points to generate on the interface, Np , is given by the required degree of determination, dod , of the SLEs—i.e. the ratio of the number of columns nA to rows mA in the coefficient matrix A. As already shown in eqs 17 and 18, the truncation order of the reaction field and the external potential as well as the number of expansion centers for the external potential determine nA while mA is simply 2Np . The total number 21

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 75

of interface points necessary is thus

Np =

dod nA 2

(27)

In the present algorithm, each of these interface points is represented as a pseudo-particle in space which we term a density-walker. The set of density-walkers is initialized from a superposition of spherical Lebedev-Laikov grids 44 around all atomic centers with P1 grid points each and a radius of Rspec . The grid radius here is an atom-type specific value obtained by inverting the radial electron density function ̺spec (r) of a free atom of kind “spec” and evaluating the inverted function rspec (̺) at the iso-density value. Grid points of a given center that would reside within, or on the border of one or several spheres placed around other centers are discarded. Thus, during initialization a certain number of grid points proportional to the overlap between Lebedev-Laikov spheres is lost. In order to still obtain the desired number of points P1 on the interface this loss needs to be compensated for, ideally through an analytic correction without a need for iterative refinement. To this end we approximate the lower limit of the ratio between the surface area of the structure and 2/3

the sum of surface areas of all spherical grids by Ncenters , where Ncenters is the total number of centers. The lower limit for P1 to achieve the desired degree of determination is then given by

P1 ≥

&

Np 2/3

Ncenters

'

(28)

as derived in Section B in the appendix. After the exclusion of overlapping points, a molecular dynamics simulation is conducted with the remaining walkers moving on a fictitious potential energy hyper-surface designed to yield a faithful representation of the density iso-surface, even spacing of walkers and a fast convergence. To that end, the following forces act on each walker: Fd (“density force”) 22

ACS Paragon Plus Environment

Page 23 of 75

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

acts along the electron density gradient and draws the walkers towards the desired iso-density value Fd (ri ) = −δi̺,rel kd fd

∇̺(ri ) ||∇̺(ri )||2

(29)

with the relative deviation from the desired iso-density value δi̺,rel δi̺,rel =

̺(ri ) −1 ̺iso

(30)

a specified force constant kd , and an enhancement factor 

fd = max 1,



f0  ̺,rel max δi

(31)

i

which increases the density force as soon as the walker ensemble approaches the desired iso-density value such that this force has (at least) a constant norm of kd f0 for the walker with the largest absolute relative deviation. Fg (“gravitation force”) accelerates density-walkers towards the center of mass of the system, R0 , if the density gradient is too close to zero, to avoid numerical problems at low charge densities

Fg (ri ) = −kg

r i − R0 ||ri − R0 ||2

(32)

where kg is a force constant. Fr (“repulsive force”) creates pairwise repulsive interaction between the walkers and leads to a more even

23

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 75

point distribution on the iso-density surface

Fr (ri , rj ) = kr

qij 1 qij 2 s2ij

(33)

The direction of the repulsive force, qij is determined by the relative distance vector between the involved sampling points, rij = rj − ri , where the component along the density gradient is projected out such that it is perpendicular to the density force

qij =

(∇̺(ri )) · (∇̺(ri ))T 1− ||∇̺(ri )||22

!

rij

(34)

The effective interaction distance sij is obtained by scaling of the Cartesian distance between the interacting points accounting approximately for the local curvature of the cavity s2ij

=

||rij ||22



1 − cos θij (1 − cos θij )2 + 1+ 6 22.5



(35)

with

cos θij =

(∇̺(ri )) · (∇̺(rj )) ||∇̺(ri )||2 · ||∇̺(rj )||2

(36)

A derivation of eq 35 is given in the appendix, Section E. Initially, the momenta of all walkers are set to zero. During time propagation in steps of ∆t handled by a Velocity-Verlet algorithm, walkers accrue kinetic energy, which is undesirable. Therefore, in the spirit of simulated annealing, at each step all momenta are scaled by a factor 0 ≤ η < 1 to reduce the energy in the system. After each nupdate steps, a certain ratio rkill of walkers furthest away from the iso-density value is discarded. This serves to eliminate walkers trapped in the electronic density hyper-surface’s potential local minima above (or local maxima below) the iso-density value. For small values of rkill , this has little to no

24

ACS Paragon Plus Environment

Page 25 of 75

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

influence on the distribution of walkers. This pseudo-dynamics is terminated as soon as the absolute relative deviation of the electron density values for each walker from the desired iso-density value, cf. eq 30, is below a certain threshold, δ ̺,rel,thresh . ̺,rel δ i < δ ̺,rel,thresh ∀ i

(37)

A second termination criterion is given by a maximum number of allowed steps, nmax , after which the program’s execution is aborted in order to avoid a possible infinite loop. This condition usually indicates a bad choice of parameters which can lead to extremely slow convergence or to walkers getting “stuck” during the dynamics. It is important to note here that the parameters of this fictitious dynamics exert no influence over the final shape of the cavity—as it is entirely determined by the density— and have only minor impact on the final distribution of walkers. Rather, the choice of parameters governs the rate of convergence of walkers towards the optimal point distribution. A parameter set yielding fast convergence for the systems we considered is given in Table 1. Efficiency and accuracy of this method of cavity generation will be discussed in more detail Table 1: Exemplary Set of Parameters for the Cavity Sampling Algorithma

a

parameter value kd 1 f0 0.1 ̺,rel,thresh kg 5 δ 0.01 500 kr 0.01 nmax 0.001 ∆t 0.1 rkill 50 η 0.1 nupdate as employed in this work for the molecular dynamics simulation with density walkers for small organic molecules (see Section 5.3).

below. An example of such a cavity for morpholine, an organic molecule taken from the test-set T1 used in Section 5.3.3 below, is illustrated in Figure 5. Solvation cavities based on electron densities inherently allow for different approaches, depending on which density is being used. Thereby, our current implementation in FHIaims can make use of either the “fixed” initial guess ̺init for the density—possibly from 25

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 5: Schematic picture of a small organic molecule (morpholine) surrounded by evenly distributed points (light-green dots) representing the iso-density cavity. restart information—or the total electron density ̺tot being converged during the SCF cycle. The main difference between these two options is that in the “fixed” case the cavity is sampled only once from ̺init and kept fixed through the rest of the calculation. With the “self-consistent” cavity option, on the other hand, the cavity is adapted to the current total electron density ̺tot in every SCF step. Considering that the MPE equations only have to be solved fully once when the cavity does not change during the SCF cycle (cf. Section 3), the “fixed” cavity option yields even greater savings in computational cost than the “self-consistent” one. Keeping in mind that the shape of the solvation cavity is one of the parameters of the model, it is not obvious which of the two options is the optimal choice. Thus, in the interest of computational efficiency, we performed all subsequent parametrizations (cf. e.g. Section 5.3) with the “fixed” option. Finally, as an outlook on dynamical simulations or geometry optimizations which we do not cover in this work, it is important to note that the cavity adapts to changes of the electronic density as it reacts to changes of the nuclear coordinates of the solute. In such situations care will have to be taken to ensure a smooth variation of the solvation potential to avoid discontinuous forces on the solute. First experiences with the variation of the potential

26

ACS Paragon Plus Environment

Page 26 of 75

Page 27 of 75

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

in the case of the “self-consistent” cavity option indicate that this is not a problem for the algorithm presented here, as it ensures similar relative cavity point placement across different iso-surfaces.

4.2

Calculation of Surface Area and Volume

Electrostatic mean field models such as the MPE, and indeed most other implicit solvation models, 18,19 only approximate electrostatic contributions ∆Gel of the solvent to the solvation free energy. As pointed out by Kirkwood 21 , ∆Gel is simply the difference of the total energy of the solute embedded in the continuum, Ucont , and in vacuum, Uvac ,

∆Gel = Ucont − Uvac

(38a)

As already alluded to above, within the MPE framework Ucont is simply given by the total energy of the solute with a solvent-modified electrostatic potential which already includes all enthalpic and entropic contributions due to the solvent degrees of freedom integrated out in the derivation of the generalized Poisson equation (cf. Section 2). Non-electrostatic contributions, ∆Gne , such as dispersion interactions between solvent and solute molecules as well as solvent excluded volume effects have to be accounted for separately in order to obtain the total solvation free energy ∆Gsol

∆Gsol = ∆Gel + ∆Gne

(38b)

Andreussi et al. 2 have shown that non-electrostatic contributions to the free energy can, to a good degree of accuracy, be approximated as simple, linear functions of surface area O and volume V of the solvation cavity

∆Gne = α O + β V

27

ACS Paragon Plus Environment

(38c)

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 28 of 75

where α and β are parameters, determined for example by fitting the solvation free energy to suitable reference data (see Section 5.3). While, for continuous cavity functions, volume and surface area are easily obtainable by spatial integration, 2,39 they are not as accessible in the case of step like functions represented as a set of points at the dielectric interface p ∈ Ω. We therefore describe here a straightforward way to determine O and V relying only on the collection of points and their respective local coordinate systems. Our algorithm is based on a Voronoi-tessellation of the cavity, in order to determine the local area of the cavity attributed to each point p. It is thus by construction highly parallel with respect to the cavity points, as it only requires knowledge of the local neighborhood of each processed point, e.g. all points that are closer than a certain distance threshold. 45 Currently, we have implemented a simple linear search to find a desired number of nearest neighbors (usually around 30) with a scaling of O(NP2 ) where NP is the number of points at the interface. While the cost of this neighbor search is negligible for the systems investigated in this work, it might become a bottleneck when going to very large point clouds. In such cases, one could consider using faster algorithms like space partitioning or k-d tree schemes with a typical average complexity of O(NP log NP ). In the first step, the neighboring points qi are projected into the plane that is spanned by the tangent vectors tp,1 and tp,2 of point p     q˜i,1  tp,1 · (qi − p) q ˜i =   =   tp,2 · (qi − p) q˜i,2

(39)

and collected in the set Q = {˜ qi }. The definition of a local coordinate system is given in the appendix in Section C. The perpendicular bisector ˜li between the (local) origin and q ˜i in this projection is given

28

ACS Paragon Plus Environment

Page 29 of 75

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

𝐪̃ 6

𝐳̃ 56 𝐳̃ 16

𝑙̃2

𝐪̃ 2

𝐳̃ 26

𝐳̃ 12

𝐱̃ 2

𝑙̃6

𝐱̃ 1

𝐪̃ 1 𝐳̃ 46

𝐳̃ 23

𝑙̃4

𝐩̃ = 𝟎 𝐱̃ 5

𝐳̃ 14 𝑙̃1

𝐪̃ 5

𝐳̃ 25

𝐳̃ 35

𝐱̃ 3 𝐪̃ 4

𝐱̃ 4

𝑙̃5

𝐳̃ 34 𝐪̃ 3

𝑙̃3

Figure 6: Schematic picture of the Voronoi-tessellation of point p with origin p ˜ (gray dot). q ˜i (blue dots) are the relative coordinates of neighboring points qi with respect to p projected into the plane spanned by the tangent vectors of point p (see text). The perpendicular bisectors (light blue lines) between each point q ˜i and p ˜ are denoted ˜li . A subset of the bisecting lines’ intersection points ˜ zij (yellow crosses), denoted x ˜m (red circles), forms the Voronoi cell (bordered by the red line). This cell is further divided (indicated by dashed lines) into non-overlapping triangles each formed by two successive points x ˜m/m′ and point p ˜. by ˜li :

˜l(λi ) = 1 q ˜ + λi v ˜i 2 i

(40)

where v ˜i is the (not normalized) direction vector perpendicular to the connecting line 







qi,2  −˜ 0 −1 ˜i =  v ˜i =  q  q˜i,1 1 0

(41)

Then, for all pairs of lines, ˜li and ˜lj , that are not collinear, i.e. q˜i,1 q˜j,2 6= q˜j,1 q˜i,2 , the intersection

29

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 30 of 75

point, ˜ zij , is calculated

˜ zij =





1 q˜i,1    2 q˜ i,2

+





qi,2  qj,2 − q˜i,2 ) + q˜j,1 (˜ qj,1 − q˜i,1 ) −˜ 1 q˜j,2 (˜   (42) 2 q˜i,1 q˜j,2 − q˜j,1 q˜i,2 q˜ i,1

A derivation of eq 42 is shown in Section D of the appendix. From the set of intersection points Z = {˜ zij }, all points are discarded which are separated from the (local) origin by any bisecting line ˜lk . This can equivalently be expressed in terms of projections on the corresponding position vector q ˜k     1 ˜k ≤ 0 ∀ q ˜k ∈ Q ˜ ·q x ˜m ∈ ˜ zij ˜ zij ∈ Z, ˜ zij − q 2 k

(43)

The remaining points x ˜m span the Voronoi cell and are sorted with respect to increasing angles between their position vector and one axis of the local coordinate system. The cell is now divided into triangles each formed by two (in this sorting) successive points—including the pair of last and first point—and the (local) origin. The procedure described above is illustrated in Figure 6. Calculating the area, o˜m , of these triangles is straightforward, for example

o˜m =

1 |˜ xm,1 x˜m+1,2 − x˜m+1,1 x˜m,2 | 2

(44)

for spanning point x ˜m and its successor x ˜m+1 . The total area element Op assigned to point p is simply the sum of the triangle areas

Op =

X

o˜m

m

30

ACS Paragon Plus Environment

(45)

Page 31 of 75

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Finally, given the Voronoi-tessellation of the surface, the cavity volume can trivially be determined from the signed volume increments 1 V p = Op h 3

(46)

of the pyramids formed by the Voronoi cells and an arbitrary (global) origin s. The height h of these are calculated as the signed distance between the origin and the plane of the Voronoi cell defined by point p and its normal vector np

h = (s − p) · np

(47)

The total surface area O and the total volume V of the cavity are then simply given by the sum of all points’ increments Op and Vp . The choice of the origin is thereby mostly arbitrary, as volume elements outside the cavity would simply cancel out. Yet, for reasons of numerical accuracy it can be advantageous to pick an s near the center of the solute. As illustrated in Figure 7, the method presented here has been verified by comparing obtained point cloud measures for small organic molecules (see Section 5.3) with reference calculations with the open source tool MeshLab’s 46 ball pivoting and geometric measures tools. This comparison indicates an excellent agreement of both methods for the systems under investigation (see Section 5.3) even for the lowest point densities employed here.

5

Benchmark Systems

In the following we apply our implementation of the MPE model in the DFT code FHI-aims to a number of benchmark systems. In order to gauge the accuracy of the dielectric response of the model independently of other, non-electrostatic effects we thereby first apply it directly to the analytically solvable spherical Born model. Then we compare the MPE potential for a number of 2D systems to a high-accuracy finite-element solution of the generalized

31

ACS Paragon Plus Environment

300

400 ˚ ) VpVoronoi (A

R2 = 0.999 ˚2 σ = 2.29 A

R2 = 1.000 ˚2 σ = 1.12 A

3

R2 = 0.999 ˚2 σ = 3.70 A

˚ ) VpVoronoi (A

(a-1)

˚ ) VpVoronoi (A

200 100 0 300

3

(b-1)

200 100 0 300

(c-1)

3

˚ 2) OpVoronoi (A ˚ 2) OpVoronoi (A

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

˚ 2) OpVoronoi (A

Journal of Chemical Theory and Computation

200 100 0

0

(a-2)

300 200 R2 = 0.991 ˚3 σ = 6.45 A

100 0 400

(b-2)

300 200 R2 = 1.000 ˚3 σ = 1.64 A

100 0 400

(c-2)

300 200 100 0

R2 = 0.982 ˚3 σ = 8.55 A 0 100 200 300 400

100 200 300

3

2

˚ ) Omeshlab (A

˚ ) Vmeshlab (A

Figure 7: Correlation between surface areas (left column) and volumes (right column) obtained by applying the MeshLab 46 tool and the method presented in this work to cavities generated for small molecules at various iso-density values (cf. Section 5.3). Results are shown for point densities of the point clouds that yield a degree of determination, dod , of the MPE equations of approximately (a) two, (b) four, and (c) eight for the respective systems at expansion orders of lmax,R = 6 and lmax,O = 6.

32

ACS Paragon Plus Environment

Page 32 of 75

Page 33 of 75

Poisson equation. The interplay of MPE electrostatics and non-electrostatic effects computed from volume and surface of the solvation cavity is finally explored for a test-set of 240 experimentally measured molecular solvation free energies.

5.1

Born Equation

The first benchmark model discussed here is the (electrostatic) free energy of solvation, ∆Gel , of a spherical, atomic ion with radius a, as calculated by Born 17 in 1920 1 ∆Gel = − 2



1 1− εb



q2 a

(48)

where q is the charge of the ion and εb the relative dielectric permittivity of the solvent. As already observed by Kirkwood 21 in the 1930s the multipole moment expansion model trivially agrees with the Born equation in case of spherically symmetric problems. Then, the multipole expansions of the involved electrostatic potentials, eqs 11 and 12, reduce to monopoles centered at the origin. When evaluated at any one point on the interface the symmetry adapted boundary conditions, eq 15, then lead to a square (2 × 2) system of equations, which can be solved to yield eq 48. Due to the existence of this analytical solution, 2.0

(a) a = 2 a0

1.5

−∆Gel (Eh )

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(b) a = 3 a0

q = ±3 e

1.0 0.5

q = ±2 e

0.0

q = ±1 e 0.0

0.5

1.0 0.0 1 − 1/εb

q = ±3 e q = ±2 e q = ±1 e 0.5

1.0

Figure 8: Comparison of the free energy of solvation ∆Gel for ions calculated with charges q = ±1 e (blue), q = ±2 e (yellow), and q = ±3 e (red) in spherical cavities with two different radii: (a) a = 2 a0 and (b) a = 3 a0 . Shown are results from the MPE model (dots) and analytical results from the Born equation (lines), see text. spherical problems are an ideal first test-case for any numerical implementation. In Figure 8 33

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

we compare our results for the Born model with the analytical reference for 3 charges q and can thereby rule out implementation errors and significant numerical inaccuracies.

5.2

2D Model Systems

As a second benchmark we next address more complex charge distributions, while, for now, keeping the simple radially symmetric geometry of the cavity. As there exists no analytical solution to such a problem we compare here to a direct solution of the generalized Poisson equation, eq 2, calculated by KARDOS, 47 an adaptive finite element (FEM) solver for nonlinear parabolic systems of partial differential equations. It should be noted that in principle, a solution to eq 2 or similar equations can be obtained efficiently with finite element methods, also for realistic systems. 10 Most of such established approaches, however, do rely on regularly spaced integration grids whereas FHI-aims or other electronic structure codes based on localized basis sets use atom-centered, non-uniform spherical grids. For reasons of efficiency and accuracy we therefore avoid interpolating between the two grids, favoring the multipole moment expansion method for the all electron solvation problem. A more in-depth discussion of the finite-element approach is given in Section F in the appendix. 5.2.1

Benchmark Setup

In order to efficiently achieve highly accurate FEM solutions, the (inherently 3D) model systems in this section are constructed such that they exhibit rotational symmetry around the Cartesian z-axis allowing for a projection into a 2D cylindrical coordinate system. It is spanned by the polar axis ρ and the longitudinal axis ζ, which coincides with the Cartesian z-axis. This allows to employ significantly refined FEM grids with a minimum mesh size of 3 × 10−3 a0 . Due to symmetry, the multipole expansion basis of such a system’s Hartree potential Φ0 can be reduced to irregular solid harmonics I , as defined in eq 10b, positioned

34

ACS Paragon Plus Environment

Page 34 of 75

Page 35 of 75

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 36 of 75

For all of those model systems, two calculations are performed: First, a discrete grid representation of the reaction field Φ′ , cf. eq 89 in the appendix, is calculated with the FEM solver KARDOS. Second, the corresponding polarization potential ΦP , cf. eq 8, obtained with the MPE method is calculated with the expansion center K of the reaction field ΦR being placed at the origin of the coordinate system, cf. eq 12b, whereas the multiple centers of the potential ΦO are identical to the positions of the multipole charges in ΦH = Φ0 . In all MPE calculations, ΦO is expanded up to an order of lmax,O = 6. The expansion order of the reaction field ΦR , lmax,R , will be subject to a convergence study. The subsequent discrete evaluation of ΦP is performed on the same grid used for the representation of Φ′ to avoid interpolation errors. 5.2.2

Integration

In order to assess the agreement between FEM and MPE results, it is convenient to derive scalar quantities that can easily be compared. To this end, a good observable is the interaction energy E of a charge distribution κ with the electrostatic potential Φ.

E=

Z

κ(r)Φ(r)dr

(50)

As the benchmark systems consists only of classical point multipoles, eq 50 would only probe the potential at the position of the expansion centers. Therefore, we introduce the fictitious density κ in two different “flavors”: κhom is equivalent to a homogeneous charge distribution of a single elementary charge in the cavity and κGauss to a Gaussian shaped distribution of one elementary charge with a width of τ 2 = 0.1 a0 2 , i.e. 2 % of the density is located outside of the cavity.   1e   , ||r||2 ≤ rc κhom (r) = Vcav   0, otherwise 36

ACS Paragon Plus Environment

(51a)

Page 37 of 75

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

κGauss (r) = √

1e 2πτ 2

3

r2 exp − 2 2τ 



(51b)

While the homogeneous fictitious charge distribution is a good probe for the overall agreement between MPE and FEM, κGauss represents a weighting of the integration towards the physically relevant regions close to the interaction centers. The integration of eq 50 is performed numerically on the adaptive triangular grid created by the finite element solver using integration weights dVi at grid points ri as described in Section F.4 in the appendix

E=

X i

5.2.3

κ(ri ) · Φ(ri ) · dVi

(52)

Benchmark Results

For each of the N = 139500 geometries in the 2D test-set, the fictitious energy expression in eq 52 is evaluated for the four possible combinations of Φ ∈ {Φ′ , ΦP } and κ ∈ {κhom , κGauss }. Figure 10 depicts the correlation of EFEM,Gauss and EMPE,Gauss . While the coefficient of determination R2 already hints at an excellent correlation for the lowest expansion order of lmax,R = 2 for ΦP , the root-mean-square error σ v u N u1 X t (EFEM,Gauss − EMPE,Gauss )2 σ= N i=1

(53)

improves visibly when the expansion order is increased. At an expansion order of lmax,R = 4, the seemingly large value of σ = 40 × 10−3 Eh is already less than 1 % of the spanned energy range. Considering the simple, highly symmetric dielectric function in this synthetic benchmark, these results suggest a minimal expansion order of 4 for the application on realistic (molecular) geometries. Yet, we point out that the expansion order of the potentials is a convergence parameter which in principle needs to be checked for each system. The corresponding correlation for the homogeneous charge density is not shown here

37

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment

Page 38 of 75

Page 39 of 75

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

parameters, describing surface tension as well as dispersion and excluded volume effects of the solvent. In order to parametrize and benchmark the MPE model for real systems, we thus compare calculated free energies of solvation with experimentally measured values for different test sets (cf. Section 5.3.3). 5.3.1

Fitting Procedure

In order to minimize the computational effort connected with DFT calculations, we used the following parametrization scheme for a first search of the parameter space: 1. For a given molecule M in the fitting set, ∆Gel,M (̺iso ), OM (̺iso ), and VM (̺iso ) depend on ̺iso (which defines the cavity shape), but not on the other fitting parameters, α and β. For a small number (< 10) of different iso-density values picked from the range −3 −3 from ̺iso,min = 1 me ˚ A to ̺iso,max = 100 me ˚ A where the optimal value is expected

to be found, DFT calculations are performed. The obtained ∆Gel,M (̺iso ), OM (̺iso ), and VM (̺iso ) are in the following called nodes. 2. Based on all (explicitly calculated) nodes, the analytical spline representations ∆Gspl el,M (̺iso ), spl (̺iso ), and VMspl (̺iso ) are constructed using univariate cubic splines OM

1

as provided

by the open-source software package SciPy. 48 3. The deviation from the experimentally measured solvation free energy ∆Gsol,ref,M for each individual molecule can then analytically be expressed by

spl spl dspl M (̺iso , α, β) = ∆Gel,M (̺iso ) + α OM (̺iso )

+ β VMspl (̺iso ) − ∆Gexp sol,M 1

We use a smoothing factor of s = 1 for O and V (due to a limited accuracy of our 2D-projected Voronoi algorithm) and a strict interpolation, i.e. s = 0, for ∆Gel

39

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Note that this is just an approximation of the actual deviation, dM , unless the chosen value of ̺iso falls directly on a node. 4. We perform a minimization of the sum of squared deviations for all molecules Dspl (̺iso , α, β) =

X

dspl M (̺iso , α, β)

M

2

using the L-BFGS-B minimization method 49,50 as provided by the open-source software package SciPy 48 under the boundary condition that

̺iso,min ≤ ̺iso ≤ ̺iso,max to avoid extrapolation errors for the splined functions. Due to the low dimensionality of the optimization problem, we can ensure to find the global minimum by using 9 × 9 × 9 different sets of starting parameters for the optimization in the ranges

̺iso ∈ [̺iso,min , ̺iso,max ] i h −2 −2 α ∈ −100 meV ˚ A , 100 meV ˚ A i h −3 −3 β ∈ −100 meV ˚ A , 100 meV ˚ A

Note that instead of (costly) DFT calculations only spline evaluations have to be performed during the numerical optimizations, each yielding a set of optimal parameters ˜ If—for any molecule in the fitting set—˜ (˜ ̺iso , α ˜ , β). ̺iso is too far away from the closest node (i.e. the absolute difference to the iso-density value of this node is larger than −3 a given threshold, here e.g. 0.1 me ˚ A ), then new nodes are obtained by performing

additional DFT calculations for all molecules for ̺iso = ̺˜iso . 5. If any new nodes have been created in the last step, we repeat from step 2 to minimize approximation errors due to spline interpolation. 40

ACS Paragon Plus Environment

Page 40 of 75

Page 41 of 75

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

˜ with the lowest corresponding value for D 6. The set of obtained parameters (˜ ̺iso , α ˜ , β) is the desired set of optimized parameters. 5.3.2

Computational Details

All DFT calculations are performed with the FHI-aims package employing collinear spin and an “atomic ZORA” scalar-relativistic correction. 1 Calculations are performed for different exchange-correlation functionals—PBE, 51 RPBE, 52 PBE0, 53 and HSE06 54,55 —and numerical settings. In FHI-aims the latter concern the employed numeric atomic orbital (NAO) basis sets—which are organized in levels, or “tiers”—as well as the density and cutoff-radius of the employed radial integration grids that can be chosen from “light” to “tight” and “really tight” settings. These internal settings and the basis set are described in more detail by Blum et al. 1 . For the dielectric constant of water we use a value of εb = 78.3553. We restrict the evaluation of continuity conditions to eqs 14a and 14b. The external potential ΦO (eq 12a) is expanded at the position of all atoms and the reaction field ΦR (eq 12b) is expanded at the geometric center of the molecule, i.e. the arithmetic mean of all atoms’ coordinates. Suitable expansion orders for ΦO and ΦR are derived from a convergence study on the basis of four randomly selected molecules from the test-set summarized in Figure 11. The expansion order for ΦO , lmax,O , is chosen equal to the expansion order lmax,H of the Hartree potential ΦH (eq 11), i.e. 4 for light, 6 for tight, and 8 for really tight integration grid settings. However, the flat dashed lines in Figure 11 indicate that a lower value may already be sufficient. ΦR is expanded up to an order of lmax,R = 8 which is necessary to observe convergence within 10 % of the final result indicated by the shaded areas in Figure 11. Note, however, that the observed convergence behavior is expected to be heavily influenced by the cavity’s size and shape and a much higher expansion order lmax,R might be necessary for more complex systems than the small organic molecules investigated here.

41

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

The cavity is constructed once based on the converged electron density of the vacuum calculation with parameters listed in Table 1, targeting an eightfold over-determination of the MPE equations (cf. eq 27), and its shape is fixed throughout the SCF cycle. 0.6

1 − ∆Gel /∆Gconv el

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(a)

lmax,O = 4 lmax,R = 8 lmax,R = lmax,O

(b)

lmax,O = 6 lmax,R = 8 lmax,R = lmax,O

0.4

0.2

0.0

4

8

12

4

8

12

lmax

Figure 11: Relative deviation of the calculated electrostatic solvation free energy, ∆Gel , with respect to the value obtained for the largest employed expansion orders, ∆Gconv el , i.e. lmax,R = 14 for ΦR and lmax,O = 14 for ΦO , as a function of electrostatic potential expansion order lmax for three different cases: variation of lmax = lmax,R at a fixed expansion order for ΦO (solid lines), variation of lmax = lmax,O at a fixed expansion order for ΦR (dashed lines), and variation of expansion order of ΦR and ΦO at the same time, lmax = lmax,R = lmax,O (dash-dotted lines). Results are calculated with the PBE functional at (a) light and (b) tight integration grid settings for four randomly selected molecules out of the test-set T1 presented in Section 5.3.3: biphenyl (chemical formula C12 H10 , drawn in blue), morpholine (C4 H9 NO, green), diethyl disulfide (C4 H10 S2 , yellow), and 1,3-dioxolane (C3 H6 O2 , purple). In all of these cases, ∆Gconv is in the range of −0.32 eV to −0.23 eV. The shaded areas el indicate an absolute deviation from ∆Gconv of 10 % or less. el

5.3.3

Parametrization Results

For the parametrization of our model, we utilize different test-sets with experimentally measured free energies of hydration for (T1) 239 organic molecules 56 and water, 57 (T2) 274 neutral solutes including the water dimer, 42

ACS Paragon Plus Environment

Page 42 of 75

Page 43 of 75

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(T3) 52 singly-charged (unclustered) cations, (T3c) 52 singly-charged, selectively clustered cations, (T4) 60 singly-charged (unclustered) anions, and (T4c) 60 singly-charged, selectively clustered anions. Molecular geometries for T1 have been provided by Andreussi et al. as used in their earlier work. 2 The other test-sets (T2, T3, T3c, T4 and T4c) are part of the 2012 version of the Minnesota solvation database. 3 T2 thereby corresponds to subset [a] of the full database while T3 and T4 form subset [i ], both of which are described in detail in the Minnesota database manual. 3 T3c is almost identical to T3 except for 8 molecules that are selectively clustered with a single water molecule in set T3c. The same relationship holds for T4 and T4c with 23 selectively clustered molecules in T4c. 58 The results of the fitting for neutral and cationic components of the Minnesota solvation database are summarized in Table 2 which is a compilation of our most general parameter sets of Table 5 in the appendix optimized for neutral and unclustered cationic solutes (testsets T2 and T3). These parameters should thus be suitable for the study of most neutral and cationic molecular solutes. For reference, full results of the optimization procedure for all test-sets, different DFT exchange-correlation functionals as well as different basis set and integration grid settings are given in Tables 5 and 6 in the appendix. We observe a convergence of the optimization procedure after ≤ 12 DFT calculations for each molecule, nine of which are the starting nodes, indicating the high efficiency of this approach. As an observable we compare the calculated solvation free energy values to the experimental references of the test-set and calculate the difference dM of the two for each molecule M . From these differences we calculate the mean absolute error (MAE)

MAE =

1 nmolecules

43

nmolecules X M =1

|dM |

ACS Paragon Plus Environment

(55)

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 44 of 75

Table 2: Proposed Parameter Sets for Neutral and Cationic Solutesa name of parameter set

xc

 ̺iso  −3 me ˚ A



α  −2 ˚ meV A



β  −3 ˚ meV A

PBE 12.5 2.80 −2.23 PBE 12.0 0.761 — RPBE 12.2 2.65 −2.11 RPBE 11.8 0.732 — PBE0 12.1 2.88 −2.21 PBE0 11.6 0.848 — HSE06 12.1 2.91 −2.25 SPANC HSE06 11.7 0.851 — SPANC-surf a For each DFT exchange-correlation functional (xc), two parameter sets are given: one with a volume-dependent non-electrostatic contribution (determined by parameter β), the other without. and the root-mean-square deviation (RMSD) v u u RMSD = t

1 nmolecules

nmolecules X

dM 2

(56)

M =1

of the whole test-set. Table 3: Deviations in Hydration Free Energies from Experiment for an Optimized Set of MPE Parametersa charge (e) MAEb (meV) RMSDc (meV) 0 50.8 69.2 0 44.4 60.0 1 108.6 159.7 1 85.4 118.0 −1 493.1 560.2 −1 419.5 459.5 −3 −2 −3 a ˚ , β = −2.25 meV ˚ ̺iso = 12.1 me ˚ A , α = 2.91 meV A A (as listed in Table 2 for the b HSE06 hybrid functional, printed in bold); Mean absolute error; c Root-mean-square deviation. test-set T1 T2 T3 T3c T4 T4c

We find that the choice of density functionals and basis sets used in the fitting procedure exerts considerable influence on the solvation parameters, partially due to the fitted solvation model compensating for errors in the functional (cf. Table 5 in the appendix). Aiming for a generally applicable solvation model this is per se not ideal as it is often not possible to re-fit 44

ACS Paragon Plus Environment

exp

∆Gsol (kcal/mol) 0.2

−12

−8

−4

4

0

4

∆Gsol (eV)

0.0

0

−4

−0.2

−8

∆Gsol (kcal/mol)

−0.4 −12 −0.6 −0.6

−0.4

−0.2

0.0

0.2

exp

∆Gsol (eV)

Figure 12: Correlation between calculated (∆Gsol ) and experimentally measured (∆Gexp sol ) free energies of solvation for the neutral test-set T1 of small organic molecules (see Section 5.3.3). The dashed gray line represents an ideal correlation. Calculations are done using the “HSE06” functional and “tight” integration grid and basis settings (see text) and corresponding MPE parameters for set T1 as listed in Table 5 in the appendix.

exp

∆Gsol (kcal/mol) −100

−50 T2 (neutrals)

0

−2

0

0

T4 (anions)

−50

∆Gsol (kcal/mol)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

∆Gsol (eV)

Page 45 of 75

T3 (cations)

−4

−100

−4

−2

0

exp

∆Gsol (eV)

Figure 13: Correlation between calculated (∆Gsol ) and experimentally measured (∆Gexp sol ) free energies of solvation for the test-sets T2, T3, and T4 (see Section 5.3.3). The dashed gray line represents an ideal correlation. Calculations are done using the “HSE06” functional and “tight” integration grid and basis settings (see text) and corresponding MPE parameters listed in Table 2 (printed in bold).

45

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

parameters for every eventuality. In order to determine the errors of using a parameter set fitted with a given functional to another one we therefore examine such cross-combinations of different parameters and DFT functionals for test set T1 in Table 4. While this study shows Table 4: Mean Absolute Errors for Cross-Combination of Parameter Sets for Test Set T1a,b

a

optimized for → used with ↓ PBE l PBE t PBE rt RPBE t PBE0 t HSE06 t All energies are given in units

PBE RPBE PBE0 HSE06 l t rt t t t 47.2 56.6 55.8 59.6 49.7 50.1 49.7 46.5 46.1 47.9 47.9 47.6 49.0 46.8 46.3 48.2 47.6 47.4 52.7 46.7 46.5 47.5 50.2 49.8 47.4 54.3 53.5 56.9 48.9 49.1 47.4 53.7 52.9 56.2 48.7 48.9 b of meV; l=light, t=tight, and rt=really tight integration grid settings in FHI-aims.

some deviations of the solvation free energies determined with one density functional and solvation parameters determined with another one, these deviations remain relatively small. As a best-practice approach Table 4 implies that parameters determined with highest computational settings and a high-accuracy hybrid functional perform best even for calculations with lower settings and cheaper generalized gradient approximation (GGA) functionals. Further focusing on the test-set T1, Figure 12 illustrates the correlation of calculated and experimentally measured free energies of solvation for this test-set with the fitted solvation parameters for the “PBE” functional and “tight” integration grid and basis settings. The MAE of this correlation is 46.5 meV and the RMSD amounts to 64.9 meV. All combinations listed under “T1” in Table 5 lead to very similar correlation graphs and error values with MAEs ranging from 47.4 to 50.1 meV and RMSDs from 66.3 to 70.7 meV. We checked the robustness of our fit by using different subsets of the full set T1 as training sets (see Section A.2 in the appendix) all of which result in similar MAE (45.3 to 51.4 meV) and RMSD values (64.9 to 68.4 meV) for the whole test-set. Using test-set T2 as fit set yields slightly different parameters (cf. Table 5 in the appendix) despite both being sets of neutral, small, and mostly organic molecules. However, 46

ACS Paragon Plus Environment

Page 46 of 75

Page 47 of 75

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

the obtained MAE (41.6 to 43.3 meV) and RMSD values (57.2 to 59.1 meV) are similar and said parameter differences are comparable to the ones observed when fitting on different subsets of T1 (cf. Section A.2 in the appendix). Including singly charged cations in the fit—i.e. using test-sets T2 and T3—again changes the parameters. In Tables 2 and 5, only the MAEs of the whole training set are shown (54.6 to 57.2 meV; RMSDs in the range of 83.6 to 84.5 meV) which are larger than the ones found when using only T2 as training set (see above). Evaluating the deviations for the different subsets separately reveals that the MAE and RMSD values for T1 and T2 are very similar as representatively shown in Table 3 for the HSE06 hybrid functional and “tight” integration grid and basis settings. Although the major increase in deviations can be attributed to the cationic subset T3 which shows MAE values in the range of 108 to 111 meV and RMSD values from 157 to 162 meV—depending on DFT functional and basis settings— relative errors in the set T3 are typically lower than in T2 as the absolute free energies of hydration is significantly larger for T3 (2 to 5 eV) than for T2 (< 1 eV). The deviation on the same cations for the same parameters can even be decreased down to 84.2 to 85.6 meV by selectively clustering certain “problematic” ions with one explicit water molecule as done in test-set T3c. A specific fit including these selectively clustered cations, i.e. fitting to test-set T2+T3c, can only slightly decrease this deviation to 79.4 to 82.4 meV and in general yields very similar parameters and errors as fits to T2+T3. Adding singly charged anions to the training set—i.e. using test-sets T2, T3, and T4— massively changes the non-electrostatic parameters α and β and also yields significantly larger MAEs on the whole training set (134 to 152 meV) as well as on the subsets (65.8 to 71.0 meV for T2, 195 to 214 meV for T3, and 397 to 466 meV for T4) as listed in Table 6. Similar observations are made for the RMSD values. This is consistent with the findings of previous work 43 employing an iso-density approach and the same non-electrostatic model where the authors conclude to propose two different parameter sets for neutral and cationic and for anionic molecules. Very recent investigations 59 reach a better description of ions by

47

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

using soft-spheres around the atoms of a molecule as cavity instead of an iso-density cavity. Nonetheless, the authors still propose two different parameter sets indicating limitations of the very simple non-electrostatic model shared in this work to describe both types of ions simultaneously. Note that fitting the model to test-set T4 alone is, of course, possible and yields acceptable accuracy compared to other implicit solvation models 58–60 with MAEs in the range of 161 to 178 meV. However, the obtained parameter sets (see Table 6 in the appendix) differ strongly for different functionals, basis sets and integration grid settings. This can partially be remedied by fitting again to a test-set of selectively clustered anions, T4c, yielding a more uniform set of parameters with respect to computational settings and lower MAEs (128 to 141 meV). All of the anionic fits lead to much larger iso-density values and mostly also to different signs in the non-electrostatic parameters compared to the neutral and cationic fits and such optimized parameters produce huge errors when applied to test-sets T1, T2, T3, or T3c. Finding a general description of neutral, cationic, and anionic molecules using the same set of parameters is beyond the scope of this work and may possibly require modifications to the presented methodology. We thus concentrate on neutral and cationic solutes, which are already of great interest e.g. in the fields of photo-electrocatalysis, 61 and for which common parameters can be found. As the most general set of parameters we thus propose the settings for the high-accuracy HSE06 functional fitted to test-sets T2 and T3 (highlighted in bold in Table 2). For easier reference we term these the MPE-Solvation Parameters for Neutrals and Cations (MPE-SPANC). Figure 13 shows the correlation of experimental versus calculated solvation free energies using the MPE-SPANC parameters for test-sets T2, T3 and T4, illustrating again the good model performance for neutrals and cations. −2 The optimized MPE-SPANC non-electrostatic parameters, α = 2.91 meV ˚ A and β = −3 −2 −2.25 meV ˚ A , agree well with the findings of previous work 2 (0.69 to 3.12 meV ˚ A and

−3 −0.50 to −2.18 meV ˚ A for the two-parameter model) which provided the model for non-

48

ACS Paragon Plus Environment

Page 48 of 75

Page 49 of 75

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

electrostatic contributions employed in this work. The iso-density value ̺iso is more difficult to compare directly since Andreussi et al. 2 use a smooth dielectric function with an onset −3

ρmin and an offset parameter ρmax . The optimized value of ̺iso = 12.1 me ˚ A

lies between

those two parameters or is at least close to the offset parameter ρmax of approximately 10 to −3

34 me ˚ A

. Another previous study 42 suggests an optimal iso-density value of approximately −3

3.4 to 13 me ˚ A

which also compares well with our result.

Finally, inspired by ideas put forward by Fisicaro and co-workers 59 and as also previously employed by Andreussi and co-workers 2 we determined optimized parameter sets with the volume-dependent non-electrostatic term constrained to zero, i.e. β = 0, which is included in Tables 5 and 6 in the appendix. Compared to the three-parameter model (̺iso , α, β), this two-parameter model (̺iso , α) obviously shows less accurate results with MAEs increased by roughly 20 % for neutral solutes (test-sets T1 and T2), 10 % for neutral and cationic solutes (T2+T3 and T2+T3c), and less than 5 % for anionic solutes (T4 and T4c). Eliminating an explicit dependence on the cavity volume, however, serves as a stepping stone for future calculations of solvation effects on extended surfaces. In analogy to MPE-SPANC, we term the parameter set for the high-accuracy HSE06 functional fitted to test-sets T2 and T3 as given in Table 2 MPE-SPANC-surf. 5.3.4

Timings

Having demonstrated the accuracy of our MPE scheme for a representative test-set we now turn to the computational cost of the method. As outlined, the overhead compared to vacuum calculations is due to cavity generation, solution of the SLE, and the construction of the reaction potential. In Figure 14 we depict the relative overhead compared to a vacuum calculation for both PBE and the HSE06 hybrid functional. Note that for both functionals the relative overhead is only a fraction of the total cost of the respective vacuum calculation (approximately 10 to 20 % for PBE and 2 % for HSE06). While for the relatively cheap PBE functional the share of MPE on the total computing time with the number of atoms in

49

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment

Page 50 of 75

Page 51 of 75

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

6

Discussion and Conclusions

We presented a re-formulation of the classic MPE implicit solvation method geared towards high efficiency and accuracy in all-electron electronic structure codes with localized basis sets. The FHI-aims code used in this work is especially well suited for the MPE method as it provides an efficient way of evaluating the Hartree potential at arbitrary points in realspace in its standard SCF cycle, thus greatly reducing the overhead necessary to perform implicit solvation calculations. In this context the main computational task in applying the MPE solvation model is the numerical solution of an over-determined system of equations arising from smoothness and boundary conditions at the interface. The size of this system of linear equations is governed by the chosen degree of determination, reflected in the number of points at the solvation cavity surface. We carefully gauged the degree of determination necessary to maximize the rank of the MPE coefficient matrix while at the same time keeping the computational cost low, to find that the optimal accuracy and efficiency is already reached at a threefold over-determination. We also presented a novel method of finding an evenly distributed set of points at the solvation cavity surface, necessary to set up the MPE equations in the first place. To this end, we utilized a molecular dynamics scheme of cavity point walkers based on a fictitious potential energy surface designed to yield fast convergence towards the desired set. Given such a set, we also put forward a way to determine surface area and volume of the point cloud using a local Voronoi-tessellation which we found to be in excellent agreement with algorithms in dedicated reference codes. We tested the reliability and accuracy of our implementation on three classes of benchmark systems. First, to rule out numerical and implementation errors, we compared our MPE code to the analytical solution of the Born point-charge model, finding near perfect agreement. Second, for a ≈ 140000 element 2D test-set, we compared the calculated solvent response potential to that calculated using a high-accuracy FEM method, which would be impractical for a DFT code utilizing non-Cartesian grids, such as FHI aims. There, our method again showed very good correlation with the reference method, and very small 51

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

standard deviations already at low multipole moment expansion orders. The general applicability of our set of methods to realistic systems was then shown for different molecular test-sets (neutral and singly charged) all of which have already been applied by other implicit solvation studies. Based on a fit to a combination of test-sets we presented an optimized parameter set (MPE-SPANC) which showed very good accuracy for both neutral and cationic solutes. A variant of MPE-SPANC without non-electrostatic volume term (MPE-SPANC-surf) was given to facilitate the model’s use in future work. For all combinations of density functionals and optimized parameter sets, our method showed mean absolute errors with respect to experimentally measured free energies of hydration below 49 meV for neutral molecules and 111 meV for cations (85 meV for the selectively clustered set). These numbers compare very well to the performance of other implicit solvation models with a similar non-electrostatic model. 2,20,43,59 For neutral molecules in water, a better description can be achieved by the so-called “SMx” models (with reported MAEs of 24 meV for SM8, 62 26 to 41 meV for SMD, 63 25 to 36 meV for SM12 58 ) or by CMIRS 60 (with a reported MAE of 34 meV) and other models that employ more elaborate descriptions of the non-electrostatic terms. All of these models, however, rely on at least a few up to many more tunable parameters. Similar to the findings of others using the same very simple nonelectrostatic model used here together with an iso-density 43 or with a soft-sphere cavity, 59 we were unable to determine a single set of parameters to describe neutral, cationic and anionic molecules at the same time even remotely as accurately as neutrals/cations and anions separately. A parameter fit exclusively for anions showed mean absolute errors below 178 meV (141 meV for the selectively clustered set) which is also in good agreement with the performance of other implicit solvation methods. 43,58–60 Finally, in determining the relative overhead of our method over the respective vacuum calculations we showed that our (not yet fully optimized) MPE implementation only adds / 20 % (GGA) or < 5 % (hybrid) to the total cost of a DFT energy calculation for the here considered small organic solutes. For larger systems the overhead cost will even further be

52

ACS Paragon Plus Environment

Page 52 of 75

Page 53 of 75

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

decreased. At this low cost the presented DFT+MPE implicit solvation method constitutes an appealing approach to effectively treat solvation effects in first-principles calculations.

7

Acknowledgments

The authors gratefully acknowledge support from the Solar Technologies Go Hybrid Initiative of the State of Bavaria and the German Science Foundation DFG (Grant OB425/4-1). S.M.’s research is carried out in the framework of Matheon supported by the Einstein Foundation Berlin. V.B. acknowledges an August-Wilhelm Scheer Visiting Professorship at the Technical University of Munich, 2016-17. We thank Oliviero Andreussi and Nicola Marzari for providing access to their solvation test set.

A A.1

Benchmark System Small Organic Molecules Parametrization Results for Different DFT Functional Approximations, Integration Grids and Basis Settings

The results for neutral and cationic solutes are summarized in Table 5 whereas parametrization results involving anionic solutes are shown in Table 6.

A.2

Complementary Statistical Analysis

In order to test the robustness of the parametrization procedure (see Section 5.3.1) we divide the whole test-set T1 of 240 small organic molecules described in Section 5.3.3 in different ways, each time into a training set used to fit the MPE parameters (̺iso , α, and β) and a validation set. The training sets thereby consist of 80, 120, or 160 molecules randomly drawn from the full test-set in 16 different ways each. For every individual training set the parameters are optimized as described in Section 5.3 using the same computational settings.

53

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 54 of 75

Table 5: Parameter Optimization Results For Neutral and Cationic Molecules Onlya xcb

set.c

 ̺iso  −3 me ˚ A



α  −2 meV ˚ A

PBE PBE PBE PBE PBE PBE RPBE RPBE PBE0 PBE0 HSE06 HSE06

l l t t rt rt t t t t t t

13.5 11.0 15.4 12.0 15.6 12.0 15.7 12.1 12.5 10.0 12.5 10.0

T1 (240 solutes) 4.56 0.869 5.58 0.815 5.76 0.818 5.70 0.798 4.81 0.744 4.83 0.733

PBE PBE PBE PBE PBE PBE RPBE RPBE PBE0 PBE0 HSE06 HSE06

l l t t rt rt t t t t t t

14.6 12.8 16.3 14.2 16.5 14.1 16.6 14.4 14.1 12.2 14.1 12.2

T2 (274 solutes) 3.63 0.999 4.08 0.989 4.28 0.989 4.08 0.999 3.77 0.920 3.76 0.908

PBE PBE PBE PBE PBE PBE RPBE RPBE PBE0 PBE0 HSE06 HSE06

l l t t rt rt t t t t t t

T2+T3 (326 solutes) 12.4 2.73 11.9 0.908 12.5 2.80 12.0 0.761 12.3 2.79 11.8 0.746 12.2 2.65 11.8 0.732 12.1 2.88 11.6 0.848 12.1 2.91 11.7 0.851



β  −3 meV ˚ A

MAEd (meV)

−3.98 0 −5.24 0 −5.42 0 −5.42 0 −4.40 0 −4.42 0

47.2 53.6 46.5 56.3 46.3 56.3 47.5 57.9 48.9 56.8 48.9 57.0

−2.78 0 −3.35 0 −3.56 0 −3.36 0 −3.04 0 −3.04 0

42.8 48.7 41.9 49.8 41.6 49.9 43.3 51.0 42.8 50.2 42.9 50.3

−1.97 0 −2.23 0 −2.23 0 −2.11 0 −2.21 0 −2.25 0

54.9 58.9 55.4 60.3 55.6 60.4 57.2 61.4 54.7 59.9 54.6 60.0

T2+T3c (326 solutes) PBE l 11.6 2.65 −1.97 50.7 PBE l 11.2 0.818 0 54.7 11.8 2.70 −2.18 51.8 PBE t PBE t 11.4 0.692 0 56.4 11.7 2.68 −2.17 52.1 PBE rt 11.2 0.675 0 56.6 PBE rt RPBE t 11.6 2.53 −2.03 53.4 11.2 0.654 0 57.5 RPBE t 11.4 2.75 −2.14 51.4 PBE0 t PBE0 t 11.0 0.778 0 56.2 11.5 2.77 −2.17 51.3 HSE06 t HSE06 t 11.0 0.772 0 56.3 a For all computational settings, two parameter sets are given: one with a volume-dependent non-electrostatic contribution (determined by parameter β), the other without; b DFT exchange-correlation functional; c FHI aims basis-set and integration grid settings with abbreviations l=light, t=tight, and rt=really tight; d Mean absolute error.

54

ACS Paragon Plus Environment

Page 55 of 75

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Table 6: Parameter Optimization Results Including Anionic Moleculesa xcb

set.c

 ̺iso  −3 me ˚ A



β  −3 meV ˚ A

MAEd

0.221 0 0.497 0 0.649 0 0.659 0 0.109 0 0.104 0

134.5 134.2 148.3 147.7 149.1 148.1 151.8 150.8 137.2 136.9 137.8 137.6

(386 solutes) 0.927 −0.185 0.760 0 0.512 0.078 0.581 0 0.406 0.206 0.591 0 0.435 0.137 0.556 0 1.08 −0.351 0.769 0 1.05 −0.332 0.754 0

118.2 118.4 130.4 130.3 131.1 130.9 133.8 133.7 120.1 120.5 120.5 120.9

α  −2 meV ˚ A



PBE PBE PBE PBE PBE PBE RPBE RPBE PBE0 PBE0 HSE06 HSE06

l l t t rt rt t t t t t t

T2+T3+T4 (386 solutes) 15.1 0.750 15.1 0.943 15.6 0.347 15.6 0.778 15.6 0.225 15.5 0.773 15.4 0.159 15.4 0.733 15.4 0.866 15.4 0.961 15.4 0.861 15.4 0.952

PBE PBE PBE PBE PBE PBE RPBE RPBE PBE0 PBE0 HSE06 HSE06

l l t t rt rt t t t t t t

T2+T3c+T4c 13.6 13.6 13.9 13.9 13.9 13.9 13.8 13.9 13.8 13.8 13.8 13.8

PBE PBE PBE PBE PBE PBE RPBE RPBE PBE0 PBE0 HSE06 HSE06

l l t t rt rt t t t t t t

64.4 64.5 73.5 70.4 72.9 72.7 71.8 71.8 67.3 67.3 67.3 66.2

T4 (60 solutes) 2.89 4.03 −6.05 3.83 −6.20 4.18 −7.04 3.77 −0.547 4.63 −0.773 4.38

1.73 0 16.40 0 16.43 0 17.02 0 8.01 0 8.25 0

(meV)

161.1 161.5 172.7 181.4 172.5 181.5 177.9 187.1 160.6 163.7 161.5 164.8

T4c (60 solutes) PBE l 45.9 11.02 −12.59 132.2 PBE l 31.6 0.323 0 133.9 48.0 6.82 −7.68 137.1 PBE t PBE t 39.5 0.450 0 136.6 48.1 6.77 −7.62 137.2 PBE rt 36.9 0.117 0 138.1 PBE rt RPBE t 46.7 5.92 −6.91 141.2 37.4 0.004 0 141.9 RPBE t 44.8 8.84 −9.63 127.6 PBE0 t PBE0 t 34.1 0.685 0 127.2 45.4 9.02 −9.87 127.9 HSE06 t HSE06 t 33.9 0.592 0 128.0 a For all computational settings, two parameter sets are given: one with a volume-dependent non-electrostatic contribution (determined by parameter β), the other without; b DFT exchange-correlation functional; c FHI aims basis-set and integration grid settings with abbreviations l=light, t=tight, and rt=really tight; d Mean absolute error.

55

ACS Paragon Plus Environment

error (meV)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

parameters (units see legend)

Journal of Chemical Theory and Computation

size of training set 120

80 30

  ˚ −3 ρ me A

20

  ˚ −2 α meV A

Page 56 of 75

160

240

a b c d e f g h i j k l mn o p

a

  ˚ −3 β meV A

10 0 −10 80 70 60 50 40 30 20 10

a b c d e f g h i j k l mn o p

a b c d e f g h i j k l mn o p

MAE (fit) RMSD (fit)

MAE (test) RMSD (test)

MAE (total) RMSD (total)

Figure 15: From the full test-set T1 with 240 molecules (cf. Section 5.3.3), subsets consisting of 80, 120, and 160 molecules (top axis) are randomly drawn, 16 times each (labeled a-p on the center axis). Every set is individually used as training set in a parameter fitting procedure as described in Section 5.3.1 with the same computational settings as previously used for the “PBE” functional and “tight” integration grid and basis settings (cf. Section 5.3.2). The upper panel depicts the resulting parameter sets (̺iso , α, β). The lower panel shows the mean absolute error (MAE, in red) and the root-mean-square deviation (RMSD, in blue) of the calculated free energies of solvation with respect to the experimental values considering the full test-set (“total”, as circles), only the training set (“fit”, as triangles up), or only the validation set (“test”, as triangles down) which consists of all molecules of the full test-set that are not in the corresponding training set.

56

ACS Paragon Plus Environment

Page 57 of 75

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Representatively for all combinations of functional and integration grid/basis settings investigated above, we only consider “PBE tight” here. The results of all optimizations are illustrated in Figure 15. Although the optimized parameter values can vary significantly— especially when only one third of the test-set is used as training set—the MAE and RMSD values on the total test-set vary only slightly: MAEs in the range of 45.3 to 51.4 meV for 80, 45.7 to 49.7 meV for 120, and 45.8 to 47.4 meV for 160 molecules in the training set; RMSDs in the range of 64.9 to 68.4 meV for 80, 64.9 to 66.5 meV for 120, and 64.9 to 65.3 meV for 160 molecules in the training set. Thus, a strong correlation seems to exist between all three parameters which has previously only been observed for the non-electrostatic parameters. 2 Due to this correlation, a wide range of parameter combinations can lead to similar accuracies, which means that any one of these combinations is comparatively robust. On the other hand, the results of Figure 15 also show that with the given models for electrostatics and non-electrostatics, no further improvement of the accuracy seems possible. We defer a more thorough study of this phenomenon to future work.

B

Number of Points Per Center for a Given Degree of Determination

The following approximation is based on a homonuclear spherical cluster with a volume Vtot equal to the sum of the size-independent “atomic” volumes, V1

Vtot = Ncenters V1

(57)

With the following relationship between surface and volume of a sphere O ∝ V 2/3

57

ACS Paragon Plus Environment

(58)

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 58 of 75

the ratio of the total surface area of the ensemble of atoms in the finite system (molecule, cluster etc), Otot , and the surface area of a single atom, O1 , is 2/3

Otot V 2/3 = tot = Ncenters 2/3 O1 V1

(59)

In order to obtain the same density of points on the surfaces Otot and O1 (assuming that the points are evenly distributed), the above ratio has to be equal to the ratio of the total number of points, Np , and the number of points per atom, P1 , which directly leads to eq 28.

C

Definition of the Local Coordinate Systems

Having determined an evenly distributed set of points p on the cavity surface, the next step is to construct a local Cartesian coordinate system {tp,1 , tp,2 , np } such that np is the vector locally perpendicular to the interface and tp,1 , tp,2 are two local tangent vectors. In any generalized gradient and higher calculation, density gradients are already used in the energy calculation such that np can be taken as the normalized local gradient at no additional computational cost.

np =

∇̺(p) ||∇̺(p)||2

(60)

The tangent vectors tp,1 and tp,2 —as e.g. used in eq 6c—are then chosen such as to fulfill the conditions

tp,1 × tp,2 = np

(61)

tp,1 · tp,2 = 0

(62)

and

58

ACS Paragon Plus Environment

Page 59 of 75

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Yet, these leave the rotation of the tangent vectors around np as an arbitrary remaining degree of freedom. For simplicity we therefore align tp,1 perpendicular to the z direction of the global coordinate system, if possible, otherwise perpendicular to the x-axis.

D

Calculation of the Intersection Point of Two Lines in 2D

In order to determine the intersection point ˜ zij of two lines (cf. eqs 40 and 41)

˜li :









(63a)



(63b)

qi,2  1 q˜i,1  −˜ ˜l(λi ) = 1 q ˜ i + λi v ˜ i =   + λi   2 2 q˜ q˜ i,1

i,2

and

˜lj :







qj,2  1 q˜j,1  −˜ ˜l(λj ) = 1 q ˜ j + λj v ˜ j =   + λj   2 2 q˜ q˜ j,2

j,1

with given point vectors q ˜i/j and non-collinear direction vectors v ˜i/j , we have to solve the equation ˜l(λi ) = ˜l(λj )

(64)

which leads to the following system of equations 1 1 q˜i,1 − λi q˜i,2 = q˜j,1 − λj q˜j,2 2 2

(65a)

1 1 q˜i,2 + λi q˜i,1 = q˜j,2 + λj q˜j,1 2 2

(65b)

59

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 60 of 75

Note here, that only the solution to λi (or λj ) is required to determine the intersection point via eq 63a (or eq 63b, respectively). Without loss of generality—as we will see later on—we assume that q˜j,1 6= 0. This allows us to express λj in terms of λi using eq 65b

λj = λi

q˜i,1 1 q˜i,2 − q˜j,2 + q˜j,1 2 q˜j,1

(66)

Inserting eq 66 into eq 65a and solving for λi yields

λi =

qj,2 − q˜i,2 ) + q˜j,1 (˜ qj,1 − q˜i,1 ) 1 q˜j,2 (˜ 2 q˜i,1 q˜j,2 − q˜j,1 q˜i,2

(67)

Note that the absolute value of the denominator is simply the area of a parallelogram spanned by v ˜i and v ˜j which is only zero if the two lines are collinear—a case which we have excluded beforehand. The solution in the case of q˜j,1 = 0 is also covered in the expression of eq 67 which can easily be verified by inserting q˜j,1 = 0 in eq 65b and solving for λi . The general expression for the intersection point ˜ zij in eq 42 is obtained by inserting eq 67 into eq 63a.

E

Distance Scaling for Density-Walkers

The distance scaling factor is derived from the following considerations: Given two points i line and j on a circle, the ratio of the length of the arc, sarc ij , and a straight line, sij , between

them is given by θij sarc ij 2 = θ sline sin( 2ij ) ij

60

ACS Paragon Plus Environment

(68)

Page 61 of 75

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

where θij is the angle between the two position vectors. Squaring eq 68 and using a trigonometric half angle formula, we straightforwardly obtain the following relation sarc ij line sij

!2

=

θij2 2(1 − cos θij )

(69)

Since cos θij can easily be calculated from the normal vectors on those points, cf. eq 36, we express θij2 as a function of cos θij θij2 = (arccos(cos θij ))2

(70)

and expand this function in a Taylor series at cos θij = 1, i.e. for small angles θ

θij2 = 2(1 − cos θij ) +

(1 − cos θij )2 3 +

 (1 − cos θij )3 + O (1 − cos θij )4 (71) 11.25

Truncating this series at third order, we obtain the following distance ratio sarc ij = line sij

r

1+

(1 − cos θij )2 (1 − cos θij )3 + 6 22.5

(72)

which is equal to the employed scaling factor.

F

Finite Element Solution of the Generalized Poisson Equation

F.1

Generalized Poisson Equation in Cylindrical Coordinates

For performance reasons we restrict ourselves here to systems of axial symmetry and thus use cylindrical coordinates denoted as (ρ, ζ), where any dependency on the rotation angle ϕ is

61

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 62 of 75

dropped due to symmetry. In order to check for finite size errors within the same simulation box, a scaled coordinate system (r, z) is introduced

ρ = lr r

(73a)

ζ = lz z

(73b)

with corresponding scaling factors lr and lz . The scalar field gradient in these coordinates reads

∇=

1 ∂ 1 ∂ ˆ r+ ˆ z lr ∂r lz ∂z

(74)

The divergence of a vector field U is given by

∇·U=

r) 1 ∂(U · ˆ z) 1 ∂(rU · ˆ + lr r ∂r lz ∂z

(75)

Using eqs 74 and 75, we can express the generalized Poisson equation, eq 2, in coordinates of (r, z) ∂ ∂r



rε0 ε(lr r, lz z) ∂Φ lr2 ∂r



∂ + ∂z



rε0 ε(lr r, lz z) ∂Φ lz2 ∂z

 = −4π · r̺(lr r, lz z) (76)

In the following, we will first find an expression for the relative dielectric function ε(ρ, ζ). Thereafter, eq 76 will be revisited and further simplified.

62

ACS Paragon Plus Environment

Page 63 of 75

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

F.2

Spherical Cavity and Smoothing of Dielectric Function

For the sake of simplicity, the spherical cavity of radius rc in the model case studied here is centered at the origin of the coordinate system. Analogously to eq 3, it is defined by the function C(ρ, ζ), which is < 0 inside of the cavity and > 0 outside C(ρ, ζ) =

p ρ2 + ζ 2 − r c

(77)

For the MPE calculations, the dielectric function is directly given by eq 3 with the function C(ρ, ζ) defined in eq 77 εMPE (ρ, ζ) = εb Θ [C(ρ, ζ)] + (1 − Θ [C(ρ, ζ)])

(78)

where the relative dielectric permittivity is constant inside, εMPE = 1, and outside, εMPE = εb , of the cavity. The Heaviside step function Θ in eq 78, however, is numerically problematic in the finite elements method. Thus, we introduce a (one-dimensional) smooth switching function Λ with the generic form of

Λa (σ) =

and the derivative

    0,    

σ < −a

0.5 + 0.75 σa − 0.25       1,  h    0.75 1 −

dΛa a =  dσ  0,

 σ 2 a

i

 σ 3 a

, σ ∈ [−a, a]

(79)

σ>a

, σ ∈ [−a, a]

(80)

else

with the real positive parameter a that defines the width of the switching region. This

63

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 64 of 75

function switches smoothly from Λa (−a) = 0 to Λa (a) = 1 and its derivative is continuous at the domain boundaries. Replacing the step function Θ in eq 78 by Λ, a corresponding smooth dielectric function suitable for the finite element solver is obtained

εFEM (ρ, ζ; a) = εb Λa [C(ρ, ζ)] + (1 − Λa [C(ρ, ζ)])

(81)

The two dielectric functions are exactly the same when the width of the smooth switching region goes to 0:

εMPE (ρ, ζ) = lim εFEM (ρ, ζ; a) a→0

(82)

Of course, a has to be > 0 in order to obtain the desired smoothness of the dielectric function. Thus, eqs 81 and 78 are not identical which causes an intrinsic discrepancy between the MPE and FEM results. The adaptivity of the integration grid used in KARDOS, however, allows for an efficient way of minimizing this error. Starting the calculations with a relatively broad switching region and then gradually narrowing it down leads to a very refined integration grid at the cavity surface whereas the rest of the domain can be sampled on a much coarser level. The parameter a is then reduced until convergence of the FEM result is observed.

F.3

Regularization of the Potential

In the model systems studied in this section, the source term ̺ in the generalized Poisson equation, cf. eq 76, consists of one or several electric poles pi (monopole, dipole, etc.) located at positions ri = (ρi , ζi )

̺(r) =

X i

pi δ(r − ri )

(83)

Regularization of eq 76 with a source term ̺0 for which we know the solution Φ0 to 64

ACS Paragon Plus Environment

Page 65 of 75

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Poisson’s equation, cf. eq 1,

△Φ0 = −

4π̺0 ε0

(84)

serves to cure the numerical problems in the finite element method connected with the δ function in the source term, eq 83, and the resulting pole in the potential. This is done by splitting the full potential in two contributions, Φ0 and Φ′ Φ = Φ 0 + Φ′

(85)

The linearity of Poisson’s equation and the relation

∇ · (τ U) = U · ∇τ + τ ∇ · U

(86)

allow us to rewrite eq 2 in the following way

∇ · (ε∇Φ′ ) + (∇Φ0 ) · (∇ε) + ε △Φ0 +

4π̺ =0 ε0

(87)

Inserting eq 84 into eq 87 and assuming that all electric poles are located within the cavity where the dielectric function is exactly 1, cf. eq 78, one sees that a good choice for the regularization source ̺0 is

̺0 = ̺

(88)

This way, the last two terms on the left-hand side of eq 87 cancel out and we arrive at a tailored equation for our model systems

∇ · (ε∇Φ′ ) − E0 · ∇ε = 0

65

ACS Paragon Plus Environment

(89)

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 66 of 75

The regularization scheme used here is very similar to the one used in the MPE model where the total potential is split into a polarization, ΦP , and a scaled Hartree potential, ′ ε−1 b ΦH , cf. eq 8. Thus, Φ cannot directly be compared to ΦP , but to the slightly modified

Φ′P = ΦP + (ε−1 b − 1)ΦH

F.4

(90)

Integration Weights on Triangular Grid

The adaptive sparse triangular grid created by the finite element solver consists of nonoverlapping triangles defined by the function t(i, j, k)

t(i, j, k) =

   1,   0,

ri , rj , rk form a triangle

(91)

otherwise

where ri , rj , rk are points on the integration grid. This triangulation information is used to assign an area element dAi in cylindrical coordinates to each integration grid point ri with T  coordinates ρi ζi equal to one third of the area of all triangles it contributes to form dAi =

X1 j,k

6

||(rj − ri ) × (rk − ri )||2

(92)

The three-dimensional volume element dVi in cylindrical coordinates is then simply

dVi = 2πρi dAi

F.5

(93)

Computational Details for the Finite Element Solver

In all FEM calculations, we consider a rectangular domain with r ∈ [0, 1] and z ∈ [−1, 1] employing the scaling factors lr = lz = 50. We assume Dirichlet boundary conditions except for the symmetry axis where we impose Neumann boundary conditions. The Dirichlet

66

ACS Paragon Plus Environment

Page 67 of 75

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

boundary values for monopole terms used in the regularization of the potential for the FEM are calculated analytically while the higher order terms (dipoles and quadrupoles) are assumed to be zero. We employ an initial mesh of 5 × 9 equidistant regular grid points and linear finite elements. The adaptive refinement is controlled with a hierarchical basis error estimator 64 for which we choose a tolerance of 1 × 10−6 Eh e−1 . We restrict the refinement procedure such that the grid gets at most 12 times refined and we do not exceed 5 000 000 grid points in total. This results in an effective spatial resolution of 3 × 10−3 a0 and estimated errors below 1 × 10−4 Eh e−1 . Our initial grid is too coarse to resolve the cavity. We therefore start with a broad smoothing function Λ with a = 1 to solve this problem. We then reduce a stepwise to the targeted value of 0.01, always employing the resulting mesh and solution of the previous step as initial guess.

References (1) Blum, V.; Gehrke, R.; Hanke, F.; Havu, P.; Havu, V.; Ren, X.; Reuter, K.; Scheffler, M. Ab initio molecular simulations with numeric atom-centered orbitals. Comp. Phys. Commun. 2009, 180, 2175–2196. (2) Andreussi, O.; Dabo, I.; Marzari, N. Revised self-consistent continuum solvation in electronic-structure calculations. J. Chem. Phys. 2012, 136, 064102–064102–20. (3) Marenich, A. V.; Kelly, C. P.; Thompson, J. D.; Hawkins, G. D.; Chambers, C. C.; Giesen, D. J.; Winget, P.; Cramer, C. J.; Truhlar, D. G. Minnesota Solvation Database – version 2012. 2012. (4) Churg, A.; Warshel, A. Control of the redox potential of cytochrome and microscopic dielectric effects in proteins. Biochemistry 1986, 25, 1675–1681.

67

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(5) Bashford, D.; Karplus, M. pKa’s of ionizable groups in proteins: atomic detail from a continuum electrostatic model. Biochemistry 1990, 29, 10219–10225. (6) Onufriev, A.; Bashford, D.; Case, D. A. Exploring protein native states and large-scale conformational changes with a modified generalized born model. Proteins: Struct., Funct., Bioinf. 2004, 55, 383–394. (7) Onuchic, J. N.; Wolynes, P. G. Theory of protein folding. Curr. Opin. Struct. Biol. 2004, 14, 70–75. (8) Tipmanee, V.; Oberhofer, H.; Park, M.; Kim, K. S.; Blumberger, J. Prediction of Reorganization Free Energies for Biological Electron Transfer: A Comparative Study of Ru-Modified Cytochromes and a 4-Helix Bundle Protein. J. Am. Chem. Soc. 2010, 132, 17032–. (9) Gohda, Y.; Schnur, S.; Groß, A. Influence of water on elementary reaction steps in electrocatalysis. Faraday Discuss. 2009, 140, 233–244. (10) Li, Y.-F.; Liu, Z.-P.; Liu, L.; Gao, W. Mechanism and activity of photocatalytic oxygen evolution on titania anatase in aqueous surroundings. J. Am. Chem. Soc. 2010, 132, 13008–13015. (11) Koper, M. T. M. Theory of multiple proton–electron transfer reactions and its implications for electrocatalysis. Chem. Sci. 2013, 4, 2710–2723. (12) Sakong, S.; Naderian, M.; Mathew, K.; Hennig, R. G.; Groß, A. Density functional theory study of the electrochemical interface between a Pt electrode and an aqueous electrolyte using an implicit solvent method. J. Chem. Phys. 2015, 142, 234107. (13) Stecher, T.; Reuter, K.; Oberhofer, H. First-Principles Free-Energy Barriers for Photoelectrochemical Surface Reactions: Proton Abstraction at TiO 2 (110). Phys. Rev. Lett. 2016, 117, 276001. 68

ACS Paragon Plus Environment

Page 68 of 75

Page 69 of 75

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(14) Tomasi, J.; Mennucci, B.; Cammi, R. Quantum Mechanical Continuum Solvation Models. Chem. Rev. 2005, 105, 2999–3094. (15) Kohn, W.; Sham, L. J. Self-consistent equations including exchange and correlation effects. Phys. Rev. 1965, 140, A1133. (16) Jones, R. O. Density functional theory: Its origins, rise to prominence, and future. Rev. Mod. Phys. 2015, 87, 897. (17) Born, M. Volumen und Hydratationsw¨arme der Ionen. Z. Physik 1920, 1, 45–48. (18) Miertuˇs, S.; Scrocco, E.; Tomasi, J. Electrostatic interaction of a solute with a continuum. A direct utilizaion of AB initio molecular potentials for the prevision of solvent effects. Chem. Phys. 1981, 55, 117–129. (19) Klamt, A.; Schuurmann, G. COSMO: a new approach to dielectric screening in solvents with explicit expressions for the screening energy and its gradient. J. Chem. Soc., Perkin Trans. 2 1993, 2, 799–805. (20) Ringe, S.; Oberhofer, H.; Hille, C.; Matera, S.; Reuter, K. Function-Space-Based Solution Scheme for the Size-Modified Poisson–Boltzmann Equation in Full-Potential DFT. J. Chem. Theory Comput. 2016, 12, 4052–4066. (21) Kirkwood, J. G. Theory of Solutions of Molecules Containing Widely Separated Charges with Special Application to Zwitterions. J. Chem. Phys. 1934, 2, 351–361. (22) Onsager, L. Electric Moments of Molecules in Liquids. J. Am. Chem. Soc. 1936, 58, 1486–1493. (23) Rivail, J.-L.; Rinaldi, D. A quantum chemical approach to dielectric solvent effects in molecular liquids. Chem. Phys. 1976, 18, 233–242.

69

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

´ (24) Dillet, V.; Rinaldi, D.; Angy´ an, J. G.; Rivail, J.-L. Reaction field factors for a multipole distribution in a cavity surrounded by a continuum. Chem. Phys. Lett. 1993, 202, 18– 22. (25) Rinaldi, D.; Bouchy, A.; Rivail, J.-L.; Dillet, V. A self-consistent reaction field model of solvation using distributed multipoles. I. Energy and energy derivatives. J. Chem. Phys. 2004, 120, 2343–2350. (26) Havu, V.; Blum, V.; Havu, P.; Scheffler, M. Efficient integration for all-electron electronic structure calculation using numeric basis functions. J. Comput. Phys. 2009, 228, 8367–8379. (27) Ren, X.; Rinke, P.; Blum, V.; Wieferink, J.; Tkatchenko, A.; Sanfilippo, A.; Reuter, K.; Scheffler, M. Resolution-of-identity approach to Hartree–Fock, hybrid density functionals, RPA, MP2 and GW with numeric atom-centered orbital basis functions. New J. Phys. 2012, 14, 053020. (28) Ihrig, A. C.; Wieferink, J.; Zhang, I. Y.; Ropo, M.; Ren, X.; Rinke, P.; Scheffler, M.; Blum, V. Accurate localized resolution of identity approach for linear-scaling hybrid density functionals and for many-body perturbation theory. New J. Phys. 2015, 17, 093020. (29) Scherlis, D. A.; Fattebert, J.-L.; Gygi, F.; Cococcioni, M.; Marzari, N. A unified electrostatic and cavitation model for first-principles molecular dynamics in solution. J. Chem. Phys. 2006, 124, 074103. (30) Hayt, W. H., Jr; Buck, J. A. Engineering Electromagnetics, 6th ed.; McGraw-Hill, 2001; Chapter 5, pp 144–150. (31) Cances, E.; Mennucci, B. The escaped charge problem in solvation continuum models. J. Chem. Phys. 2001, 115, 6130–6135.

70

ACS Paragon Plus Environment

Page 70 of 75

Page 71 of 75

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(32) Ruiz-L´opez, M. F. Solvation effects on molecules and biomolecules; Springer, 2008; pp 23–38. (33) Delley, B. An all-electron numerical method for solving the local density functional for polyatomic molecules. J. Chem. Phys. 1990, 92, 508–517. (34) Steinborn, E. O.; Ruedenberg, K. In Advances in Quantum Chemistry; L¨owdin, P.-O., Ed.; Academic Press, 1973; Vol. 7; pp 46–51. (35) Capelli, A.; Garbieri, G. Corso di analisi algebrica, 1st ed.; F. Sacchetto, 1886; Chapter 5, pp 373–390. (36) Bj¨orck, ˚ A. Numerical Methods for Least Squares Problems; Other Titles in Applied Mathematics; Society for Industrial and Applied Mathematics, 1996. (37) Anderson, E.; Bai, Z.; Bischof, C.; Blackford, S.; Demmel, J.; Dongarra, J.; Du Croz, J.; Greenbaum, A.; Hammarling, S.; McKenney, A. et al. LAPACK Users’ Guide, 3rd ed.; Society for Industrial and Applied Mathematics: Philadelphia, PA, 1999. (38) Blackford, L. S.; Choi, J.; Cleary, A.; D’Azevedo, E.; Demmel, J.; Dhillon, I.; Dongarra, J.; Hammarling, S.; Henry, G.; Petitet, A. et al. ScaLAPACK Users’ Guide; Society for Industrial and Applied Mathematics: Philadelphia, PA, 1997. (39) Fattebert, J.-L.; Gygi, F. Density functional theory for efficient ab initio molecular dynamics simulations in solution. J. Comput. Chem. 2002, 23, 662–666. (40) Chipman, D. M. Comparison of solvent reaction field representations. Theor. Chem. Acc. 2002, 107, 80–89. (41) Zhan, C.-G.; Bentley, J.; Chipman, D. M. Volume polarization in reaction field theory. J. Chem. Phys. 1998, 108, 177–192. (42) Zhan, C.-G.; Chipman, D. M. Cavity size in reaction field theory. J. Chem. Phys. 1998, 109, 10543–10558. 71

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(43) Dupont, C.; Andreussi, O.; Marzari, N. Self-consistent continuum solvation (SCCS): The case of charged systems. J. Chem. Phys. 2013, 139, 214110. (44) Lebedev, V. I.; Laikov, D. N. A quadrature formula for the sphere of the 131st algebraic order of accuracy. Dokl. Math. 1999, 59, 477–481. (45) Frenkel, D.; Smit, B. Understanding Molecular Simulation, 2nd ed.; Academic Press, Inc.: Orlando, FL, USA, 2001. (46) Cignoni, P.; Corsini, M.; Ranzuglia, G. Meshlab: an open-source 3d mesh processing system. ERCIM news 2008, 73, 6. (47) Erdmann, B.; Lang, J.; Roitzsch, R. KARDOS–User’s Guide. Konrad-Zuse-Zentrum f¨ ur Informationstechnik: Berlin, 2002. (48) Jones, E.; Oliphant, T.; Peterson, P.; others, SciPy: Open source scientific tools for Python. 2001–; http://www.scipy.org/, [Online; accessed 2017-01-06]. (49) Byrd, R. H.; Lu, P.; Nocedal, J.; Zhu, C. A Limited Memory Algorithm for Bound Constrained Optimization. SIAM Journal on Scientific Computing 1995, 16, 1190– 1208. (50) Zhu, C.; Byrd, R. H.; Lu, P.; Nocedal, J. Algorithm 778: L-BFGS-B: Fortran Subroutines for Large-scale Bound-constrained Optimization. ACM Trans. Math. Softw. 1997, 23, 550–560. (51) Perdew, J. P.; Burke, K.; Ernzerhof, M. Generalized gradient approximation made simple. Phys. Rev. Lett. 1996, 77, 3865. (52) Hammer, B.; Hansen, L. B.; Nørskov, J. K. Improved adsorption energetics within density-functional theory using revised Perdew-Burke-Ernzerhof functionals. Phys. Rev. B 1999, 59, 7413.

72

ACS Paragon Plus Environment

Page 72 of 75

Page 73 of 75

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(53) Adamo, C.; Barone, V. Toward reliable density functional methods without adjustable parameters: The PBE0 model. J. Chem. Phys. 1999, 110, 6158–6170. (54) Heyd, J.; Scuseria, G. E.; Ernzerhof, M. Hybrid functionals based on a screened Coulomb potential. J. Chem. Phys. 2003, 118, 8207–8215. (55) Heyd, J.; Scuseria, G. E.; Ernzerhof, M. Erratum:“Hybrid functionals based on a screened Coulomb potential”[J. Chem. Phys. 118, 8207 (2003)]. J. Chem. Phys. 2006, 124, 219906. (56) Shivakumar, D.; Deng, Y.; Roux, B. Computations of Absolute Solvation Free Energies of Small Molecules Using Explicit and Implicit Solvent Model. J. Chem. Theory Comput. 2009, 5, 919. (57) Lide, D. R., Ed. CRC handbook of chemistry and physics: a ready-reference book of chemical and physical data, 82nd ed.; CRC Press, 2001; Chapter 5, p 2. (58) Marenich, A. V.; Cramer, C. J.; Truhlar, D. G. Generalized Born Solvation Model SM12. J. Chem. Theory Comput. 2013, 9, 609–620. (59) Fisicaro, G.; Genovese, L.; Andreussi, O.; Mandal, S.; Nair, N. N.; Marzari, N.; Goedecker, S. Soft-Sphere Continuum Solvation in Electronic-Structure Calculations. J. Chem. Theory Comput. 0, 0, null. (60) You, Z.-Q.; Herbert, J. M. Reparameterization of an Accurate, Few-Parameter Implicit Solvation Model for Quantum Chemistry: Composite Method for Implicit Representation of Solvent, CMIRS v. 1.1. J. Chem. Theory Comput. 2016, 12, 4338–4346. (61) Oberhofer, H.; Reuter, K. First-principles thermodynamic screening approach to photocatalytic water splitting with co-catalysts. J. Chem. Phys. 2013, 139, 044710. (62) Cramer, C. J.; Truhlar, D. G. A Universal Approach to Solvation Modeling. Acc. Chem. Res. 2008, 41, 760–768. 73

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(63) Marenich, A. V.; Cramer, C. J.; Truhlar, D. G. Universal Solvation Model Based on Solute Electron Density and on a Continuum Model of the Solvent Defined by the Bulk Dielectric Constant and Atomic Surface Tensions. J. Phys. Chem. B 2009, 113, 6378–6396. (64) Lang, J. Adaptive Multilevel Solution of Nonlinear Parabolic PDE Systems; Lecture Notes in Computational Science and Engineering; Springer Berlin Heidelberg, 2001; Vol. 16.

74

ACS Paragon Plus Environment

Page 74 of 75

Page 75 of 75

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment