Fast Domain Decomposition Algorithm for Continuum Solvation

Subscriber access provided by DUKE UNIV

Article

A Fast Domain Decomposition Algorithm for Continuum Solvation Models: Energy and First Derivatives Filippo Lipparini, Benjamin Stamm, Eric Cances, Yvon Maday, and Benedetta Mennucci J. Chem. Theory Comput., Just Accepted Manuscript • DOI: 10.1021/ct400280b • Publication Date (Web): 03 Jul 2013 Downloaded from http://pubs.acs.org on July 9, 2013

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Chemical Theory and Computation is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 36

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

A Fast Domain Decomposition Algorithm for Continuum Solvation Models: Energy and First Derivatives Filippo Lipparini,∗,† Benjamin Stamm,‡ Eric Cancès,¶ Yvon Maday,‡,§,k and Benedetta Mennucci⊥ UPMC Univ. Paris 06, Institut du Calcul et de la Simulation, F-75005 Paris, France, UPMC Univ Paris 06, UMR 7598, Laboratoire Jacques-Louis Lions, F-75005, Paris, France, Université Paris-Est, CERMICS, Project-team Micmac, INRIA-Ecole des Ponts, 6 & 8 avenue Blaise Pascal, 77455 Marne-la-Vallée Cedex 2, France, Institut Universitaire de France, Brown Univ, Division of Applied Maths, Providence, RI, USA, and Dipartimento di Chimica e Chimica Industriale, Università di Pisa, Via Risorgimento 35, 56126 Pisa, Italy E-mail: [email protected]

Abstract In this contribution, an efficient, parallel, linear scaling implementation of the Conductorlike screening model (COSMO) is presented, following the domain decomposition (dd) algorithm recently proposed by three of us. The implementation is detailed and its linear scaling ∗ To

whom correspondence should be addressed Univ. Paris 06, Institut du Calcul et de la Simulation, F-75005 Paris, France ‡ UPMC Univ Paris 06, UMR 7598, Laboratoire Jacques-Louis Lions, F-75005, Paris, France ¶ Université Paris-Est, CERMICS, Project-team Micmac, INRIA-Ecole des Ponts, 6 & 8 avenue Blaise Pascal, 77455 Marne-la-Vallée Cedex 2, France § Institut Universitaire de France k Brown Univ, Division of Applied Maths, Providence, RI, USA ⊥ Dipartimento di Chimica e Chimica Industriale, Università di Pisa, Via Risorgimento 35, 56126 Pisa, Italy † UPMC

1 ACS Paragon Plus Environment


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

properties, both in computational cost and memory requirements, are demonstrated. Such behavior is also confirmed by several numerical examples on linear and globular large-sized systems, for which the calculation of the energy and of the forces is achieved with timings compatible with the use of polarizable continuum solvation for molecular dynamics simulations.

1

Introduction

Continuum (or implicit) solvation models are nowadays largely used computational approaches to include solvent effects in the simulation of properties and processes of (supra)molecular systems in condensed phase. 1–7 Their easiness of use combined with a favorable cost/effectiveness ratio has allowed their application in very different research fields, from chemistry to biology and material sciences. Due to the very different typical ranges of accuracy and dimensions of these fields, continuum solvation models have been combined with both classical molecular mechanics (MM) approaches and quantum-mechanical (QM) descriptions. Nowadays, many formulations are available in the most common computational softwares: they differentiate both in terms of specific modellistic aspects and computational implementations. In all cases, however, the starting point is common, namely the electrostatic problem of a charge distribution contained in a cavity of proper shape and dimension 6,8 built within a dielectric continuum characterized by its permittivity ε. Among the possible strategies developed so far to solve such a problem, one of the most successful approaches introduces the concept of apparent surface charge (ASC) to represent the polarization of the dielectric. The ASC allows in fact to reduce the entire problem to the cavity surface without the need to account for the volume of the dielectric. Within the ASC approach, different formulations have been given in the years. Among them, we quote here the Polarizable Continuum Model (PCM) by Tomasi and coworkers, the "conductor-like screening model" (COSMO) originally developed by Klamt and Schüürmann 9 and the "surface and simulation of volume polarization for electrostatics" (SS(V)PE) method by Chipman. 10,11 As a matter of fact, the PCM is now an acronym which collects a family of methods combining different solutions of the same


Page 2 of 36

Page 3 of 36

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


electrostatic problem (namely DPCM, 12 IEFPCM, 13–15 CPCM 16,17 ) with different formulations (VPCM 18,19 ) and/or different computational implementations (IPCM, 20 CSCPCM 21 ). Likewise, COSMO is now available in implementations which largely differentiate with respect to the original formulation, suffice here to quote the “smooth COSMO” (S-COSMO) model of York and Karplus 22 and the “Switching/Gaussian” (SWIG) approach by Lange and Herbert. 23,24 To further increase the complexity of the picture and the difficulties for a user to understand the differences among the many formulations, we have to recall that a given continuum model, for example COSMO or IEFPCM, can give different results when used in different softwares (Gaussian, QCHEM, ADF, GAMESS, just to quote the most used ones) as each software has its own definition of the cavity and of the numerical strategy used to mesh the surface cavity into the finite elements (or points) used to represent the ASC. From this brief summary it is clear that state-of-the-art continuum solvation models are the result of a complex combination of physical, mathematical and numerical expertises which, during the years, has made them more and more accurate in the solution of the original electrostatic problem and more and more efficient in their numerical performances. This increased accuracy and efficiency is indeed one of the main aspects that characterize the evolution of these models in the last years and their large applicability, now extending well beyond the field of small molecules in simple homogeneous and isotropic solvents. 25,26 Moreover, most of the recent formulations have been generalized to include derivatives with respect to internal (namely geometrical) and external parameters (such as electric and magnetic fields). This further extension has allowed to apply continuum solvation models to study geometries of solvated systems, to predict reaction mechanisms and to simulate spectroscopies. 27 With the present paper, we add a further step in this evolution, reporting an efficient computational implementation of a new numerical approach to solve the electrostatic problem of continuum solvation models derived from the Domain Decomposition (dd) methods. 28 The common point of all the present implementations of the various continuum solvation methods is that the numerical methods used to solve the integral equations belongs to the family of Galerkin methods. In particular, two main strategies of discretization are currently employed, the first one being based on a boundary element approximation, the second one on a non-



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

conformal Galerkin approximation which makes use of Gaussian basis functions to expand the ASC. Nevertheless, all the implementations start by defining a suitable mesh on the surface of the cavity, which is used to position the basis functions, let them be spherical gaussians 21,22,24 or piecewise constant functions: 12,29 the approximation basis set is then used to represent the integral operators, so that the problem is transformed into the solution of a linear system, for which either matrix inversion or iterative approaches 30 can be used. As all the integral operators involved in continuum solvation produce, given the ASC, either a potential or a field, the fast multipole method (FMM) 29,31 can be used to compute the matrix-vector products implied in any iterative procedure. The computational implementation that we propose here is based on a completely new path with respect to all the discretization schemes used so far to solve for the ASC. For now, only the COSMO (or C-PCM) equations will be considered: the extension of our numerical procedure to the IEF-PCM is object of active investigation. The main idea is to divide the domain, i.e., the cavity, in a set of subdomains simple enough that one can either solve the problem analytically, or numerically but with a very limited computational effort. Here the subdomains are the spheres composing the molecular cavity which are centered on the atoms forming the solute molecule and for which a closed expression for the solution to the COSMO equation exists. We will show how the discretization method produces a very sparse linear system, which can be solved efficiently with an iterative procedure. Notice that the sparsity of the system implies the linear scaling properties of the method, which we will demonstrate in this paper, without relying on fast summation techniques. Such an efficient implementation paves the way to the use of PCM and similar methods both in molecular dynamics simulations and in large QM/MM/Continuum calculations, where the cost of solving the integral equations largely dominates the overall computational effort. The details of the numerical approach will be given in section 2 whereas its extension to the analytical derivatives will be presented in section 3. In section 4 we will show how to achieve both computational costs and memory requirements which scale linearly with the number of atoms. Some numerical experiments will be used in section 5 to demonstrate the properties of our algorithm; finally, we will draw some conclusions and perspectives.


Page 4 of 36

Page 5 of 36

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


2

Theory

In this section, we will briefly summarize the main aspects of the dd method that we will use to solve the COSMO 9 equations for a Van der Waals molecular cavity. As the theory has been thoroughly described in a parallel paper by some of the authors, 32 we will present here only the discretized equations, on the implementation of which we will focus in section 4. We consider a solute molecule of M atoms carrying a classical charge distribution M

ρ(r) =

∑ q j δ (r − R j ),

j=1

where q j is the charge of the jth atom and R j its position. Notice that more complex classical distributions, including higher order or polarizable multipoles, 33–37 can be also treated with a straightforward generalization. The van der Waals molecular cavity is defined as a union of spheres centered at the atoms and radius the atomic Van der Waals radius, scaled for a constant factor, that reads as: Ω=

M [

Ω j.

j=1

Here we call Ω j the sphere with center R j (i.e. the position of the jth atom) and radius r j . In COSMO, the solute and the solvent interact by mutually polarizing each other: the solvation energy of the molecule is defined as the interaction of the solute’s density of charge with the reaction potential W , which is the contribution to the total electrostatic potential due to the polarization of the environment:

Es =

1 f (εs ) 2

Z

Ω

ρ(r)W (r)dr =

M 1 f (εs ) ∑ q jW (R j ). 2 j=1

(1)

Here, f (εs ) is a constant scaling factor depending on the solvent dielectric constant εs used to take into account the non-conductor nature of the solvents and the reaction potential W is the unique solution to the boundary value problem 38    −∆W (r)= 0  

for r ∈ Ω,

W (s)= −Φ(s)

for s ∈ Γ.

5

ACS Paragon Plus Environment

(2)


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 36

In the above equations, Γ = ∂ Ω is the surface of the cavity Ω and Φ the potential generated by ρ in vacuo, given by M

∀r ∈ R3 ,

Φ(r) =

qj

∑ |r − R j | .

(3)

j=1

The usual way to compute the solvation energy Es is to use the fact that it is possible to represent the reaction potential by means of an apparent surface charge (ASC) σ which produces the potential W : 38

∀r ∈ Ω,

W (r) =

Z

σΓ (s) ds. |r − s| Γ

(4)

The ASC is then computed by imposing that the potential it produces at Γ is equal to minus the solute’s potential, as in eq. 2. This defines an integral equation on Γ:

∀s ∈ Γ,

Z

σΓ (s′ ) ′ ds = (SΓ σ )(s) = −Φ(s). ′ Γ |s − s |

(5)

Such an integral equation is usually discretized according to some boundary element scheme; here, we follow a completely different path. The main idea of the Schwarz Domain Decomposition method is to divide the domain, i.e., the cavity (and not the surface), in a set of simple overlapping subdomains which, for the COSMO problem in a Van der Waals cavity, are obviously the single spheres composing the cavity. Let us consider a Van der Waals cavity as the one sketched in figure 1. Each of the spheres composing the cavity has a portion of its surface Γ j exposed to the solvent (Γej ) and a portion which is buried into the cavity (Γij ) and, in particular, is contained in one or more other neighboring spheres. Unfortunately, it is not possible to solve the standard COSMO problem as a collection of independent problems on a single sphere, which would be very easy, since, in order to do so, one should have a boundary condition not only on the external portion of the surface (which is provided by the original COSMO problem), but also on the internal one. To better clarify this point, let us consider the sphere Ω2 in figure 1: we know the boundary condition on Γe2 , but not on Γi2 . The domain decomposition method gives a strategy to setup an iterative procedure where, at each step, for each sphere Γ j , the boundary condition on Γij


Page 7 of 36

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


Figure 1: Domain decomposition of a Van der Waals cavity.

e 2 3

4 i 2 is computed as the reaction potential obtained at the previous step in the sphere that intersect Ω j or, if more spheres intersect Ω j at the same point, as the weighted average of the previous step potentials of each intersecting sphere. A pictorial representation of the traditional and dd

Figure 2: In the standard approach, the integral equation eq. 5 is solved on the molecular cavity Γ = ∂ Ω (left). In the ddCOSMO method, we solve a coupled system of integral equations on M copies of the unit sphere (right). For j = 2, the blue, red, green, black and magenta sets are those where N2 (s) = 0, / N2 (s) = {1}, N2 (s) = {3}, N2 (s) = {4} and N2 (s) = {3, 4} respectively. j=3

j=2

j=1

j=4

approaches is given by figure 2. Let us consider sphere Ω1 : at iteration it, we will solve an integral equation ∀s ∈ Γ1 ,

(it)

(SΓ1 σ 1,(it) )(s) = g1 (s),



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

where (it)

g1 (s) =

   −Φ(s),

Page 8 of 36

s ∈ Γe1 ,

  W (it−1) (s), s ∈ Γi . 1 2

Notice that we have introduced a local ASC σ 1 (s), which is not related to the global ASC. Always referring to figure 2, for sphere Ω2 we need to take into account multiple intersections. Let N2 (s) be the list of spheres that intersect Ω2 at s ∈ Γ2 (see figure 2 for a graphical description of the neighbor list N2 (s)) and let |N2 (s)| be the number of intersecting spheres: the integral equation that we will solve for sphere Ω2 will be

∀s ∈ Γ2 , where (it) g2 (s)

=

(it)

(SΓ2 σ 2,(it) )(s) = g2 (s),

   −Φ(s),

  ∑k∈N (s) 1 W (it−1) (s), 2 |N2 (s)| k

s ∈ Γe2 , s ∈ Γi2 .

More in general, for each sphere Ω j we will solve

∀s ∈ Γ j , where (it) g j (s)

=

(it)

(SΓ j σ j,(it) )(s) = g j (s),

   −Φ(s),

  ∑k∈N (s) j

(6)

s ∈ Γej (it−1) 1 (s), |N j (s)| Wk

s∈

(7)

Γij .

For later convenience, we will introduce here a more compact notation for the definition of the boundary condition. Let χk be the characteristic function of Ωk , i.e.,

χk (r) =

   1, r ∈ Ωk ,

  0, r ∈ / Ωk .

A point s ∈ Γ j will be external if and only if the characteristic function of all the neighbors of


Page 9 of 36

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


Ω j is zero, i.e., if χk (s) = ∑ ω jk (s) = 0, |N j (s)| k∈N (s) k∈N (s)

∑ j

j

χk (s) |N j (s)| .

where we have introduced the normalized weights ω jk (s) = eq. 7 as (it) g j (s)

= − 1−

∑

!

ω jk Φ(s) +

k∈N j (s)

This allows us to rewrite

(it−1)

∑

ω jkWk

(s).

(8)

k∈N j (s)

The final step in order to obtain the ddCOSMO linear system of coupled integral equations is to recognize that the reaction potential on the neighboring spheres can be expressed as the potential produced by the local apparent surface charge on such spheres by means of eq. 4, which is also valid for the local ASC σ k in Ωk :

∀s ∈ Γ j ,

(it−1) Wk (s)

fjk σ k,(it−1) )(s) = = (S

Z

Γk

σ k,(it−1) (y) dy. |s − y|

(9)

fjk operator, which, given the local ASC on sphere Ωk , Notice that we have introduced the S

computes the potential produced by σ k at some point s on Γ j . By substituting in eq. 8 and putting everything together, we finally get the coupled equations for each sphere Ω j :

∀s ∈ Γ j ,

(SΓ j σ

j,(it)

)(s) = − 1 −

∑

!

ω jk Φ(s) +

k∈N j (s)

∑ k∈N j (s)

fjk σ k,(it−1) )(y)(s). ω jk (S

(10)

Notice how eq. 10 represents a Jacobi iteration for a linear system of (integral) equations, which defines the ddCOSMO model:

∀s ∈ Γ j ,

2.1

j

(SΓ j σ )(s) = − 1 −

∑

!

ω jk Φ(s) +

k∈N j (s)

∑ k∈N j (s)

fjk σ k )(y)(s). ω jk (S

(11)

Explicit computation of the ddCOSMO energy

In this subsection, we will discretize and compute explicitly all the contributions to eq. 11. In order to do so, we will expand each local ASC in a truncated basis set of real spherical m=−l,l harmonics {Ylm }l=0,N (see appendix A for more details). This is particularly convenient as

all the quantities involved in eq. 11 are either the potential produced by the solute or the



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 36

potential produced by a local ASC: in order to compute the latter, we can easily use a truncated multipolar expansion. For the sake of simplicity, before proceeding we will rescale all the spheres to the unit sphere S2 and introduce accordingly the scaled local ASCs as

∀s ∈ S2 , σ j (s) = r j σ j (R j + r j s). Let then us represent σ j in terms of a truncated expansion in spherical harmonics:

m σ j (s) = ∑[X j ]m l Yl (s), lm

where we put for brevity N

∑ := ∑ lm

l

∑

.

l=0 m=−l

The left-hand side of eq. 11 is easily computed in the spherical harmonics representation:

(SS2 σ j )(s) =

Z

S2

m ′ 4π ∑lm [X j ]m m l Yl (s ) ′ ds = ∑ [X j ]m l Yl (s). |s − s′ | 2l + 1 lm

This means that the SS2 operator is diagonal in the spherical harmonics basis, and in particular, if we call L j j its representation (the j j subscript means that both s and the integration variable belong to the same sphere Γ j ): ′

= Lllmm ′

4π δll ′ δmm′ . 2l + 1

(12)

The computation of the reaction potential produced by a local ASC on a neighboring sphere is more complex. Let v jk (s) be the vector pointing from the center of sphere Ωk to s ∈ Γ j , i.e. v jk (s) = R j + r j s − Rk , and let t jk (s) be the length of v jk (s) divided by the radius of sphere Ωk and s jk (s) be the unit vector that represents the direction of v jk (s), i.e.,

t jk (s) :=

|R j + r j s − Rk | rk

and

s jk (s) :=

R j + r j s − Rk . |R j + r j s − Rk |


Page 11 of 36

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


If Ωk is a sphere intersecting Ω j at s ∈ Γij and σk is the (scaled) solution on Ωk , the potential produced by σk at s (see eq. 9) is fjk σk )(s) = (S

Z

S2

σk (s′ ) ds′ . |t jk (s)s jk (s) − s′ |

(13)

Introducing the spherical harmonics expansion for σk and using the spherical harmonics addition theorem to expand the denominator, one gets: fjk σk )(s) = ∑ (S lm

4π jk l m jk [Xk ]m k [t (s)] Yl [s (s)]. 2l + 1

fjk , we need to perform a further integraIn order to get the representation matrix [L jk ] of S

tion. Unfortunately, this integral can not be computed analytically: a numerical quadrature is mandatory. For this particular problem, the Lebedev quadrature rule is particularly appropriate. N

g Let {yn , wn }n=1 be the Ng Lebedev points and weights for the unit sphere:

′

[L jk ]llmm =− ′

Z

S2

Ng

′

fjkY m′ ](s)Y m (s)ds ≃ − ∑ wnY m (yn )ω jk ω jk (s)[S n l l l n=1

′ ′ 4π (t jk )l Ylm′ (snjk ) 2l ′ + 1 n

where we have added the weight function as we need to integrate only on the portion of surface which is inside Ωk and, for brevity, we have put tnjk = t jk (yn ), snjk = s jk (yn ), ωnjk = ω jk (yn ).

(14)

The numerical quadrature introduces a source of discontinuity of the computed quantities with respect to the atomic positions, as changing the distance between two atoms (and therefore, between two spheres), an integration point can become suddenly exposed or buried (see figure 3 for a pictorial representation of the problem). In order to avoid this problem, it is possible to regularize the switching between internal and external surface by introducing a smoothed



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 36

Figure 3: By changing the distance of two spheres, some integration point can suddenly become buried or exposed

characteristic function χη (t) defined as follows: 1 χη (t) = pη (t) 0

if t ≤ 1 − η, if 1 − η < t < 1,

(15)

if t ≥ 1,

where the parameter η identifies a “switching region” (see figure 4 for a pictorial view) between internal and external and where

pη (t) = η −5 (1 − t)3 (6t 2 + (15η − 12)t + 10η 2 − 15η + 6). Using such a definition for the characteristic function, we can introduce a new set of weights

Figure 4: Sharp and smooth switching between external and internal points


Page 13 of 36

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


for the right-hand side of eq. 11 which allows for a smooth definition of all the computed quantities. In particular, let

χnjk = χη (tnjk ),

fnj =

∑

χnjk ,

dnj =

min( fnj , 1) fnj

k∈N j

,

(16)

where N j is the list of all neighbors of sphere j (we use the convention 0/0 = 0 in the definition of dnj ). We can now define

Wnjk = dnj χnjk ,

Unj = 1 −

Wnjk .

∑

(17)

k∈N j

Notice that Unj can be used to identify whether a point yn belongs to the external portion of Γ j (Unj = 1), to a zone of switching (0 < Unj < 1) or whether the point is completely buried inside the cavity (Unj = 0). Notice that fnj = 0 if the point yn belongs to the Γej , fnj ≥ 1 if yn is buried inside the cavity and 0 < fnj < 1 if the point belongs to a switching region. We can use the newly defined Wnjk weights in the definition of the L jk matrix blocks Ng

′

m jk [L jk ]mm ll ′ = − ∑ wnYl (yn )Wn n=1

′ ′ 4π (tnjk )l Ylm′ (snjk ) ′ 2l + 1

(18)

and the Unj weights to compute the spherical harmonics expansion of the molecular potential Ng

[g j ]m l

= − ∑ wnYlm (yn )Unj Φnj ,

(19)

n=1

where we have introduced M

Φnj =

qk

∑ |R j + r j yn − Rk | .

k=1

We have now all the pieces to define the linear system that represents the discretized ddCOSMO coupled integral equations:      

L11 .. .

... .. .

L1M .. .

LM1 . . . LMM

     

X1 .. . XM





    =    

g1 .. . gM




  .  

(20)


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 36

For the sake of brevity, we will also write eq. 20 as LX = g. Notice that, as Wnjk vanishes if the spheres j and k do not overlap, only the off-diagonal blocks relative to intersecting spheres are non-zero: the matrix L is therefore block sparse. To conclude, the COSMO polarization energy can be calculated as

Es =

M M √ π f (εs ) ∑ q j [X j ]00 = f (εs ) ∑ hX j , Ψ j i, j=1

(21)

j=1

where we have introduced the auxiliary function

[Ψ j ]m l =

√

πq j δl0 δm0

and the notation m ha, bi = ∑[a]m l [b]l . lm

We will show in section 4 how it is possible to implement an efficient, iterative solution to the ddCOSMO equations achieving linear scaling in both computational cost and memory requirements.

3

Analytical derivatives

In this section, we will describe the procedure to compute the analytical derivatives of the COSMO solvation energy. By differentiating eq. 21 with respect to the position of the center of the i-th sphere: M

M

−Fi = ∇i Es = f (εs ) ∑ Ψ j , ∇i X j = f (εs ) ∑ M

= f (εs ) ∑ j=1

*

j=1

Ψ j, ∑ k

L−1 jk ∇i gk

+

j=1 M

− f (εs ) ∑

j=1

*

*

k=1

!+

−1

−1

M

∑

Ψ j , ∇i M

"

L−1 jk gk

Ψ j , ∑ L (∇i L)L k=1


g jk k

+

.

(22)

Page 15 of 36

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


By rearranging: M

−Fi = f (εs ) ∑

j=1

*

M

M

T ∑ (L−1 jk ) Ψk , k=1

|

{z sj

}|

∇i g j − ∑ (∇i L jk )Xk k=1

{z

!+

hi j

}

M

= f (εs ) ∑ hs j , hi j i,

(23)

j=1

where s = (s1 , s2 , . . . sM )T is the solution to the adjoint linear system LT s = Ψ,

(24)

Ψ = (Ψ1 , Ψ2 , . . . ΨM )T and ′

′

m m mm [hi j ]m l = ∇i [g j ]l − ∑(∇i [L jk ]ll ′ )[Xk ]l ′ .

(25)

k

The computation of the forces therefore involves two separate steps: solving the adjoint linear system and assembling the explicit derivative terms [hi j ]m l . For the adjoint linear system we notice that



  T L =  

T L11

.. . T L1M

... .. .

T LM1

.. .

T . . . LMM

where the diagonal blocks of the L matrix are self-adjoint: ′

[LTj j ]mm ll ′ = δll ′ δmm′



  ,  

(26)

4π 2l + 1

and the off-diagonal blocks are defined as follows: ′

=− [LkT j ]llmm ′

′m ′ 4π 4π I( j,k) = − wnYlm′ (yn )Wnk j (tnk j )l Ylm (skn j ). [ck j ]m ′ ∑ ll 2l + 1 2l + 1 n

Notice that, of course, also LT is a block sparse matrix. The explicit contributions can be divided into two sets, as in eq. 25: the ones involving the derivatives of the RHS, and therefore of the solute’s potential, and the ones involving the



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 36

derivatives of the L matrix. The first term is obtained by differentiating eq. 19: Ng

[Gi j ]m l =

∑ wnYlm (yn ) n=1

" j Φn ∇iUnj +Unj ∇i Φnj .

(27)

Here, the derivatives of the potential are standard quantities. The derivatives of the weights are

∇iUni = −Zin ,

∇iUnj =

1 ′ ji ji p (t )s , i ∈ N j Ri η n n

(28)

where 1 ′ ik ik pη (t )s k∈Ni Rk

Zin =

∑

and pη′ (t) = −

30 (1 − t)2 (t − 1 + η)2 η5

is the derivative of the switching function. The second family of contributions arising from eq. 25, i.e., the ones of the form

[Ki j ]m l =∑

∑

l ′ m′ k∈N j

′

′

m ∇i [L jk ]mm ll ′ [Xk ]l ′ ,

(29)

involves the evaluation of the gradient of the product of several terms, which produces very cumbersome expressions. Notice that three different cases arise, depending on whether i = j 6= k, i = k 6= j or j 6= i 6= k. For the sake of brevity, we will report here only the final expression of the i = j case: 4π [Kii ]m l = ∑ ∑ ′ k∈Ni l ′ m′ 2l + 1

(

I(i,k)

δi j

∑ n

S(i,k)

+

∑ n

′ ′ wn m ik m′ ik Yl (yn )Wnik δl ′ ≥1 (tnik )l −1 l ′Ylm′ (sik )s + ∇Y (s ) ′ n n n l Rk

′ ′ wn m i ′ ik ik Yl (yn )(tnik )l Ylm′ (sik n )dn pη (tn )sn Rk

I(i,k)

−

δ fni >1

∑ n

′ ′ i 2 ik i wnYlm (yn )(tnik )l Ylm′ (sik n )(dn ) χn Zn

)

′

[σk ]m l′ ,

(30)

where δ fni >1 = 1 if fni > 1 and zero otherwise. For the sake of brevity, we have identified with


Page 17 of 36

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


I( j, k) the set of points on Γ j that are inside Ωk , i.e., I( j, k) = {yn | tnjk < 1}, and with S( j, k) ⊂ I( j, k) the set of points on Γ j that are in the switching region between spheres j and k, i.e., S( j, k) = {yn | 1 − η < tnjk < 1}. The other cases produce similar expressions that are reported in appendix B. Such contributions can be assembled with the geometrical quantities already defined and the gradients of the basis functions. We will show in section 4 how it is possible to compute the forces efficiently and with linear scaling in computational cost and memory requirements.

4

Efficient, parallel and linear scaling implementation

This section is devoted to the description of the actual implementation of the ddCOSMO equations. We will show the feasibility of computing all the needed quantities with both computational costs and memory requirements linear with respect to the number of atoms. We will start by discussing the iterative procedure to solve the linear system eq. 20; we will then show how the contributions to the forces can also be computed achieving linear scaling cost by discussing the terms already presented in section 3. In appendix A we will dedicate some attention to an efficient procedure for the evaluation of a SH set given a vector on the unit sphere, which is the most CPU intensive task involved in the algorithm. To compute the interaction energy (eq. 21), we need to compute the discretized local ASCs, i.e., their expansion coefficients [X j ]m l , by solving the linear system in eq. 20. This can be efficiently done by means of Jacobi iterations, i.e., solving for each j, at iteration it: (it)

l [L j j X j ]m l = [g j ]m −

′

(it−1) m′ ]l ′

∑ ∑ [L jk ]mm ll [Xk

k∈N j l ′ m′

′

(it)

:= [V j ]m l .

(31)

Notice that eq. 20 is the discretized equivalent to eq. 10. The diagonal blocks of the L ma-



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

trix are diagonal (see eq. 12), which means that, once the right-hand side (RHS) has been assembled, the linear system is trivially solved. Not only: the sparsity of the L matrix ensures that only the spheres that intersect j will give a contribution to the RHS: each iteration will therefore require a computational effort which is, for each sphere, proportional to the number of neighbors, which is much smaller than the total number of atoms for a large enough molecule. Notice that the number of neighbors is determined by the molecular connectivity: the scaling properties will hence be similar for a linear or a globular molecule. This is a key feature of the ddCOSMO algorithm which is not shared by other fast algorithms, like the Fast Multipole Method (FMM) used in conjunction with a standard finite element discretization of the COSMO equations. 29 We will now detail our iterative procedure.

Preliminary steps Read the positions of the centers of the spheres and the charges; choose N

N

g g and the associated weights {wn }n=1 ; at each grid point, and build the Lebedev grid {yn }n=1

−l≤m≤l compute {Ylm (yn )}0≤l≤N . Assemble a neighbor list and other quantities needed to handle

the sparse storage: all the quantities that formally depend on two sphere indexes like the tnjk can be precomputed and stored in memory using O(Ng × M) words of memory. Allocate the needed amount of memory and compute all the geometrical quantities defined in eqs. 14, 16 and 17. As a last preliminary step, compute the solute’s potential Φnj . This can be done efficiently using standard fast summation techniques, like the FMM. In this implementation, we employed the fully adaptive FMM3DLIB 39,40 package for such a purpose. In simulation softwares, this step is already handled by the standard machinery and can be considered as the input to the ddCOSMO module. The RHS can finally be assembled by weighting the potential with the Unj factors. Notice that the RHS will be equal to the electrostatic potential on the external points, but will be non-vanishing also in the switching region. The total cost of this step is linear thanks to the FMM and requires one to allocate a vector gnj of size M × Ng , plus some (linear) scratch for the FMM procedure. The iterative procedure consists of three main steps: calculation of the RHS, solution of the diagonal system and test of the convergence.


Page 18 of 36

Page 19 of 36

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


Step 1: Calculation of the RHS The first step is the most computationally demanding and can be described as follows. We will need a scratch vector Pnj of size M × Ng . For each sphere j, loop over the grid points yn and check whether they are external (i.e., if Unj = 1) or internal. If the points are external, put Pnj = gnj , else loop over the neighboring spheres k ∈ N j . If yn ∈ Ωk , i.e., if tnjk < 1, compute {Yml (snjk )}, where snjk is the projection of yn on Γk , then compute α =∑ lm

4π (it−1) m (t jk )l Yml (snjk )[Xk ]l . 2l + 1 n

(32)

We can now accumulate the contribution to Wnj from the k-th sphere

Pnj = Pnj + α ×Wnjk

(33)

and, when the loop over the neighboring spheres is complete, add the contribution due to the solute’s potential in the switching region (i.e., where 0 < Unj < 1). Finally, by numerical quadrature, we compute (it)

[V j ]lm =

Ng

∑ wnYlm (yn )Pnj ,

(34)

n=1

which is the right-hand side of eq. 31. See algorithm 1 for a sketched summary of the computation of Pnj . j

Algorithm 1 Computation of the Pn vector for sphere j for n = 1, Ng do j if Un = 1 then j j Pn = gn else for k ∈ Ni do jk if tn < 1 then jk Assemble {Ylm (sn )} (it−1)

Compute α as in eq. 32 using [Xk j j jk Pn = Pn + α ×Wn end if end for j j j Pn = Pn + gn end if end for

]



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 36

Here, we notice some important features of the algorithm. First, the total computational cost of a single iteration can be estimated by O(M × Ng ), where the prefactor depends on the average number of neighbors, on the number of points which are buried inside the cavity and on the maximum angular momentum N used for the SH expansion. Second the quantity of memory needed scales linearly with respect to the size of the system as well, as the only quantities that we need to store are two vectors of size M × Ng , i.e., Pnj and gnj , and three (it)

(it)

(it−1) l ]m .

vectors of size M × (N + 1)2 , i.e., [V j ]lm , [X j ]lm and [X j

Finally, for each sphere j,

the only information needed to perform the iteration are either geometrical parameters, the solute’s potential and quantities that can be computed on the fly like the SH, or the solution vector on the neighboring spheres at the previous iteration: it is hence possible to parallelize the algorithm by distributing each sphere or group of spheres on a separate processor.

Step 2: Solution of the diagonal system Once the RHS of eq. 31 has been computed, it is possible to compute the solution at the iteration step. Due to the fact that the SS2 operator is diagonal in the SH representation, this step can be done trivially: (it)

[X j ]m l =

2l + 1 (it) l [V j ]m . 4π

(35)

Step 3: Convergence check and acceleration For each sphere, it is now possible to evaluate

v u u ([X (it) ]m − [X (it−1) ]m )2 j j (it) (it−1) l l ∆ j = kX j − X j k = t∑ , l + 1 lm

(36)

where we have chosen to use the energy norm of the increment. When the vector ∆ has been assembled, we exit from the parallel region, evaluate the RMS norm of ∆ and compare it with a threshold to check the convergence. This simple iterative procedure is known as the Jacobi (it)

iterative method. The incremental solution X j

(it−1)

− Xj

can be also used as an error vector

for the DIIS 41 extrapolation, which has proven to be a good way to accelerate the convergence of the Jacobi iterations. A summary of the iterative procedure is sketched in algorithm 2. A similar procedure can be used to solve the adjoint linear system, which is a mandatory step for the computation of the forces.


Page 21 of 36

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


Algorithm 2 Algorithm for the iterative solution of the linear equations. Read the spheres centers and radii, compute the Lebedev grid and SH at each grid point jk jk jk j jk j j j Compute the neighbor lists and the quantities tn , sn , χn , dn , Wn , Un , Zn and Qn j j j Compute gn = Un Φn for it = 1, ItMax do Parallel Region (j) for j = 1, M do j Assemble the Pn vector as in algorithm 1 using [X (it−1) ] (it) Assemble [V j ]m l as in eq. 34 (it)

Solve for [Xk ]lm as in eq. 35 Evaluate ∆ j end for End Parallel Region Compute k∆k if Converged then Return else Perform DIIS extrapolation and save [X (it) ]. end if end for We will now show that also the explicit contributions to the forces can be computed with linear effort. As said in section 3, the contributions involving the derivatives of the matrix are very cumbersome: we will focus on the one reported in eq. 30. By fully expanding the contribution arising from such term (which we will denote KiA ) we get: (

I(i,k)

KiA = ∑[si ]m l lm

∑∑ k∈Ni l ′ m′

4π 2l ′ +1

∑ n

S(i,k)

+

∑ n

′ ′ wn m ik m′ ik Yl (yn )Wnik δl ′ ≥1 (tnik )l −1 l ′Ylm′ (sik )s + ∇Y (s ) ′ n n n l Rk

′ ′ wn m i ′ ik ik Yl (yn )(tnik )l Ylm′ (sik n )dn pη (tn )sn Rk

I(i,k)

−

δ fni >1

∑ n

′ ′ i 2 ik i wnYlm (yn )(tnik )l Ylm′ (sik n )(dn ) χn Zn

)

′

[σk ]lm′ .

(37)

For each sphere i, loop over the yn grid points and, for each yn , loop over the neighboring spheres k ∈ Ni . If yn ∈ Ωk , i.e., if tnik < 1, compute a SH set {Ylm (sik n )} and its derivatives



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 36

m {∇Ylm (sik n )} . Now, compute all the sums involving [Xk ]l :

′ 4π ik l ′ −1 ′ m′ ik ik m′ ik (t ) (s )s + ∇Y (s ) [Xk ]m l Y ′ ′ n n n n l l l′ , ′ l ′ =1 m′ =−l ′ 2l + 1 N

a=

l′

∑ ∑

′ ′ 4π m′ (tnik )l Ylm′ (sik n )[Xk ]l ′ . ′ l ′ m′ 2l + 1

b= ∑

Then, accumulate the sum over the neighbors: put

v = v+

Wnik × a + (dni )2 (χnik )2 Zin × b. Rk

If yn belongs to the switching region between i and k, i.e., if 1 − η < tnik < 1, accumulate also the following contribution:

v = v+

2 i ik ′ ik ik d χ p (t )s × b. Rk n n η n n

Finally, evaluate the sum involving [si ]m l m α = ∑[si ]m l Yl (yn ), lm

and accumulate all the contributions for each yn :

KiA = KiA − wn × α × v.

(38)

Notice that we used a quantity of scratch memory which does not depend on the size of the system and that we did an amount of work which is O(M × Ng ), where the prefactor depends again on the average number of neighbors, on the number of points buried inside the cavity and on the maximum angular momentum N used for the SH expansion. Notice also that each i contribution can be computed in parallel by distributing each sphere or group of spheres on a separate processor. The other two terms involving the derivatives of the matrix can be computed in a similar fashion starting from the same quantities already used for eq. 38. The terms arising from the derivatives of the RHS of eq. 11 involve both the computation


Page 23 of 36

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


of the derivatives of the solute’s potential, for which the FMM can be used to achieve linear scaling in computational cost, and the evaluation of eq. 28. For these latter terms, as all the two-index quantities are nonzero only for neighboring spheres, both the computational cost and the memory requirements remain linear with respect to the number of spheres M.

5

Numerical results

In this section we present some test cases to show the performance of our methodology. In particular, we will confirm numerically the linear scaling properties of our algorithm for both linear and globular systems, together with the overall excellent timings. We choose here three categories of systems: a chain of alanine peptides, a large, tetrameric protein and its subunits and a spherical cluster of water molecules. The first system is linear, while the latter two are globular; furthermore, both the alanine chain and the water cluster can be made arbitrarily large by simply adding more and more residues. The algorithm, as described in section 4, has been implemented in a stand-alone FORTRAN 77 code. OpenMP shared memory parallelism has been used to parallelize the calculation, by distributing the spheres among the processors. All the calculations for which we report absolute timings have been performed on a single node equipped with two Xeon [email protected] hexacore processors and 24 GB of DDR3@1333MHz physical memory. The ifort FORTRAN compiler from the Intel Parallel Studio XE 2013 suite of compilers has been used together with machine specific optimization flags (-xSSE4.2) and aggressive optimization (-O3). As the elapsed time of a computation can be subject to small variation, due for instance to little differences in the background activity of the operating system, all the computation have been repeated ten times: in the figures, the standard deviation on the timings is reported as an error bar. We report in the following the elapsed time for the solution of the ddCOSMO linear system eq. 20 and adjoint problem eq. 24 and the time needed to compute the explicit contributions to the forces that involve ddCOSMO-related quantities. These three operations are, in fact, the ones which depend explicitly on the algorithm used to solve the ddCOSMO equations; to obtain the total elapsed time of a computation, one needs to add the



time to compute the cavity-related quantities, the potential and the potential derivatives. As we have already mentioned, the potential and potential derivatives are computed in our code by means of the FMM3DLIB; as far as the cavity-related quantities (i.e., the Lebedev quadrature points and weights and the neighbor lists) efficient implementations already exist: 21,29,42 we will not deal with these aspects in this communication. We also report the quantity of memory allocated for the computation, which we divide in basis (i.e., the memory needed to store the solution, the scratch arrays, the external potential the grid and other basic quantities) and geometrical (i.e., the memory needed to store the geometrical parameters tnjk , snjk , . . .): this latter memory storage can be avoided by computing the geometrical quantities on-the-fly. The discretization parameters used for all the computations are η = 0.1, N = 10 and Ng = 302; the threshold for the convergence of the iterative solution (see eq. 36) has been fixed to 10−6 . In

Figure 5: Linear molecules: alanine chains as a test case. The elapsed time to solve the ddCOSMO equations, the adjoint equations, and to compute the explicit contributions to the forces are reported together with the memory needed to perform the computation and to precompute the geometrical parameters Alanine chains 90

1600

direct system adjoint system Forces (matrix part) Memory - cavity and scratch Memory - Geometrical parameters

80

1400

70

1000 50 800 40 600 30

Memory (Integer MegaWords)

1200

60 Wall Time (s)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 36

400

20

200

10 0 0

5000

10000

15000 20000 25000 Number of atoms

30000

35000

0 40000

figure 5, the results for various alanine chains are reported. The number of residues per chain varies in a range from 20 units (i.e., 2000 atoms), to 400 units (40000 atoms). To give a taste of the computational size of the problem, the number of basis functions (i.e., the size of the linear system) varies from more than two hundred thousands up to almost five millions, the average number of neighbors per sphere is approximately 20 and, on average, approximately 255


Page 25 of 36

points out of 302 lye on the internal surface. As stated in section 4, these latter two quantities give a measure of the computational cost per iteration per sphere. More details can be found in the supporting information. It is possible to see how the predicted linear scaling behavior shows in figure 5, where the noisy curves are to be ascribed to the non-uniform memory access of the various cores and to the use of the processors cache, which is determined by the compiler and non trivial to analyze. It is remarkable that also the memory requirements scale perfectly linearly with the size of the system. Notice also how the quantity of memory necessary to store the geometrical parameters is larger than the one needed for the computation: even smaller memory requirements can be obtained by computing the geometrical parameters on-the-fly. To see how this affects the computation, we report in figure 6 the wall time required

Figure 6: In core vs. on-the-fly solution of the linear system for alanine chains. 90

2000

time in core memory in core time on-the-fly memory on-the-fly

80

1800

1400 60 1200 50 1000 40 800 30 600 20


1600

70

Wall Time (s)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


400

10

200

0 0

5000

10000

15000 20000 25000 Number of atoms

30000

35000

0 40000

to solve the equations by precomputing the geometrical parameters (“in core”), which is taken from figure 5, and the time to solve the same equations but by calculating all the parameters on-the-fly. The differences are minimal, with the on-the-fly algorithm requiring only a very small amount of time more than its in core counterpart; on the other hand, the memory required is much smaller: the on-the-fly algorithm is the way to go when moving to very large systems. Finally, the absolute timings are per se impressive: considering the size of the linear systems implied (up to almost 5 × 106 ), it is noteworthy that only a few minutes are required to compute the COSMO polarization energy and contributions to the forces even for very large



systems on a single node. Similar results are obtained for non linear molecules, like a protein.

Figure 7: Globular molecules: Hemoglobin chains as a test case. The elapsed time to solve the ddCOSMO equations, the adjoint equations, and to compute the explicit contributions to the forces are reported together with the memory needed to perform the computation and to precompute the geometrical parameters Hemoglobin and its subunits 350


20 18

300

16

250

14 200

12 10

150

8 100

6


22

Wall Time (s)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 36

4 50 2 0

0 2000

3000

4000

5000 6000 Number of atoms

7000

8000

9000

In figure 7, the various timings are reported for a biological structure (PDB structure 3S48, 43 to which we will refer improperly as Hemoglobin), by starting from one subunit up to the full protein. It is interesting to notice how both the average number of internal points (257–259) and of neighbors per sphere (18) are actually smaller than for the linear chains: we can expect the cost per iteration for the globular protein to be similar or even smaller than the one for the alanine chains. To confirm this in a more systematic way, we run the usual set of calculations on spherical clusters of water molecules, which we cut starting from a pre-equilibrated cubic box. The timings are reported in figure 8; the sizes of the clusters vary between 411 and 31104 atoms, the average number of neighbors between 12 and 15 and the average number of internal points between 236 and 255. Again, a very nice linear scaling behavior is shown also for this very globular system, both for the CPU time and for the memory requirements. Here we see that the average number of neighbors is again much lower while, for large enough clusters, the number of internal points is similar to the one for the alanine chains: we expect the cost of the computation for the spherical water cluster to be smaller than for the linear molecules. To make the comparison easier, the time to solve the ddCOSMO equations are reported for each class


Page 27 of 36

Figure 8: Globular systems: spherical water clusters as a test case. The elapsed time to solve the ddCOSMO equations, the adjoint equations, and to compute the explicit contributions to the forces are reported together with the memory needed to perform the computation and to precompute the geometrical parameters Spherical Water Clusters 50

900


45 40

800

Wall Time (s)

600 30 500 25 400 20 300

15


700

35

200

10

100

5 0 0

5000

10000

15000 20000 Number of atoms

25000

30000

0 35000

Figure 9: Comparison of the elapsed times to solve the ddCOSMO equations for different classes of molecules Elapsed time to solve COSMO equations 80

Alanine chains Water clusters Hemoglobin subunits

70

60

50 Wall Time (s)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


40

30

20

10

0 0

5000

10000

15000 Number of atoms

20000


25000

30000


of systems in figure 9, showing that the computations for the water clusters are in fact faster than the ones for the linear molecules. This is a very remarkable feature of the ddCOSMO algorithm, as the nice, linear scaling behavior depends on the topology of the molecular system rather than on its size: the linear scaling regime can thus be accessed also for globular, not so large systems.

Figure 10: Speed gain by using more shared memory processors. 16

Algorithm Theoretical

14

12

10 Speed gain

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 28 of 36

8

6

4

2

0 0

10

20

30 40 Number of cores

50

60

70

As already mentioned, the ddCOSMO algorithm is suitable for a parallel implementation. In this work, only some basic shared memory parallelism has been introduced; nevertheless, it is possible to see from figure 10 that the parallel behavior of the code is good up to 40-48 cores, after which a plateau is observed. An improved parallel implementation is currently under development. We conclude this section with a remark on the accuracy of the computed forces with respect to the tightness of the convergence of the linear equations. In figure 11, the RMS error on the forces with respect to both numerical derivatives and a very tight computation for increasingly tight convergence thresholds is reported for a chain of 5 alanine peptides. We notice that even with the sleaziest threshold, an accuracy better than 10−5 Hartree/Bohr is already obtained: an accuracy of 10−6 Hartree/Bohr, which can be considered safe both for molecular dynamics simulations and for geometry optimizations is achieved with the 10−6 threshold used for all the simulations reported here.


Page 29 of 36

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


Figure 11: Error on the computed forces for a 5 alanines chain depending on the convergence threshold for the iterative solver. In the left group of columns, we report the RMS difference between the forces computed by means of analytical and numerical derivatives (with a finite step of 10−5 Bohrs) for different stopping criteria; in the right group of columns, we report the RMS difference between the analytical derivatives for a given stopping criterion and the ones obtained with a 10−10 threshold. RMS Error on forces for a five alanines chain 1e-05

10−4 10−5 10−6 10−7 10−8

1e-06

1e-07

1e-08

1e-09 Vs Numerical

6

Vs Converged

Conclusions

In this paper an implementation of the domain decomposition paradigm recently presented by some of us is described. The linear scaling properties in both CPU time and memory are detailed and illustrated by means of several numerical examples. The methodology is promising and paves the way to the application of polarizable continuum methods to molecular dynamics simulation. Starting from the present, several directions are worthy to be explored, both from a theoretical and a computational point of view. In particular, the extension of the algorithm to the PCM model and to solutes described by means of a quantum mechanical methodology are currently actively investigated. Further refinements in the definition of the cavity, or in the way to decompose the cavity in spheres may also improve the computational efficiency of the ddCOSMO scheme.



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Acknowledgments This work was supported in part by the France-Berkeley Fund, the ANR Manif, and the French state funds managed by CALSIMLAB and the ANR within the Investissements d’Avenir programme under reference ANR-11-IDEX-0004-02.

A Efficient evaluation of the spherical harmonics and their gradient In this appendix, we illustrate how to efficiently compute a real SH set (and possibly its gradient) at a given point on the unit sphere s = (x, y, z), like any snjk vector. We recall that

where

 √  |m|   Clm Pl (cos θ ) 2 cos(mφ ),     m Yl (θ , φ ) = C0 P0 (cos θ ) l l      √  Cm P|m| (cos θ ) 2 sin(mφ ), l l Clm = |m|

is a normalization constant and Pl

s

01 dn χn )dn pη (tn )(tn ) Yl′ (sn )sn [σi ]ml′ n I( j,i)

δik +

∑

Finally, for j 6= i 6= k: KCi = δ fnj >1

′ ′ ′ 4π 1 I( j,k)∩S( j,i) wnYlm (yn )(dnj )2 χnjk (tnjk )l Ylm′ (snjk )pη′ (tnji )snji [σk ]lm′ ∑ ∑ ∑ ′ +1 R 2l i n k∈N \{i} l ′ m′ j


Page 32 of 36

Page 33 of 36

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


References (1) Tomasi, J.; Persico, M. Chem. Rev. 1994, 94, 2027–2094. (2) Honig, B.; Nicholls, A. Science 1995, 268, 1144–9. (3) Roux, B.; Simonson, T. Biophys. Chem. 1999, 78, 1–20. (4) Cramer, C. J.; Truhlar, D. G. Chem. Rev. 1999, 99, 2161–2200. (5) Orozco, M.; Luque, F. Chem. Rev. 2000, 100, 4187–4225. (6) Tomasi, J.; Mennucci, B.; Cammi, R. Chem. Rev. 2005, 105, 2999–3093. (7) Klamt, A. WIREs Comput. Mol. Sci. 2011, 1, 699–709. (8) Marenich, A. V.; Cramer, C. J.; Truhlar, D. G. J. Phys. Chem. B 2009, 113, 6378–6396. (9) Klamt, A.; Schuurmann, G. J. Chem. Soc., Perkin Trans. 2 1993, 799–805. (10) Chipman, D. J. Chem. Phys. 1999, 110, 8012–8018. (11) Chipman, D. M. J. Chem. Phys. 2006, 124, 224111. (12) Miertus, S.; Scrocco, E.; Tomasi, J. Chem. Phys. 1981, 55, 117–129. (13) Cancès, E.; Mennucci, B.; Tomasi, J. J. Chem. Phys. 1997, 107, 3032–3041. (14) Cancès, E.; Mennucci, B. J.Math. Chem. 1998, 23, 309–326. (15) Mennucci, B.; Cancès, E.; Tomasi, J. J. Phys. Chem. B 1997, 101, 10506–10517. (16) Barone, V.; Cossi, M. J. Phys. Chem. A 1998, 102, 1995–2001. (17) Cossi, M.; Rega, N.; Scalmani, G.; Barone, V. J. Comput. Chem. 2003, 24, 669–681. (18) Lipparini, F.; Scalmani, G.; Mennucci, B.; Cancès, E.; Caricato, M.; Frisch, M. J. J. Chem. Phys. 2010, 133, 014106.



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(19) Lipparini, F.; Scalmani, G.; Mennucci, B.; Frisch, M. J. J. Chem. Theory Comput. 2011, 7, 610–617. (20) Foresman, J.; Keith, T.; Wiberg, K.; Snoonian, J.; Frisch, M. J. Phys. Chem. 1996, 100, 16098–16104. (21) Scalmani, G.; Frisch, M. J. J. Chem. Phys. 2010, 132, 114110. (22) York, D.; Karplus, M. J. Phys. Chem. A 1999, 103, 11060–11079. (23) Lange, A. W.; Herbert, J. M. J. Phys. Chem. Lett. 2010, 1, 556–561. (24) Lange, A. W.; Herbert, J. M. J. Chem. Phys. 2010, 133, 244111. (25) Mennucci, B. J. Phys. Chem. Lett. 2010, 1, 1666–1674, and references therein. (26) Mennucci, B. WIREs Comput. Mol. Sci. 2012, 2, 386–404. (27) Mennucci, B., Cammi, R., Eds. Continuum Solvation Models in Chemical Physics; Wiley, New York, 2007. (28) Quarteroni, A.; Valli, A. Domain decomposition methods for partial differential equations; Oxford Science Publications, Oxford, 1999. (29) Scalmani, G.; Barone, V.; Kudin, K.; Pomelli, C.; Scuseria, G.; Frisch, M. Theor. Chem. Acc. 2004, 111, 90–100. (30) Cammi, R.; Tomasi, J. J. Comput. Chem. 1995, 16, 1449–1458. (31) Greengard, L.; Rokhlin, V. J. Comput. Phys. 1987, 73, 325–348. (32) Stamm, B.; Cancès, E.; Maday, Y. Domain decomposition for implicit solvation models. Submitted for publication. (33) Steindal, A. H.; Ruud, K.; Frediani, L.; Aidas, K.; Kongsted, J. J. Phys. Chem. B 2011, 115, 3027–3037. (34) Lipparini, F.; Barone, V. J. Chem. Theory Comput. 2011, 7, 3711–3724.


Page 34 of 36

Page 35 of 36

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


(35) Caprasecca, S.; Curutchet, C.; Mennucci, B. J. Chem. Theory Comput. 2012, 8, 4462– 4473. (36) Boulanger, E.; Thiel, W. J. Chem. Theory Comput. 2012, 8, 4527–4538. (37) Lipparini, F.; Cappelli, C.; Barone, V. J. Chem. Theory Comput. 2012, 8, 4153–4165. (38) Cancès, E. In Continuum Solvation Models in Chemical Physics; Mennucci, B., Cammi, R., Eds.; Wiley, New York, 2007; Chapter 1.2, pp 29–48. (39) Cheng, H.; Greengard, L.; Rokhlin, V. J. Comput. Phys. 1999, 155, 468 – 498. (40) The software package FMM3DLIB is publicly available at http://www.cims.nyu. edu/cmcl/fmm3dlib/fmm3dlib.html (accessed April 13, 2012). (41) Pulay, P. Chem. Phys. Lett. 1980, 73, 393–398. (42) Cossi, M.; Scalmani, G.; Rega, N.; Barone, V. J. Chem. Phys. 2002, 117, 43–54. (43) The structure can be downloaded at http://www.rcsb.org/pdb/explore/ explore.do?structureId=3S48.



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 12: Graphical TOC - For Table of contents only.


Page 36 of 36

Fast Domain Decomposition Algorithm for Continuum Solvation

Recommend Documents