Molecular Mechanics Simulation of

(14−18) The subset can also be kept smaller, so that the MM model remains relatively ..... FlexMD uses the Atomic Simulation Environment (ASE)(56) l...
1 downloads 0 Views 1021KB Size
Subscriber access provided by Kaohsiung Medical University

Dynamics

Accurate QM/MM Simulation of Aqueous Solutions with Tailored MM Models Tao Jiang, Stanislav Simko, and Rosa E. Bulo J. Chem. Theory Comput., Just Accepted Manuscript • DOI: 10.1021/acs.jctc.7b01218 • Publication Date (Web): 05 Jun 2018 Downloaded from http://pubs.acs.org on June 5, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 39 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Accurate QM/MM Simulation of Aqueous Solutions with Tailored MM Models Tao Jiang,∗,† Stanislav Simko,‡ and Rosa E. Bulo‡ ´ †Laboratoire de chimie, Ecole Normale Sup´erieure de lyon, 46 all´ee d’italie, 69007 Lyon, France ‡Inorganic Chemistry and Catalysis group, Debye Institute for Nanomaterials Science, Utrecht University, Universiteitsweg 99, 3584 CG Utrecht, The Netherlands E-mail: [email protected]

Abstract In recent years, quantum mechanical / molecular mechanical (QM/MM) methods have emerged that are designed specifically for chemical reactions in water. Despite the many advances, a remaining problem is that the patchwork of QM and MM descriptions changes the solvent structure. In a solvent as intricately connected as water, such structural changes can alter a chemical process even across large distances. Examples of structural artefacts in QM/MM water include density accumulation at the QM/MM boundary, decreased order, and density differences between regions. These issues are mostly apparent if the difference between the QM and the MM model is very large, which is often the case with water models. Here we assess the QM/MM performance of simple MM models that are specifically parameterized to match selected data from a QM simulation of bulk water. To this end, we introduce a novel MM model (PM6(DH+)-EFF) that reproduces PM6-DH+ water properties. We also assess a recent PBE-DFT based MM model (PBE-EFF) that reproduces structural properties of bulk

1

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

water simulated with PBE-DFT. Both models consist solely of tabulated potential energy terms for interactions between atom pairs. We compare the matched QM/MM results (PBE-DFT / PBE-EFF and PM6-DH+ / PM6(-DH+)-EFF) with those from mismatched QM/MM simulations (PM6-DH+/PBE-EFF). The mismatched simulations reflect issues similar to those reported for other mismatched QM/MM pairs. The matched simulations yield very good results, with water structures that barely deviate from the QM reference. In view of these findings, we strongly recommend adoption of specifically parameterized MM models in the QM/MM simulation of chemical processes in water.

1

Introduction

Water is at the center of all lives on earth, and it is the solvent for most of the chemical processes in nature. It is also more and more commonly used as the solvent in organic synthesis and homogeneous catalytic reactions. 1–3 While most organic solvents only affect reactions via heat and mass transfer, water influences reactivity and selectivity in many more ways. 4 Examples include charge transfer through hydrogen bonding, direct participation of the solvent in the reaction via proton transfer, and structural effects across long distances. 5 Simulating these processes then requires very large molecular models, which in turn dictate the use of configuration sampling techniques to explore the many degrees of freedom. In addition, chemical reactions can only be modeled accurately using quantum mechanical (QM) models to directly describe the electronic structure. It is possible to combine these two requirements; a purely QM description of a model aqueous solution with molecular dynamics (MD) as a sampling technique. 6–9 However, due to the tremendous computational demand of such simulations, the (periodic) molecular models are restricted to less than one hundred water molecules, and the simulations to several tens of picoseconds. A way to enable the use of large model systems and longer simulation times is to exploit the fact that chemical reactions generally involve only local changes in the electronic structure. It is therefore an 2

ACS Paragon Plus Environment

Page 2 of 39

Page 3 of 39 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

acceptable approximation to describe only a small reactive region at electronic resolution (with a QM model), while the interaction with the rest of the solvent is described at atomic resolution, using a molecular mechanics (MM) model.

Figure 1: Schematic representation of the QM/MM approach for aqueous solutions. The accuracy of the description is reflected in the size of the molecules, with the large and small water molecules representing QM and MM water respectively. The “reactive” solute molecule in this example is a (yellow) water molecule.

A drawback of adopting a multi-scale approach for something as structured as an aqueous solution is that the patchwork of descriptions is immediately apparent from the structure of the solvent. The structural effect goes beyond a local reflection of the different equilibrium water structures for the two models. The multi-scale approach often results in a general under-structuring throughout the solution, 10 and sometimes - if direct interaction between QM-MM molecule pairs is preferred over QM-QM interactions - water density accumulations at the QM/MM boundary are observed. 11 In addition, the different pressures inherent to the models result in molecule exchange across the boundary, leading to anomalous water densities in the different areas even in fixed volume (NVT) simulations. Due to these factors, there

3

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

have been several examples of discrepancies between results from fully QM and QM/MM simulations that go beyond solvent structure only. For example, proton transfer rates were affected, 12,13 which was attributed to an altered configuration of the QM/MM transition states compared to fully QM simulations. 13 Solving the density problem is crucial for main stream application of QM/MM methods for the simulation of chemical reactions in water. If all above issues follow from the differences between the QM and the MM model, then logic dictates that minimizing this difference should improve results significantly. Generally, the two models differ significantly, as QM models often yield over-structured water, whereas MM models are mostly parametrized to reproduce the experimental structure and density. The ideal MM model in a QM/MM simulation would match the QM model completely, but still retain the efficiency required of MM computations. This MM efficiency, however, comes from a set of radical simplifications, which inevitably affect the behavior of the system. Since it is not possible for an MM model to replicate all properties of a QM system, the best possible model would display a selected subset of QM properties. The subset can be very large, which can be achieved with the highly parametrized complex MM models developed with machine learning techniques. 14–18 The subset can also be kept smaller, so that the MM model remains relatively simple. In the field of coarse grained simulations it has long been the practice to match selected interactions of the simple coarse grained system to the results of atomistic MM simulations. 19 Examples of matched properties are the forces between pairs of particles (force matching methods), 20–22 and pair radial distribution functions (using iterative Boltzmann inversion). 23 Similar approaches have been used to develop MM models based on QM results, 24–27 most notably the work of Wang et al., who constructed ab initio MM models using force matching (FM) to the QM results from a QM/MM simulation. 28,29 The goal of their work, however, was not to improve QM/MM results, but simply to create very reliable MM models. The authors never addressed the performance with respect to the above-mentioned QM/MM issues, despite the fact that they had QM/MM results with matching QM and MM models available.

4

ACS Paragon Plus Environment

Page 4 of 39

Page 5 of 39 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Recently, Fritsch et al. applied a combination of FM and iterative Boltzman inversion (IBI) to create a simple MM model that matches many features of QM bulk water, specifically, water equilibrated with the density functional theory (DFT) Perdew, Burke and Ernzerhof (PBE) functional. 30 The model potential energy is composed of bonded and nonbonded pair interaction terms. The bonded interactions consist of two distance depended terms (O-H and H-H), and an angle dependent term (H-O-H). The nonbonded interactions contain contributions from all three possible pair interactions between the two atom types (O-O, O-H, H-H). In their two-step approach the authors first used FM to obtain those potential energy terms that minimize the deviation of the obtained average forces on each atom from the QM reference. In a second step the IBI method was used to construct a correction to the non-bonded energy terms, based on the deviation from the reference of the pair radial distributions of all three non-bonded pairs. The obtained energy and force terms were stored in tabulated form. This procedure yields an effective MM model that is potentially ideal for the MM role in accurate QM/MM simulations. Another very important issue with QM/MM simulations of aqueous solutions relates to the diffusivity of water molecules. It is often crucial for a proper description of solutesolvent charge transfer to include in the QM description (at least) those water molecules that interact directly with the reactants (Figure 1). However, during the course of a simulation these water molecules easily diffuse away from a reactive site, to be replaced by other water molecules that may not have explicit electrons to transfer to the (QM) reactant. In a standard QM/MM setup, the QM (or MM) character of the diffusing atoms is fixed throughout the simulation, 31–37 yielding this approach unsuitable. In recent years, two types of QM/MM methods have emerged that specifically tackle this diffusivity issue. The first type, constrained QM/MM, constrains the QM water molecules within the region in which a reaction may happen. The second type, adaptive QM/MM, allows the molecules to adapt their resolution (from QM to MM or vice versa) as they move in or out of the reactive region. 38–40 The two approaches each have advantages and disadvantages, which are reflected

5

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

in the structure of the QM/MM solution. 10 The Flexible Inner Region Ensemble Separator (FIRES) method is a very sophisticated version of the constrained approach, with an active region size that is flexibly defined. 41 The solvent molecules within a sphere around the QM center are classified as QM molecules prior to the simulation. A constraining potential is then placed on the surface of this sphere, keeping the MM molecules out. The exact radius of the walled-off volume is determined by the position of the QM molecule furthest from the QM center, and varies throughout the simulation. In this manner, the system can adjust the density of the QM region to the underlying energy surface. While the imposed constraint alters the dynamics with respect to an unconstrained system, it has been shown that if such a constraining potential is infinitely steep, equilibrium properties are the same as those of a parent (non-constrained) system. However, to propagate the system in an MD simulation, the infinitely steep potential needs to be replaced by a soft wall. In this case the FIRES method can only retain correct equilibrium properties if the two regions are described by the same model. 41 When two distinct models are used for the two regions, these equilibrium properties change. 10,42 This manifests in varying degrees of accumulation of water molecules at the boundary between the two regions. Adaptive QM/MM methods have already been applied to several reactions in water and have provided insights in an efficient manner. 43–45 The simplest method is termed Abrupt, due to the abruptly changing description of the solvent molecules as they cross a predefined QM/MM boundary. Simulations with the Abrupt method require very strong thermostats to correct for the local heating caused by the non-continuous potential energy surface. However, if only equilibrium quantities are required, the Abrupt method yields surprisingly good results, producing water structure comparable to the structure obtained with the most sophisticated adaptive QM/MM methods available. 39,46 Those sophisticated adaptive QM/MM methods define a transition region where the description of the solvent molecules changes continuously, as they cross between QM and MM description. Differencebased Adaptive Solvation (DAS), one flavor of continuous adaptive QM/MM, does not re-

6

ACS Paragon Plus Environment

Page 6 of 39

Page 7 of 39 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

quire extreme measures for temperature regulation, and results in time-dependent properties that are significantly more reliable than those obtained with the Abrupt method. 10,39 In this work we assess the QM/MM performance of two MM models: The PBE-EFF model of Fritsch et al., 30 and a new PM6-(DH+)-EFF model that we developed using the same procedure. We combine the two MM models with matching QM models (PBE-EFF and PM6-DH+ 47,48 respectively), in a QM/MM simulation of a periodic water box. To confirm that a match is necessary a mismatched QM/MM pair (PM6-DH+/PBE-EFF) was also tested. It should be noted that the mismatch of the latter pair is not as large as that of any pair that involves a more conventional MM model, since all models used here display the characteristic overstructuring of QM water. The combination of PM6-DH+ and PBE-EFF is therefore a more stringent test of the sensitivity of QM/MM simulations to differences between models than previously obtained. We use the FlexMD Python library 49 to perform the QM/MM simulations, and implemented a new feature that handles tabulated pair potentials (which are highly non-standard). We then tested two adaptive QM/MM methods (Abrupt and DAS), as well as the constrained FIRES QM/MM method to control diffusion across the QM/MM boundary. The Abrupt and DAS methods were chosen for their (relatively) reliable reproduction of water structure, and FIRES specifically for its problems at the QM/MM boundary. The results suggest that indeed our new tailored approach provides a general solution to the structural problems commonly encountered in QM/MM simulations of aqueous systems. This paper is organized as follows: Section 2 describes some background on the parametrization procedure used in the development of the new PM6(-DH+)-EFF model, and on the use of tabulated force fields in simulations. In Section 3 the practical details of the parametrization and implementation are presented, as well as details on all simulations. In Section 4 we discuss the performance of the new MM models in QM/MM simulations of bulk water. We conclude this section with a single example of a simulation of methanol in water, to demonstrate the value of the MM water models in the study of aqueous solutions. Finally,

7

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

we present a summary of the conclusions in Section 5.

2

Theory

The development of the MM model requires a training set consisting of atomic forces and radial distributions from a reference QM simulation. The first step is then to use the atomic forces to create an initial set of parameters, using the non-iterative force matching (FM) procedure. The non-bonded parameters of the new MM model are then refined using iterative Boltzmann inversion (IBI), which optimizes the fit to the atomic radial distributions. A schematic representation of the process can be found in Figure 2. In this section we describe the theoretical background of; (i) cubic splines, as they are used in the FM procedure, (ii) the FM method applied to fit both the PBE-EFF and the PM6-(DH+)-EFF potentials, (iii) the IBI used to refine the potentials from the FM fit to better match the reference atomatom radial distributions, and (iv) the smooth cubic splines that are used in the FlexMD implementation of the new feature that fits cubic splines to the tabulated potential energies at simulation time.

Figure 2: A schematic overview of the entire fitting procedure of the MM model.

8

ACS Paragon Plus Environment

Page 8 of 39

Page 9 of 39 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

2.1

Cubic Splines

A spline is often used to fit a smooth analytical function to a set of m numerical data points ((x1 , y1 ), ..., (xm , ym )). In this work, we use splines in the FM procedure, and again at simulation time, to convert the tabulated potential energy values to analytical form. A cubic spline is a numerical function that is piecewise constructed from third order polynomial functions. The pieces of the spline are connected at a set of n control points xk (k = 1, ..., n) known as knots. These knots don’t necessarily coincide with the m x-values of the supplied data set. The spline function f (x) should pass as close as possible through the m data values (xi , yi ). The cubic spline ensures continuity of the function at the knots, as well as continuity of the first and second derivatives. A cubic spline function f (x) with n knots consists of n − 1 pieces, each of which is a separate third order polynomial. Instead of defining a spline in terms of the functional form of its pieces it is common practice to define a spline as a linear combination of a set of known spline basis functions χi (x), or B-splines. The B-spline functions contributing to a spline f (x) are themselves spline functions of the same order as f (x) (cubic in this discussion), with knots at the same xi values. A cubic B-spline χi (x) only has nonzero value over the range (xi : xi+4 ). A set of cubic B-splines only forms a complete basis over the region where three B-splines have non-zero value. Five B-splines (χ1 (x), χ2 (x), ..., χ5 (x)) form a complete basis only over the region spanned by the central three points (x4 : x6 ). In general, a cubic spline function f (x) that is defined over a region (x1 : xn ) can be exactly expressed as a linear combination of a set of n + 2 cubic B-splines f (x) =

n−1 X

αi χi (x).

(1)

i=−2

In this expression, αi is the weight assigned to each basis function. When fitting a spline function f (x) to a data set, the parameters αi are optimized, ensuring that the spline function values at the m data points xi meet the desired values yi . This is done by minimizing the

9

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 39

sum-of-squares

ζ(~ α) =

m X

(yi − f (xi , α ~ ))2 =

i=1

m X i=1

yi −

n−1 X

2 αk χk (xi ) .

(2)

k=−2

with respect to the vector α ~ containing the weights αk of the contributing B-splines χk (x). Increasing the number of knots n of the spline function f (x) improves the fit with the data points ((x1 , y1 ), ...(xm , ym )), resulting in a smaller value for ζ(~ α). If the knots coincide exactly with the points in the data set, the least squares equation has an exact solution.

2.2

Force Matching

The effective force fields are fit to the results of first-principles MD simulations using the force matching (FM) method, 20–22 as commonly applied in the process of coarse graining atomistic interactions. 19 The target potential energy function is defined as a sum of bonded and non-bonded terms that differ for different pairs of atom types (O-O, O-H, H-H). The non-bonded (intermolecular) interactions between H-H, H-O and O-O atoms are chosen to be spherically symmetric functions Eβnbi βj (rij ), with rij the distance between atom i and atom j. The index βi represents the atom type of atom i. The intra-molecular interactions consist of bonded terms and angular terms. The bonded terms Eβb i βj (rij ) are non-zero if (βi ,βj ) equals (O, H), (H, O) or (H, H). The angular term Eβθi βj βk (θijk ) is only non-zero if (βi , βj , βk ) equals (H, O, H). The non-zero H-H bonded interaction is added to take into account the correlations between the stretching and bending modes. According to Fritsch et. al., 30 it considerably improves the resulting 2-dimensional distribution of the H-O-H angle and O-H bond length when compared to the reference first-principles results. The total energy is now written as,

~ = E EF F (R)

X i