Multistep Explicit Solvation Protocol for Calculation ... - ACS Publications

Dec 4, 2018 - Cody M. Sterling† and Ragnar Bjornsson*†‡ .... Microsoft cofounder Paul G. Allen, who died from complications of non-Hodgkin's lym...
0 downloads 0 Views 3MB Size
Subscriber access provided by University of Winnipeg Library

Dynamics

A Multi-Step Explicit Solvation Protocol for Calculation of Redox Potentials Cody M. Sterling, and Ragnar Bjornsson J. Chem. Theory Comput., Just Accepted Manuscript • DOI: 10.1021/acs.jctc.8b00982 • Publication Date (Web): 04 Dec 2018 Downloaded from http://pubs.acs.org on December 7, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

A Multi-Step Explicit Solvation Protocol for Calculation of Redox Potentials

Cody M. Sterling1, Ragnar Bjornsson1,2* 1

Science Institute and Department of Chemistry, University of Iceland, Dunhagi 3, 107 Reykjavík, Iceland 2

Department of Inorganic Spectroscopy, Max-Planck-Institut für Chemische Energiekonversion, Stiftstrasse 34-36, 45470 Mülheim an der Ruhr, Germany * E-mail: [email protected]

ABSTRACT The calculation of molecular redox potentials in aqueous solution presents a challenge to quantum chemistry due to the need to calculate charged, open-shell species experiencing large solvent effects. Traditionally, redox potentials are calculated via the use of density functional theory and continuum solvation methods but such protocols have been found to often suffer from large errors, particularly in the case of aqueous solution. While explicit solvation models hold promise of higher accuracy to describe solvent effects in general, their complicated use and lack of well-defined, reliable protocols has hindered their adoption. In this study, we present an explicit-solvation-based approach for the calculation of molecular redox potentials. We combine the use of affordable semi-empirical QM/MM molecular dynamics (making use of the recently proposed GFN-xTB method by Grimme et al.) for both redox states and use the linear response approximation to relate vertical ionization energies to the adiabatic redox potential. Simulation length, averaging over snapshots, and accounting for bulk and polarization effects are systematically evaluated using phenol as a working example. We find that it is crucial to reliably account for bulk solvation effects in these calculations, as well as polarization effects which we divide up into short-range and long-range contributions. The short-range polarization contribution is accounted for via QM-region expansion while the long-range contribution is accounted for via Drude-polarizable QM/MM. Our multi-step protocol has been coded to be used in a fully automatic way in a local version of Chemshell. It has been evaluated on a large test set of oxidation potentials of organic molecules and found to give gas-solution redox shifts with a mean absolute error of 0.13 eV with respect to experiment, compared to mean absolute errors of 0.26 and 0.21 eV with CPCM and SMD continuum models, respectively.

ACS Paragon Plus Environment

1

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1. INTRODUCTION Redox reactions comprise a large portion of common chemical reactions such as those taking place in batteries, the combustion of fossil fuels, photovoltaics, environmental chemistry, molecular redox catalysis, and many biological processes. Because it involves the rapid displacement of a charged electron, the redox potential can be very sensitive to solvation or other environmental effects.1 There are two main frameworks for the treatment of solvation effects within quantum chemistry: implicit (continuum) and explicit solvation.2 Continuum solvation models treat the solvent as a dielectric medium defined by an experimentally-measured dielectric constant ε. Polarizable continuum models account for the mutual polarization between the solute and the dielectric solvent reaction field and provide a computationally efficient way of accounting for long-range solvent electrostatic effects as there are no solvent degrees of freedom to account for in the system.3 The SMD solvation model4 is a recent popular model that accounts for bulk electrostatics via the integral-equation formalism of polarizable continuum models via optimized radii to generate the continuum cavity and accounts for non-electrostatic contributions via atomic surface tensions. However, because the continuum approach does not include actual solvent molecules in the calculation (mixed cluster-continuum calculations being an exception), continuum solvation models neglect specific close-range solute-solvent interactions such as hydrogen bonding and van der Waals effects or more specific solvent structure around the solute. Hydrogen bonding, in particular, has been shown to be important in the accurate computation of redox potentials.5 Despite these shortcomings, continuum solvation models are vital tools in computational chemistry and can often yield redox potentials to useful accuracy (especially for non-aqueous solvents), though the accuracy can depend on the model being used6 and which DFT method it is used with. Recent studies by Schlegel and coworkers have demonstrated that calculations of acidity constants, a related property, are greatly improved by a mixed cluster-continuum approach.7,8 Such mixed clustercontinuum approaches may thus be a useful tool for some systems but introduce challenges associated with multiple solvent molecule conformations around the solute. Work on reduction potentials of transition metal complexes by Rulíšek and coworkers9,10 has highlighted the challenges associated with transition metal redox chemistry in aqueous solution where errors of 1 V or higher can be found. Adding explicit water molecules was found to improve the results. The SMD model and the COSMO-RS11 model (which goes beyond bulk electrostatics of the regular COSMO model12) were found to be an improvement over older continuum models, but challenges remain with highly charged species. A recent benchmarking study by Pantazis and coworkers13 compared calculated gas phase ionization energies and aqueous redox potentials (using continuum solvation models) for a test set of organic molecules to experimental data in the gas phase and the aqueous phase. The gas phase ionization energy problem could be improved considerably by the use of wavefunction theory, especially coupled cluster theory instead of density functional theory approximations.

ACS Paragon Plus Environment

2

Page 2 of 35

Page 3 of 35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

The recent availability of a near linear-scaling local correlation method for open-shell systems, DLPNO-CCSD(T),14 opens the door to more affordable and more accurate calculations of redox processes of even larger systems. As demonstrated in the study by Pantazis and coworkers,13 however, improved gas phase ionization energies do not necessarily translate into improved aqueous redox potentials, as the continuum solvation model was identified as the main source of error in calculated oxidation potentials when compared to aqueous experimental data. A mean absolute error of 0.28 eV (and maximum error of 1.02 eV) was still present in these calculations, even when the gas phase problem was treated by accurate wavefunction theory and the solvation contribution via a DFT-continuum approach (TPSS-SMD). The study also demonstrated the problem of error cancellation that can occur with errors from the solvation model and the DFT approximation, highlighting the necessity of separating the errors. In contrast to the pure implicit approach, explicit solvation accounts for solute-solvent effects by introducing actual solvent molecules into the calculation instead of a continuum. This allows for the recovery of the explicit interactions neglected in continuum solvation, which places it on more solid theoretical grounding, but comes at a steep computational cost due to the system size increase and introduces the need to perform sampling as well as the question of the number of solvent molecules needed. The Sprik group has successfully employed periodic DFT-based molecular dynamics to compute redox potentials and acidity constants of small organic and inorganic molecules.15 Redox potentials were identified as a more difficult property than acidity constants due to the open-shell nature of 1-electron redox potentials and underestimation of the energy gap of the extended states of the solvent. Hybrid functionals were found to be more accurate than GGA functionals, likely due to the opening up of the bandgap. The problem of system size increase can be helped by the use of QM/MM methodology where the system can be divided into a chemically important region (e.g. the solute) treated with quantum mechanics (QM) and a second region (e.g. the solvent) which can be described by simpler molecular mechanics (MM). This greatly reduces the computational cost of calculating very large systems but also requires the molecular mechanics forcefield to be accurate enough to describe the solvent-solvent interactions as well as the solute-solvent interactions during both the dynamics and the redox energy calculations. In 2012 Van Voorhis and coworkers studied redox potentials of first-row transition metal complexes via QM/MM calculations based on the linear response approach and compared the results to those from continuum solvation models.5 They demonstrated that a QM/MM treatment (using the B3LYP functional for the QM region and polarizable forcefields for the MM region) for a small test set of transition metal complexes (mostly water-ligated) outperformed a treatment with continuum models, impressively reducing the RMSD from 1.48 V to 0.36 V. The short-range solute-solvent structure, mostly induced by hydrogen bonding, was identified as the main physical effect distinguishing the explicit results from the implicit results. A more recent article from the Van Voorhis group explores using a fixed QM density in the course of a QM/MM molecular dynamics simulation and reports good

ACS Paragon Plus Environment

3

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

agreement with traditional QM/MM simulations, suggesting an effective route to reduce the computational cost of the dynamics.16 Other recent studies employing QM/MM-based protocols for the calculation of redox potentials come from the groups of Arey, Krylov, Bravaya, and Ghosh.17,18,19,20,21 These studies have all employed effective fragment potentials (EFPs)22,23 as the MM method for taking into account the explicit solvent effects and either EOM-IP-CCSD24,25 or ωB97X26 as the QM method for the redox energies. The molecular dynamics trajectories have either been simulated using classical forcefields or DFT-based QM/MM. Krylov and coworkers19 developed a computational protocol based on the linear response approximation for describing the redox properties of phenol and phenolate and validated it by comparison to the experimental redox potential and to new experimental valence photoemission measurements (probing the vertical ionization energies). Arey and coworkers used a similar protocol but discussed the importance of accounting for effects beyond linear response and compared in detail their results to experimental data for aniline, phenol, methoxybenzene, imidazole, and dimethylsulfide.18 Bravaya and coworkers have highlighted bulk effects as an important factor to consider when employing spherical droplet models.20 Ghosh and coworkers have recently discussed a biased sampling approach to reduce the computational cost for these types of simulations where multiple snapshots are required for appropriate averaging.21 These studies have demonstrated that explicit solvation protocols are indeed capable of computing redox potentials to a promising accuracy (~0.1-0.3 eV) but also demonstrate the complexity of the problem as the protocols are expensive and results from different protocols for the same molecules (phenol in particular) do give slightly different results. Furthermore, these protocols have so far only been tested on a limited number of systems, possibly due to the lack of fully automatic protocols. This study introduces a multi-step explicit solvent protocol for computing molecular redox potentials from molecular dynamics trajectories. In contrast to previous studies, we employ a semi-empirical/classical approach for affordable molecular dynamics simulations and, instead of a single solvation model (such as implicit models or EFPs) for describing the environment in the redox calculation, we use a combination of non-polarizable QM/MM, increased QM regions for capturing short-range polarization effects and a polarizable MM model for longer-range polarization effects. Furthermore, bulk solvation effects are estimated via extrapolation using multiple-sized MM water droplets. The focus of this paper is largely on the accurate and efficient calculation of the gas-solvent redox shift, i.e. the solvation contribution to the redox potential, but we also discuss affordable and accurate coupled cluster calculations for the gas phase ionization energy. The structure of the article is as follows: first, theoretical aspects of redox potentials are discussed and the general structure of the multi-step protocol is explained. Following a discussion of specific computational details, a step-by-step application of the protocol is demonstrated using phenol as a test system. Finally, the results of the protocol on a large test set of organic molecules are discussed and compared to the results from continuum solvation calculations.13

ACS Paragon Plus Environment

4

Page 4 of 35

Page 5 of 35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

2. THEORY AND METHODOLOGY In this study we will exclusively be discussing 1-electron oxidation reactions involving a neutral, closed-shell molecule M. It is convenient to describe solution phase redox processes in terms of a thermodynamic cycle as shown in Figure 1. Here the free energy of oxidation, ∆𝐺#$ , in solution of molecule M is connected to the gas phase adiabatic ionization energy, 𝐼𝐸'() , of M and the solvation free energies, ∆𝐺)#*+ , of the two redox states of the molecule.

Figure 1. Thermodynamic cycle for calculation of the free energy of oxidation of molecule M in solution. 𝐼𝐸'() represents the adiabatic free energy change of oxidation in the gas phase, ∆𝐺#$ is the adiabatic free energy change of oxidation in solution, and ∆𝐺)#*+,-./0 and ∆𝐺)#*+,#$ are the free energies of solvation for the neutral species and oxidized radical, respectively.

The free energy change of the solute being oxidized in solution can hence conveniently be divided up into a gas phase and solvation contribution: ∆𝐺#$ = 𝐼𝐸'() + ∆∆𝐺)#*+

(1)

∆∆𝐺)#*+ = ∆𝐺)#*+,#$ − ∆𝐺)#*+,-./0

(2)

and the free energy change is related to the aqueous oxidation potential by the Nernst equation: 𝐸9 =

∆𝐺#$ − 𝑆𝐻𝐸 𝑛𝐹

(3)

where 𝑛 is the number of electrons (here 𝑛 = 1) and 𝐹 is Faraday’s constant. 𝐼𝐸'() in Eq. 1 is the gas phase adiabatic ionization energy and ∆∆𝐺)#*+ , defined in Eqs. 1 and 2, is here referred to as the gas-solution redox shift; it represents the effect of solvation on 𝐼𝐸'() to yield the oxidation free energy change in solution and is the difference in free energy of solvation between the oxidized

ACS Paragon Plus Environment

5

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 35

and neutral molecules. 𝑆𝐻𝐸 is the absolute potential of the standard hydrogen electrode; the value 4.28 V, which neglects the surface potential contribution,2 is used in this work. These terms can be calculated separately, and ideally both the gas phase ionization energy and the solvation contribution are calculated with a high level (HL) theory: ?@ ?@ ?@ ∆𝐺#$ = 𝐼𝐸'() + ∆∆𝐺)#*+

(4)

which should then yield the free energy of oxidation and hence redox potential at a high level of theory. ?@ The 𝐼𝐸'() term can be calculated quite accurately with coupled cluster theory; the CCSD(T)based composite protocol by Pantazis and coworkers13 gives a MAE of only 0.05 eV compared to experimental reference ionization energies for a test set of organic molecules (with some updated reference values as discussed in Table S2). Using modern local coupled cluster methodology, such as the DLPNO-CCSD(T) method in ORCA,14,27,28 a coupled cluster ?@ calculation of the 𝐼𝐸'() term is affordable for many molecules. The accurate and affordable ?@ calculation of 𝐼𝐸'() should thus not be an obstacle for the calculation of redox potentials to an accuracy aim of ~0.1 V. ?@ The calculation of the ∆∆𝐺)#*+ term at a high-level of theory such as coupled cluster theory, however, is infeasible. We note that implicit solvation calculations with charge distributions from correlated wavefunctions are typically not available and such calculations would be costly (requiring a relaxed wavefunction density) and, more importantly, the accuracy limited by the continuum representation of the solvent. The use of a high-level of theory for explicit solvation is also difficult due to the large number of energy calculations required for the dynamics and vertical ionization energy steps (particularly when expanded QM regions are used, as discussed later) and so it is convenient to use some lower level (LL) theory that approximately reproduces the higher-level solvent shift: ?@ @@ ∆∆𝐺)#*+ ≈ ∆∆𝐺)#*+

(5)

Eq. 1 can then be rewritten as follows for the LL theory: @@ @@ @@ ∆∆𝐺)#*+ = ∆𝐺#$ − 𝐼𝐸'()

(6)

@@ This study focuses on a multi-step protocol for calculating the ∆∆𝐺)#*+ term. This term can be straightforwardly calculated at the DFT-continuum level of theory (which we compare to in the end), but here we introduce a multi-step QM/MM protocol for calculating it instead, necessitating: a) sufficient sampling of solvent molecules (here using QM/MM molecular dynamics as detailed in section 3.1) and b) a way of calculating the redox process itself.

Marcus theory represents the free energy curves of each state in a redox reaction (here showing the oxidation reaction, so the neutral → oxidized state) as diabatic quadratic curves with equal

ACS Paragon Plus Environment

6

Page 7 of 35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

curvature,29 shown in Figure 2. This representation is better the more similar the two geometries are.

Figure 2. Marcus free energy curves for reduced and oxidized states. 𝜆 is the solvent reorganization free energy, 〈𝑉𝐼𝐸 〉 and 〈𝑉𝐸𝐴〉 represent averaged values for the vertical ionization energy (VIE) and electron affinity (VEA), calculated as the difference in energy due to oxidation of the neutral and oxidized geometries, respectively, and 𝛥𝐺#$ is the adiabatic free energy change of the oxidation reaction.

The curves are quadratic because of the assumption that the solvent reacts linearly to solute changes. This is called the linear response approximation (LRA) and allows us to treat the oxidation event as a transition between these states with an average solvent change. We can define the free energy of oxidation as: @MN Δ𝐺#$ = 〈𝑉𝐼𝐸 〉 − 𝜆

(7)

@MN Δ𝐺#$ = 〈𝑉𝐸𝐴〉 + 𝜆

(8)

where 𝜆 is the solvent reorganization free energy, and 〈𝑉𝐼𝐸 〉 and 〈𝑉𝐸𝐴〉 represent averaged values for the vertical ionization energy (VIE) and electron affinity (VEA), calculated as the difference in energy due to oxidation of the neutral and oxidized geometries, respectively. These equations can be rewritten to yield the free energy of oxidation as the average of the ensemble average values of VIE and VEA: @MN Δ𝐺#$ =

〈𝑉𝐼𝐸 〉 + 〈𝑉𝐸𝐴〉 2

(9)

By combining Eqs. 6 and 9, the gas-solvent redox shift equation thus becomes: @@ ∆∆𝐺)#*+ ≈

〈𝑉𝐼𝐸 〉@@ + 〈𝑉𝐸𝐴〉@@ @@ − 𝐼𝐸'() 2

ACS Paragon Plus Environment

7

(10)

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 35

The use of the LRA greatly simplifies the calculation of the adiabatic redox energy from an MD simulation by relating it to the average vertical energy changes over the course of reduced and oxidized molecular dynamics trajectories. In this study we propose an efficient and accurate @@ calculation scheme for calculation of 〈𝑉𝐼𝐸 〉 and 〈𝑉𝐸𝐴〉 and hence ∆∆𝐺)#*+ (within the limits of ?@ the LRA) which can then be combined with an accurate estimate of 𝐼𝐸'() . The accuracy of the of 〈𝑉𝐼𝐸 〉 and 〈𝑉𝐸𝐴〉 terms depends on several factors: a) the representation of bulk solvent by a finite number of solvent molecules; b) the approximate polarization of the solute by the solvent, which can be divided up into short-range and long-range contributions; c) the QM level of theory used; and d) the amount of sampling employed, both the length of the MD trajectory and the number of snapshots used to compute the average VIE and VEA. To make these calculations both accurate and computationally feasible we account for the different contributions in a stepwise manner and using different levels of theory. Initially, we do inexpensive calculations of a finite system size over multiple MD trajectory snapshots that @@ mostly ignore bulk effects and solvent polarization (∆∆𝐺)#*+,S-S0S(* ). This is then followed by @@ separate calculations for computing a short-range polarization term (∆∆∆𝐺)#*+,TMUV#* ), a long@@ range polarization term (∆∆∆𝐺)#*+,@MUV#* ), and finally a term accounting for bulk solvation @@ @@ (∆∆∆𝐺)#*+,W/*X ). As will become clear, dividing up the ∆∆𝐺)#*+ (or VIE/VEA terms) into different contributions in this way allows one to apply different levels of theory as well as different number of snapshots for each term that makes the protocol computationally efficient. This scheme is outlined in Eq. 11 and is visualized in Figure 3. Section 3.2 discusses the levels of theory used to calculate each term. @@ @@ @@ @@ @@ ∆∆𝐺)#*+ ≈ ∆∆𝐺)#*+,S-S0S(* + ∆∆∆𝐺)#*+,W/*X + ∆∆∆𝐺)#*+,TMUV#* + ∆∆∆𝐺)#*+,@MUV#*

ACS Paragon Plus Environment

8

(11)

Page 9 of 35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

@@ Figure 3. Representation of regions used in snapshot calculations for ∆∆𝐺)#*+ estimation. The green oval @@ represents the ∆∆𝐺)#*+,S-S0S(* calculation (where only the solute is polarized by a limited number of MM @@ correction (where point charges via electrostatic QM/MM), the blue circle represents the ∆∆∆𝐺)#*+,TMUV#* the solute and solvent mutually polarize each other, quantum mechanically), the orange circle represents @@ correction (where an outer MM solvation shell and the inner QM solvation shell the ∆∆∆𝐺)#*+,@MUV#* @@ polarize each other via Drude-polarizable QM/MM), and the arrows represent the ∆∆∆𝐺)#*+,W/*X @@ correction (accounted for via extrapolation of ∆∆𝐺)#*+,S-S0S(* with multiple spheres of different sizes).

3. COMPUTATIONAL DETAILS 3.1. QM/MM MOLECULAR DYNAMICS Separate molecular dynamics trajectories at the semi-empirical QM/MM level of theory for the reduced and oxidized solute were carried out, from which snapshots for VIE/VEA calculations were taken. The QM/MM molecular dynamics simulations were carried out in a locally-modified version of Chemshell 3.7.30,31,32,33 A sphere of TIP3P34 water molecules had the solute inserted into its center, using neutral or oxidized gas phase solute geometries, and clashing water

ACS Paragon Plus Environment

9

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 35

molecules (generally ~4 water molecules for the solutes in this study) were removed. Water spheres of different sizes were used, as defined in Table 1. An outer frozen layer of ~3 Å (see corresponding number of water molecules in Table 1) was used to avoid evaporation and a spherical boundary potential of 3 hartree/Bohr2 was additionally used to avoid active water molecules making their way through the frozen layer (potential acting 1 Å into the frozen layer). Another short-range spherical potential of 3 hartree/Bohr2 acting 5 Å from the center, applied only to the solute atoms, was used to keep the solute approximately centered during the dynamics. A timestep of 1 fs was used to integrate Newton’s equations of motion using the leapfrog algorithm and a temperature of 300 K was maintained using a Nosé-Hoover 4-chain thermostat with a time constant of 0.02 ps.35,36,37 Separate neutral and oxidized (with an electron removed from the QM solute) QM/MM molecular dynamics simulations were run for 100 ps with 100 snapshots taken from the final 40 ps of each simulation, for a total of 200 snapshots to be used in the vertical single-point energy calculations. In each simulation, only the solute was treated quantum-mechanically (semiempirical QM) and all water molecules were treated using the rigid TIP3P water model using the TIP3P atom charges and Lennard-Jones site on oxygen. QM-MM electrostatic interactions were described by either mechanical or electrostatic embedding and the short-range QM-MM interactions by Lennard-Jones (LJ) potentials. The QM codes used were MNDO38 and xTB.39 The MM code used was DL_POLY40 as implemented in Chemshell.

Table 1. List of MM solvent sphere sizes used in this study in the MD simulations. Sphere radii, total number of water molecules in sphere, and number of frozen water molecules on the surface (~3 Å layer) are given. The total number of water molecules in each simulation varies slightly due to removal of water molecules when solute is inserted.

Name

Radius (Å)

# Total water molecules

# Frozen water molecules

I

16.2

509

302

II

19.0

866

328

III

32.8

4,618

931

IV

40.7

8,984

1,440

The size of the solvation system, the QM code used, and the electrostatic/LJ treatment vary in the different subsections of section 4. In sections 4.1-4.4, the QM method used for the QM/MM MD simulations was the semi-empirical method OM341 (within MNDO) with electrostatic embedding of the TIP3P point charges, and CHARMM LJ parameters42,43 were used for the phenol atoms.

ACS Paragon Plus Environment

10

Page 11 of 35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Section 4.1 uses only water sphere II (as defined in Table 1), while sections 4.2-4.4 use all four sphere sizes from Table 1 when demonstrating the effect of water sphere size for the various corrections in Eq. 11. Section 4.5 compares the accuracy of semi-empirical QM methods (GFN-xTB39 and PM344,45) and electrostatic treatments in the QM/MM MD simulations. The MD simulations for the results in this subsection were performed using OPLS-AA LJ potentials on the solute due to better availability of LJ parameters for organic molecules.46,47 No LJ potentials were used for polar hydrogens, and the r0 values for O and N were scaled by 0.97 for a better description of solutesolvent hydrogen bonding behavior. The spherical potential acting on the solute (mentioned above) was changed to apply 3 Å from the radius of the solute (defined as the radius of the smallest sphere that encloses all atoms when centered at the geometric centroid) in its starting position. This reflects different solute sizes. Water sphere III was used for all MD simulations in this section.

3.2. QM/MM SINGLE-POINT CALCULATIONS All single-point QM/MM calculations for calculating the redox energy were carried out using the same Chemshell version as in the MD simulations. The QM code was ORCA 4.048,49,50 and the MM code was DL_POLY. Drude-polarizable QM/MM was performed as implemented in Chemshell using the CHARMM Drude oscillator.51 The energy terms in Eq. 11 can be calculated at different levels of theory (different QM levels and/or different MM levels) in order to reduce the computational cost. The final protocol as discussed in section 4.5 is defined below while sections 4.1-4.4 discuss how we arrived at this specific protocol. The low-level (LL) QM theory discussed below and abbreviated as “ωB97X” consists of the range-separated hybrid functional ωB97X,26 the def2-SVP basis set,52 the RIJCOSX approximation for Coulomb and Exchange integrals,53,54 and the def2/J auxiliary basis set.55 •





@@ Z[\]^ ∆∆𝐺)#*+,S-S0S(* = ∆∆𝐺)#*+,𝑰𝑰𝑰 ; meaning the solute is treated with ωB97X and all water molecules of the III water sphere were treated as TIP3P point charges and employing electrostatic embedding. @@ Z[\]^ ∆∆∆𝐺)#*+,W/*X = ∆∆∆𝐺)#*+,W/*X ; this bulk correction term uses the same level of QM and MM theory as above but uses an extrapolation procedure based on multiple water spheres which is defined and discussed in section 4.2. @@ Z[\]^ ∆∆∆𝐺)#*+,TMUV#* = ∆∆∆𝐺)#*+,TMUV#* ; for the short-range polarization correction, the solute and neighboring water molecules within 5 Å are included in the ωB97X calculation; all other water molecules in the III sphere are treated with TIP3P point charges and with electrostatic embedding.

ACS Paragon Plus Environment

11

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60



@@ Z[\]^ ?_U`a ∆∆∆𝐺)#*+,@MUV#* = ∆∆∆𝐺)#*+,@MUV#* or ∆∆∆𝐺)#*+,@MUV#* ; for the long-range polarization @@ correction, the same QM region as in the ∆∆∆𝐺)#*+,TMUV#* term is used. As discussed in section 4.4 it is useful to use a cheaper QM level of theory here, such as the HF-3c method (HF with a minimal basis set and 3 corrections).56 Water molecules within 20 Å of the QM region are treated using the polarizable SWM4-DP model57 while all other water molecules are treated with TIP3P point charges.

Z[\]^ @@ This protocol above defines the ∆∆𝐺)#*+ term which is one way of describing the ∆∆𝐺)#*+ Z[\]^ @@ term of Eq. 11. Following the calculation of ∆∆𝐺)#*+ , i.e. our definition of ∆∆𝐺)#*+ , we can ?@ combine it with 𝐼𝐸'() as per Eq. 4 to yield an approximate adiabatic free energy of oxidation ?@ from which the oxidation potential is derived according to Eq. 3. The 𝐼𝐸'() here is calculated in ORCA 4.0 using the DLPNO-CCSD(T) method using a two-point extrapolation of def2-SVP and def2-TZVP HF and correlation energies to the basis set limit as implemented in ORCA.58

4. RESULTS AND DISCUSSION @@ In the following sections we demonstrate the effectiveness of approximating ∆∆𝐺)#*+ term according to Eq. 11. MD simulations were carried out at the OM3/MM level of theory in sections 4.1-4.4 while we opted for the GFN-xTB semi-empirical method in section 4.5 as discussed later. For single-point QM/MM calculations, ωB97X/def2-SVP was used for the QM region and TIP3P for the MM level of theory in sections 4.1-4.4 unless otherwise specified. The gassolution redox shift ∆∆𝐺)#*+ of phenol is taken as an illustrative example in sections 4.1-4.4 while the final protocol is evaluated on a larger test set in section 4.5.

4.1. MOLECULAR DYNAMICS SAMPLING @@ Our protocol relies on calculating the ∆∆𝐺)#*+ term from an appropriate number of snapshots from long enough molecular dynamics trajectories (one for each charge state). The MD simulation length was explored by carrying out two 1 ns simulations (OM3/TIP3P level of Z[\]^ was calculated according theory) for the neutral and oxidized state of phenol and then ∆∆𝐺)#*+,𝑰𝑰 to Eq. 10 for the II water sphere, where only electrostatic QM/MM single-point energies were calculated (no correction for short/long polarization or bulk effects was performed). Z[\]^ Figure 4 shows calculated ∆∆𝐺)#*+,𝑰𝑰 values using snapshots taken from different sections of the 1 ns simulation of phenol in the II system. Each column corresponds to 100 snapshots (from each neutral and oxidized trajectory) taken from the indicated time interval (during the last 40 ps) except for the last column that consists of the combined 1000 snapshots taken from the whole 1 ns simulation. The variance in the data across the whole timescale is only ~0.1 eV, with the 0100 ps value being only ~0.05 eV off from the aggregate 0-1000 ps value. This data suggests 100

ACS Paragon Plus Environment

12

Page 12 of 35

Page 13 of 35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

@@ ps MD sampling to be appropriate for calculating the ∆∆𝐺)#*+ term, at least for solutes similar to phenol.

Z[\]^ Figure 4. Comparison of ∆∆𝐺)#*+,𝑰𝑰 values for various 100 ps sampling periods in a 1 ns simulation of phenol. 100 snapshots were taken from the last 40 ps of each 100 ps interval, e.g. from 60-100 ps, 160200 ps, etc. Results indicate that a 0-100 ps simulation provides sufficient sampling for an accurate Z[\]^ ∆∆𝐺)#*+,𝑰𝑰 value relative to a full 1 ns sample.

Z[\]^ The number of snapshots used for calculating ∆∆𝐺)#*+,𝑰𝑰 was also explored for the 0-100 ps Z[\]^ simulation. The running average of ∆∆𝐺)#*+,𝑰𝑰 as a function of number of snapshots is shown in Z[\]^ Figure 5. The results demonstrate that ∆∆𝐺)#*+,𝑰𝑰 shows convergence to ~0.02 eV after approximately 50-60 snapshots. This result is in good agreement with the results of Ghosh et al.19

These results generally indicate that choosing 50-60 snapshots from 100 ps MD simulations Z[\]^ should be sufficiently accurate (to ~0.02 eV) for calculating ∆∆𝐺)#*+ as demonstrated for our example of phenol. Other larger or more flexible molecules may require more snapshots and perhaps longer simulation times as well. In our protocol we have for now settled on the use of 100 snapshots taken from 100 ps MD simulations as a balance between computational cost and precision. These results, however, also demonstrate a severe error present in ∆∆𝐺)#*+ when approximated Z[\]^ as ∆∆𝐺)#*+,𝑰𝑰 alone. The experimental ∆∆𝐺)#*+ for oxidation of phenol is in a range of -2.67 to 2.73 eV (based on a 1.50-1.56 V potential experimental range of phenol and a gas phase IE of

ACS Paragon Plus Environment

13

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Z[\]^ 8.51 eV; see Table S2 for more details) but, as Figure 4 and Figure 5 show, ∆∆𝐺)#*+,𝑰𝑰 is approximately -1.5 eV, an error of more than 1 eV. As we will show, the reason for this strong disagreement is due to the lack of accounting for solvent polarization and bulk effects. The next sections discuss the additional terms of Eq. 11 that account for these effects.

Z[\]^ Figure 5. Running average of ∆∆𝐺)#*+,𝑰𝑰 (eV) for the 100 snapshots taken during the 60-100 ps interval of a 100 ps MD simulation of phenol.

4.2. BULK CORRECTION A drawback of spherical QM/MM or droplet models as opposed to periodic QM/MM models is the potential lack of bulk solvent effects since only a limited number of water molecules are present. This issue has been discussed in a recent paper by Thiel and coworkers59 where a comparison of periodic QM/MM models was made. Finite spherical models were found to compare favorably with periodic QM/MM models for potential energy and free energy profiles. Bravaya and coworkers,20 on the other hand, demonstrated the presence of substantial bulk effects when computing redox potentials of a few organic molecules (including phenol) using the spherical QM/MM approach. The authors explored extrapolation of energies obtained from different sized spheres but ended up recommended a 1-point extrapolation procedure based on a Born solvation equation. In this section we analyze the bulk effect on the redox potential for phenol. QM/MM MD simulations using different sized spheres (as defined in Table 1) were performed, Z[\]^ snapshots were extracted from them, and ∆∆𝐺)#*+ calculated as before. As Figure 6

ACS Paragon Plus Environment

14

Page 14 of 35

Page 15 of 35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Z[\]^ Z[\]^ shows, ∆∆𝐺)#*+,𝑰𝑰 gives a value of -1.50 eV, a smaller sphere gives ∆∆𝐺)#*+,𝑰 = -1.35 eV, a Z[\]^ Z[\]^ larger sphere ∆∆𝐺)#*+,𝑰𝑰𝑰 gives -1.70 eV and ∆∆𝐺)#*+,𝑰𝑽 gives a value of -1.73 eV. When these energies are plotted vs. 1/R, where R is the radius of the water sphere, we get a linear relationship. The figure demonstrates that there is a substantial bulk effect present; the difference between the bulk limit (linear extrapolation) and the III sphere is 0.3 eV, while it is 0.5 eV for the II sphere. This substantial bulk effect is in line with the results of Bravaya and coworkers.

Figure 6. The effect of different sized water spheres on the calculated gas-solution redox shift, Z[\]^ Inverse-radius linear extrapolation of ∆∆𝐺)#*+ energies (100 snapshots) from different-sized Z[\]^ MD simulations (100 ps) gives the bulk limit energy ∆∆𝐺)#*+,W/*X . Extrapolation to the bulk limit here

Z[\]^ ∆∆𝐺)#*+ .

Z[\]^ , of -0.30 eV, for the reference sphere III (used in the final results in a bulk correction, ∆∆∆𝐺)#*+,W/*X protocol).

cd\]e , we have opted for To conveniently account for this bulk effect in our protocol, ∆∆∆𝐺)#*+,W/*X extrapolation to the bulk limit using different sized spheres but instead of performing multiple MD simulations with different sized spheres we only perform MD simulations on a single sphere, the III system. From the snapshots extracted from the simulations using III, we then cut the model down to 7 smaller sphere sizes (28-10 Å, in 3 Å steps) and perform the single-point calculations on them and use 8-point extrapolation of the energies to get the bulk limit energy. cd\]e The bulk correction, ∆∆∆𝐺)#*+,W/*X , is then the difference between the bulk limit energy cd\]e Z[\]^ ) and the energy of the original water sphere, here ∆∆𝐺)#*+,𝑰𝑰𝑰 . (∆∆𝐺)#*+,W/*X

ACS Paragon Plus Environment

15

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

4.3. SHORT-RANGE QM POLARIZATION Z[\]^ In the previous calculations the ∆∆𝐺)#*+ term is described via electrostatic embedding singlepoint QM/MM where the solute is described quantum mechanically but all solvent molecules described via point charges as defined in the TIP3P water force field. While electrostatic embedding in this way accounts for some polarization of the solute, this will only be a crude estimate of the real polarization effect of a nearby water molecule and there is complete neglect of solvent polarization. Figures S1-S4 in the supporting information demonstrate the effect of different MM water models on the vertical ionization energy of phenol (described at the ωB97X/aug-cc-pVQZ level) when surrounded by 1 or 2 water molecules. Polarizable MM water models are found to be an improvement over non-polarizable MM water models for these hydrogen-bonding interactions but not as reliable as describing the water molecule quantum mechanically by a reduced basis set like def2-SVP (minimal basis sets such as MINIX or STO3G do not give good results, however). To accurately account for mutual solute-solvent polarization (as well as solvent-solvent polarization) we opted to describe the short-range polarization effects by QM-region expansion where nearby solvent molecules, like the solute, are described by the ωB97X functional and the def2-SVP basis set. In this context we note that redox potentials that involve anionic species may require diffuse basis sets and basis set dependency will be explored in more detail in a future study. The QM-region expansion was only performed at the single-point level, not during the dynamics as that would require an adaptive QM/MM approach due to QM and MM solvent molecule exchange. Different sized QM regions were defined by expanding the QM region by including all water molecules that had any atom within X Å of a solute atom. Using phenol again as an example, we show different sized QM regions in Figure 7.

ACS Paragon Plus Environment

16

Page 16 of 35

Page 17 of 35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Figure 7. Representative images of QM region sizes around phenol. “QM X Å” refers to a QM region including all solvent molecules which have an atom within X Å of a solute atom for each snapshot. Atom numbers represent approximate averages across trajectories and system sizes.

Z[\]^ is shown in Figure 8 where different sized The effect of QM-region expansion on ∆∆𝐺)#*+ solvent spheres have also been used: there is a noticeable increase in the magnitude of the Z[\]^ ∆∆𝐺)#*+ as we more accurately describe the interactions between the solute and solvent. As we get to the QM3-QM4 region, a level of convergence is obtained and there is only a small difference between QM4 and QM6 values, despite more than doubling the number of QM atoms. To strike a balance between computational accuracy and cost, in the final protocol we opted for a QM-region Z[\]^ expansion of “QM 5” and define a short-range polarization correction, ∆∆∆𝐺)#*+,TMUV#* , which is defined as the difference between “QM 0” value and the “QM 5” value for sphere size III. Z[\]^ For our example of phenol, a correction of ∆∆∆𝐺)#*+,TMUV#* = -0.37 eV was obtained, representing the increase in the solvent shift in going from a complete point-charge representation of the solvent to accounting for short-range (5 Å) solvent polarization effects. This correction is only calculated for a single sphere size (III) in the final protocol, which is justified as Figure 8 demonstrates almost

ACS Paragon Plus Environment

17

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

equal slopes for different QM-region series. This contribution should account for most of the important solvent polarization effects that are present, leaving aside long-range polarization effects discussed in the next section.

Z[\]^ Figure 8. Effect of increasing QM region on bulk solvation effect extrapolation on ∆∆𝐺)#*+ . The Z[\]^ displayed ∆∆∆𝐺)#*+,TMUV#* is calculated using the III radius for convenience, and shows a -0.37 eV change in the gas-solvent redox shift when including waters up to 5 Å from the solute molecule.

We note in this context that use of the ωB97X functional in these calculations was important for reliable behavior. The range-separated hybrid functional ωB97X includes 100% HF exchange in the long-range and is thus free from long-range self-interaction error that we believe is crucial for DFT calculations of redox properties on large QM systems. When attempting to use GGA functionals in an effort to reduce computational cost, we saw no clear indication of convergence where the VIE was calculated for a snapshot with increasing QM region, as shown in Figure S5; ωB97X and HF, however, showed clear signs of convergence and were close in energy to DLPNOCCSD(T) results. We interpret this as being due to long-range self-interaction error, effectively suggesting that non-hybrid functionals are of limited use in computational redox chemistry and that only QM methods that exhibit approximately correct long-range behavior should be used (this would include post-HF methods, range-separated hybrid functionals and the HF method). The QM-region expansion strategy used to calculate the results in Figure 8 would be prohibitively expensive for a general protocol, even when performed for a single water sphere, due to the current choice of averaging over 100 snapshots per trajectory resulting in a total of 400 single-point energy calculations with large QM regions (200 of which would be expensive unrestricted hybrid-DFT calculations). As we anticipate short-range polarization to be a fairly systematic effect, we have chosen to restrict the QM-region expansion calculations to only a limited number of snapshots (5 was chosen here). Instead of a random choice of these snapshots, the selection of these snapshots

ACS Paragon Plus Environment

18

Page 18 of 35

Page 19 of 35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

can be made by making use of the complete snapshot data available at the “QM 0” level of theory. We can then choose snapshots that are closest to the average of the full number of snapshots, i.e. snapshots that are representative of the average instead of random snapshots which may not be. As Figure 9 demonstrates, this is an approach that is justifiable based on the data obtained for phenol, and Table S1 in the supporting information gives further data that justifies this approach.

Figure 9. Representation of the strategy for choosing snapshots for computing the expensive QM-region Z[\]^ Z[\]^ expansion contribution, i.e. ∆∆∆𝐺)#*+,TMUV#* . The blue lines show the ∆∆𝐺)#*+ term evaluated for each individual snapshot, while the yellow line shows the running average. Upper plot shows the data evaluated with a QM region of only the solute, QM 0, while the bottom plot shows the data calculated Z[\]^ with the QM 5 region (see Figure 7). The 5 snapshots with ∆∆𝐺)#*+ values closest to the average over Z[\]^ all 100 snapshots with the QM 0 region (black line) can be used to estimate the full average ∆∆𝐺)#*+ value with the QM 5 region; this estimated average is shown in the bottom with the red line and as shown, lies close to the true total average of all 100 snapshots at the QM 5 level with an error of only 0.023 eV.

ACS Paragon Plus Environment

19

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The representative-snapshot or biased-sampling approach used here shows considerable promise and its accuracy will be investigated more in future work. The number of snapshots used (5) is currently an arbitrary choice that will be further investigated in future studies. We note that recent studies have employed similar biased-sampling approaches to cut down on the number of snapshots needed to calculate the final property from MD simulations.21,60 In refs. 21 and 60 a descriptor, such as the density or the classical interaction energy, was first used to calculate the average for a large number of snapshots from which snapshots for the final property evaluation were selected. In our protocol the property is the same in both cases, but a low level of theory (solute-solvent interactions described as point charges via QM/MM electrostatic embedding) is used to calculate the average and to find suitable snapshots for calculating at the higher-level of theory (where short-range solute-solvent interactions are calculated by QM).

4.4. LONG-RANGE MM POLARIZATION While increasing QM-region expansion should account for all the solute-solvent and solventsolvent polarization effects present, this becomes prohibitively expensive even with our representative-snapshot strategy. While Figure 8 demonstrated a level of convergence as we approached QM regions of 4-6 Å (solvent molecule included if within 4-6 Å of any solute atom) we anticipated a longer-range polarization effect present that could not be realistically captured by QM-region expansion. We investigated this long-range polarization effect via the use of polarizable MM forcefields. Drude-polarizable models have been successfully used to model bulk liquid effects61 and are being extensively developed as the next generation of forcefields for liquid and biomolecular simulations. Our investigation of phenol interacting with 1-2 water molecules (Figures S1-4) demonstrated that Drude polarizable water forcefields are considerably more accurate than nonpolarizable water forcefields like TIP3P for the calculation of vertical ionization energies, at least for short-range polarization effects and give similar accuracy to the effective fragment potential method that has been used in recent explicit solvation studies. To capture long-range polarization effects on the gas-solution redox shift, we first define the QM-region as the same QM-region used to capture the short-range polarization effects (QM 5 is used here) and then define a polarizable MM region that surrounds the QM region. The size of the polarizable MM region was systematically explored for the phenol system (see Figure S6) and a 20 Å region showed convergence in our calculations. We note that allowing all MM waters to be polarizable resulted in polarization artifacts for water molecules at the surface of the water sphere. Other water molecules were described by TIP3P point charges as before. The polarizable MM model used was the SWM4-DP model and we used the polarizable QM/MM implementation in Chemshell. Polarizable QM/MM ensures mutual polarization of QM and MM regions via the self-consistent field method. Initially the full MM environment polarizes the QMregion and by calculating the QM forces acting on the Drude particles, their positions change. This position change results in a change of polarization of the QM region and this loop is iterated until convergence is achieved (we chose 0.0005 hartree as the convergence threshold). By

ACS Paragon Plus Environment

20

Page 20 of 35

Page 21 of 35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

computing the VIE/VEA after Drude-particle convergence and subtracting the VIE/VEA obtained before the inclusion of Drude particles, we define our long-range polarization Z[\]^ correction ∆∆∆𝐺)#*+,@MUV#* . Figure 10 shows the effect of the addition of SWM4-DP (shown in orange) compared to a full TIP3P treatment (shown in yellow) at a QM region of 5 Å, yielding a long-range polarization Z[\]^ of -0.20 eV. The effect here is smaller than that of the short-range contribution of ∆∆∆𝐺)#*+,@MUV#* polarization but is still substantial.

Figure 10. Bulk-solvation extrapolation for SWM4-DP calculations with a QM region of 5 Å. Addition Z[\]^ of SWM4-DP polarization causes a -0.20 eV change to ∆∆𝐺)#*+ .

Because our setup relies on the use of a large QM region and the computation of expensive QM forces multiple times until convergence of the Drude particle positions is reached, we explored the use of a cheaper QM method for these calculations in the final protocol and settled on the use of Grimme’s HF-3c method (Hartree-Fock method with the minimal MINIX basis set) that was found to behave almost identically to ωB97X as shown in Figure S6.

4.5. FINAL RESULTS AND BENCHMARKING

ACS Paragon Plus Environment

21

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Having now fully defined Eq. 11 and justified its components and the calculation details for the example of phenol, we can now calculate the full gas-solution redox shift of phenol as: Z[\]^ Z[\]^ Z[\]^ Z[\]^ Z[\]^ ∆∆𝐺)#*+ = ∆∆𝐺)#*+,𝑰𝑰𝑰 + ∆∆∆𝐺)#*+,W/*X + ∆∆∆𝐺)#*+,TMUV#* + ∆∆∆𝐺)#*+,@MUV#* = -1.70 + (-0.30) + (-0.37) + (-0.20) = -2.57 eV. .$V This value can be compared to the experimental range of ∆∆𝐺)#*+ ≈ -2.67 to -2.73 eV (see Table S2 for details about the reference data), demonstrating good agreement with the experimental range. Our protocol is based on the linear response approximation (LRA) and we note that previously Arey and coworkers and Bravaya and coworkers have computed a correction18,20 for non-LRA effects on the redox potential. Such calculations are based on classical forcefield free energy simulations and, while it is unclear to us how reliable such a correction is, it has been estimated to be approximately -0.1 eV for phenol. Adding this -0.1 eV correction to our calculated value results in -2.67 eV which is now within the experimental range. This level of agreement between theory and experiment (with or without the non-LRA correction) for such a complicated multi-step protocol is very satisfying and we note that continuum solvation models result in considerably worse redox shifts of -2.2 eV (CPCM) and -2.3 eV (SMD) for phenol. All the steps in our elaborate protocol have been coded so that the whole protocol can be performed automatically in a parallelized fashion for essentially any molecule. To demonstrate that our protocol is generally valid (not just for phenol) we have calculated a larger test set of molecules, the set of organic molecules previously investigated by Pantazis and coworkers, shown in Figure 11.13 Their work supplied computational CCSD(T)/CBS references for gas phase ionization energies and listed experimental references for both gas phase IEs and aqueous reduction potentials. The test set provides a convenient way of benchmarking our protocol and comparing to the results from continuum solvation protocols. The experimental reference values used in the present study were taken from the study by Pantazis (shown in Tables 3 and S2) with a few exceptions as detailed in Table S2.

ACS Paragon Plus Environment

22

Page 22 of 35

Page 23 of 35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

6.1. MOLECULES IN THE TEST SET

aniline

dimethylsulfide

p-cyanophenol

p-methylphenol

anisole

diethylamine

indole

N-methylaniline

p-methoxyanline

phenol

dimethylamine

p-chloroaniline

p-methoxyphenol

pyrrolidine

piperidine

dimethyldisulfide

p-chlorophenol

p-methylaniline

thioanisole

13 List of molecules molecules used test the final accuracy. for which experimental IEs and Figure 11. The testFigure set of46. organic bytoPantazis andprotocol coworkers aqueous redox potential data were available (Table S2). The reference data for 𝐼𝐸 and 𝐸 for all molecules are obtained from Tables S4 and S19, respectively, from Pantazis et al. and are reported in Table 16, with the exception of the 𝐸 value for p-chlorophenol as discussed later.

Because our protocol mainly uses ωB97X for the calculation of the explicit ∆∆𝐺)#*+ term, and Table 16. Reference values for 𝐼𝐸 , ΔΔ𝐺𝑠𝑜𝑙𝑣 , and 𝐸 for test set molecules. 𝐼𝐸 values come from thisCCSD(T)/cc-pVTZ functional was not investigated by Pantazis and coworkers, we have recalculated calculations with LPNO-CCSD[Q/5] extrapolation. 𝐸 values come from experiment and the implicit values are derived using the reference 𝐼𝐸 and 𝐸 values with respect to the SHE (4.28 V) ΔΔ𝐺 solvation ∆∆𝐺)#*+ terms with this functional. This allows for a more direct comparison between according to Equation 6.4. explicit and implicit solvation. The continuum solvation models tested were CPCM62 and SMD,4 both implemented 4.0. Molecule in ORCA𝑰𝑬 ΔΔ𝑮𝒔𝒐𝒍𝒗 (eV) 𝑬𝟎 vs. SHE (V) 𝒈𝒂𝒔 (eV) aniline

7.76

-2.46

1.02

The workanisole in the previous sections code and CHARMM as 8.28was conducted with -2.38OM3 as the QM 1.62 8.00 for the MD simulations. -2.36 1.36 the Lennard-Jones As the OM3 semi-empirical diethylamineparameter source 8.39 -2.84 1.27 dimethylamine method is not available for many elements (including sulfur which some molecules in the test set 8.10 was always going -2.14to be necessary. 1.68 dimethyldisulfide contain) a more general replacement Recently, Grimme and 8.72 -3.05 1.39 dimethylsulfide coworkersindole introduced GFN-xTB, a semi-empirical -2.47 tight-binding method1.08 that includes a specific 7.83 globalN-methylaniline parameterization for geometries, frequencies,-2.24 and noncovalent interactions for essentially 7.47 0.95 39 the whole periodic table (elements 7.781-86). An attractive -2.82 feature of GFN-xTB 0.68 is the lack of pairp-chloroaniline 8.46 -2.74 1.44 p-chlorophenol specific potential parameters (common in tight-binding methods) making the method useful as a 9.08 -3.09 1.71 p-cyanophenol general black-box method for fast approximate QM calculations. Additionally, the use of a restricted open-shell approach for open-shell systems makes it a fast option for the open-shell cations that are encountered here. Grimme and coworkers demonstrated in particular that geometries of molecules are described quite accurately by GFN-xTB. 75

ACS Paragon Plus Environment

23

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 35

For the test set, we compared GFN-xTB as the QM component in the QM/MM MD simulations (with either electrostatic or mechanical embedding). Additionally, the semi-empirical PM3 method with electrostatic embedding was compared as it (unlike OM3) has parameters available for all elements in the test set. The Lennard-Jones parameters to describe the solute-solvent interactions were changed to OPLS-AA due to a better availability of parameters for organic molecules. The CHARMM and OPLS-AA parameters are generally found to be similar for molecules where mutual parameters were available. The accuracy of Lennard-Jones parameters (or other short-range potentials) and suitability with respect to the QM-MM electrostatic embedding was not investigated much in this study but will be an important feature to study further. Encouragingly, a previous study by Cui and coworkers suggests that redox potentials are relatively resilient with regards to the Lennard-Jones potentials used.63 Additionally, as discussed @@ in sections 3.2 and 4.4, the QM method used for the ∆∆∆𝐺)#*+,@MUV#* correction was changed to HF-3c as implemented in ORCA 4.0 due to considerable time savings with little loss in accuracy. The results of the protocol for gas-solvent redox shifts and redox potentials for the test set with the 3 different QM/MM MD protocols are summarized in Table 2 and compared to results of continuum solvation models, while Table 3 gives the values for each of the components in Eq. 11 for the entire test set and the experimental reference data.

Z[\]^ Table 2. Comparison of our ∆∆𝐺)#*+ protocol using 3 different MD protocols: GFN-xTB and PM3 with electrostatic embedding (elstat), and GFN-xTB with mechanical embedding (mech) with CPCM (ωB97X) and SMD (ωB97X) solvation models for the test set by Pantazis et al. Shown are mean absolute errors (MAE), mean errors (ME) and maximum errors (MaxE) compared to experiment for both redox potentials (V) and gas-solvent redox shifts ∆∆𝐺)#*+ (eV). See Tables 3 and S2 for reference data. For redox potentials, the ∆∆𝐺)#*+ energies were combined with gas phase IEs from DLPNO-CCSD(T) calculations (except for implicit solvation data).

a

Solvation model

𝑬𝟎 MAE (ME) (V)

𝑬𝟎 MaxE (species) (V)

∆∆𝑮𝝎𝑩𝟗𝟕𝑿 MAE 𝒔𝒐𝒍𝒗 (ME) (eV)

∆∆𝑮𝝎𝑩𝟗𝟕𝑿 MaxE 𝒔𝒐𝒍𝒗 (species) (eV)

CPCM

0.19 (-0.03)a

0.90 (phenol)

0.26 (0.19)

0.64 (phenol)

SMD

0.17 (-0.03)a

0.38 (pcyanophenol)

0.21 (0.19)

0.63 (pcyanophenol)

QM/MM MD: GFN-xTBelstat

0.15 (0.04)b

0.48 (pcyanophenol)

0.13 (0.03)

0.47 (pcyanophenol)

QM/MM MD: PM3elstat

0.19 (0.15)b

0.39 (dimethyldisulfide)

0.18 (0.14)

0.33 (dimethyldisulfide)

QM/MM MD: GFN-xTBmech

0.40 (-0.38)b

0.65 (pmethoxyaniline)

0.41 (-0.39)

0.69 (pmethoxyaniline)

Gas phase IE at the ωB97X/def2-SVP level.

ACS Paragon Plus Environment

24

Page 25 of 35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

b

Gas phase IE at the DLPNO-CCSD(T)/CBS level (def2-SVP/def2-TZVP extrapolation).

The results in Table 2 reveal that the explicit solvation protocol with GFN-xTB QM method and electrostatic embedding during the QM/MM MD simulation results in the lowest mean absolute Z[\]^ error for redox potentials (0.15 V) and the gas-solvent redox shift ∆∆𝐺)#*+ (0.13 V). We note that the experimental gas-solvent redox shift is derived using the theoretical gas phase reference IE values from Pantazis and coworkers. This is done for consistency purposes as the theoretical values should be very accurate and, as the calculations are performed identically, they are arguably more consistent gas phase references than experimental ones. Using the experimental Z[\]^ gas phase IEs instead changes the results very slightly (the MAE for ∆∆𝐺)#*+ goes from 0.133 V to 0.138 V). Z[\]^ demonstrating that the The error in redox potential is only slightly higher compared to ∆∆𝐺)#*+ Z[\]^ main source of redox potential error is due to the ∆∆𝐺)#*+ term. When comparing the Z[\]^ ∆∆𝐺)#*+ error of the explicit protocol to the implicit models, a noticeable improvement in MAE is evident and interestingly the explicit protocol appears to be mostly free from systematic errors, having an ME of only 0.03 eV, compared to considerable systematic overestimation of the implicit models. The maximum error is also generally lower for the explicit protocol than for the implicit models. These results indicate that our explicit protocol is mostly free from systematic errors for this test set, suggesting that we are accounting for most of the major Z[\]^ . When comparing errors in redox potentials in Table 2 it should be contributions to ∆∆𝐺)#*+ kept in mind that for the implicit models we deliberately calculated the gas phase contribution at the same level of theory as the solvation term while for the explicit protocol we employ DLPNOCCSD(T)/CBS for the gas phase contribution. This is done to highlight the fact that redox potential errors for implicit models appear to benefit from favorable error cancellations between the solvation error and the gas phase error, because, as shown in Table 1, the redox potential Z[\]^ errors for CPCM and SMD are smaller than the errors in ∆∆𝐺)#*+ and without systematic errors. Such favorable error cancellations are unlikely to hold true generally and must be kept in mind when comparing explicit solvation and implicit solvation protocols.

Table 2 also shows the results when employing a slightly different QM/MM MD protocol, PM3elstat. Here the NDDO-based PM3 semi-empirical model is employed with electrostatic embedding and this shows slightly worse results. As GFN-xTB is a semi-empirical method designed for accurate molecular geometries, and was found to give better structural parameters than other NDDO-based methods, we suspect that this quality translates into slightly better gassolvent redox shifts. Furthermore, seeing as GFN-xTB is available for a large portion of the periodic table (and was even found to work well for transition metal complex geometries) this bodes well for the general applicability of our protocol to more complex solutes. We additionally performed MD simulations with GFN-xTB where mechanical embedding was used instead of electrostatic embedding. In mechanical embedding, the QM-MM electrostatic coupling is

ACS Paragon Plus Environment

25

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 35

handled at the classical level via point charges for the QM atoms. This offers some computational advantages for the QM/MM MD simulations, but unfortunately the accuracy is Z[\]^ adversely affected with the mean absolute error in ∆∆𝐺)#*+ rising to 0.41 eV and the max error becoming 0.69 eV. This indicates that electrostatic embedding of solvent charges is crucial for accurate behavior of solute-solvent interactions during the course of the MD simulation, likely due to a more accurate hydrogen-bonding structure during the MD simulation, and additionally suggests that replacing the QM/MM MD by cheaper MM MD will not provide the accuracy necessary. Z[\]^ Table 3 shows the value for each term in ∆∆𝐺)#*+ as well as the high-level 𝐼𝐸'() energy and the final redox potential. As shown, generally the largest additional term that is added to the Z[\]^ main ∆∆𝐺)#*+,𝑰𝑰𝑰 term (electrostatic embedding only for sphere III) is the short-range Z[\]^ , calculated via QM-region expansion around the solute, polarization term, ∆∆∆𝐺)#*+,TMUV#* approximately -0.35 eV for the test set. This is followed by the bulk solvation term, Z[\]^ ∆∆∆𝐺)#*+,W/*X , estimated via extrapolation of smaller-sized water spheres, approximately -0.20 ?_U`a eV, and lastly the long-range polarization term ∆∆∆𝐺)#*+,@MUV#* , calculated via polarizable QM/MM via Drude-polarizable MM water models, is approximately -0.08 eV. These terms are all fairly large in size and accounting for them all in some way is likely crucial to an accurate explicit solvation model, at least when dealing with redox properties. The standard deviation of these terms are 0.06, 0.05, and 0.02 eV, for short-range, bulk, and long-range contributions, respectively, demonstrating that these corrections are relatively consistent across all species investigated here and suggesting that a simplified protocol with empirical corrections could be successful. Finally, we note that Table 3 shows the results of the new protocol for phenol with Z[\]^ value of -2.67 eV slightly different values for the different contributions and a new ∆∆𝐺)#*+ compared to the previous value of -2.57 eV. The difference stems mainly from the different QM/MM MD protocol and a slightly different extrapolation procedure.

Z[\]^ Z[\]^ Table 3. Calculated values for the ∆∆𝐺)#*+,𝑰𝑰𝑰 term and the correction terms ∆∆∆𝐺)#*+,W/*X , Z[\]^ ?_U`a ∆∆∆𝐺)#*+,TMUV#* , and ∆∆∆𝐺)#*+,@MUV#* for each species investigated in the test set in Figure 11, as well as r@stuUvv , the final calculated oxidation potential 𝐸 9,a(*a , values for the gas phase ionization energy 𝐼𝐸'()

.$V and the reference experimental 𝐸 9,.$V and derived ∆∆𝐺)#*+ . See Table S2 for details about the reference data. All values in eV. 𝒆𝒙𝒑

∆∆𝑮𝝎𝑩𝟗𝟕𝑿 𝒔𝒐𝒍𝒗,𝑰𝑰𝑰

∆∆∆𝑮𝝎𝑩𝟗𝟕𝑿 𝒔𝒐𝒍𝒗,𝒃𝒖𝒍𝒌

∆∆∆𝑮𝝎𝑩𝟗𝟕𝑿 𝒔𝒐𝒍𝒗,𝑺𝑹U𝒑𝒐𝒍

∆∆∆𝑮𝑯𝑭U𝟑𝒄 𝒔𝒐𝒍𝒗,𝑳𝑹U𝒑𝒐𝒍

∆∆𝑮𝝎𝑩𝟗𝟕𝑿 𝒔𝒐𝒍𝒗

𝑰𝑬𝑫𝑳𝑷𝑵𝑶U𝑪𝑪 𝒈𝒂𝒔

𝑬𝟎,𝒄𝒂𝒍𝒄

∆∆𝑮𝒔𝒐𝒍𝒗

𝑬𝟎,𝒆𝒙𝒑

aniline

-1.93

-0.26

-0.42

-0.06

-2.68

7.74

0.78

-2.46

1.02

anisole

-1.68

-0.18

-0.27

-0.09

-2.21

8.30

1.81

-2.38

1.62

diethylamine

-1.77

-0.18

-0.44

-0.05

-2.44

8.02

1.30

-2.36

1.36

dimethylamine

-1.96

-0.22

-0.46

-0.11

-2.75

8.38

1.35

-2.84

1.27

ACS Paragon Plus Environment

26

Page 27 of 35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

dimethyldisulfide

-1.74

-0.31

-0.30

-0.06

-2.41

8.15

1.47

-2.43

1.39

dimethylsulfide

-2.17

-0.20

-0.35

-0.04

-2.75

8.77

1.74

-2.78

1.66

indole

-1.83

-0.18

-0.33

-0.08

-2.41

7.82

1.13

-2.31

1.24

N-methylaniline

-1.79

-0.25

-0.38

-0.07

-2.48

7.45

0.69

-2.24

0.95

p-chloroaniline

-1.85

-0.14

-0.35

-0.09

-2.42

7.77

1.07

-2.49

1.01

p-chlorophenol

-1.91

-0.20

-0.31

-0.08

-2.49

8.51

1.74

-2.73

1.45

p-cyanophenol

-1.95

-0.20

-0.37

-0.10

-2.62

9.09

2.19

-3.09

1.71

p-methoxyaniline

-1.59

-0.17

-0.28

-0.09

-2.12

7.16

0.75

-2.05

0.79

p-methoxyphenol

-1.56

-0.17

-0.23

-0.12

-2.08

7.75

1.39

-2.19

1.23

p-methylaniline

-1.82

-0.15

-0.37

-0.09

-2.43

7.48

0.77

-2.29

0.92

p-methylphenol

-1.81

-0.14

-0.37

-0.10

-2.42

8.22

1.52

-2.54

1.38

phenol

-1.93

-0.26

-0.37

-0.10

-2.67

8.55

1.60

-2.73

1.53

piperidine

-1.71

-0.19

-0.39

-0.06

-2.35

8.10

1.47

-2.45

1.34

pyrrolidine

-1.77

-0.19

-0.39

-0.13

-2.48

8.08

1.32

-2.52

1.26

thioanisole

-1.77

-0.26

-0.27

-0.09

-2.38

7.98

1.32

-2.25

1.45

Z[\]^ Figure 12 plots the calculated ∆∆𝐺)#*+ using the GFN-xTBelstat QM/MM MD protocol versus the experimental ∆∆𝐺)#*+ for all molecules in the test set when including some or all of the contributions from short-range and long-range polarization and bulk effects. This figure Z[\]^ demonstrates nicely how we get systematic improvements to ∆∆𝐺)#*+ via the correction terms Z[\]^ as we get closer to the ideal diagonal line. The full ∆∆𝐺)#*+ term results in points close to the ideal line, showing generally little systematic error, although for systems with large ∆∆𝐺)#*+ there may be some systematic underestimation error present. The system with the largest experimental ∆∆𝐺)#*+ term (-3.09 eV) and experimental redox potential (1.71 V), p-cyanophenol, is the main outlier in our comparison (error of 0.47 eV). The experimental potential is derived from the potential of the deprotonated species and acidity constants (one of which is estimated)64 but the data seems unlikely to be much in error. The p-cyanophenol system thus requires further study.

ACS Paragon Plus Environment

27

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 28 of 35

Z[\]^ Figure 12. Calculated ∆∆𝐺)#*+ values using the GFN-xTBelstat QM/MM MD protocol, graphed against

the experimental ∆∆𝐺)#*+ values with the orange diagonal line indicating perfect agreement. The different series represent the addition of the corrections from Eq. 11, showing systematic improvement as terms are added.

5. CONCLUSIONS In this study we propose an accurate explicit solvation protocol for calculating redox potentials of small molecules in aqueous solvation. This is accomplished via separate calculations of the gas phase ionization energy 𝐼𝐸'() (via accurate and affordable DLPNO-CCSD(T) calculations) and the gas-solvent redox shift, ∆∆𝐺)#*+ , via a multi-step QM/MM-based protocol based on the linear response approximation. The final equations are shown below: 𝐸9 =

∆𝐺#$ − 𝑆𝐻𝐸(4.28 𝑉) 𝑛𝐹

r@stuUvvTr(Œ) Z[\]^ ∆𝐺#$ = 𝐼𝐸'() + ∆∆𝐺)#*+ Z[\]^ Z[\]^ Z[\]^ Z[\]^ ?_U`a ∆∆𝐺)#*+ ≈ ∆∆𝐺)#*+,𝑰𝑰𝑰 + ∆∆∆𝐺)#*+,W/*X + ∆∆∆𝐺)#*+,TMUV#* + ∆∆∆𝐺)#*+,@MUV#*

(12) (13) (14)

Z[\]^ The ∆∆𝐺)#*+ term is calculated by first generating trajectories of both redox states of the solute using affordable semi-empirical (GFN-xTB) QM/MM dynamics with ~4,600 MM water molecules (water sphere III). Snapshots from these trajectories (100 snapshots for each trajectory) are then used to calculate vertical QM/MM redox energies (using range-separated hybrid DFT at the ωB97X/def2-SVP level) which are used to calculate an initial free energy of Z[\]^ . As demonstrated in this oxidation according to the linear response approximation, ∆∆𝐺)#*+,𝑰𝑰𝑰

ACS Paragon Plus Environment

28

Page 29 of 35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Z[\]^ study, accounting accurately for short-range polarization effects, ∆∆∆𝐺)#*+,TMUV#* , as well as ?_U`a Z[\]^ and bulk effects, ∆∆∆𝐺)#*+,W/*X , is vital to the accuracy long-range polarization, ∆∆∆𝐺)#*+,@MUV#* of such a protocol. In our protocol this is performed via separate correction steps that account for these effects in an accurate, albeit currently costly, manner as we opted for describing nearby solute-solvent interactions fully quantum mechanically, with large QM regions thus used in , Z[\]^ ?_U`a ∆∆∆𝐺)#*+,TMUV#* and ∆∆∆𝐺)#*+,@MUV#* terms. However, by calculating the costly corrections only for 5 snapshots that are representative of the average (estimated at a lower level of theory), the computational cost is reduced considerably. Use of a more accurate polarizable forcefield from the beginning might reduce the cost of these corrections as smaller QM-regions could then be employed. Important to the accuracy was also to use a well-behaved QM method and the rangeseparated hybrid functional ωB97X that reduces long-range self-interaction error fits this bill. QM method dependency as well as basis set effects will be explored more in a future study.

In general, our multi-step protocol shows great promise and, importantly, most of the errors are fully controllable in our setup. Higher accuracy can be obtained via increasing simulation time, number of snapshots, QM-region size, changing QM method etc. A crucial feature of an expensive explicit solvation model is that the accuracy be better than the cheaper implicit solvation models. This was achieved in our study, not only for our test molecule phenol but also for the larger test set of 19 organic molecules where we get an improved mean absolute error of only 0.13 V of the ∆∆𝐺)#*+ term compared to the CPCM and SMD models (mean absolute errors of 0.26 and 0.21 V), practically cutting the error in half. These results clearly demonstrate the consistency and accuracy of our pilot explicit solvation protocol for redox potentials and suggest that high-accuracy thermochemistry of redox processes in the solution phase is within reach. An explicit solvation protocol like ours, however, certainly suffers from a high computational cost compared to implicit solvation models. The SI shows the number of energy (or energy+gradient) calculations used to evaluate all terms in Eq. 14. The MD simulations are currently the costliest part of the protocol but an improved MD code and parallelization should reduce the cost considerably in the future. There is still plenty of room for improvement in accuracy and future refinements of our protocol will likely involve modifications to the bulk correction (or a switch to periodic calculations for this term), as well as further studying of the QM-MM interaction during the MD simulation. We also note that previous studies18,20 have suggested the need for accounting for effects beyond linear response (classical corrections have indicated that such corrections can be about ~0.1 eV in magnitude) which will need to be explored in a future study. Quantum tunneling effects have recently been proposed to be of relevance as well.65 Our protocol has been implemented into a development version of the Chemshell code and it allows for these calculations to be performed in an automatic manner. This will allow us to test our methodology on more systems in the near future such as transition metal complexes with aqua and cyanide ligands that present severe challenges to current solvation models.10

ACS Paragon Plus Environment

29

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACKNOWLEDGEMENTS RB acknowledges support from the Icelandic Research Fund, grant no. 141218051 as well the University of Iceland Research Fund.

SUPPORTING INFORMATION AVAILABLE: The supporting information contains data comparing different MM levels of theory for describing the VIE for the phenol+nH2O systems, figure showing the behavior of different QM methods for the VIE with QM-region expansion of a water sphere, details about the polarizable QM/MM term and the biased sampling procedure, details about the experimental and computational reference data for the test set used and an overview of the number of calculations required in the protocol. This information is available free of charge via the Internet at http://pubs.acs.org

REFERENCES 1

Winget, P.; Cramer, C. J.; Truhlar, D. G. Computation of Equilibrium Oxidation and Reduction Potentials for Reversible and Dissociative Electron-Transfer Reactions in Solution. Theor. Chem. Acc. 2004, 112, 217-227. 2 Marenich, A. V.; Ho, J.; Coote, M. L.; Cramer, C. J.; Truhlar, D.G. Computational electrochemistry: prediction of liquid-phase reduction potentials. Phys. Chem. Chem. Phys. 2014, 16, 15068-15106. 3 Cramer, C. J.; Truhlar, D. G. Implicit Solvation Models: Equilibria, Structure, Spectra, and Dynamics. Chem. Rev. 1999, 99, 2161–2200. 4 Marenich, A. V.; Cramer, C. J.; Truhlar, D. G. Universal Solvation Model Based on Solute Electron Density and on a Continuum Model of the Solvent Defined by the Bulk Dielectric Constant and Atomic Surface Tensions. J. Phys. Chem. B 2009, 113, 6378–6396. 5 Wang, L.-P.; Van Voorhis, T. A Polarizable QM/MM Explicit Solvent Model for Computational Electrochemistry in Water. J. Chem. Theory Comput. 2012, 8, 610–617. 6 Guerard, J. J.; Arey, J. S. Critical Evaluation of Implicit Solvent Models for Predicting Aqueous Oxidation Potentials of Neutral Organic Compounds. J. Chem. Theory Comput. 2013, 9, 5046–5058. 7 Thapa, B.; Schlegel, H. B. Calculations of pKa’s and Redox Potentials of Nucleobases with Explicit Waters and Polarizable Continuum Solvation, J. Phys. Chem. A 2015, 119, 5134-5144.

ACS Paragon Plus Environment

30

Page 30 of 35

Page 31 of 35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

8

Thapa, B.; Schlegel, H. B. Improved pKa Prediction of Substituted Alcohols, Phenols, and Hydroperoxides in Aqueous Medium Using Density Functional Theory and a Cluster-Continuum Solvation Model. J. Phys. Chem. A 2017, 121, 4698-4706. 9 Srnec, M.; Chalupský, J.; Fojta, M.; Zendlová, L.; Havran, L.; Hocek, M.; Kývala, M.; Rulíšek, L. Effect of Spin−Orbit Coupling on Reduction Potentials of Octahedral Ruthenium(II/III) and Osmium(II/III) Complexes, J. Am. Chem. Soc. 2008, 130, 10947-10954. 10 Rulíšek, L. On the Accuracy of Calculated Reduction Potentials of Selected Group 8 (Fe, Ru, and Os) Octahedral Complexes. J. Phys. Chem. C 2013, 117, 16871–16877. 11 Klamt, A. J. Conductor-like Screening Model for Real Solvents: A New Approach to the Quantitative Calculation of Solvation Phenomena, J. Phys. Chem. 1995, 99, 2224-2235. 12 Klamt, A.; Scüürmann, G. COSMO: a new approach to dielectric screening in solvents with explicit expressions for the screening energy and its gradient. J. Chem. Soc. Perkin Trans. 1993, 0, 799-805. 13 Isegawa, M.; Neese, F.; Pantazis, D. A. Ionization Energies and Aqueous Redox Potentials of Organic Molecules: Comparison of DFT, Correlated Ab Initio Theory and Pair Natural Orbital Approaches. J. Chem. Theory Comput. 2016, 12, 2272–2284. 14 Saitow, M.; Becker, U.; Riplinger, C.; Valeev, E. F.; Neese, F. A New Near-Linear Scaling, Efficient and Accurate, Open-Shell Domain-Based Local Pair Natural Orbital Coupled Cluster Singles and Doubles Theory. J. Chem. Phys. 2017, 146, 164105. 15 Cheng, J.; Liu, X.; VandeVondele, J.; Sulpizi, M.; Sprik, M. Redox Potentials and Acidity Constants from Density Functional Theory Based Molecular Dynamics. Acc. Chem. Res. 2014, 47, 3522–3529. 16 Vaissier, V.; Van Voorhis, T. Adiabatic Approximation in Explicit Solvent Models of RedOx Chemistry. J. Chem. Theory Comput. 2016, 12, 5111–5116. 17 Tentscher, P. R.; Seidel, R.; Winter, B.; Guerard, J. J.; Arey, J. S. Exploring the Aqueous Vertical Ionization of Organic Molecules by Molecular Simulation and Liquid Microjet Photoelectron Spectroscopy. J. Phys. Chem. B 2015, 119, 238–256. 18 Guerard, J. J.; Tentscher, P. R.; Seijo, M.; Samuel Arey, J. Explicit Solvent Simulations of the Aqueous Oxidation Potential and Reorganization Energy for Neutral Molecules: Gas Phase, Linear Solvent Response, and Non-Linear Response Contributions. Phys. Chem. Chem. Phys. 2015, 17, 14811–14826. 19 Ghosh, D.; Roy, A.; Seidel, R.; Winter, B.; Bradforth, S.; Krylov, A. I. First-Principle Protocol for Calculating Ionization Energies and Redox Potentials of Solvated Molecules and Ions: Theory and Application to Aqueous Phenol and Phenolate. J. Phys. Chem. B 2012, 116, 7269– 7280. 20 Tazhigulov, R. N.; Bravaya, K. B. Free Energies of Redox Half-Reactions from FirstPrinciples Calculations. J. Phys. Chem. Lett. 2016, 7, 2490–2495. 21 Bose, Samik, and Debashree Ghosh. An Interaction Energy Driven Biased Sampling Technique: A Faster Route to Ionization Spectra in Condensed Phase. J. Comput. Chem. 2017, 38, 2248–57. 22 Gordon, M. S.; Freitag, M. A.; Bandyopadhyay, P.; Jensen, J. H.; Kairys, V.; Stevens, W. J. The Effective Fragment Potential Method: A QM-Based MM Approach to Modeling Environmental Effects in Chemistry. J. Phys. Chem. A 2001, 105, 293–307. 23 Gurunathan, P. K.; Acharya, A.; Ghosh, D.; Kosenkov, D.; Kaliman, I.; Shao, Y.; Krylov, A. I.; Slipchenko, L. V. Extension of the Effective Fragment Potential Method to Macromolecules. J. Phys. Chem. B 2016, 120, 6562–6574. ACS Paragon Plus Environment

31

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

24

Page 32 of 35

Stanton, J. F.; Bartlett, R. J. The equation of motion coupled-cluster method. A systematic biorthogonal approach to molecular excitation energies, transition probabilities, and excited state properties, J. Chem. Phys. 1993, 98, 7029–7039. 25 Stanton, J.F.; Gauss, J. Analytic energy derivatives for ionized states described by the equation-of-motion coupled cluster method, J. Chem. Phys., 1994, 101, 8938–8944. 26 Chai, J.-D.; Head-Gordon, M. Systematic Optimization of Long-Range Corrected Hybrid Density Functionals. J. Chem. Phys. 2008, 128, 084106. 27 Liakos, D. G.; Neese, F. Improved Correlation Energy Extrapolation Schemes Based on Local Pair Natural Orbital Methods. J. Phys. Chem. A 2012, 116, 4801–4816. 28 Riplinger, C.; Neese, F. An Efficient and near Linear Scaling Pair Natural Orbital Based Local Coupled Cluster Method. J. Chem. Phys. 2013, 138, 034106. 29 Marcus, R. A. Chemical and Electrochemical Electron-Transfer Theory. Ann. Rev. Phys. Chem. 1964, 15, 155–196. 30 Chemshell, a Computational Chemistry Shell, see www.chemshell.org. 31 Metz, S.; Kästner, J.; Sokol, A. A.; Keal, T. W.; Sherwood, P. ChemShell-a Modular Software Package for QM/MM Simulations: ChemShell. WIRES: Comp. Mol. Sci. 2014, 4, 101–110. 32 Kästner, J.; Carr, J. M.; Keal, T. W.; Thiel, W.; Wander, A.; Sherwood, P. DL-FIND: An Open-Source Geometry Optimizer for Atomistic Simulations. J. Phys. Chem. A 2009, 113, 11856–11865. 33 Sherwood, P.; de Vries, A. H.; Guest, M. F.; Schreckenbach, G.; Catlow, C. R. A.; French, S. A.; Sokol, A. A.; Bromley, S. T.; Thiel, W.; Turner, A. J.; et al. QUASI: A General Purpose Implementation of the QM/MM Approach and Its Application to Problems in Catalysis. J. Mol. Struct.: THEOCHEM, 2003, 632, 1–28. 34 Jorgensen, W. L.; Chandrasekhar, J.; Madura, J. D.; Impey, R. W.; Klein, M. L. Comparison of Simple Potential Functions for Simulating Liquid Water. J. Chem. Phys. 1983, 79, 926–935. 35 Nosé, S. A molecular dynamics method for simulations in the canonical ensemble. Mol. Phys. 1984, 52, 255–268. 36 Hoover, W. G. Canonical dynamics: Equilibrium phase-space distributions. Phys. Rev. A 1985, 31, 1695–1697. 37 Martyna, G. J.; Klein, M. L.; Tuckerman, M. Nosé-Hoover chains: The canonical ensemble via continuous dynamics. J. Chem. Phys. 1992, 97, 2635-2643. 38 W. Thiel, Program MNDO, Mülheim, Germany, 1999 39 Grimme, S.; Bannwarth, C.; Shushkov, P. A Robust and Accurate Tight-Binding Quantum Chemical Method for Structures, Vibrational Frequencies, and Noncovalent Interactions of Large Molecular Systems Parametrized for All Spd-Block Elements (Z = 1–86). J. Chem. Theory Comput. 2017, 13, 1989–2009. 40 Smith, W.; Yong, C. W.; Rodger, P. M. DL_POLY: Application to Molecular Simulation. Mol. Sim. 2002, 28, 385–471. 41 Dral, P. O.; Wu, X.; Spörkel, L.; Koslowski, A.; Weber, W.; Steiger, R.; Scholten, M.; Thiel, W. Semiempirical Quantum-Chemical Orthogonalization-Corrected Methods: Theory, Implementation, and Parameters. J. Chem. Theory Comput. 2016, 12, 1082–1096. 42 Vanommeslaeghe, K.; Hatcher, E.; Acharya, C.; Kundu, S.; Zhong, S.; Shim, J.; Darian, E.; Guvench, O.; Lopes, P.; Vorobyov, I.; et al. CHARMM General Force Field: A Force Field for Drug-like Molecules Compatible with the CHARMM All-Atom Additive Biological Force Fields. J. Comput. Chem. 2009, 31, 671-690.

ACS Paragon Plus Environment

32

Page 33 of 35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

43

Yu, W.; He, X.; Vanommeslaeghe, K.; MacKerell, A. D. Extension of the CHARMM General Force Field to Sulfonyl-Containing Compounds and Its Utility in Biomolecular Simulations. J. Comput. Chem. 2012, 33, 2451–2468. 44 Stewart, J. J. P. Optimization of Parameters for Semiempirical Methods I. Method. J. Comput. Chem. 1989, 10, 209–220. 45 Stewart, J. J. P. Optimization of Parameters for Semiempirical Methods II. Applications. J. Comput. Chem. 1989, 10, 221–264. 46 Jorgensen, W. L.; Maxwell, D. S.; Tirado-Rives, J. Development and Testing of the OPLS AllAtom Force Field on Conformational Energetics and Properties of Organic Liquids. Acc. Chem. Res. 1989, 22, 184–189. 47 Robertson, M. J.; Tirado-Rives, J.; Jorgensen, W. L. Improved Peptide and Protein Torsional Energetics with the OPLS-AA Force Field. J. Chem. Theory Comput. 2015, 11, 3499–3509. 48 Neese, F. The ORCA Program System: The ORCA Program System. WIRES: Comp. Mol. Sci. 2012, 2, 73–78. 49 Neese, F. Software Update: The ORCA Program System, Version 4.0: Software Update. WIRES: Comp. Mol. Sci. 2018, 8, e1327. 50 Neese, F. Prediction of Molecular Properties and Molecular Spectroscopy with Density Functional Theory: From Fundamental Theory to Exchange-Coupling. Coord. Chem. Rev. 2009, 253, 526–563. 51 Lamoureux, G.; Roux, B. Modeling induced polarization with classical Drude oscillators: Theory and molecular dynamics simulation algorithm. J. Chem. Phys. 2003, 119, 3025. 52 Weigend, F.; Ahlrichs, R. Balanced Basis Sets of Split Valence, Triple Zeta Valence and Quadruple Zeta Valence Quality for H to Rn: Design and Assessment of Accuracy. Phys. Chem. Chem. Phys. 2005, 7, 3297. 53 Neese, F.; Wennmohs, F.; Hansen, A.; Becker, U. Efficient, approximate and parallel HartreeFock and hybrid DFT calculations. A 'chain-of-spheres' algorithm for the Hartree--Fock exchange. Chem. Phys. 2009, 356, 98-109. 54 Izsák, R.; Neese, F. An overlap fitted chain of spheres exchange method. J. Chem. Phys. 2011, 135, 144105. 55 Weigend, F. Accurate Coulomb-Fitting Basis Sets for H to Rn. Phys. Chem. Chem. Phys. 2006, 8, 1057. 56 Sure, R.; Grimme, S. Corrected Small Basis Set Hartree-Fock Method for Large Systems. J. Comput. Chem. 2013, 34, 1672–1685. 57 Lamoureux, G.; MacKerell, A. D.; Roux, B. A Simple Polarizable Model of Water Based on Classical Drude Oscillators. J. Chem. Phys. 2003, 119, 5185–5197. 58 Neese, F.; Valeev, E. F. Revisiting the atomic natural orbital approach for basis sets: robust systematic basis sets for explicitly correlated and conventional correlated ab initio methods? J. Chem. Theory Comput. 2011, 7, 33-43. 59 Vasilevskaya, Tatiana, and Walter Thiel. Periodic Boundary Conditions in QM/MM Calculations: Implementation and Tests. J. Chem. Theory Comput. 2016, 12, 3561–70. 60 Kessler, Jiří, Martin Dračínský, and Petr Bouř. Parallel Variable Selection of Molecular Dynamics Clusters as a Tool for Calculation of Spectroscopic Properties. J. Comput. Chem. 2012, 34, 366–71. 61 Lemkul, J. A.; Huang, J.; Roux, B.; MacKerell Jr., A. D. An Empirical Polarizable Force Field Based on the Classical Drude Oscillator Model: Development History and Recent Applications. Chem. Rev. 2016, 116, 4983-5013. ACS Paragon Plus Environment

33

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

62

Cossi, M.; Rega, N.; Scalmani, G.; Barone, V. Energies, Structures, and Electronic Properties of Molecules in Solution with the C-PCM Solvation Model. J. Comput. Chem. 2003, 24, 669– 681. 63 Riccardi, D.; Li, G.; Cui, Q. Importance of van Der Waals Interactions in QM/MM Simulations. J. Phys. Chem. B 2004, 108, 6467–6478. 64 Canonica, S.; Hellrung, B.; Wirz, J. Oxidation of Phenols by Triplet Aromatic Ketones in Aqueous Solution. J. Phys. Chem. A 2000, 104, 1226–1232. 65 Rybkin, V. V.; VandeVondele, J. Nuclear Quantum Effects on Aqueous Electron Attachment and Redox Properties, J. Phys. Chem. Lett. 2017, 8, 1424-1428.

ACS Paragon Plus Environment

34

Page 34 of 35

Page 35 of 35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

For Table of Contents Only

ACS Paragon Plus Environment