Assessing Statistical Uncertainties of Rare Events ... - ACS Publications

Jul 28, 2017 - we propose a method based on the Poisson distribution to assess the statistical uncertainty of reaction rate constants obtained from re...
0 downloads 0 Views 440KB Size
Subscriber access provided by UNIVERSITY OF THE SUNSHINE COAST

Letter

Assessing Statistical Uncertainties of Rare Events in Reactive Molecular Dynamics Simulations Leif C. Kröger, Wassja A. Kopp, Malte Döntgen, and Kai Leonhard J. Chem. Theory Comput., Just Accepted Manuscript • DOI: 10.1021/acs.jctc.7b00524 • Publication Date (Web): 28 Jul 2017 Downloaded from http://pubs.acs.org on July 29, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Chemical Theory and Computation is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 20

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Assessing Statistical Uncertainties of Rare Events in Reactive Molecular Dynamics Simulations Leif C. Kröger, Wassja A. Kopp, Malte Döntgen, and Kai Leonhard∗ Chair of Technical Thermodynamics, RWTH Aachen University, D-52062 Aachen, Germany E-mail: [email protected]

Phone: +49 241 80 98174. Fax: +49 241 80 92255

Abstract Reactive molecular dynamics (MD) simulations are a versatile tool which allow for studying reaction pathways and rates simultaneously. However, most reactions will be observed only a few times in such a simulation due to computational limitations or slow kinetics and it is unclear how this will influence the obtained rate constants. Therefore, we propose a method based on the Poisson distribution to assess the statistical uncertainty of reaction rate constants obtained from reactive MD simulations.

Introduction Fundamental research and industrial utilization of chemical, 1 biochemical, 2 and biological 3 processes require knowledge and in-depth understanding of the underlying reaction kinetics governing these processes. A rapidly emerging technique for kinetics predictions are reactive molecular dynamics (MD) simulations, 4 as these simulations allow for simultaneous discovery ∗

To whom correspondence should be addressed

1

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

of reactions and prediction of their kinetics. 5 Reactive MD simulations, however, are typically limited to several thousands of molecules and several nanoseconds, thus access only a limited part of the phase space. As a consequence, many reactions are observed rarely, if at all. In this work, we present a rigorous scheme for assessing statistical uncertainties of rare events observed in reactive MD simulations. In principle, reactive MD simulations allow the chemical system to evolve naturally 6 and are therefore often referred to as in-silico experiments. 7 As early as in the seventies, reactive MD simulations were used as scientific tool. In pioneer works, Bunker et al. 8 and Valencich et al. 9 performed a sequence of trajectory studies of hot atom reactions for Tritium and Methane, 8 and CH5 . 9 Later Hase et al. 10 developed an analytical potential surface and studied decomposition reactions of the ethyl radical on more than 3000 individual classical trajectories. 11 Today, from the pathways and rates observed in such in-silico experiments, full reaction networks can be obtained from single simulations. 5 At the same time, reaction kinetics of many reactions are predicted from a small number of rare reaction events, raising concerns regarding their statistical uncertainties. Unfortunately, there is little hope that this rare event problem will be solved in the future. Although acceleration techniques for MD simulations allow to study normally inaccessible parts of the phase space, most of these techniques require biasing the potential energy surface of the studied system, which makes it challenging to reconstruct the unbiased reaction kinetics. 12 However, if the unbiased information can be obtained from a biased potential, which is the case in many meta-dynamics studies, 13,14 the presented scheme of this work for assessing statistical uncertainties of rare events observed in reactive MD simulations can be applied as well. Nevertheless, it seems highly unlikely that in-silico experiments will become competitive to real-world experiments with respect to the number of molecules and reachable timescale, e.g. several moles of molecules and several seconds, respectively. Although more and more reactions will become accessible for in-silico experiments due to technological advances, the rare event problem will just shift to reactions inaccessible in current reactive MD simulations.

2

ACS Paragon Plus Environment

Page 2 of 20

Page 3 of 20

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

While a single rare event is sufficient for discovering a reaction pathway, 5 obtaining accurate rate constants from this single rare event is almost impossible due to statistical uncertainties. In fact, quantification of these uncertainties comes with a dilemma: Without further knowledge, it is almost impossible to judge whether one was “lucky” in observing one event at all or whether one was “unlucky” observing just one event. These difficulties are in our opinion one of the key aspects which prevent a more wide-spread use of reactive MD simulations for the prediction of rate constants. In this work, we present a new approach of quantifying the uncertainty of reaction rate constants obtained from reactive MD simulations. This new approach utilizes the probability distribution function (PDF) of reaction events and has a wide range of implications for the analysis of reactive MD simulations. We will use this new approach to: 1. Set up in-silico experiments in a way that a particular reaction event is observed with a desired probability, 2. compute the rate constants of reaction events in reactive MD simulations, 3. compute the statistical error of the computed rate constant from a single simulation, 4. estimate upper limits for the rate constants of reactions which are not observed in a reactive MD simulation, 5. improve the statistical accuracy of obtained reaction rate constants, 6. elucidate the flaws of arithmetic averaging. In addition, we show that the statistical error is often much smaller than the methodological error, even though only a few reaction events have been observed. With this work, we want to emphasize the value of every single reactive MD simulation, even though the reactions of interest might not have been observed in a particular simulation.

3

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 20

Reaction Events The (rare) events of several natural phenomena, such as the number of decays due to radioactivity, 15 are distributed according to the Poisson distribution. 16 The Poisson distribution is known for centuries 17,18 and provides a formula of the probability Pλ (n) of observing n events if λ events are expected:

Pλ (n) =

λn exp [−λ] . n!

(1)

This function is valid for discrete values of n ∈ N0 and continuous values of λ ∈ R>0 . The product of two Poisson-distributed random variables with the expectation values λa and λb is again a Poisson-distributed random variable with an expectation value λa + λb . 19 In general, the Poisson distribution is a reasonable model when a variety of assumptions are valid 20 and it can be shown that these assumptions apply to reaction events: 1. The number of reaction events should be clearly countable. 20 This assumption is valid in case of chemical reactions, as reaction events are well defined and can be clearly distinguished. 2. The rate at which reaction events occur is constant. 20 This assumption is valid given the general form of a reaction rate r = k · C, if the rate constant k and the product of the reactants concentrations C are constant. While the rate constant k is the constant proportionality factor between the reaction rate r and the product of the reactants concentrations C, the latter changes as reactions consume reactants and form products. As a consequence, the reaction rate r changes whenever a reaction event is observed, but is constant between two such events, which can be clearly seen in species profiles of a MD simulations. Figure 1 exemplary shows such species profiles. Therefore, the reaction rate r is piece-wise constant for a reactive MD simulation, meaning the above assumption is valid for each of these pieces. 1

1

This assumption is also valid if the number of molecules is significantly larger than the number of reaction

4

ACS Paragon Plus Environment

Page 5 of 20

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 20

the expected number of reaction events of a certain reaction λA is determined by:

λA = kA (T ) · V C∆t ,

(2)

where kA is the rate constant of reaction A, V is the volume, and ∆t is the length of the simulation interval. Although Eq. (2) is only valid if C is constant, the aforementioned piece-wise constant nature of C allows for using the formulation for λA for each piece. Therefore, a reactive MD simulation can be interpreted as a chain of independent pieces, or sub-simulations, which are separated by reaction events. The probability of observing a reaction event of reaction A in a particular sub-simulation is again described by Eq. (2). Due to the previously mentioned additivity of Poisson-distributed random variables, the expected number of events in each sub-simulation can be added up to obtain the expected number of events of the total simulation:

λA = k A ·

M X

Vi Ci ∆ti ,

(3)

i=1

with M being the number of sub-simulations into which the total simulation is divided. As each of these independent sub-simulations could be performed as separate trajectories, exactly the same formulation could be used to combine the information gained from several separately performed trajectory simulations. It should further be noted that the rate constant kA is only constant for a given temperatureand pressure-condition. Consequently, Eq. (3) can be only applied to k(T ) or K(T, p) if the respective variables are kept constant. For temperature-dependent reactions, it would be more general to say that the energy-distribution of the reactants must be the same prior to reaction. For pressure-dependent reactions, it would also be sufficient if the reaction is in its high-pressure limit.

6

ACS Paragon Plus Environment

Page 7 of 20

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Results The above Poisson-based formulation of reaction events can be used to design simulations, analyze statistical uncertainties of rare events, and combine information from multiple trajectory simulations. For setting up a reactive MD simulation, it is necessary to estimate how long that simulation should run to actually observe anything. Often, an estimate of the rate constant e kA (T ) of a reaction A is available in literature or can be obtained via analogies to similar

reactions. In such a case, the Poisson distribution can be employed to estimate the necessary simulation time ∆t which is required to observe reaction A at least once. The probability of observing at least one event of reaction A PλA (n > 0) can be expressed as:

PλA (n > 0) = 1 − PλA (n = 0) = 1 − exp (−λA ) ,

(4)

where λA is described by Eq. (2). The probability of observing at least one reaction event of reaction A rapidly increases with time and converges to unity. In fact, this relation was found by Joshi et al. 21 from an analysis of 192 parallel replica reactive MD simulations. When assuming constant reactants concentrations and constant volume for pre-simulation evaluation, Eqs. (2) and (4) yield the simulation time required to observe a reaction event with probability PλA :

∆t = −

ln(1 − PλA (n > 0)) . e kA (T )V C

(5)

As can be seen in the denominator of the expression for ∆t, this form could also be used to estimate the required volume V or the required product of reactants concentrations C, i.e. the initial number of reactants. It is obvious from Eq. (5) that inaccurate estimates of the rate constants e kA (T ) cause similar inaccuracies in ∆t. Further, we point out estimates taken from literature or analogies may vary significantly from the underlying rate constants 7

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

in MD simulations. Nevertheless, having an estimate of the required simulation time ∆t is a valuable tool for setting up in-silico experiments. Rate constants can be directly obtained from reactive MD simulations, as the total number of reaction events can be easily counted, using the ChemTraYzer software package for example. 5 Although the present Poisson-based analysis would require knowledge of λA , which is unknown initially, it is reasonable to assume that the observed number of events NA corresponds to the most probably λA . 0.5 N =1 N =3 N =5

λPois

0.4 P (λ) / %

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 20

0.3 0.2

0

λup

λlow

0.1

0

1

2

3

4 λ

5

6

7

8

Figure 2: Poisson probability distribution functions for N = 1, 3, and 5, the most probable number of events λPois , and the upper and lower bounds of the confidence interval, as discussed later. Figure 2 shows the PDFs of the expected number of events λ in case of one (N = 1), two (N = 3), and five (N = 5) observed events. It can be seen that the PDF is highly asymmetric, but it becomes symmetric in case of large N since the Poisson distribution converges to the binomial distribution. 15 Further, it can be seen from Fig. 2 that the PDF has a maximum at the observed number of events N , which can also be proven mathematically. Consequently, it is reasonable to use the highest probability to define the rate constant k Pois (T ) of a reaction observed during reactive MD simulation. If NA reaction events of reaction A are observed, 8

ACS Paragon Plus Environment

Page 9 of 20

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

the rate constant kAPois (T ) can be calculated from Eq. (3) according to: kAPois (T ) =

NA M P

(6)

.

Vi Ci ∆ti

i=1

This form is consistent with the rate constant formulation derived from macroscopic balancing of reactions in the work of Döntgen et al. 5 The confidence interval of Poisson-distributed reaction events can be calculated from the Poisson PDF. Two assumptions are required in order to define the lower and upper bounds of a two-sided confidence interval: Firstly, the confidence level X is defined as2 :

I:

X =

Zλup

Pλ (n = N ) dλ = −

b=0

λlow

=

N X λb

N X λb

low

b=0

b!

!

exp [−λlow ] −

b!

!

λup exp [−λ] λlow !

N X λbup b=0

b!

exp [−λup ] ,

(7)

(8)

where λlow and λup are the lower and upper bounds of the two-sided confidence interval, respectively. The level of confidence is set by X , which ranges from zero to unity. A common choice for X is a 95% confidence, which corresponds roughly to 2σ in case of normal distribution. Secondly, the probability of the upper and lower bounds are chosen to be identical:

II :

!

Pλup (n = N ) = Pλlow (n = N ) λN λN up exp [−λup ] = low exp [−λlow ] , N! N!

(9) (10)

which accounts for the asymmetric form of the Poisson PDF. Figure 2 shows the Poisson PDF The formulas are given for λ for reasons of readability. The obtained λs can be transformed into k Pois (T ) using Eq. (3). 2

9

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 20

for one observed event, as well as the most probable λPois and the upper and lower bounds of the confidence interval. The particular approach for defining the confidence interval is chosen due to the asymmetric form of the PDF (cf. Fig. 2). When one would replace assumption II for example with excluding the same probability range on both sides, one could even exclude the most probable λ. The resulting equation system is non-linear and cannot be solved analytically. Therefore, we transformed Eqs. (8) and (10) and solve them for λlow using the Gamma function Γ (x), the incomplete Gamma function 22 Γ (x, a), and the Lambert W function 23 W(x) because these functions are implemented in standard mathematical software. This leads to the following formulation of the two previous constraints:

I:

X Γ (N + 1) = −Γ (N + 1, λup ) + Γ (N + 1, λlow ) ⇒ λlow = Γ−1 (N + 1, X Γ (N + 1) + Γ (N + 1, λup )) ,

(11) (12)

and

II :

λlow



  λup λup = −N W − . exp − N N

(13)

The complete derivation of Eqs. (12) and (13) can be found in the supporting information. In case no reaction event was observed, the Poisson-based formulation can be used to calculate an upper bound for the respective reaction rate constant. It is fact, that any reaction, observed or not, has a rate constant larger than zero. If a reaction has not been observed, however, no statement about the actual rate constant can be made, but about the lower and upper bounds. With the lower bound set to zero: λlow = 0, the upper bound can be obtained analytically from Eq. 8: λup = − ln (1 − X ). For example, if ten molecules of type A and ten molecules of type B exist together in the same simulation box (V = 4096 Å3 ) for one nanosecond but do not react, one can conclude from the previous equation and Eq. 10

ACS Paragon Plus Environment

Page 11 of 20

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(2) that all bimolecular reactions of molecules A and B have reaction rate constants which are smaller than kup = 7.39 · 1010 cm mol−1 s−1 with a confidence level of X=0.95. Hereby, it becomes obvious that simulations in which no desired reaction event was observed still contain valuable information about these events. In fact, when only simulations that contain a desired reaction are analyzed and others are neglected, the denominator of Eq. (6) is artificially reduced, which leads to an overestimation of the rate constant. Consequently, we advise fellow researchers to also analyze unobserved reactions and publish the respective upper bounds as supporting information for use in later studies. We are aware that a large number of species can be observed during reactive MD simulations and that this results in a large number of possible reactions. Nevertheless, many of the observed species do not exist at the same time so that the number of actually possible reactions is limited. Utilization of previously published data would allow to efficiently reduce statistical uncertainties of rate constants obtained from reactive MD simulations. Furthermore, it is needed to avoid the overestimations described before. Although this is already possible by analyzing the previous and new trajectory data as a whole, utilization of previously published data is currently limited by the accessibility of that data. Currently, the major issue is that hard disk output of trajectory simulations easily reaches several gigabytes, which cannot be published online, typically. With the present approach, the rate constant information obtained from reactive MD simulations are condensed to a few numbers, thus can be easily published. In order to demonstrate which information is needed from previous simulations, Eq. (6) can be rewritten as:

k Pois (T ) =  M P

i=1

Nprevious + Nnew  M  P Vi Ci ∆ti + Vi Ci ∆ti previous

i=1

.

(14)

new

Thereby, only two piece of information of previous studies are required per reaction: The

11

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

number of reaction events Nprevious and the "concentration integral"

Page 12 of 20

M P

i=1

Vi Ci ∆ti



. If previous

the rate constant is given instead of the "concentration integral", the latter can be calculated from Eq. (6). If the upper and lower bounds are given instead of the number of events, the latter can be calculated from the equations used to derive the confidence interval (Eqs. (8) and (10)). This approach for merging information is also applicable for reactions not observed during simulations. Again, we highlight the importance of providing lower and upper bounds along the respective rate constants, so that other researchers can fully utilize the information of previous simulations. The previously used arithmetic averaging of multiple trajectories 5 should not be used any further for combining reactive MD simulation results. Arithmetic averaging holds for normal distributions, but in case of rare events the asymmetric Poisson distributions applies. Although giving the same answer for NA → ∞, the differences become apparent when comparing the Poisson-based formulation to the arithmetic average-formulation, which is given by:

kAar (T )

S 1X = Mj S j=1 P

NA,j

,

(15)

Vi,j Ci,j ∆ti,j

i=1

where S is the total number of different simulations. The comparison of k ar (T ) (cf. Eq. (15)) and k Pois (T ) (cf. Eq. (6)) reveals that they become identical if only one simulation is considered (S=1). This can also be achieved if several different simulations are treated as one long simulation. However, no confidence interval can be obtained in this case when using arithmetic averaging. Figure 3 shows the evolution of the rate constants obtained every 0.1 ns from four idependent simulations with the two different methods. Reaction events can be clearly observed in the figure. Further, the four simulations analyzed had the same setup regarding number of molecules and box-size. This is a requirement when using the arithemtic averaging method, 12

ACS Paragon Plus Environment

Page 13 of 20

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

but is not necessary in the newly presented Poisson-based formulation. The most fundamental difference between these two formulation is that the arithmetic averaging confidence interval, as well as the resulting arithmetically averaged rate constants depend on the number of simulations S. For the arithmetic average-formulation, it makes a difference whether one long simulation is analyzed or whether this simulation is analyzed as multiple consecutive short simulations. The Poisson method would lead to the same results in both cases. Table 1 presents a comparison of k ar (T ) to k Pois (T ) of our previous work on methane combustion 5 and shows the correct confidence intervals for a confidence level of X = 0.95. It can be seen that rate constants increase or decrease moderately when the reactive MD simulations are analyzed using the Poisson-based formulation. This is also true for the confidence interval, as long as a sufficient number of reactions has been observed. In case only a single reaction event has been observed, the upper bound is almost doubled when using the Poisson-based formulation and the lower bound can be quantified for the first time. Table 1: Comparison of the rate constants and the confidence intervals using the arithmetic average-based fromulation and the Poisson-based formulation to analyze reactive MD simulations of methane combustion. 5 Döntgen et al. 5 also compared the rate constants obtained from arithmetic averaging to literature values. Reaction

˙ 3+H ˙ CH4 → CH ˙ 3 + HO ˙2 CH4 + O2 → CH ˙ → CH ˙ 3 + H2 O CH4 + OH

T

N

k ar

(K) 2000 2200 2400 2000 2200 2400 2000 2200 2400

(-) 1 6 20 6 11 17 5 22 44

cm ) (s−1 ) ( mol·s 8.05e+05 4.61e+06 1.41e+07 5.37e+09 9.42e+09 1.23e+10 1.91e+13 5.58e+12 1.11e+13

3

13

kar ar klow

(-) ∞ 1.82 1.22 1.92 1.33 1.64 2.94 1.56 1.27

ACS Paragon Plus Environment

ar kup kar

k Pois

(-) 2.00 1.45 1.18 1.48 1.25 1.39 1.66 1.36 1.21

cm ) (s−1 ) ( mol·s 6.85E+05 3.63E+06 1.89E+07 4.60E+09 8.97E+09 1.85E+10 5.58E+12 9.98E+12 9.36E+12

3

kPois Pois klow

(-) 12.50 2.33 1.43 2.33 1.79 1.56 2.38 1.43 1.27

Pois kup kPois

(-) 3.93 1.94 1.37 1.94 1.63 1.47 1.95 1.38 1.25

Journal of Chemical Theory and Computation

1.4

kPois kar

1.2 k/107 (s− 1)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 20

1 0.8 0.6 0.4 0.2 0

1

1.5

2

2.5

3 t (ns)

3.5

4

4.5

5

˙ 3+H ˙ obtained Figure 3: Instantaneous reaction rate constants of the reaction CH4 → CH from four independent simulations using arithmetic averaging (k ar ) and the Poisson based method (k Pois ). Four out of ten independent trajectories at a temperature of 2400 K with a total of five reaction events are analyzed and the rate constants are obtained every 0.1 ns using both methods. Near 4.7 ns two simulation events took place. The simulations were performed as part of our previous work. 5 Please refer to Döntgen et al. 5 for details.

Statistical vs. methodological errors In order to put the statistical uncertainty of rate constants into context, one should briefly recall the errors of force fields and QM methods in general: A variety of different parametrizations of reactive force fields exist in the literature, which are all trained for unique, but different applications. Therefore, it is impossible to give an absolute accuracy for the force field method. This is why the ReaxFF force field 4 and the parametrization of Ashraf et al. 24 is used as an example here. In the recent work of Ashraf et al. 24 the ReaxFF combustion parameters were retrained to accurately model the initial oxidation kinetics and combustion of syngas. In their work, the authors have validated their newly trained force field extensively against QM data. 24 The results of this study indicate that this ReaxFF force field rarely exhibits extreme energy deviations up to 80 kJ mol−1 . This deviation would influence rate constant predictions by a factor of 2 · 104 at a temperature of 1000 K and still by a factor of 140 at a temperature of 2000 K. In case of Born-Oppenheimer MD (BOMD), a variety 14

ACS Paragon Plus Environment

Page 15 of 20

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

of different DFT methods are available and it depends on the size of the studied system and the available computer resources which of these methods is used for simulation. One of the most widely used DFT method is B3LYP, which was benchmarked by Zheng et al. 25 to have a mean unsigned error (MUE) in energy of roughly 17 kJ mol−1 . This would cause the rate constants to deviate by a factor of 7.8 to 2.8 at temperatures of 1000 K and 2000 K, respectively. Neglect of other effects, such as tunneling, would result in additional errors of less than a factor 2 above 1000 K. 26 The errors do have an even higher effect when studying reactions at lower temperature, i.e. in microgel synthesis, where temperatures of less than 100 ◦C are of interest. 27 However, the statistical uncertainty obtained by the Poisson method is independent from temperature, it solely depends on the number of observed events. The comparison of the statistical and methodological errors of the above mentioned methods clearly indicates that the statistical uncertainties of singly observed reactions are of the same order as the error of DFT methods typically used for BOMD simulations. Concerning classical fore fields, the statistical uncertainties are orders of magnitude smaller compared to the underlying systematic errors. As a consequence, we advise fellow researchers to focus on high-level methods rather than on vast phase space sampling in reactive MD studies. A high sampling rate of reaction events will reduce the uncertainty of the rate constants by one order of magnitude in the best case, while higher levels of theory or a more accurate parametrization of a force field have the potential to increase the accuracy by several orders of magnitude.

Conclusions In this study we present a Poisson-based formulation which allows to obtain rate constants and their confidence intervals from reactive MD simulations. The rate constants as well as the confidence intervals obtained by this Poisson-based formulation are independent from the number of simulations performed, which makes this approach superior to the arithmetic

15

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

average-formulation. Further, it enables the estimation of upper bounds for rate constants of reactions which have not been observed during a simulation. Consequently, we demonstrate a practical way of utilizing the results of previous (published) simulations and therefore advise all fellow researchers to share upper and lower bounds of reactions even though the reaction might not have been observed in a particular simulation. In fact, this information needs to be included in future studies in order to avoid overestimation of the rate constants. It was demonstrated that statistical errors are usually less than one order of magnitude even though few reaction events were observed. A comparison to other sources of error, such as the general uncertainty of a force field or a DFT method revealed that methodological errors are at least of the order of statistical errors. We therefore advise fellow researchers to focus on high-level methods or accurately trained force fields, rather than on a high sampling rate of reaction events. In conclusion, rate constants derived from many reaction events simulated with a low-level method is more uncertain compared to rate constants derived from few reaction events simulated with a computationally expensive high-level method.

Acknowledgement The authors thank the Deutsche Forschungsgemeinschaft (DFG) for support within SFB 985 ”Functional microgels and microgel systems”.

Supporting Information Available Detailed derivation of the confidence boundary formulations. This material is available free of charge via the Internet at http://pubs.acs.org/.

References (1) Yang, G.; Pidko, E. A.; Hensen, E. J. M. Mechanism of Bronsted acid-catalyzed conversion of carbohydrates. J. Catal. 2012, 295, 122–132.

16

ACS Paragon Plus Environment

Page 16 of 20

Page 17 of 20

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(2) Keener, J.; Sneyd, J. Mathematical Physiology I: Cellular Physiology. 2009. (3) Hatch, M. D.; Slack, C. R. Photosynthesis by sugar-cane leaves: a new carboxylation reaction and the pathway of sugar formation. Biochem. J. 1966, 101, 103. (4) van Duin, A. C. T.; Dasgupta, S.; Lorant, F.; Goddard, W. A. ReaxFF: a reactive force field for hydrocarbons. J. Phys. Chem. A 2001, 105, 9396–9409. (5) Döntgen, M.; Przybylski-Freund, M.-D.; Kröger, L. C.; Kopp, W. A.; Ismail, A. E.; Leonhard, K. Automated discovery of reaction pathways, rate constants, and transition states using reactive molecular dynamics simulations. J. Chem. Theory Comput. 2015, 11, 2517–2524, PMID: 26575551. (6) Döntgen, M. Reaction models from reactive molecular dynamics and high-level kinetics predictions. Ph.D. thesis, RWTH Aachen University, 2017. (7) Danchin, A.; Médigue, C.; Gascuel, O.; Soldano, H.; Hénaut, A. From data banks to data bases. Res. Microbiol. 1991, 142, 913 – 916. (8) Bunker, D. L.; Pattengill, M. D. Trajectory studies of hot-atom reactions. I. tritium and methane. J. Chem. Phys. 1970, 53, 3041–3049. (9) Valencich, T.; Bunker, D. L. Trajectory studies of hot atom reactions. II. An unrestricted potential for CH5. J. Chem. Phys. 1974, 61, 21–29. (10) Hase, W. L.; Wolf, R. J.; Sloane, C. S. Trajectory studies of the molecular dynamics of ethyl radical decomposition. J. Chem. Phys. 1979, 71, 2911–2928. (11) Hase, W. L.; Buckowski, D. G. Dynamics of ethyl radical decomposition. II. Applicability of classical mechanics to large-molecule unimolecular reaction dynamics. J. Comput. Chem. 1982, 3, 335–343.

17

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(12) Hamelberg, D.; Mongan, J.; McCammon, J. A. Accelerated molecular dynamics: a promising and efficient simulation method for biomolecules. J. Chem. Phys. 2004, 120, 11919–11929. (13) Fleming, K. L.; Tiwary, P.; Pfaendtner, J. New Approach for Investigating Reaction Dynamics and Rates with Ab Initio Calculations. J. Phys. Chem. A 2016, 120 . (14) Fu, C. D.; Oliveira, L. F. L.; Pfaendtner, J. Determining energy barriers and selectivities of a multi-pathway system with infrequent metadynamics. J. Chem. Phys. 2017, 146, 014108. (15) Arfken, G.; Weber, H. Mathematical methods for physicists 6th ed. by George B; 2005. (16) Good, I. Some statistical applications of Poisson’s work. Statistical science 1986, 157– 170. (17) Poisson, S. D.; Schnuse, C. H. Recherches sur la probabilité des jugements en matière criminelle et en matière civile; Meyer, 1841. (18) Moivre, A. De mensura sortis, seu, de probabilitate eventuum in ludis a casu fortuito pendentibus; Clements, 1711. (19) Haight, F. A. Handbook of the Poisson distribution; Wiley, 1967. (20) Feller, W. An introduction to probability theory and its applications: volume I ; John Wiley & Sons London-New York-Sydney-Toronto, 1968; Vol. 3. (21) Joshi, K. L.; Raman, S.; van Duin, A. C. T. Connectivity-based parallel replica dynamics for chemically reactive systems: from femtoseconds to microseconds. J. Phys. Chem. Lett. 2013, 4, 3792–3797. (22) Abramowitz, M.; Stegun, I. A. Handbook of mathematical functions: with formulas, graphs, and mathematical tables; Courier Corporation, 1964; Vol. 55. 18

ACS Paragon Plus Environment

Page 18 of 20

Page 19 of 20

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(23) Corless, R. M.; Gonnet, G. H.; Hare, D. E. G.; Jeffrey, D. J.; Knuth, D. E. On the LambertW function. Advances in Computational Mathematics 1996, 5, 329–359. (24) Ashraf, C.; van Duin, A. C. Extension of the reaxFF combustion force field toward syngas combustion and initial oxidation kinetics. J. Phys. Chem. A 2017, 121, 1051– 1068. (25) Zheng, J.; Zhao, Y.; Truhlar, D. G. The DBH24/08 database and its use to assess electronic structure model chemistries for chemical reaction barrier heights. J. Chem. Theory Comput. 2009, 5, 808–821. (26) Kopp, W. A.; Langer, R. T.; Döntgen, M.; Leonhard, K. Hydrogen abstraction from n-butyl formate by H. and HO.2 . J. Phys. Chem. A 2013, 117, 6757–6770. (27) Kröger, L. C.; Kopp, W. A.; Leonhard, K. Prediction of chain propagation rate constants of polymerization reactions in aqueous NIPAM/BIS and VCL/BIS systems. J. Phys. Chem. B 2017, 121, 2887–2895.

19

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment

Page 20 of 20