Subscriber access provided by TULANE UNIVERSITY
A: New Tools and Methods in Experiment and Theory
Transferable Kinetic Monte Carlo Models with Thousands of Reactions Learned from Molecular Dynamics Simulations Enze Chen, Qian Yang, Vincent Dufour Decieux, Carlos A. Sing-Long, Rodrigo Freitas, and Evan J. Reed J. Phys. Chem. A, Just Accepted Manuscript • DOI: 10.1021/acs.jpca.8b09947 • Publication Date (Web): 08 Feb 2019 Downloaded from http://pubs.acs.org on February 13, 2019
Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.
is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.
Page 1 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
The Journal of Physical Chemistry
Transferable Kinetic Monte Carlo Models with Thousands of Reactions Learned from Molecular Dynamics Simulations Enze Chen,† Qian Yang,‡ Vincent Dufour Decieux,¶ Carlos A. Sing-Long,§ Rodrigo Freitas,¶ and Evan J. Reed∗,¶ †Institute for Computational and Mathematical Engineering, Stanford University, Stanford, CA 94305, United States ‡Computer Science and Engineering Department, University of Connecticut, Storrs, CT 06269, United States ¶Department of Materials Science and Engineering, Stanford University, Stanford, CA 94305, United States §Institute for Mathematical and Computational Engineering, Pontificia Universidad Cat´olica de Chile, Santiago, Chile E-mail:
[email protected] 1
ACS Paragon Plus Environment
The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Abstract Molecular dynamics (MD) simulation of complex chemistry typically involves thousands of atoms propagating over millions of time steps, generating a wealth of data. Traditionally these data are used to calculate some aggregate properties of the system and then discarded, but we propose that these data can be reused to study related chemical systems. Using approximate chemical kinetic models and methods from statistical learning, we study hydrocarbon chemistries under extreme thermodynamic conditions. We discover that a single MD simulation can contain sufficient information about reactions and rates to predict the dynamics of related yet different chemical systems using kinetic Monte Carlo (KMC) simulation. Our learned KMC models identify thousands of reactions and run four orders of magnitude faster than MD. The transferability of these models suggests that we can viably reuse data from existing MD simulations to accelerate future simulation studies and reduce the number of new MD simulations required.
2
ACS Paragon Plus Environment
Page 2 of 25
Page 3 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
The Journal of Physical Chemistry
Introduction Simulations of complex chemical systems under extreme temperature and pressure are of significant interest to the combustion, 1 organic chemistry, 2 and astrophysics 3 communities, among others. Gas giant planetary interiors, for example, are thought to be comprised of condensed-phase hydrocarbons undergoing shock-induced transitions that are fundamental to planetary structure, 4 and one way we can study these systems is through atomistic simulations. While molecular dynamics (MD) simulations are capable of simulating thousands of atoms using density functional theory and up to millions of atoms using reactive potentials such as ReaxFF, 5,6 these simulations can nonetheless require weeks on high performance parallel machines to reach merely nanosecond timescales. Furthermore, the results of any single MD simulation cannot be directly extrapolated to related systems. Every adjustment to initial species concentrations requires a new expensive simulation. However, for systems involving the same atom types under the same thermodynamic conditions, it is likely that much of the same potential energy surface is traversed over the course of these simulations. In this work, we demonstrate that the information contained in a single or few MD simulations can be sufficient to construct stochastic models of reaction chemistry that are able to rapidly predict the dynamics of related chemical systems. These stochastic models can be used to simulate new systems using the Gillespie Stochastic Simulation algorithm (SSA), 7 which is equivalent to kinetic Monte Carlo (KMC) and runs orders of magnitude faster than MD. Our methodology reduces the need for new expensive MD simulations of chemical systems for which existing MD simulations may contain enough information, creating an opportunity to greatly speed up our ability to iterate on atomistic simulation studies of complex chemistry. The development of faster alternatives to traditional MD simulations has been widely studied in the literature. 8 In the past twenty years, scientists have tried to address the issue of accelerating MD simulations using bias potentials, 9,10 enhanced sampling, 11 parallel replica methods, 12 coupled KMC/MD methods, 13,14 machine learning, 15 and hardware improvements including GPU computing 16 and massively parallel architectures. 17 As can be 3
ACS Paragon Plus Environment
The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
seen from the wide variety of proposed methods, accelerating MD simulations is a difficult challenge that requires different methods for different types of systems. This difficulty largely stems from the inherently serial nature of MD as it integrates the equations of motion over time. KMC methods offer significant improvements over MD in terms of computational tractability by propagating the system in time over a given set of states rather than atomic configurations. Each step of KMC advances in time on the timescale of transitions between system states, which is many orders of magnitude longer than each timestep of MD, since the latter must be on the timescale of atomic vibrations. However, the set of transition rates input into KMC methods are typically derived from transition state theory, which can be expensive when thousands of reactions or more are relevant. 18 An additional challenge is identifying the entire set of transitions, in our case chemical reactions, relevant to a particular system. 19 Approaches to discovering relevant chemical reaction pathways using ab initio methods have recently been developed by Wang and co-workers. 2,20 Typically in chemistry applications, the next step after reaction pathway discovery is to estimate the individual reaction rates, preferably using a quantum method combined with nudged elastic band calculations. In this work, we study a set of chemical reactions and reaction rate constants as a learned network of interacting transitions between states instead of as separate chemical pathways. Our data-driven method builds a stochastic model of the dynamics of the chemical system with minimal assumptions about the properties of the potential energy surface, including for example entropic effects. The criteria for the usefulness of our model is simply whether it can predict the results of corresponding MD simulations. We compare the ability of a KMC simulation derived from such a large-scale network of thousands of learned reactions and estimated rate constants to successfully predict the results of new MD simulations. We find that the reaction network learned from a single or few MD trajectories can efficiently and accurately predict the kinetic evolution of new MD simulations with different chemical composition.
4
ACS Paragon Plus Environment
Page 4 of 25
Page 5 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
The Journal of Physical Chemistry
MD simulation
64 liquid C4H10 H2 H2 54
Parameter Estimation
5C2CH26H6
KMC model with 10,000 reactions
Training Error
CHCH 4 36 4
Test Error 216 liquid CH4
and 2000 2000 more! C33H H66 and more! 1 C
Figure 1: A single MD simulation initialized with 64 isobutane molecules at 3300 K and 40.53 GPa (approximate conditions within the interior of Neptune and Uranus 3 ) was performed to obtain training data containing thousands of decomposition products. From the MD data, we trained a stochastic model consisting of elementary reactions and their corresponding reaction rate constants estimated using maximum likelihood. The stochastic KMC models were used to predict the time evolution of a chemical system independently initialized with 216 methane molecules under the same thermodynamic conditions. We close the loop by comparing the species concentration trajectories obtained by the KMC simulations and the MD simulation. Comparisons on the isobutane system represent a training error since the KMC model was trained from the isobutane MD data, while comparisons on the methane system represent a test error since the KMC model was not trained on the methane MD data. Our framework is outlined in Fig. 1. First, we use MD to simulate a high temperature, high pressure condensed phase system using ReaxFF 5 initialized with 64 molecules of isobutane (C4 H10 ). The MD simulation data contained thousands of decomposition products, including linear, branched, and cyclic hydrocarbons. We train our stochastic model with these data using methods developed in prior work. 21,22 We first use bond length and minimum bond duration criteria to find a set of elementary reactions that describes the chemical system of interest with optimal model complexity, and then use maximum likelihood to estimate the corresponding reaction rate constants. With the full set of reactions and reaction rates, we compare simulations of our stochastic model using the SSA with the molecular concentration trajectories observed from MD simulations. We first make this comparison for the training data initialized with 64 molecules of isobutane to obtain training error. We then compare the KMC simulation to a completely independent MD simulation initialized with 216 molecules of condensed phase methane (CH4 ) under the same thermodynamic conditions to obtain test error. These conditions are representative of the extreme environments 5
ACS Paragon Plus Environment
The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
in combustion and shock compression, and it would be extremely expensive to compute multiple reactive MD simulations for iterative studies of these complex chemical systems. Consequently, our results suggest that the reuse of existing MD data to build fast KMC models that can efficiently simulate related chemical systems has the potential to enable fast screening of reaction chemistry for combustion, catalysis, energetic materials, and other technologically and economically important applications.
Methods Molecular Dynamics Simulation Isobutane (C4 H10 ) and methane (CH4 ) under high temperature and high pressure were the two independent hydrocarbon systems used in this study. We used LAMMPS 23,24 to simulate a computational cell of 64 isobutane molecules at 3300 K and 40.53 GPa, the approximate thermodynamic conditions of gas giant planetary interiors, 3 for approximately 450 ps. We used the ReaxFF potential from Mattsson et al. 25 The atoms were initialized with random velocities and the computational cell was equilibrated at 111 K for 3 ps using a NVT thermostat (constant volume, constant temperature) with a temperature damping parameter of 100 time steps. Then the temperature and pressure were ramped to their final values over 50000 time steps of 0.12 fs each. A NPT thermostat (constant pressure, constant temperature) was used with a temperature damping parameter of 12 time steps and a pressure damping parameter of 500 time steps. The training data were obtained from a MD simulation using a time step of 0.12 fs and a NPT thermostat with a temperature damping parameter of 12 time steps and a pressure damping parameter of 120 time steps. We also simulated 216 methane molecules with the same parameters and potential field for approximately 550 ps. The first 10 ps of each MD simulation was discarded before parameter estimation to allow vibrational modes to equilibrate.
6
ACS Paragon Plus Environment
Page 6 of 25
Page 7 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
The Journal of Physical Chemistry
Bond Length and Minimum Bond Duration We first identified molecular species and chemical reactions from the isobutane-initialized MD simulation. Atoms are considered to be bonded if they are within a given threshold distance (bond length) for a given period of time (minimum bond duration). Similarly, bonded atoms are considered unbonded only if they stay beyond the bond length for at least one bond duration. The bond lengths we chose were taken from a previous work: 26 1.98 ˚ A for C-C bonds, 1.09 ˚ A for H-H bonds, and 1.57 ˚ A for C-H bonds. These were reported to be the location of the minima of the radial distribution functions between first and second nearest neighbors in a ReaxFF MD simulation of methane at similar temperatures and pressures. The choice for the minimum bond duration, τ , has important consequences on the resulting stochastic model of the chemical reaction network. 22 Namely, it is important to choose a minimum bond duration that is short enough for reactions to be considered elementary, i.e. a single energy barrier is crossed, but also long enough to avoid counting short atomic vibrations as true reactions. Consequently, different values of τ lead to different molecular concentrations, chemical reactions, and reaction rates. In this statistical model, we define a “molecular species” as any species that satisfies the given bond duration (τ ) and bond length criteria, and note that this definition may include molecules as well as transient species and collision complexes. Recent work suggests that this broader definition could truly be important in combustion and planetary environments. 27 Therefore, τ is best considered as a parameter that clusters all atoms in the system at a given time into groups that must participate in reaction pathways as a unit, where a group may be as small as a single H atom, or as large as a polymer. By increasing τ , we are simplifying the model by considering certain reaction pathways, defined by changes in these groups over time, not as independent reactions but instead as fluctuations within a species. Thus a series of reactions at a smaller τ might manifest itself as a single reaction at longer τ . Specifically, increasing τ decreases the complexity of the stochastic model by decreasing the number of reactions identified in the chemical reaction network. In this study, we studied a range of τ spanning three orders 7
ACS Paragon Plus Environment
The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 8 of 25
of magnitude from 0.012 ps to 1.440 ps to tune our stochastic models. We then chose the optimal τ for our model that minimized the relative error according to equation (1). q P T 1 T
RelErr =
q ES
1 T
t=1
PT
(XMD [t] − ES {XSSA [t]})2
t=1
(XSSA,i [t] − ES {XSSA [t]})
2
(1)
Here, X[t] represents the molecular concentration of a species of interest at timestep t, ES represents taking the average concentration of S simulations, and T is the total number of sampled time steps. The numerator reflects the root-mean-square error (RMSE) in molecular concentration between the single MD simulation and the mean of S stochastic simulations. The denominator represents a baseline error for the amount of fluctuation in a single simulation in the stochastic model by calculating the average RMSE between a single stochastic simulation and the average of all S stochastic simulations. We then calculate the `2 -norm over the three largest-population molecules according to equation (2) to establish a single error metric over a set of molecules of interest.
RelErr`2 =
qP
i
RelErr2i
(2)
To find our best model, the relative error is plotted as a function of τ in Fig. 2a, and the optimal choice from the training data minimizing RelErr`2 is indicated by the dashed black line, which we find corresponds to τ ∗ = 0.048 ps.
Chemical Master Equation and Parameter Estimation Next we give a quick review of the Gillespie Stochastic Simulation algorithm (SSA) 28,29 and our parameter estimation method developed in previous work. 22 The chemical master equation gives the probability at any time t of being in a given state. We can associate with every reaction a propensity function aj (x), defined such that aj (X(t))dt is the probability that reaction j will occur in the time interval [t, t+dt) given molecular concentrations X(t) at
8
ACS Paragon Plus Environment
Page 9 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
The Journal of Physical Chemistry
time t. This propensity function is proportional to a combinatorial argument of the reactant molecules, hj (X(t)), according to 29
hj (X(t)) ∝
Xm (t)
unimolecular reaction
Xm (t)Xm0 (t) bimolecular reaction 1 Xm (t)(Xm (t) − 1) dimerization reaction 2
where the constant of proportionality is known as the reaction rate coefficient, kj . Let nj (t, t + τ ) denote the number of times each reaction j occurs in the time interval [t, t+τ ). Assuming that our value for τ is small enough such that aj (X(t))dt is approximately constant in that interval, then we can approximate nj (t, t + τ ) by Poisson random variables that are conditionally independent given the molecular concentration X(t) at time t. In our parameter estimation method, 22 we then proceed using maximum likelihood estimation to obtain the following result for the estimated reaction rate coefficients kj : PN PT kj =
τ
i=1 t=1 P N PT i=1
nj,i (t, t + τ )
t=1
hj (Xi (t))
(3)
where the inner summation is across all T time steps and the outer summation is across N independent MD simulations under the same thermodynamic conditions. We use the system of reactions j and the corresponding reaction rate coefficients kj as input parameters to the SSA, following the “direct” method described by Gillespie. 7
Results and Discussion Stochastic Simulation Training and Test Data We first test the ability of our learned KMC model with nearly 10,000 reactions to successfully reproduce the dynamics of the chemical system it was trained on. Figure 2 shows the
9
ACS Paragon Plus Environment
The Journal of Physical Chemistry
Isobutane model training error
3
40
2 H2 CH4 C2H6
1.5
L2 norm
1
H2 molecules
Relative error
KMC vs MD training data
50
2.5
30 training MD = 0.012 = 0.048 = 0.288 avg 5 MD
20 10
0.5
0
10
=*
-2
10
-1
10
0
0
100
(a) KMC vs MD training data 50
10
40 30
training MD = 0.012 = 0.048 = 0.288 avg 5 MD
20 10 0 200
300
C2H6 molecules
12
100
300
400
(b)
60
0
200
Time (ps)
Bond duration = (ps)
CH4 molecules
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 10 of 25
KMC vs MD training data
8 6 training MD = 0.012 = 0.048 = 0.288 avg 5 MD
4 2 0
400
0
100
200
Time (ps)
Time (ps)
(c)
(d)
300
400
Figure 2: (a) Relative training error in the stochastic model (equation (1)) as a function of the minimum bond duration. The relative error is defined to be the deviation from the MD trajectory divided by the magnitude of fluctuations in the stochastic model so that values of one or less are ideal (shaded region). The dashed black line indicates the optimal τ (0.048 ps) used for the remainder of the study, selected as the point with lowest `2 -norm training error over the three largest-population molecules (equation (2)). (b)-(d) Mean SSA trajectories for three values of τ were plotted against the corresponding MD training data for the three largest-population decomposition product molecules (H2 , CH4 , C2 H6 ). In all figures, 30 separate stochastic simulations were averaged to form the mean SSA trajectory which exhibits smaller fluctuations and shows close agreement with the single MD trajectory (dotted gray). The black trajectory represents the average of 5 additional MD simulations performed for the first 100 ps only, with error bars representing 1 standard deviation. molecular concentration trajectories of H2 , CH4 , and C2 H6 from the SSA and MD simulation of the system initialized with 64 molecules of C4 H10 . These are the three dominant decompo-
10
ACS Paragon Plus Environment
Page 11 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
The Journal of Physical Chemistry
sition products with the highest average concentration across all time steps. The simulation results display reasonable agreement between our stochastic models and the corresponding MD trajectory, especially after approximate equilibrium is reached after ∼ 150 ps. It is important to note that our method can simulate the system over 450 ps using the SSA in just a few minutes on a laptop computer, whereas the corresponding MD simulation with the ReaxFF potential took several weeks on a high performance parallel machine. Thus our learned KMC model performs four orders of magnitude faster than traditional MD and still produces similar molecular concentration trajectories. The initial discrepancy in concentration trajectories could arise from the stochastic nature of the single MD simulation and the assumptions of our model. We performed five more MD simulations with the same initial conditions for 100 ps and averaged the five additional MD concentration trajectories to produce the black MD trajectories in Fig. 2. Particularly for CH4 in Fig. 2c, the average MD trajectory at short times is smoother and shows closer agreement to the initial average KMC trajectory. In addition, the KMC model relies on three important simplifying assumptions: (1) that the system is well-stirred, (2) that the reactions are elementary so that the reaction rate constants kj are constant over time, and (3) that the probabilities of reactions occurring are independent. Since these assumptions can only be satisfied approximately, the stochastic model will necessarily incur some error in comparison to MD. Nevertheless, we will show that the usefulness of our learned model can be demonstrated by its ability to extrapolate to different chemical systems. Finally, we note that the five 100 ps MD simulations were for comparison only and no KMC parameters were trained from that data. One might expect a good stochastic model to successfully reproduce the MD data it was trained from. Next, we explore the potential for our stochastic model to reproduce the chemical evolution of systems with completely different initial chemical composition. Using the same set of observed reactions and estimated reaction rate coefficients from the isobutane-trained model, we used the SSA to simulate a system with an initial concentra-
11
ACS Paragon Plus Environment
The Journal of Physical Chemistry
Isobutane model test error H (training)
C H (training)
H2 (test) CH4 (training) CH4 (test)
C2H6 (test)
3
2
6
2
L norm (test)
2 1
number of molecules
2
4
Relative error
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 12 of 25
KMC simulation of initially 216 CH 4 molecules, using KMC model trained from MD simulation of initially 64 C4H10 molecules (runtime: minutes)
200
150 C4 H1 0 CH4 H2 C2 H6
100
MD simulation of initially 216 CH 4 molecules (runtime: weeks)
50
0 10
-2
10
-1
10
0
0
0
= (ps)
100
200
300
400
time (ps)
(a)
(b)
Figure 3: (a) Relative test error in the stochastic models as a function of the minimum bond duration, with values of one or less being ideal (shaded region). Test error (solid lines) is compared with baseline training error from a methane-trained model (dashed lines). The optimal value of τ found during training on the isobutane MD data (dashed black line) is close to the minimum `2 -norm test error when the model is compared with the methane MD test data. The extrapolation performance of the model is relatively insensitive to the choice of τ within a broad optimal range. (b) Extrapolation from a stochastic model trained on an isobutane-initialized MD system and tested against a methane-initialized MD system. 30 separate stochastic simulations were averaged to form the mean SSA trajectory (dark solid lines) which exhibits smaller fluctuations and shows close agreement with the MD trajectory (light dotted lines). tion of 216 CH4 molecules. This simple adjustment of the initial molecular concentration did not require any additional MD simulations or parameter estimation steps, allowing us to perform a single simulation in less than two minutes on a laptop computer. We then compared the resulting concentration trajectories of H2 , CH4 , and C2 H6 from the SSA with the corresponding molecular concentration trajectories from an independent MD simulation initialized with 216 CH4 molecules, and plotted the error in Fig. 3. The SSA simulations were able to achieve close agreement with the MD test data. Our stochastic models successfully predicted the time evolution of H2 , CH4 , and C2 H6 in the methane-initialized test system, demonstrating transferability from the isobutane-initialized system they were trained on.
12
ACS Paragon Plus Environment
Page 13 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
The Journal of Physical Chemistry
Error Analysis For the training data, we compared the root-mean-square error between the mean SSA trajectory and the MD data against the average magnitude of fluctuation in the stochastic simulations (see Methods). We use this unitless relative error metric (equation (1)) to account for the fact that the inherent fluctuation in the molecular concentrations in any single simulation (MD or SSA) can be large, and we only have a single MD simulation. We plot this relative error as a function of τ in Fig. 2a. We consider a pair of atoms to be bonded if their spatial separation is less than a bond distance criterion for a time duration longer than τ , which we take to be an adjustable parameter of the model. Intuitively, if the minimum bond duration τ is too small, the model is likely to overfit to noise and erroneously capture atomic fluctuations as reactions. On the other hand, if τ is too large, the model may smooth out the signal too much and miss elementary reactions. When τ is too large, elementary reactions are combined into non-elementary reactions that do not obey the statistics assumed by the SSA. Thus the minimum bond duration is a principal optimization step for constructing accurate stochastic models. To choose an optimal bond duration τ ∗ , we calculated the `2 -norm of the relative error for CH4 , H2 , and C2 H6 according to equation (2) and found its minimum value, which corresponds to τ = 0.048 ps, a physically reasonable value corresponding to several atomic vibrations. We use this optimal τ ∗ to construct our stochastic model with 9799 total reactions for subsequent testing. We also computed the relative error for each of our stochastic models trained from the isobutane-initialized MD but tested against the methane-initialized MD. We compared this to the relative error for a separate stochastic model consisting of reactions and estimated reaction rates directly trained from the methane-initialized MD (Fig. 3). As before, the black dashed line in this figure indicates τ ∗ = 0.048 ps. The relative error from a methanetrained model (colored dashed lines) serves as a baseline for comparison with the test error using the isobutane-trained model. Figure 3 shows that the `2 -norm test error over the three largest-population molecules for our optimal stochastic model with τ ∗ = 0.048 ps was among 13
ACS Paragon Plus Environment
The Journal of Physical Chemistry
the lowest out of all the stochastic models over varying τ . While the error grows for larger values of τ approaching 1 ps, the growth may be moderated by the significantly longer average lifetimes of dominant species in the range of several to tens of picoseconds. Furthermore, the weak dependence of the stochastic model on τ in the studied range is an interesting finding in and of itself. It demonstrates that, for the given system and thermodynamic conditions, many possibly independent reaction pathways can be eliminated without introducing a large error in the dynamics of the key species. At the same time, many reaction pathways involving what have traditionally been considered transient species and collision complexes are still important. For example, we note that while we find approximately 10,000 reactions for the optimal τ = 0.048 ps, we still identify over 7,000 reactions at τ = 0.96 ps. 15
8
Number of clusters
Number of clusters
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 14 of 25
10
5
training MD single KMC average KMC
0
test MD single KMC average KMC
6 4 2 0
0
100
200
300
400
0
100
Time (ps)
200
300
400
Time (ps)
(a)
(b)
Figure 4: Molecular concentration trajectories of all molecules with more than 5 carbon atoms, comparing KMC simulation using the optimal stochastic model with (a) the isobutane-initialized MD training data and (b) the methane-initialized MD test data. Both simulations capture the approximate rate of growth of large carbon clusters in their respective hydrocarbon system.
Large Carbon Clusters In addition to the largest-population molecules of interest, we also tracked the evolution of large carbon clusters, defined here as any molecule with more than five carbon atoms. These 14
ACS Paragon Plus Environment
Page 15 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
The Journal of Physical Chemistry
clusters formed as side products of reactions and in some cases grew to include over 400 atoms, which comprise almost 50% of all atoms in our isobutane-initialized computational cell. As seen in Fig. 4, the training and test simulations both show qualitative agreement between the simulated cluster count trajectory and the cluster counts obtained from the underlying MD data. The model performed best in the first 100 ps, largely capturing the onset and rate of growth in the number of clusters in both the training and test systems. These plots show that our stochastic models are able to successfully model not only the dynamics of the largest-population molecules, but also less frequently observed molecules of interest, such as large carbon clusters. It is possible to speculate that improvements in the reaction descriptors have the potential to improve the accuracy of these models, particularly for larger clusters.
Comparison of Reaction Networks Finally, we address the overarching goal of training a stochastic model capable of generalizing to different chemical systems. The system initialized with 64 molecules of C4 H10 and the system initialized with 216 molecules of CH4 differ not only in their initial states but also in their hydrogen to carbon atom ratios. We use Principal Components Analysis (PCA) to visualize the different regions in chemical space traversed by these two systems as they evolve towards equilibrium. 30 Figure 5a displays the MD trajectories of both systems projected onto the first and second principal components (PCs) in chemical space of the methane-initialized system. The PCs correspond to the directions of largest variance in the data, where each data point is a vector of molecular populations at a given timestep in the MD trajectory. The methane-initialized system is shown in blue while the isobutane-initialized system is shown in red; the arrows and color gradients indicate the evolution of the systems over time. We can see that although the systems seem to be converging, they in fact do not overlap in chemical space at all within this representation. To understand why it might be possible to extrapolate from one system to another 15
ACS Paragon Plus Environment
The Journal of Physical Chemistry
Similarity Between Models: Number of Reactions Optimal reduced methane-trained model
208
Optimal reduced isobutane-trained model
324
490
Full isobutane-trained model: 9799
(a)
(b) Counts 250 200
0
150
-5
100 50
200
Number of molecules
5
log(k), methane model
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 16 of 25
MD H 2 CH 4 C2H6
150 100 50 0
-10 -10
-5
0
5
0
log(k), isobutane model
100
200
300
400
Time (ps)
(c)
(d)
Figure 5: (a) Principal Components Analysis (PCA) in molecular concentration space of the MD data to visualize their time evolution (indicated by arrows). The isobutane-initialized MD data (red, vertical) are projected onto the first and second principal components of the methane-initialized MD data (blue). We observed no overlap of the two systems. (b) A Venn diagram comparing the reactions from the best reduced models trained from each set of MD data reveals 324 reactions in common between the two reduced models. (c) A loglog plot comparing the 324 estimated reaction rate constants kj from both reduced models. Darker points indicate reactions with higher observed counts in the MD simulations, and the overlaid y = x line indicates that the estimated kj values for these 324 reactions were similar despite being derived from separate MD simulations. (d) SSA trajectories simulating the methane-initialized test system using only the 324 shared reactions and their estimated rate constants.
16
ACS Paragon Plus Environment
Page 17 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
The Journal of Physical Chemistry
that does not significantly overlap in chemical space, we study the set of reactions that are important to each system. We reduce our optimal stochastic model by eliminating the most infrequent reactions using one of the model reduction methods discussed in a previous work. 22 Briefly, we apply a count-based threshold f and eliminate the reactions from our model that occur less than f times by setting the corresponding reaction rate constant to 0. Primarily, we use this reduction method to identify a core subset of reactions that govern the dynamics of important molecules, taken here to be the ones with highest average concentration (H2 , CH4 , C2 H6 ). Reduction allows us to focus on the salient reactions and reaction rates that affect our species of interest instead of analyzing all 9799 elementary reactions. Ultimately, we found that over 91% of the reactions can be eliminated from the full reaction network without increasing the relative error in modeling isobutane decomposition, leaving just 814 reactions in the reduced stochastic model. Figure 5b shows the similarity between our best reduced isobutane-trained model (with 814 reactions) and the best reduced methane-trained model (with 532 out of 2024 reactions) constructed in a previous work. 22 The diagram reveals a significant overlap of 324 reactions in common between the reaction networks of both stochastic models. This overlap of reactions suggests that our two kinetic models used a similar set of features for predicting their respective system’s dynamics. We then compare the reaction rate constants kj of the 324 shared reactions identified in both models above using a log-log plot (Fig. 5c). The close proximity to the line with unity slope indicates that the estimated reaction rate constants for the same reaction are very similar in the two models, despite being derived from two distinct chemical systems. Darker points indicate reactions with a higher number of occurrences and generally fall closer to the dashed line. This result agrees with the fact that our parameter estimation method gives better estimates for reactions that are more frequently observed. The similarity in the values of kj across two different chemical systems suggests that the data are reasonably consistent with our assumption that the observed reactions in the optimal models are elementary and therefore have a single characteristic rate constant.
17
ACS Paragon Plus Environment
The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Intuition for why chemical extrapolation of these kinetic models can work is also discussed in Yang et al. 31 In Fig. 5d, we use only these 324 shared reactions and their rate constants kj (trained from the isobutane-initialized MD) to simulate the dynamics of the methane-initialized test system, and we show the resulting trajectories for H2 , CH4 , and C2 H6 compared with the MD test data. There is close agreement for all three species and the `2 -norm of the relative errors (2.38 molecules) computed with equation (2) is similar to that of the full stochastic model (2.36 molecules). This result is revealing about the reason why we were able to extrapolate from the C4 H10 system to the CH4 system: the set of elementary reactions that are good descriptors for the CH4 system is also important for the C4 H10 system, and the estimated rate constants are similar when estimated from either system.
Extrapolating from Methane-initialized MD Using this methodology, we find that one can also perform extrapolation in the reverse direction; that is, training a kinetic model on methane-initialized MD data and testing against isobutane-initialized MD data. Fig. 6 shows that we can use the set of reactions and estimated reaction rates learned from the methane-initialized MD data to predict the concentration trajectories of dominant species in the isobutane-initialized system. We observe that the KMC trajectories reach approximately the same equilibrium concentrations as the MD simulations. It might be surprising that extrapolation from methane to isobutane works at all because one might not expect to observe reactions involving isobutane during methane decomposition. In fact, such reactions do occur in small numbers over the course of the methane decomposition MD simulation. While the largest-population decomposition products are most commonly studied in such simulations, this extrapolation study suggests that there is valuable information about other species provided by the fluctuations of low-population species. 18
ACS Paragon Plus Environment
Page 18 of 25
Page 19 of 25
60 50
number of molecules
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
The Journal of Physical Chemistry
40 30 20 10 0 0
100
200
300
400
500
time (ps)
Figure 6: Extrapolation from a stochastic model trained on a methane-initialized MD system to an isobutane-initialized MD system. The dark solid lines represent concentration trajectories from the SSA and the light dotted lines are MD concentration trajectories for the corresponding molecule. There is good agreement between the two simulations for the concentrations of the dominant decomposition products after several hundred picoseconds. The initial rates of isobutane decomposition and of growth in the other molecules differ by an approximate factor of two. A factor of two variation in the initial chemical decomposition rate could result from reduced statistics for important reactions that are relatively rare in the training MD. Also, it is known that third-body effects can change the reaction rate constant by a factor of 2 or more 32 for some reactions. Third-body effects are not explicitly encoded in our elementary reaction features but are rather implicitly included in our estimated reaction rates, and thus could be a source of some discrepancies when extrapolating from one system to another.
Conclusions The results presented in this paper suggest that a KMC model can be trained from MD data to capture the principal dynamics of the system chemistry. This stochastic model can then efficiently simulate a range of chemical systems starting from different initial compositions. We first optimize our stochastic model by varying the minimum bond duration,
19
ACS Paragon Plus Environment
The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
τ . We show that this procedure is able to produce a KMC model that can extrapolate to a different hydrocarbon system that evolves in a completely non-overlapping region of chemical space. Extrapolation is possible not only for the largest-population species, but also for less frequently observed but important species such as large carbon clusters. We develop the intuition that this extrapolation works well when the chemical system used to train the stochastic model contains the set of elementary reactions that are important for the dynamics of the system being extrapolated to. Since simulation of the KMC model is four orders of magnitude faster than conventional MD, this methodology has the potential to save an enormous amount of computational expense. This method unlocks the possibility of taking an archive of existing MD simulations and learning the dynamics to simulate related, complex chemical systems. We anticipate that the use of existing MD data for building fast KMC models will enable efficient chemical screening for combustion, catalysis, and materials chemistry, among other technologically and economically important applications.
Acknowledgement This material is based upon work supported by the Vice Provost for Undergraduate Education REU Program at Stanford University and the Department of Energy National Nuclear Security Administration under Award Number DE-NA0002007. This work was also partially supported by NSF grant DMR-1455050. The authors thank E. D. Cubuk for helpful comments and discussions.
Competing Financial Interests The authors declare no competing financial interests.
20
ACS Paragon Plus Environment
Page 20 of 25
Page 21 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
The Journal of Physical Chemistry
References (1) Harper, M. R.; Van Geem, K. M.; Pyl, S. P.; Marin, G. B.; Green, W. H. Comprehensive Reaction Mechanism for n-Butanol Pyrolysis and Combustion. Combust. Flame 2011, 158, 16–41. (2) Wang, L.-P.; Titov, A.; McGibbon, R.; Liu, F.; Pande, V. S.; Mart´ınez, T. J. Discovering Chemistry with an Ab Initio Nanoreactor. Nat Chem 2014, 6, 1044–1048. (3) Kraus, D.; Vorberger, J.; Pak, A.; Hartley, N. J.; Fletcher, L. B.; Frydrych, S.; Galtier, E.; Gamboa, E. J.; Gericke, D. O.; Glenzer, S. H. et al. Formation of Diamonds in Laser-Compressed Hydrocarbons at Planetary Interior Conditions. Nature Astronomy 2017, 1, 606–611. (4) Ross, M. The Ice Layer in Uranus and Neptune—Diamonds in the Sky? Nature 1981, 292, 435–436. (5) van Duin, A. C. T.; Dasgupta, S.; Lorant, F.; Goddard, W. A. ReaxFF: A Reactive Force Field for Hydrocarbons. J. Phys. Chem. A 2001, 105, 9396–9409. (6) Aktulga, H. M.; Knight, C.; Coffman, P.; O’Hearn, K. A.; Shan, T.-R.; W., J. Optimizing the Performance of Reactive Molecular Dynamics Simulations for Many-Core Architectures. Int. J. High Perform. C. 2018, 0, 1–18. (7) Gillespie, D. T. A General Method for Numerically Simulating the Stochastic Time Evolution of Coupled Chemical Reactions. J. Comput. Phys. 1976, 22, 403–434. (8) Uberuaga, B. P.; Voter, A. F. In Radiation Effects in Solids; Sickafus, K. E., Kotomin, E. A., Uberuaga, B. P., Eds.; Springer Netherlands: Dordrecht, 2007; pp 25–43. (9) Voter, A. F. Hyperdynamics: Accelerated Molecular Dynamics of Infrequent Events. Phys. Rev. Lett. 1997, 78, 3908–3911.
21
ACS Paragon Plus Environment
The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
(10) Hamelberg, D.; Mongan, J.; McCammon, J. A. Accelerated Molecular Dynamics: A Promising and Efficient Simulation Method for Biomolecules. J. Chem. Phys. 2004, 120, 11919–11929. (11) Pan, A. C.; Weinreich, T. M.; Piana, S.; Shaw, D. E. Demonstrating an Order-ofMagnitude Sampling Enhancement in Molecular Dynamics Simulations of Complex Protein Systems. J. Chem. Theory Comput. 2016, 12, 1360–1367. (12) Voter, A. F. Parallel Replica Method for Dynamics of Infrequent Events. Phys. Rev. B 1998, 57, R13985. (13) Violi, A.; Sarofim, A. F.; Voth, G. A. Kinetic Monte Carlo-Molecular Dynamics Approach to Model Soot Inception. Combust. Sci. Technol. 2004, 176, 991–1005. (14) Kabbe, G.; Wehmeyer, C.; Sebastiani, D. A Coupled Molecular Dynamics/Kinetic Monte Carlo Approach for Protonation Dynamics in Extended Systems. J. Chem. Theory Comput. 2014, 10, 4221–4228. (15) Botu, V.; Ramprasad, R. Adaptive Machine Learning Framework to Accelerate Ab Initio Molecular Dynamics. Int. J. Quantum Chem. 2015, 115, 1074–1083. (16) Liu, W.; Schmidt, B.; Voss, G.; M¨ uller-Wittig, W. Accelerating Molecular Dynamics Simulations Using Graphics Processing Units with CUDA. Comput. Phys. Comm. 2008, 179, 634–641. (17) Shaw, D. E.; Deneroff, M. M.; Dror, R. O.; Kuskin, J. S.; Larson, R. H.; Salmon, J. K.; Young, C.; Batson, B.; Bowers, K. J.; Chao, J. C. et al. Anton, A Special-Purpose Machine for Molecular Dynamics Simulation. Commun. ACM 2008, 51, 91–97. (18) Voter, A. F. In Radiation Effects in Solids; Sickafus, K. E., Kotomin, E. A., Uberuaga, B. P., Eds.; Springer Netherlands: Dordrecht, 2007; pp 1–23.
22
ACS Paragon Plus Environment
Page 22 of 25
Page 23 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
The Journal of Physical Chemistry
(19) Battaile, C. C.; Srolovitz, D. J. Kinetic Monte Carlo Simulation of Chemical Vapor Deposition. Annu. Rev. Mater. Res. 2002, 32, 297–319. (20) Wang, L.-P.; McGibbon, R. T.; Pande, V. S.; Martinez, T. J. Automated Discovery and Refinement of Reactive Molecular Dynamics Pathways. J. Chem. Theory Comput. 2016, 12, 638–649. (21) Yang, Q.; Sing-Long, C. A.; Reed, E. J. L1 Regularization-Based Model Reduction of Complex Chemistry Molecular Dynamics for Statistical Learning of Kinetic Monte Carlo Models. MRS Adv. 2016, 1, 1767–1772. (22) Yang, Q.; Sing-Long, C. A.; Reed, E. J. Learning Reduced Kinetic Monte Carlo Models of Complex Chemistry from Molecular Dynamics. Chem. Sci. 2017, 8, 5781–5796. (23) Plimpton, S. Fast Parallel Algorithms for Short-Range Molecular Dynamics. J. Comput. Phys. 1995, 117, 1–19. (24) LAMMPS Molecular Dynamics Simulator. http://lammps.sandia.gov, 1995. (25) Mattsson, T. R.; Lane, M. D.; Cochrane, K. R.; Desjarlais, M. P.; Thompson, A. P.; Pierce, F.; Grest, G. S. First-Principles and Classical Molecular Dynamics Simulation of Shocked Polymers. Phys. Rev. B 2010, 81, 054103. (26) Qi, T.; Reed, E. J. Simulations of Shocked Methane Including Self-Consistent Semiclassical Quantum Nuclear Effects. J. Phys. Chem. A 2012, 116, 10451–10459. (27) Burke, M. P.; Klippenstein, S. J. Ephemeral Collision Complexes Mediate Chemically Termolecular Transformations That Affect System Chemistry. Nat Chem 2017, 9, 1078–1082. (28) Gillespie, D. T. Approximate Accelerated Stochastic Simulation of Chemically Reacting Systems. J. Chem. Phys. 2001, 115, 1716–1733.
23
ACS Paragon Plus Environment
The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
(29) Higham, D. J. Modeling and Simulating Chemical Reactions. SIAM Rev. 2008, 50, 347–368. (30) Hess, B. Similarities Between Principal Components of Protein Dynamics and Random Diffusion. Phys. Rev. E 2000, 62, 8438–8448. (31) Yang, Q.; Sing-Long, C. A.; Chen, E.; Reed, E. J. In Computational Approaches for Chemistry Under Extreme Conditions; Goldman, N., Ed.; Springer International Publishing: Basel, 2019; p 28. (32) Tur´anyi, T.; Tomlin, A. S. Analysis of Kinetic Reaction Mechanisms; Springer-Verlag: Berlin Heidelberg, 2014.
24
ACS Paragon Plus Environment
Page 24 of 25
Page 25 of 25
TOC Graphic 200 180 160 140
4
CH molecules
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
The Journal of Physical Chemistry
MD
KMC
120 100 0
100
200
300
Time (ps)
25
ACS Paragon Plus Environment
400