Performance of ACE-Reaction on 26 Organic Reactions for Fully

finds out the most essential part of reaction networks just from reactants and products ... analysis makes possible to find reaction pathways very eff...
1 downloads 0 Views 2MB Size
Subscriber access provided by UNIV OF LOUISIANA

Article

Performance of ACE-Reaction on 26 Organic Reactions for Fully Automated Reaction Network Construction and Microkinetic Analysis Jin Woo Kim, Yeonjoon Kim, Kyung Yup Baek, Kyunghoon Lee, and Woo Youn Kim J. Phys. Chem. A, Just Accepted Manuscript • DOI: 10.1021/acs.jpca.9b02161 • Publication Date (Web): 10 May 2019 Downloaded from http://pubs.acs.org on May 11, 2019

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 46 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Performance of ACE-Reaction on 26 Organic Reactions for Fully Automated Reaction Network Construction and Microkinetic Analysis Jin Woo Kim§†, Yeonjoon Kim§†#, Kyung Yup Baek†, Kyunghoon Lee†, and Woo Youn Kim*†‡ †Department of Chemistry, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Korea ‡KAIST Institute for Artificial Intelligence, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Korea

ACS Paragon Plus Environment

1

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 46

Abstract

Accurate analysis of complex chemical reaction networks is necessary for reliable prediction of reaction mechanism. Though quantum chemical methods provide a desirable accuracy, large computational costs are unavoidable as considering numerous reaction pathways on the networks. We proposed a graph-theoretic approach combined with chemical heuristics (named as ACE-Reaction) in the previous work [Kim, Y.; Kim, J. W.; Kim, Z.; Kim, W. Y. Chem. Sci. 2018, 9, 825-835.], which automatically and rapidly finds out the most essential part of reaction networks just from reactants and products, and here we extended it by incorporating a stochastic approach for microkinetic modeling. To show its performance and broad applicability, we applied it to 26 organic reactions which include 16 frequently appeared functional groups. As a result, we could demonstrate that ACE-Reaction successfully found the accepted mechanism of all reactions most within a few hours on a single workstation, and additional microkinetic modeling automatically discovered new competitive paths as well as a major path.

ACS Paragon Plus Environment

2

Page 3 of 46 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Introduction Elucidation of reaction mechanisms provides deeper understanding of chemical reactions. To study mechanisms at the atomistic level, computational approaches based on quantum chemistry have attracted great attention as a powerful tool. In particular, density functional theory (DFT) was extensively utilized to obtain energy profiles along a given reaction pathway thanks to its reasonable accuracy with low computational costs.1–3 For prediction of reaction mechanism, it is necessary to explore numerous possible reaction pathways and obtain their accurate energy profiles for comparison. However, it is computationally too demanding to perform DFT calculations for every pathway.4,5 Therefore, most computer-aided mechanism studies have been limited to supporting experiments by considering only a few reaction pathways proposed by chemical intuition and/or experimental evidences. In this regard, there is a growing interest in the development of automated methods for exploring multiple reaction pathways with a high efficiency. The new methods include various types such as knowledge-based, reaction dynamics, and graph-theoretic methods.6,7 The knowledge-based methods utilize pre-defined8,9 data-driven10 reaction rules, and chemical concepts such as reactivity11–13. They were successfully applied to several reaction examples, however, chemical concepts and databases specified in a certain set of reactions can limit a broader applicability. In the reaction dynamics methods, high-energy dynamics simulations are

ACS Paragon Plus Environment

3

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 46

carried out,14–18 or the constrained reaction Hamiltonian is formulated19,20 to discover physically relevant paths by inducing reactions between molecules. Molecular dynamics can be also accelerated by applying an artificial force between reactive molecules.21,22 Prediction using reaction dynamics is free from reaction-specific rules, but it is challenging for these approaches to cope with growing computational costs of quantum-chemical calculations as the size of systems increases. Therefore, it is of importance to ensure a balance between reliability and efficiency in developing automated reaction prediction methods. To that end, the graph-theoretic approach is particularly useful, because molecular graph analysis makes possible to find reaction pathways very efficiently and yet reliably by using chemical heuristics while minimizing expensive quantum chemical calculations.19,20,23–27 Each method uses a unique idea, but overall they share the following strategy. First, enumeration of molecular graphs can generate chemically feasible reaction intermediates which can be regarded as vertices in a reaction network. The resulting intermediates are connected with each other to make elementary reactions corresponding to edges in the network. Then, an activation energy of each elementary reaction can be obtained from transition state (TS) calculations. Finally, an appropriate analysis of the network determines most favorable reaction pathways. We have also devised a graph-theoretic approach which is called ACE-Reaction. We reported the technical details of ACE-Reaction in ref 28. It follows a similar graph enumeration strategy with those of Suleimanov’s and Zimmerman’s methods. The two methods carry out TS calculations at each enumeration step and thus rule out reaction paths with high energy barrier on the fly. The distinctive feature of ACE-Reaction is to first construct a reaction network without TS calculations and then extract an essential part of the reaction network using the concept of “principle of minimum chemical distance”, which states that most chemical reactions proceed

ACS Paragon Plus Environment

4

Page 5 of 46 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

along a pathway with the minimum dissociation and formation of bonds.29,30 The extracted subnetwork is small enough to obtain accurate activation energies for all elementary reactions in that with affordable computational cost. Thus, it can be very fast and yet reliable. For example, we have shown that without transition state calculations ACE-Reaction could successfully find out the accepted mechanisms of the Claisen ester condensation and the cobalt-catalyzed hydroformylation reactions within one hour on a single workstation having 16 Intel Xeon cores.28 However, applications of those graph-theoretical methods including ours have been limited to a few example reactions. To show their broad applicability, hence, it is necessary to carry out more extensive studies with various types of organic reactions. Here, we assessed performance of ACE-Reaction for 26 organic reactions including most common functional groups composed of C, H, O, N, S, and P atoms. The list of the reactions is in Table 1. For each reaction, we analyzed the efficiency of our method in terms of the size of a reaction network extracted for transition state calculations. Then, we measured its accuracy by checking out whether or not the resulting network includes accepted mechanisms. This is particularly important to achieve both efficiency and accuracy, because transition state calculations are the most time-consuming part in many cases. As an extension of our previous work, we also implemented a new feature called the microkinetic modeling; rate constants of all elementary reactions in the network are first obtained using transition state calculations and the Eyring-Polanyi equation, and then differential rate equations are solved. It gives a reaction progress profile of concentrations of all species in the network and eventually the most plausible reaction pathway on the network.31 It was also employed in other works to validate reliability of predicted reaction pathways.15,16,20 In this work, we applied the microkinetic modeling to the benzoin condensation reaction as an example and compared the result with the previous works.

ACS Paragon Plus Environment

5

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 46

In what follows, we first describe the procedure of ACE-Reaction and computation details. Then, we show the performance of ACE-Reaction for 26 organic reactions and the result of microkinetic study on the benzoin condensation reaction. Finally, we draw conclusions with future outlook.

Methods 2.1 Workflow of ACE-Reaction

Figure 1. Schematic workflow of ACE-Reaction. AC stands for adjacency matrix, 𝐸𝑅, 𝐸𝑡𝑜𝑙, and 𝐸𝑃𝑀6 denote the computed energy of reactants, the tolerance energy for energy-wise screening of intermediates, and the computed energy at the PM6 level, respectively.

ACS Paragon Plus Environment

6

Page 7 of 46 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Figure 1 shows a schematic workflow of ACE-Reaction. It is composed of four consecutive steps. Details of the first three steps can be found in ref 28. Here we briefly summarize them and explain the last step with more details. Step 1 aims to sample chemically possible intermediates. We enumerate the molecular graph of reactants represented in terms of atom connectivity (AC). The newly generated molecular graphs are subjected to an on-the-fly 3D structural conversion and screening process to sample intermediates. Each AC matrix is converted to the corresponding 3D geometry including possible conformers at a molecular mechanics level accuracy.32 In this process, inappropriate molecular graphs are removed according to chemical heuristics such as atomic valences and formal charges. Then, the remaining molecules are further optimized using semi-empirical methods such as PM633 and DFTB34. Molecules whose connectivities are changed during optimization are also excluded. The remaining ones are regarded as chemically plausible intermediates and hence used as new starting molecular graphs in the next enumeration cycle. This is repeated until no more new molecular graphs are generated or it reaches the maximum number of cycles given in the input. The former condition was used in this work. Noting that in most organic reactions only a few atoms are directly involved in bond dissociation and formation during propagation, we select a set of namely the active atoms from reactants as an input. Only those are subjected to the graph enumeration to avoid combinatorial explosion. In Step 2, the reaction network is constructed via graph analysis. Before making connections between intermediates, we further screen out energetically unstable molecules. We remove molecules with energies above a threshold that is given by the sum of reactants energy (𝐸𝑅) and a tolerance energy (𝐸𝑡𝑜𝑙). Despite such screening processes with heuristics and energetics, there would be still too many intermediates remained. Apparently, a small fraction of them will form

ACS Paragon Plus Environment

7

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 46

the most probable reaction pathways. To remove irrelevant molecules, we invoke the principle of minimum chemical distance (𝐶𝐷).29,30 The chemical distance is defined as the sum of the minimum numbers of bond dissociations and formations (𝑁𝑏𝑟𝑜𝑘𝑒𝑛 and 𝑁𝑓𝑜𝑟𝑚𝑒𝑑, respectively) necessary for transformation between two intermediates 𝐼𝑖 and 𝐼𝑗 as follows. 𝐶𝐷(𝐼𝑖, 𝐼𝑗) = 𝑁𝑏𝑟𝑜𝑘𝑒𝑛 + 𝑁𝑓𝑜𝑟𝑚𝑒𝑑 (1) Since reactants and products are given as an input, one can measure the 𝐶𝐷s of an intermediate from reactants and products, respectively. If its distances are too far, it is expected that the intermediate is unlikely to appear in a favorable reaction pathway according to the principle of minimum chemical distance. This idea can be implemented as follows: 𝐶𝐷(𝑅, 𝐼) +𝐶𝐷(𝐼,𝑃) ≤ 𝐶𝐷(𝑅,𝑃) + Δ, (2) where the so-called digression factor Δ denotes a parameter that controls the tightness of the criterion. It should be noted that eq 2 with equal sign gives an ellipse on a chemical space whose focal points are the reactants and products. The radius of the ellipse is adjusted by the digression factor Δ. Molecules inside the ellipse are relatively close to the reactants and products in terms of 𝐶𝐷. Therefore, only those molecules are used to form nodes in a reaction network. Then, any two nodes satisfying the following criteria are connected as an elementary reaction, resulting in a reaction sub-network that includes most probable intermediates. 𝑁𝑏𝑟𝑜𝑘𝑒𝑛 ≤ 2 and 𝑁𝑓𝑜𝑟𝑚𝑒𝑑 ≤ 2 (3) To determine most favorable reaction pathways within the network, one can perform DFT calculations to compare the energy profiles of all possible pathways. However, large computational costs are inevitable if the network contains a number of elementary reactions. Step 3 is intended to extract the smallest reaction network that can be deduced without quantum chemical calculations. According to the principle of minimum chemical distance, chemical

ACS Paragon Plus Environment

8

Page 9 of 46 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

reactions favor the shortest pathway in terms of 𝐶𝐷. Thus, we find out the shortest pathway passing through each intermediate in the network using the Dijkstra algorithm35 and combine those of all intermediates to make a new network which we call the minimal reaction network. If necessary, the 2nd and 3rd shortest paths for each intermediate can be also considered by employing the Yen’s algorithm36. In Step 4, we finally determine the kinetically most favorable path in the minimal reaction network. For rigorous analysis, we first obtain the activation energy of each elementary reaction via transition state calculations at the DFT level accuracy. The details of the transition state calculations are described in the following section (Computational Details). The resulting activation energy is used to obtain the corresponding rate constant using the Eyring-Polanyi equation37 ‡

𝑘=

𝐾𝐵𝑇 ― Δ𝐺 𝑅𝑇 ℎ 𝑒

, (4)

where 𝑘 is the rate constant, 𝑘𝐵 is the Boltzmann constant, 𝑇 is temperature, ℎ is the Planck’s constant, 𝑅 is the gas constant, and Δ𝐺 ‡ is the Gibbs activation energy. The microkinetic analysis with the rate constants is performed using the Gillespie’s stochastic simulation algorithm (SSA).38–40 It assumes that the number of elastic (non-reactive) collisions is much greater than the number of inelastic (reactive) collisions. As a corollary, the location of chemical species is uniformly distributed in the reactor and the velocity of them follows the MaxwellBoltzmann distribution. The direct SSA method updates change in the number of each chemical species in a virtual reactor according to a selected elementary reaction at each time step. The selection of elementary reaction is done stochastically by considering the size of the rate constants. As a result, we obtain the reaction progress profiles of all molecules in the minimal reaction network. Subsequently, we can determine the most kinetically favorable reaction

ACS Paragon Plus Environment

9

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 46

pathway in addition to major/minor products. Previous works show that the SSA was able to reproduce experimental results on organic reactions.15,16,20 2.2 Computational Details Since graph enumeration can produce an intractable number of new molecular graphs, we adopt the screening processes in Step 1 and 2 as described above. The geometry optimization after the 3D conversion has been done by using the PM6 method as implemented in Gaussian 09.41 The energy-wise screening efficiency in Step 2 is sensitive to 𝐸𝑡𝑜𝑙. If it is too high, too many intermediates will remain, but if it is too low, important ones may also be screened out. In this work, we used 𝐸𝑡𝑜𝑙 of 70 kcal/mol for all reactions in consideration of the inaccuracy of PM6 energy and the upper boundary of activation energy of organic reactions at room temperature (~25 kcal/mol). The digression factor Δ was set to 4. For transition state calculations, we used B3LYP/6-31+g(d,p) as implemented in Gaussian 09. The CPCM method was employed for solvation at 298.15 K and 1 atm.

ACS Paragon Plus Environment

10

Page 11 of 46 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Figure 2. (a) 16 representative functional groups involved in the 26 organic reactions studied in this work. (b) Hypothetical functional groups used to screen out intermediates during graph enumeration. To show broad applicability of ACE-Reaction, we applied it to 26 organic reactions of which mechanisms and products are well-known. Those reactions were selected because they involve 16 functional groups displayed in Figure 2(a) that frequently appear in organic reactions. There would be hypothetical functional groups which induce instability of a given molecule and so unlikely to appear in reactions under ambient condition. Figure 2(b) shows such examples. Graph enumeration can generate molecules with the unstable hypothetical functional groups, which is apparently waste of computation time. In this regard, we excluded molecules with the functional groups in Figure 2(b). Diradical molecules including carbene were also regarded as unstable. For further screening, we applied the following rules to all reactions. The number of rings in a molecule should be less than 3. The formal charge of each atom cannot be less than -1 or greater than +1. Similarly, each molecule cannot be charged to less than -1 or greater than +1. The other screening criteria is listed as follows: (1) 5 ≤ (total # of rings in an intermediate), (2) 5 ≤ (# of molecules constituting an intermediate), (3) Intermediates of which connectivity is changed after semi-empirical optimization, (4) Intermediates of which semi-empirical optimization jobs are not terminated normally. Active atoms in reactant molecules should be carefully selected to compromise between reliability and efficiency. Figure 3 illustrates the active atoms selected for the Benzoin condensation reaction along with the accepted mechanism. The cases of the other 25 reactions are presented in supporting information.

ACS Paragon Plus Environment

11

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 46

Figure 3. Example of how to choose active atoms (red circles) for the cyanide-catalyzed benzoin condensation reaction with its accepted mechanism. All reactive atoms can be assigned as active ones.

Results and Discussion 3.1 Reaction Network Analysis of 26 Organic Reactions. Table 1. Simulation Results for 26 Organic Reactions

0.4 (0.4 + 0.0 + 0.0)

# of total atoms 16

# of active remarks atoms 4

(9, 22)

2.2 (2.2 + 0.0 + 0.0)

22

5

42

(22, 99)

2.3 (2.2 + 0.0 + 0.1)

12

6

Conia-Ene

22

(9, 14)

2.9 (2.8 + 0.1 + 0.0)

26

6

Beckmann rearrangement

30

(24, 75)

3.2 (3.1 + 0.0 + 0.1)

13

6

Michael

16

(13, 41)

6.1 (6.0 + 0.1 + 0.0)

27

5

Neber rearrangement

14

(14, 41)

6.7 (6.4 + 0.3 + 0.0)

47

5

reaction name

N1a

(NV, NE)b

Ttot (T1 + T2 + T3)c (min)

Barton

6

(6, 11)

Henry

9

1,3-butadiene + HCl

issue 1d

issue 2e

ACS Paragon Plus Environment

12

Page 13 of 46 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Ramberg-Bäcklund

45

(41, 183)

8.2 (7.6 + 0.1 + 0.6)

19

6

Arbuzov

57

(8, 17)

15.6 (15.4 + 0.1 + 0.0)

36

8

Schotten-Baumann

75

(22, 72)

16.3 (16.1 + 0.1 + 0.1)

19

8

azo coupling

8

(8 19)

17.3 (17.2 + 0.1 + 0.0)

38

4

Favorskii rearrangement 75

(29, 121)

21.9 (21.5 + 0.2 + 0.2)

32

7

Dieckmann condensation

(32, 165)

26.0 (25.4 + 0.3 + 0.3)

32

8

65

Table 1. (continued)

reaction name

N1

a

(NV, NE)b

Ttot (T1 + T2 + T3

# of # of total active remarks atoms atoms

Fischer esterification

228

(33, 143)

29.9 (29.6 + 0.1 + 0.2)

21

11

Wolff-Kishner reduction

19

(16, 52)

33.8 (33.4 + 0.3 + 0.1)

41

6

Baylis-Hillman

98

(26, 106)

40.1 (39.4 + 0.5 + 0.1)

35

6

Darzens

110

(39, 132)

40.6 (39.7 + 0.8 + 0.2)

32

7

CN- catalyzed benzoin 51 condensation

(23, 80)

51.6 (51.4 + 0.2 + 0.1)

30

7

Oxy-Cope rearrangement

622

(108, 643)

51.6 (42.3 + 0.9 + 8.4)

17

8

Benzylic acid rearrangement

142

(25, 107)

116.2 (115.8 + 0.3 + 0.2)

28

7

Cannizzaro

117

(46, 213)

143.4 (142.0 + 1.0 + 0.4)

35

9

Diazotization

243

(93, 1128)

188.8 (171.7 + 0.4 + 16.7)

20

9

2100 (15, 39)

257.2 (256.8 + 0.3 + 0.1)

19

10

96

280.8 (280.2 + 0.5 + 0.1)

33

8

acid catalyzed reaction Baeyer-Villiger

aldol

(27, 83)

)c (min)

issue 3f

ACS Paragon Plus Environment

13

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 46

Bischler-Napieralski

957

(157, 882)

1100.5 (1071.9 + 2.4 + 26.1) 30

9

Mannich

658

(26, 80)

4336.1 (4335.7 + 0.3 + 0.1)

8

29

issue 1d

aNumber

of intermediates sampled at step1. bNumber of vertices (NV) and number of edges (NE). time (step1 + step2 + step3). dAdjustment of 𝐸𝑡𝑜𝑙. eConnectivity change of the welldefined intermediate due to PM6 calculation. fSize extension of minimal network

cComputation

Table 1 summarizes the calculation results of 26 organic reactions. First of all, total computation time varies from seconds to days depending on reactions. There are three main factors determining the computation time; (i) number of total atoms, (ii) number of active atoms, and (iii) efficiency of screening by the pre-imposed heuristic rules. The first one is due to computation time to obtain molecular energy and geometry using a semi-empirical method such as PM6, which is the most time-consuming part in Step 1 to Step 3. For example, the BaylisHillman reaction and the Beckmann rearrangement reaction have the same number of active atoms as 6, but the former with 35 atoms took much longer time (39.4 min) in Step 1 than 3.1 min of the latter with 13 atoms. It should be noted that most of the total time was taken for PM6 calculations in Step 1. The second factor determines the number of molecular graphs that can be obtained from enumeration of AC matrix. In general, more active atoms cause rapid increase of computation time due to combinatorial complexity. The Ramberg-Bäcklund and acid-catalyzed aldol reactions are the example of this case. Though they have a same number of atoms (19), the latter has more active atoms (10) than that of the former (6), resulting in longer computation time. However, it is difficult to estimate computation time with only these two factors. If the given heuristic rules remove a large number of enumerated molecular graphs, the following timeconsuming process such as PM6 calculation becomes fast. The Darzens and Favorskii rearrangement reactions are such cases. On the contrary, the Mannich reaction took the longest

ACS Paragon Plus Environment

14

Page 15 of 46 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

time because the screening by the heuristic rules was ineffective. These three factors affect computation time of each reaction with different extents, resulting in such a large variation of total computation time. The numbers of intermediates with respect to the computation time in Step 1 clearly demonstrate high efficiency of our graph enumeration scheme to find chemically plausible intermediates. In principle, combinatorial enumeration of molecular graph in an entire chemical space gives an intractable number of intermediates. However, use of appropriate active atoms and heuristic rules restrict the chemical space only around reactants and products. As a result, computation time in the enumeration becomes negligible compared to that of PM6 calculations. For instance, the Mannich reaction took only 0.9 min in enumerating molecular graphs out of 4335.7 min. A large number of intermediate candidates would still remain after Step 1. Table 1 shows that the reaction network construction and extraction in Step 2 and 3 play a crucial role in finding most plausible intermediates. In the aldol reaction, for example, only 15 intermediates were left out of 2100 after Step 3. For the other reactions, the final networks contained less than 50 intermediates, except for the Bischler-Napieralski, the Oxy-Cope rearrangement, and the diazotization reactions. Even for the three exceptions, 157, 108, and 93 vertices were left after ruling out 84 %, 83 %, and 62 % of intermediates obtained in Step 1, respectively. Such great reduction by the graph-theoretic approach makes transition state calculations required in Step 4 affordable. Analysis on the computation times in Table 1 manifests high efficiency of our method. Efficiency is meaningful only if reliability is guaranteed. To assess the reliability of our method, we examined whether or not the final network of each reaction contains the generally accepted

ACS Paragon Plus Environment

15

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 46

mechanism of the reaction. Consequently speaking, we could successfully find the accepted mechanism for all the 26 reactions. However, there are three issues that should be addressed. Two of them are caused by the PM6 method employed in this study. Its advantage is apparently in fast calculation of molecular energy and geometry, but it can provoke severe errors due to crude approximations, leading to the first issue. In the present experiment, we removed intermediates with 70 kcal/mol higher energy than that of reactants for each reaction. In general, 𝐸𝑡𝑜𝑙 of 70 kcal/mol is high enough compared to the maximally allowed energy barrier for reactions under ambient condition. In this case, only 24 reaction networks contained their generally accepted mechanisms. That is because some of important intermediates were removed by the energy criterion in the addition reaction of 1,3-butadiene with HCl and the BischlerNapieralski reaction. To remedy this problem, we increased 𝐸𝑡𝑜𝑙 for the two cases and finally found the missing intermediates at 290 and 90 kcal/mol, respectively. Hence, it is cautious to benefit from the approximated method to achieve a reliable result as well as high efficiency. Inaccuracy of the semi-empirical method can also cause wrong geometry optimization, which is the second issue. Figure 4 shows the known reaction path of the Beckmann rearrangement indicated by the black arrow. We note that I3 was excluded since it was relaxed to the structure of I2 by the PM6 optimization. Thus, our method predicted the resulting path proceeded directly from I2 to I4 as indicated by the green arrow. However, I3 can be readily found through transition state calculations in Step 4.

ACS Paragon Plus Environment

16

Page 17 of 46 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Figure 4. The accepted mechanism of the Beckmann rearrangement reaction (the black arrows). The green arrow denotes the direct route from I2 to I4 proposed by our method due to the inaccuracy of PM6 geometry optimization. The final issue to be addressed is about the range of the shortest paths considered when obtaining a minimal reaction network in Step 3. Taking only the first shortest path passing through each intermediate into account was enough to find the accepted mechanism of all reactions except for the diazotisation reaction. In that case, the 2nd shortest path had also to be considered with additional time cost (16.7 min). However the additional cost was tolerable compared to total time. Despite those issues that can be readily resolved with additional affordable costs, the above results manifest that our network construction and extraction scheme is very efficient and yet reliable for most organic reactions.

3.2 Kinetic Analysis of Cyanide Catalyzed Benzoin Condensation Reaction The main purpose of making reaction networks is to find the most favorable reaction path and competitive ones amongst various possible pathways. We could extract the so-called minimal

ACS Paragon Plus Environment

17

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 46

reaction network with the efficient graph-theoretic approach. To check out the validity of the resulting reaction network, we performed TS calculations using DFT and then microkinetic analysis. Specifically, we examined whether or not one can find desired products and accepted mechanism in the network. In addition, the microkinetic analysis may give us possible byproducts and other competitive pathways. To demonstrate this idea, we chose the cyanidecatalyzed benzoin condensation reaction as a test case. The accepted mechanism of the reaction (Figure 3) was suggested by Lapworth42,43 in the early 1900s and has been well accepted. However, the possibility of other reaction pathways has been investigated until recently.44–49 We note that the minimal reaction network of the reaction extracted by our method indeed contained other pathways as well as the known mechanism. Therefore, it is a good example to evaluate the performance of our method. To carry out the microkinetic analysis, we adopted the experimental conditions reported recently in ref 50 amongst various other experiments51–56. Specifically, the volume of a virtual reactor was set to 2.77e-20 L, which contains 5,000 benzaldehydes and 1,665 cyanides. The system temperature 𝑇 was deliberately increased up to 2,500 K to accelerate the reaction in the virtual reactor. We carried out 240 stochastic simulations for statistical significance. More details of these settings are given in supporting information.

ACS Paragon Plus Environment

18

Page 19 of 46 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Figure 5. (a) Reaction network of the cyanide-catalyzed benzoin condensation reaction obtained by ACE-Reaction. The elementary reactions of which Δ𝐺 ‡ s are less than 30 kcal/mol for both directions (forward and backward) are shown in red. Here, the forward means the direction from a vertex above to a vertex below, whereas the backward means the other way around. When Δ𝐺 ‡ of the single direction is less than 30 kcal/mol, it is indicated as green (forward) or blue (backward). Otherwise, they are marked in black. (b) The molecular structures of the intermediates in (a). In the minimal network of the cyanide-catalyzed benzoin condensation reaction after Step 3, we found 116 different paths; the minimum and maximum 𝐶𝐷s (eq 1) along the paths were 5 and 11, respectively. We selected the top 50 % shortest paths in terms of 𝐶𝐷, making a new network composed of 20 vertices and 53 edges. Figure 5(a) and 5(b) show the resulting network and the

ACS Paragon Plus Environment

19

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 46

molecular structures corresponding to each vertex. Elementary reactions were marked with nondirectional edges. We performed TS optimizations for the elementary reactions, which succeeded only for 24 reactions indicated by the solid lines; the failed ones were denoted by the dotted lines. As a result, 3 vertices denoted by the gray circles were isolated in the network. The calculated Δ𝐺 ‡ s and all optimized geometries are available in supporting information. The final network contained 7 paths connecting the reactants and the products including the accepted mechanism (Figure 3). They can be divided into two types according to whether passing I1 or 1; their structures are displayed in Figure 5(b). The former case includes the following four paths: path 1) R → I1 → 6 → I4 → P path 2) R → I1 → I4 → P path 3) R → I1 → I2 → I3 → I4 → P (accepted mechanism) path 4) R → I1 → 4 → P The latter case has the following three paths: path 5) R → 1 → 8 → 12 → P path 6) R → 1 → 8 → P path 7) R → 1 → 8 → 11 → P A previous computational study49 on this reaction proposed four possible mechanisms. Two of them correspond to path 3 and path 4. The other two paths were also included in our network: path 8) R → I1 → 1 → 8 → 11 → P and path 9) R→ I1 → 10 → I2 → I3 → I4 → P. We noted that I1 → 1 and 10 → I2 edges were screened out in the minimal network extraction process, but they can be recovered by extending the searching range up to the third shortest paths.

ACS Paragon Plus Environment

20

Page 21 of 46 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Comparison of Δ𝐺 ‡ of the rate determining steps (RDSs) of the 7 paths implies that path 1 (RDS = I1 → 6, 29.51 kcal/mol) is favorable against path 3 (RDS = I1 → I2, 39.72 kcal/mol. It should be noted that path 1 was newly proposed in this work. However, we need to consider the propensities of each reaction step which are determined by the amount of species participating in the reaction at a particular point in time along with the rate constants and by the propensities of the other reaction steps. In other words, the direct comparison of the Δ𝐺 ‡ values of the RDSs may not solely determine the most plausible path. Therefore, we need to perform the microkinetic analysis that gives the reaction progress profile of each species and the propensity of each elementary reaction at once.

Figure 6. (a) Reaction progress profile of 20 species involved in the cyanide-catalyzed benzoin condensation reaction. (b) Enlarged view of the lower part of (a). Each name in the legend indicates the 20 species whose structures are given in Figure 5(b).

ACS Paragon Plus Environment

21

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 46

Figure 6 shows the reaction progress profile of the number of molecules in the network as the simulation result. Each data point represents the mean number of species in a uniform time grid over the 240 stochastic simulations. Figure 6(a) includes 20 species participated in the reaction, while Figure 6(b) shows an enlarged view of the lower part of Figure 6(a). At the end of the simulations, 1,469 benzaldehydes as the reactant was consumed to produce 401 benzoins as the desired product (Figure 6(b)). In the meantime, we could observe side products (Figure 6(b)): the aryl compound of the intermediate 13 (named as 13-aryl, 278 molecules produced) which is an enolate form of benzoin, the aryl compound of the intermediate 9 (named as 9-aryl, 34 molecules produced), and HCN (311 molecules produced). We noted that the sum of 13-aryl and 9-aryl molecules is equal to the number of HCN, indicating that they have been made through P → 13 and P → 9. Therefore, those side products can be prevented by a product quenching process in experiments. Cyanide was decreased by about 340 which is close to the number of HCN. That is, most cyanides played a role as a catalyst appropriately. Most intermediates were remained under 15 like in steady states during the reaction progress.

ACS Paragon Plus Environment

22

Page 23 of 46 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Figure 7. Energy profile of 3 major pathways of the cyanide-catalyzed benzoin condensation reaction discovered by ACE-Reaction. The numbers in the parentheses indicate net reaction counts. The integer 4 and 6 denote the intermediates in Figure 5(b). The other values indicate the relative free energies of each intermediate and the products with respect to that of the reactants. To find out the main reaction path, we performed further analysis especially by observing the prevalence of the forward or backward reactions of each elementary reaction. We adopted the analysis scheme proposed by Habershon in ref 20. We first calculated net reaction counts by subtracting the counts of backward elementary reaction from that of forward elementary reaction. The results for all elementary reactions are listed in supporting information. Analysis of these values confirmed that path 1, path 3 (the accepted mechanism), and path 4 contributed mostly to the product formation. Figure 7 shows the Gibbs free energy profiles of the three major paths. Here, the reaction count of each elementary reaction is shown in the parentheses beside the TS energy values. Comparison of the energy profiles and the net reaction counts concluded that path 1 is the most favorable, while path 3 and path 4 contributed in similar extents. In

ACS Paragon Plus Environment

23

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 46

particular, path 1 has never been reported before. It should be noted that the “accepted mechanism” does not necessarily mean that it has been proven either experimentally or theoretically. Especially for the benzoin condensation reaction, though path 3 has been known as the accepted mechanism for a long time, new paths have been proposed until recently, and none of them has been clearly proven. For further verification of which one is favored, additional experiments or simulations would be required. However, that is beyond the scope of the present work. Nonetheless, the fact that our method could find both the accepted mechanism and other competitive paths shows that the network-based kinetic analysis is useful for mechanistic study. After all, whole calculations from Step 1 to Step 3 took only 51 minutes and Step 4 required only 20 intermediates and 55 TS optimizations. It was possible because we could extract a key part of the reaction network with the very efficient graph-theoretic approach.

Conclusions We devised a computational method called ACE-Reaction which aims to perform automated prediction of reaction mechanisms through a graph-theoretical approach. The novel feature of our method is to extract a minimal reaction network which contains the key part of a given reaction by using the concept of chemical distance. We successfully applied it to the Claisen ester condensation and the cobalt-catalyzed hydroformylation reactions in the previous study.28 In this work, we performed extensive studies on 26 representative organic reactions to demonstrate the general applicability of our method. Furthermore, we implemented a new feature in the method for microkinetic analysis of reaction networks. We noted that the minimal reaction networks obtained by our method included the accepted mechanisms of all the 26 reactions. In most cases, computation time was less than about a few

ACS Paragon Plus Environment

24

Page 25 of 46 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

hours, proving that the graph-theoretic approach is very efficient and yet reliable. This was because enumeration of molecular graphs combined with heuristic rules makes exhaustive sampling possible, and screening process with the concept of chemical distance further removes many irrelevant intermediates. Microkinetic analysis on the benzoin condensation reaction as an example study demonstrated the validity of our method and also potential applicability to find unknown pathways together with side products. The results in this work also uncovered problematic issues that should be resolved in future. First of all, selection of active atoms is critical to compromise between computational accuracy and efficiency. The greater the number of active atoms is, the higher the reliability but the larger the computational cost. At present, it is simply given as input by users. Therefore, an automated method needs to be devised for a systematic performance improvement. Second, semi-empirical methods employed for efficient energy and geometry calculations provoked undesired errors which may cause severe damage to the reliability of our approach. Though those issues can be addressed with additional costs, a more fundamental solution is desired. Recent studies showed that machine learning techniques can be a viable alternative with even faster computation speed. The state-of-the-art deep learning approaches predicted molecular energies within chemical accuracy with respect to DFT energies.57–64 Alphafold showed a promising result toward accurate molecular geometry optimization through a deep learning technique.65 At present, however, such machine learning approaches cover only neutral molecules made of C, H, O, and N due to the lack of molecular big data, whereas reaction intermediates may exist in radical or ionic forms with more diverse atoms such as S, P, and halogen atoms. Therefore, an extended molecular database which encompasses most organic reactions should be required for more

ACS Paragon Plus Environment

25

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 46

general applicability. We believe our approach will become more reliable as well as more efficient by incorporating such deep learning methods in the near future.

ASSOCIATED CONTENT The following files are available free of charge. Active site selection, DFT optimized structures of 24 transition states and intermediates for microkinetic analysis, Gibbs free energies of activation for 24 elementary reactions, how to determine the settings for microkinetic modeling, and the mean net reaction counts with confidence intervals (PDF) AUTHOR INFORMATION Corresponding Author *E-mail: [email protected] Present Addresses #Biosciences

Center, National Renewable Energy Laboratory, 15013 Denver West

Parkway, Golden, Colorado 80401, United States. Author Contributions §These

authors contributed equally.

Notes

ACS Paragon Plus Environment

26

Page 27 of 46 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

The authors declare no competing financial interest ACKNOWLEDGMENT This work was supported by Basic Science Research Programs through the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT (NRF2017R1E1A1A01078109). We also thank Prof. Mu-Hyun Baik of the Center for Catalytic Hydrocarbon Functionalizations, Institute for Basic Science (IBS) and Department of Chemistry at KAIST, for providing the computing resources. REFERENCES (1)

Niu, S.; Hall, M. B. Theoretical Studies on Reactions of Transition-Metal Complexes. Chem. Rev. 2000, 100, 353–405.

(2)

Houk, K. N.; Cheong, P. H.-Y Computational Prediction of Small-Molecule Catalysts. Nature 2008, 455, 309–313.

(3)

Cheong, P. H.-Y.; Legault, C. Y.; Um, J. M.; Çelebi-Ölçüm, N.; Houk, K. N. Quantum Mechanical Investigations of Organocatalysis: Mechanisms, Reactivities, and Selectivities. Chem. Rev. 2011, 111, 5042–5137.

(4)

Levinthal, C. Are There Pathways for Protein Folding? J. Chim. Phys. 1968, 65, 44– 45.

ACS Paragon Plus Environment

27

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(5)

Page 28 of 46

Goedecker, S. Minima Hopping: An Efficient Search Method for the Global Minimum of the Potential Energy Surface of Complex Molecular Systems. J. Chem. Phys. 2004, 120, 9911–9917.

(6)

Dewyer, A. L.; Argüelles, A. J.; Zimmerman, P. M. Methods for Exploring Reaction Space in Molecular Systems. WIREs Comput. Mol. Sci. 2018, 8, e1354.

(7)

Vogiatzis, K. D.; Polynski, M. V.; Kirkland, J. K.; Townsend, J.; Hashemi, A.; Liu, C.; Pidko, E. A. Computational Approach to Molecular Catalysis by 3d Transition Metals: Challenges and Opportunities. Chem. Rev. 2018, 119, 2453-2523

(8)

Rappoport, D.; Galvin, C. J.; Zubarev, D. Y.; Aspuru-Guzik, A. Complex Chemical Reaction Networks from Heuristics-Aided Quantum Chemistry. J. Chem. Theory Comput. 2014, 10, 897–907.

(9)

Zubarev, D. Y.; Rappoport, D.; Aspuru-Guzik, A. Uncertainty of Prebiotic Scenarios: The Case of the Non-Enzymatic Reverse Tricarboxylic Acid Cycle. Sci. Rep. 2015, 5, 8009.

(10) Segler, M. H. S.; Waller, M. P. Modelling Chemical Reasoning to Predict and Invent Reactions. Chem. Eur. J. 2017, 23, 6118–6128. (11) Bergeler, M.; Simm, G. N.; Proppe, J.; Reiher, M. Heuristics-Guided Exploration of Reaction Mechanisms. J. Chem. Theory Comput. 2015, 11, 5712–5722.

ACS Paragon Plus Environment

28

Page 29 of 46 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

(12) Simm, G. N.; Reiher, M. Context-Driven Exploration of Complex Chemical Reaction Networks. J. Chem. Theory Comput. 2017, 13, 6108–6119. (13) Simm, G. N.; Vaucher, A. C.; Reiher, M. Exploration of Reaction Pathways and Chemical Transformation Networks. J. Phys. Chem. A 2019, 123, 385–399. (14) Martínez-Núñez, E. An Automated Method to Find Transition States Using Chemical Dynamics Simulations. J. Comput. Chem. 2015, 36, 222–234. (15) Varela, J. A.; Vazquez, S. A.; Martinez-Nunez, E. An Automated Method to Find Reaction Mechanisms and Solve the Kinetics in Organometallic Catalysis. Chem. Sci. 2017, 8, 3843–3851. (16) Vázquez, S. A.; Otero, X. L.; Martinez-Nunez, E. A Trajectory-Based Method to Explore Reaction Mechanisms. Molecules 2018, 23, 3156. (17) Wang, L.-P.; Titov, A.; McGibbon, R.; Liu, F.; Pande, V. S.; Martínez, T. J. Discovering Chemistry with an Ab Initio Nanoreactor. Nat. Chem. 2014, 6, 1044–1048. (18) Wang, L.-P.; McGibbon, R. T.; Pande, V. S.; Martinez, T. J. Automated Discovery and Refinement of Reactive Molecular Dynamics Pathways. J. Chem. Theory Comput. 2016, 12, 638–649.

ACS Paragon Plus Environment

29

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 30 of 46

(19) Habershon, S. Sampling Reactive Pathways with Random Walks in Chemical Space: Applications to Molecular Dissociation and Catalysis. J. Chem. Phys. 2015, 143, 094106. (20) Habershon, S. Automated Prediction of Catalytic Mechanism and Rate Law Using Graph-Based Reaction Path Sampling. J. Chem. Theory Comput. 2016, 12, 1786– 1798. (21) Maeda, S.; Morokuma, K. Finding Reaction Pathways of Type A + B → X: Toward Systematic Prediction of Reaction Mechanisms. J. Chem. Theory Comput. 2011, 7, 2335–2345. (22) Sameera, W. M. C.; Maeda, S.; Morokuma, K. Computational Catalysis Using the Artificial Force Induced Reaction Method. Acc. Chem. Res. 2016, 49, 763–773. (23) Zimmerman, P. M. Automated Discovery of Chemically Reasonable Elementary Reaction Steps. J. Comput. Chem. 2013, 34, 1385–1392. (24) Zimmerman, P. M. Navigating Molecular Space for Reaction Mechanisms: An Efficient, Automated Procedure. Mol. Simul. 2015, 41, 43–54. (25) Dewyer, A. L.; Zimmerman, P. M. Finding Reaction Mechanisms, Intuitive or Otherwise. Org. Biomol. Chem. 2017, 15, 501–504.

ACS Paragon Plus Environment

30

Page 31 of 46 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

(26) Suleimanov, Y. V.; Green, W. H. Automated Discovery of Elementary Chemical Reaction Steps Using Freezing String and Berny Optimization Methods. J. Chem. Theory Comput. 2015, 11, 4248–4259. (27) Gao, C. W.; Allen, J. W.; Green, W. H.; West, R. H. Reaction Mechanism Generator: Automatic Construction of Chemical Kinetic Mechanisms. Comput. Phys. Commun. 2016, 203, 212–225. (28) Kim, Y.; Kim, J. W.; Kim, Z.; Kim, W. Y. Efficient Prediction of Reaction Paths through Molecular Graph and Reaction Network Analysis. Chem. Sci. 2018, 9, 825–835. (29) Jochum, C.; Gasteiger, J.; Ugi, I. The Principle of Minimum Chemical Distance (PMCD). Angew. Chemie Int. Ed. 1980, 19, 495-505. (30) Jochum, C.; Gasteiger, J.; Ugi, I. The Principle of Minimum Chemical Distance and the Principle of Minimum Structure Change. Z. Naturforsch. 1982, 37b, 1205–1215. (31) Besora, M.; Maseras, F. Microkinetic Modeling in Homogeneous Catalysis. WIREs Comput. Mol. Sci. 2018, 8, e1372. (32) Kim, Y.; Kim, W. Y. Universal Structure Conversion Method for Organic Molecules: From Atomic Connectivity to Three-Dimensional Geometry. Bull. Korean Chem. Soc. 2015, 36, 1769–1777.

ACS Paragon Plus Environment

31

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 32 of 46

(33) Stewart, J. J. P. Optimization of Parameters for Semiempirical Methods V: Modification of NDDO Approximations and Application to 70 Elements. J. Mol. Model. 2007, 13, 1173–1213. (34) Aradi, B.; Hourahine, B.; Frauenheim, T. DFTB+, a Sparse Matrix-Based Implementation of the DFTB Method. J. Phys. Chem. A 2007, 111, 5678–5684. (35) Dijkstra, E. W. A Note on Two Problems in Connexion with Graphs. Numer. Math. 1959, 1, 269–271. (36) Yen, J. Y. Finding the K Shortest Loopless Paths in a Network. Manage. Sci. 1971, 17, 712-716. (37) Eyring, H. The Activated Complex in Chemical Reactions. J. Chem. Phys. 1935, 3, 107–115. (38) Gillespie, D. T. A General Method for Numerically Simulating the Stochastic Time Evolution of Coupled Chemical Reactions. J. Comput. Phys. 1976, 22, 403–434. (39) Gillespie, D. T. Exact Stochastic Simulation of Coupled Chemical Reactions. J. Phys. Chem. 1977, 81, 2340–2361. (40) Gillespie, D. T. Stochastic Simulation of Chemical Kinetics. Annu. Rev. Phys. Chem. 2007, 58, 35–55.

ACS Paragon Plus Environment

32

Page 33 of 46 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

(41) Frisch, M. J.; Trucks, G. W.; Schlegel, H. B.; Scuseria, G. E.; Robb, M. A.; Cheeseman, J. R.; Scalmani, G.; Barone, V.; Mennucci, B.; Petersson, G. A., et al. Gaussian 09, Revision D.01; Gaussian, Inc.: Wallingford, CT, 2009. (42) Lapworth, A. XCVI.-Reactions Involving the Addition of Hydrogen Cyanide to Carbon Compounds. J. Chem. Soc., Trans 1903, 83, 995–1005. (43) Lapworth, A. V.-The Action of Halogens on Compounds containing the Carbonyl Group. J. Chem. Soc., Trans 1904, 85, 30–42. (44) Kuebrich, J. P.; Schowen, R. L.; Wang, M.-S.; Lupes, M. E. The Mechanism of the Benzoin Condensation. J. Am. Chem. Soc. 1971, 93, 1214–1220. (45) Castells, J.; López-Calahorra, F.; Domingo, L. Postulation of Bis(Thiazolin-2Ylidene)s as the Catalytic Species in the Benzoin Condensation Catalyzed by a Thiazolium Salt plus Base. J. Org. Chem. 1988, 53, 4433–4436. (46) Martí, J.; López-Calahorra, F.; Bofill, J. M. A Theoretical Study of Benzoin Condensation. J. Mol. Struct. (Theochem) 1995, 339, 179–194. (47) Gronert, S. An Epoxide Intermediate in Nucleophilic Acylations by Thiazolium Precursors. Org. Lett. 2007, 9, 3065–3068.

ACS Paragon Plus Environment

33

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 34 of 46

(48) Yamabe, S.; Yamazaki, S. Three Competitive Transition States in the Benzoin Condensation Compared to the Clear Rate-Determining Step in the Cannizzaro Reaction. Org. Biomol. Chem. 2009, 7, 951–961. (49) He, Y.; Xue, Y. Mechanism Insight into the Cyanide-Catalyzed Benzoin Condensation: A Density Functional Theory Study. J. Phys. Chem. A 2010, 114, 9222–9230. (50) Kim, Y.-J.; Kim, N. Y.; Cheon, C. H. Beyond Benzoin Condensation: Trimerization of Aldehydes via Metal-Free Aerobic Oxidative Esterification of Aldehydes with Benzoin Products in the Presence of Cyanide. Org. Lett. 2014, 16, 2514–2517. (51) Breslow, R. On the Mechanism of Thiamine Action. IV. Evidence from Studies on Model Systems. J. Am. Chem. Soc. 1958, 80, 3719–3726. (52) Breslow, R. Hydrophobic Effects on Simple Organic Reactions in Water. Acc. Chem. Res. 1991, 24, 159–164. (53) Linghu, X.; Bausch, C. C.; Johnson, J. S. Mechanism and Scope of the CyanideCatalyzed Cross Silyl Benzoin Reaction. J. Am. Chem. Soc. 2005, 127, 1833–1840. (54) Berkessel, A.; Elfert, S.; Etzenbach-Effers, K.; Teles, J. H. Aldehyde Umpolung by NHeterocyclic Carbenes: NMR Characterization of the Breslow Intermediate in Its

ACS Paragon Plus Environment

34

Page 35 of 46 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Keto Form, and a Spiro-Dioxolane as the Resting State of the Catalytic System. Angew. Chemie. Int. Ed. 2010, 49, 7120–7124. (55) Collett, C. J.; Massey, R. S.; Maguire, O. R.; Batsanov, A. S.; O’Donoghue, A. C.; Smith, A. D. Mechanistic Insights into the Triazolylidene-Catalysed Stetter and Benzoin Reactions: Role of the N-Aryl Substituent. Chem. Sci. 2013, 4, 1514–1522. (56) Rehbein, J.; Ruser, S.-M.; Phan, J. NHC-Catalysed Benzoin Condensation-Is It All down to the Breslow Intermediate? Chem. Sci. 2015, 6, 6013–6018. (57) Schütt, K. T.; Sauceda, H. E.; Kindermans, P.-J.; Tkatchenko, A.; Müller, K.-R. SchNet - A Deep Learning Architecture for Molecules and Materials. J. Chem. Phys. 2018, 148, 241722. (58) Schütt, K. T.; Kindermans, P.-J.; Sauceda, H. E.; Chmiela, S.; Tkatchenko, A.; Müller, K.-R. SchNet: A Continuous-Filter Convolutional Neural Network for Modeling Quantum Interactions. Advances in Neural Information Processing Systems 30 2017, 992-1002. (59) Jørgensen, P. B.; Jacobsen, K. W.; Schmidt, M. N. Neural Message Passing with Edge Updates for Predicting Properties of Molecules and Materials. arXiv:1806.03146 2018.

ACS Paragon Plus Environment

35

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 36 of 46

(60) Hansen, K.; Montavon, G.; Biegler, F.; Fazli, S.; Rupp, M.; Scheffler, M.; von Lilienfeld, O. A.; Tkatchenko, A.; Müller, K.-R. Assessment and Validation of Machine Learning Methods for Predicting Molecular Atomization Energies. J. Chem. Theory Comput. 2013, 9, 3404–3419. (61) Smith, J. S.; Isayev, O.; Roitberg, A. E. ANI-1: An Extensible Neural Network Potential with DFT Accuracy at Force Field Computational Cost. Chem. Sci. 2017, 8, 3192–3203. (62) Hansen, K.; Biegler, F.; Ramakrishnan, R.; Pronobis, W.; von Lilienfeld, O. A.; Müller, K.-R.; Tkatchenko, A. Machine Learning Predictions of Molecular Properties: Accurate Many-Body Potentials and Nonlocality in Chemical Space. J. Phys. Chem. Lett. 2015, 6, 2326-2331. (63) Smith, J. S.; Nebgen, B. T.; Zubatyuk, R.; Lubbers, N.; Devereux, C.; Barros, K.; Tretiak, S.; Isayev, O.; Roitberg, A. E. Outsmarting Quantum Chemistry through Transfer Learning. ChemRxiv:6744440 2018. (64) Rupp, M.; Tkatchenko, A.; Müller, K.-R.; von Lilienfeld, O. A. Fast and Accurate Modeling of Molecular Atomization Energies with Machine Learning. Phys. Rev. Lett. 2012, 108, 1–5.

ACS Paragon Plus Environment

36

Page 37 of 46 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

(65) Evans, R.; Jumper, J.; Kirkpatrick, J.; Sifre, L.; Green, T. F. G.; Qin, C.; Zidek, A.; Nelson, A.; Bridgland, A.; Penedones, H.; et al. De Novo Structure Prediction with DeepLearning Based Scoring. Thirteen. Crit. Assess. Tech. Protein Struct. (Abstracts) 2018.

TOC Graphic

ACS Paragon Plus Environment

37

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 38 of 46

Your photo

Position, Department, Affiliation

Associate Professor Chemistry KAIST

ACS Paragon Plus Environment

38

Page 39 of 46 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Biography

Woo Youn Kim completed his undergraduate degree in chemistry at POSTECH in 2004 and a Ph.D. in computational chemistry under the guidance of Prof. Kwang S. Kim at POSTECH in 2009. He was a postdoctoral fellow in the group of Prof. E. K. U. Gross at Max Planck Institute of microstructure physics. He joined Department of Chemistry at KAST in 2011 and now is a tenured associate professor. His research interests include the development of computational tools including quantum chemistry, graph-theoretical methods, and machine learning techniques.

ACS Paragon Plus Environment

39

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 1. Schematic workflow of ACE-Reaction. AC stands for adjacency matrix, ER, Etol, and EPM6 denote the computed energy of reactants, the tolerance energy for energy-wise screening of intermediates, and the computed energy at the PM6 level, respectively. 80x51mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 40 of 46

Page 41 of 46 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Figure 2. (a) 16 representative functional groups involved in the 26 organic reactions studied in this work. (b) Hypothetical functional groups used to screen out intermediates during graph enumeration. 38x50mm (300 x 300 DPI)

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 3. Example of how to choose active atoms (red circles) for the cyanide-catalyzed benzoin condensation reaction with its accepted mechanism. All reactive atoms can be assigned as active ones. 39x26mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 42 of 46

Page 43 of 46 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Figure 4. The accepted mechanism of the Beckmann rearrangement reaction (the black arrows). The green arrow denotes the direct route from I2 to I4 proposed by our method due to the inaccuracy of PM6 geometry optimization. 39x34mm (300 x 300 DPI)

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 5. (a) Reaction network of the cyanide-catalyzed benzoin condensation reaction obtained by ACEReaction. The elementary reactions of which ΔG‡s are less than 30 kcal/mol for both directions (forward and backward) are shown in red. Here, the forward means the direction from a vertex above to a vertex below, whereas the backward means the other way around. When ΔG‡ of the single direction is less than 30 kcal/mol, it is indicated as green (forward) or blue (backward). Otherwise, they are marked in black. (b) The molecular structures of the intermediates in (a). 170x108mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 44 of 46

Page 45 of 46 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Figure 6. (a) Reaction progress profile of 20 species involved in the cyanide-catalyzed benzoin condensation reaction. (b) Enlarged view of the lower part of (a). Each name in the legend indicates the 20 species whose structures are given in Figure 5(b). 80x43mm (300 x 300 DPI)

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 7. Energy profile of 3 major pathways of the cyanide-catalyzed benzoin condensation reaction discovered by ACE-Reaction. The numbers in the parentheses indicate net reaction counts. The integer 4 and 6 denote the intermediates in Figure 5(b). The other values indicate the relative free energies of each intermediate and the products with respect to that of the reactants. 81x40mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 46 of 46