Automated Transition State Search and Its Application to Diverse

Sep 28, 2017 - Here, we report a highly automated workflow designed to locate transition states for a given elementary reaction with minimal setup ove...
0 downloads 14 Views 2MB Size
Subscriber access provided by AUBURN UNIV AUBURN

Article

An automated transition state search and its application to diverse types of organic reactions Leif D. Jacobson, Art D. Bochevarov, Mark A. Watson, Thomas F. Hughes, David Rinaldo, Stephan Ehrlich, Thomas B. Steinbrecher, S. Vaitheeswaran, Dean M. Philipp, Mathew D. Halls, and Richard A. Friesner J. Chem. Theory Comput., Just Accepted Manuscript • DOI: 10.1021/acs.jctc.7b00764 • Publication Date (Web): 28 Sep 2017 Downloaded from http://pubs.acs.org on September 29, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Chemical Theory and Computation is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 54

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

An automated transition state search and its application to diverse types of organic reactions Leif D. Jacobson,† Art D. Bochevarov,∗,† Mark A. Watson,† Thomas F. Hughes,† David Rinaldo,‡ Stephan Ehrlich,‡ Thomas B. Steinbrecher,‡ S. Vaitheeswaran,¶ Dean M. Philipp,§ Mathew D. Halls,k and Richard A. Friesner⊥ †Schr¨odinger, Inc., 120 West 45th St, New York, New York 10036 ‡Schr¨odinger GmbH, Dynamostrasse 13, D-68165, Mannheim, Germany ¶Schr¨odinger, Inc., 222 Third St, Suite 2230, Cambridge, Massachusetts 02142 §Schr¨odinger, Inc., 101 SW Main St, Suite 1300, Portland, Oregon 97204 kSchr¨odinger, Inc., 5820 Oberlin Dr, Suite 203, San Diego, California 92121 ⊥Department of Chemistry, Columbia University, New York, New York 10027 E-mail: [email protected] Abstract Transition state search is at the center of multiple types of computational chemical predictions related to mechanistic investigations, reactivity and regioselectivity predictions, and catalyst design. The process of finding transition states in practice is however a laborious multistep operation that requires significant user involvement. Here we report a highly automated workflow designed to locate transition states for a given elementary reaction with a minimal setup overhead. The only essential inputs required from the user are the structures of the separated

1 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

reactants and products. The seamless workflow combining computational technologies from the fields of cheminformatics, molecular mechanics, and quantum chemistry automatically finds the most probable correspondence between the atoms in the reactants and the products, generates a transition state guess, launches a transition state search through a combined approach involving the relaxing string method and the quadratic synchronous transit, and finally validates the transition state via the analysis of the reactive chemical bonds and imaginary vibrational frequencies as well as by the intrinsic reaction coordinate method. Our approach does not target any specific reaction type nor does it depend on training data; instead, it is meant to be of general applicability for a wide variety of reaction types. The workflow is highly flexible, permitting modifications such as a choice of accuracy, level of theory, basis set, or solvation treatment. Successfully located transition states can be used for setting up transition state guesses in related reactions, saving computational time and increasing the probability of success. The utility and performance of the method are demonstrated in applications to transition state searches in reactions typical for organic chemistry, medicinal chemistry, and homogeneous catalysis research. In particular, applications of our code to Michael additions, hydrogen abstractions, Diels-Alder cycloadditions, carbene insertions, and an enzyme reaction model involving a molybdenum complex are shown and discussed.

1 INTRODUCTION The transition state (TS), the activation energy, and the potential energy surface (PES) are among the key concepts of chemistry. 1,2 Though the TS is typically experimentally unobservable (see, however, Refs. 3–5 ) and defined with multiple caveats within transition state theory and physical chemistry, 6,7 this and the associated notions serve as a foundation of the chemist’s understanding of how chemical reactions happen. To a first approximation, the end, equilibrium states of the reaction, which are the reactants and the products, determine the thermodynamic characteristics of the reaction whereas the TS, in conjunction with the reactants and the products, is responsible for the kinetic characteristics. Both the thermodynamic and the kinetic parts of the picture are

2 ACS Paragon Plus Environment

Page 2 of 54

Page 3 of 54

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

necessary for an understanding of the reaction mechanism and distribution of product yields.

Figure 1: A schematic representation of transition state search approaches. The prediction of the atomic structures and the energetics of the end states is a firmly established and routine operation in modern quantum chemical calculations but the analogous predictions for the TS manifest significant practical difficulties. This situation is reflected by the different terms that the quantum chemical community uses in reference to the prediction of the atomic structures of the equilibrium and the transition states. While the prediction of the equilibrium atomic structures is known as the “geometry optimization” the analogous prediction for the TS structures is referred to as the “transition state search”. The latter term implies that the process constitutes an actual search as opposed to a relatively straightforward optimization (refinement), and that searching does not necessarily result in finding the mathematically optimal solution. The greatest practical impediment to finding TSs is perhaps coming up with a good TS guess, which is a structure that serves as an approximation to the final, optimized TS structure. For equilibrium structures, there are efficient algorithms that translate two-dimensional or text string representations of the equilibrium structures into reasonable three-dimensional atomic structures, 8 and these structures can serve as an excellent guess in the subsequent geometry optimization. Structures around equilibria on the potential energy surface can be accurately pre-optimized with force fields, various flavors of which have existed for many years, whereas force fields for TS-like 3 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 54

structures are not as straightforward to develop. 9 On the other hand, algorithms that generate a transition state guess for a given reaction are complicated and not universal. A factor compounding the problem is that the outcome of the transition state search often depends critically on the quality of the transition state guess (see Section 2.4). Such a guess is typically constructed in a manual or a semi-automated way (with the help of a previously located TS in a related reaction), and adjusted accordingly in a trial-and-error fashion should the first pass at a transition state search prove to be unsuccessful. It is also worth pointing out that even if a good quality initial guess is available the outcome of the transition state search depends on the complexity of the underlying PES and the sophistication of of the search algorithm, such that the successful localization of the TS is not guaranteed. In the present paper, we describe a new, integrated approach to the problem of locating transition states. The underlying methodology is designed to be general, in that it is not geared toward any particular reaction type, or the use of specific quantum chemical methods. The long term goal is to enable the user to input products and reactants, and to have the program return an accurate, optimized transition state, with minimal need for human intervention. At present, there are a number of fundamental limitations; the reaction must constitute an elementary process, occur on a single potential energy surface, and involve making or breaking of one or more covalent bonds. There are also specific technical issues that can impede effective execution for individual cases. Nevertheless, a wide range of practically important chemistry can be treated, examples of which are provided in subsequent sections. A practical limitation impacting the results reported in the present paper concerns the treatment of large, highly flexible molecules. In some (although not all) such cases, conformational search methods are needed to identify the low energy reactant conformation, and to average over a Boltzmann distribution of initial reactant conformational states if necessary. We avoid these potential problems below by treating relatively rigid (and in many cases small) molecules in our example applications. Incorporation of conformational search and processing of multiple conformations into the algorithm, along the lines of our recent work on pKa prediction, 10 is currently ongoing,

4 ACS Paragon Plus Environment

Page 5 of 54

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

and will be necessary to achieve robustness in the general case. Two very different types of algorithms can be employed to construct a transition state initial guess. The first interpolates an intermediate structure from the reactants and products, and is applicable to an arbitrary reaction without any prior results. The second utilizes one or more previously computed transition state structures as a template, facilitating the construction of a more accurate and reliable initial guess based on the presumed similarity of the target reaction with the template. We view both of these algorithms as essential, and our integrated approach enables the user to select which one to employ for any given input reaction. There is also a capability to store and reuse the results of calculations after they are completed, so that when addressing a novel reaction, the interpolation method can be deployed initially, and the results of these calculations can then be used as templates in subsequent modeling. Template generation and storage in such a way can dramatically reduce overall computation times, and increase reliability, when considering a larger number of reactions of a library of congeneric compounds. Once an initial guess is generated, the actual transition state must be located via a numerical optimization algorithm, and the correspondence of the transition state that is located with the desired reaction must be confirmed via calculations of second derivatives of the energy. The protocol that we have developed links these steps together in an automated workflow. The overhead associated with setting up TS search calculations and with analyzing their results is minimized via the deployment of a graphical user interface (GUI) integrated with the Jaguar quantum chemistry program, 11,12 through which the automated workflow is controlled. This tool converts the difficult problem of finding a TS into the relatively easy task of sketching or otherwise creating the structures of the reactants and products. The minimal overhead permits the user to easily set up and launch TS search for multiple reactions, and makes the traditionally laborious TS search calculations more accessible to non-experts. When setting up the TS calculation from the GUI, the user selects various characteristics of the quantum chemical methodology, and treatment of solvation, that define the energy surface to be optimized. For the present paper, we have focused our efforts on density functional theory (DFT)

5 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

methods, which provide efficient computational performance while at the same time yielding very reasonable results for the barrier heights of a wide range of chemical reactions. 13–16 The user chooses the DFT functional and basis set from the GUI as in a normal Jaguar calculation. In the present paper, for most calculations we employ the B3LYP functional 17,18 and a relatively modest double-zeta basis set. Once a transition state is located at this level, it is straightforward to perform a single point calculation (or a local reoptimization if desired) using an (arguably) more accurate functional such as M06-2X 19 along with a larger basis set, if calculation of the absolute barrier height is an important objective. In many cases, relative barriers for a series of reactions can be determined with good accuracy with the B3LYP functional and a double-zeta basis set. While efforts to design more efficient methods to locate saddle points on the PES continue, 20–28 a number of recent works report a high degree of automation of TS search. 25,29–37 These reports are for the most part focused on an automated exploration of a specific type of PES in a defined region of phase space with a goal to locate novel reactions, reaction intermediates, and the corresponding TSs. 25,29–32,34,36,38–40 Applications of these methods to small molecules 31,32,35 and a single reaction type 31 are common but it remains to be seen if in practice these approaches can be applied to larger molecules and diverse reaction types without extensive customization for each new application. In contrast, our goal is not to explore PESs and find multiple TSs in a certain area of energy space but rather to automatically locate a single TS that lies between two given, presumably adjacent PES minima (as shown in Figure 1). An ideal situation for our approach would be the existence of only one TS on the PES path between the two minima. If it turns out that there is more than one TS between the two given minima our program will attempt to locate all these TSs in an iterative fashion. However, the algorithm is not optimized to deal with such cases and we estimate that the chances for successfully locating all the TSs along the path (and even one TS of interest) separating distantly located input minima drop sharply. Finding a TS between two endpoints is seemingly a common operation and not a very lofty goal; theoretical chemists have been doing this in countless studies for decades. And yet locating transition states in a fully automated way with a high frequency of success can be regarded as an

6 ACS Paragon Plus Environment

Page 6 of 54

Page 7 of 54

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

important challenge on its own, particularly if we view this task as an elementary unit of work required for pursuing other, more involved research projects. Among broad classes of applications in which TS search is a frequent and indeed an elementary operation are catalyst design, reactivity and regioselectivity predictions, optimization of synthetic pathways and, ultimately, evaluation of synthetic feasibility. Often, dozens of TSs need to be found and analyzed before a conclusion about reactivity in a class of compounds can be made, and computational optimization of an efficient catalyst could require hundreds or thousands of transition state optimizations. Each individual TS search is a difficult and sometimes even an arduous computational task that requires significant expertise and user involvement. Having to perform multiple such calculations in a typical research project creates a barrier to a more rapid catalyst design or reactivity screening. Our objective is to substanially lower this barrier and make each TS search a routine computational operation requiring a minimal user involvement. Achieving this objective requires a combination of algorithmic development and software engineering; these efforts are described in more detail in what follows. The paper is organized as follows. In Section 2, we describe the methodology used to construct the initial guess, covering both the interpolation and template based approaches. In Section 3, technical details of the calculations are presented. In Section 4, applications are given for moderate to large data sets representing four classes of reactions: Michael additions, carbene insertions, hydrogen atom abstractions, and Diels-Alder reactions, as well as a reaction class involving transition metals. We also summarize in more detail the limitations of the current technology in Section 5. Finally, in Section 6, the Conclusion, we summarize our results and briefly discuss future directions.

2 METHOD We seek to construct a workflow that takes as input reactant and product molecules (each defined separately) and returns one or more TS geometries that connect reactants to products. This is

7 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 2: The AutoTS workflow. The rectangular-shaped boxes represent tasks while the diamond-shaped boxes represent workflow decisions. See text for the details. in contrast to a typical input for a quantum mechanical computation which would consist of reactant and product complexes. These complexes would have individual molecules positioned in an intelligent way by the user, with similar internal geometries. In addition, atom indexes would have to be assigned consistently. The motivation for such a simplified input is manyfold. First, it is natural to sketch a reaction by drawing 2D representations of molecules such as done in an introductory chemistry course. This permits these types of calculations to be conducted by laboratory, not by computational scientists. Second, removing the burden of atom indexing and complex construction opens the door to easily screening a set for the most or least reactive chemical compounds for a particular reaction. The general organization of our automated TS search workflow is shown in Figure 2 and is described in the text below. The diagram indicates that we start by collecting the definition of the reaction of interest from the user through a GUI. We require the number and type of atoms to be strictly conserved in this definition, not allowing reactions for which the solvent is implicitly an active participant. The individual reactant and product molecules are next assembled

8 ACS Paragon Plus Environment

Page 8 of 54

Page 9 of 54

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

into reactant and product complexes and the correspondence of atoms between the reactant and product complexes is determined. Here and in the following discussion we refer to chemical structures in which all atoms are bonded together as molecules whereas the term complex refers to a collection of interacting molecules. The atoms are re-ordered such that there is a direct, one-to-one correspondence of atoms between reactant and product complexes. Once we have properly ordered atoms we manipulate the torsional coordinates as well as the positions and orientations of the individual molecules to generate reactant and product complexes that are connected by the simplest of reaction paths. We then generate a guess at the TS, either by using information from a template TS or by finding the geometry corresponding to the maximum (DFT) energy point along a linearly interpolated path between reactant and product complexes. This guess is optimized using the hybrid quadratic synchronous transit (QST) and eigenvector following approach of Peng and Schlegel. 41 The optimized TS geometry is vetted (see Section 2.5) before confirming that the TS connects the input reactant and product molecules by computing the endpoints of an intrinsic reaction coordinate (IRC). 42 Each of the steps that rely on quantum chemical calculations are prone to failures (such as a failure to meet an optimization criterion) and we attempt to automatically recover from such situations using fallback methods, as shown in Figure 2.

2.1 Input As input we only require a precise definition of the reaction. Optionally, the user can specify a particular potential energy surface (defined by a basis set, density functional and solvation model) and customize values of parameters that control the proposed automated workflow. The reaction is specified by providing a definition of the assumed endpoints, i.e. reactant and product molecules; this includes specifying the charge and spin multiplicity of these molecules as well as the charge and spin multiplicity of the reaction complex. We ask the user to provide these molecules separately, not as complexes. This frees the user from the burden of having to intelligently position the component molecules into reactant and product complexes (this is discussed in Section 2.3). It is important to note that when providing the structures of the reactants and the products the 9 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 54

user assumes that the reactant and product complexes that will be formed from the initial input structures are stable minima on the potential energy surface, at least at the level of theory that the user employs for the calculation. If this assumption turns out to be false, and the reactant complexes cannot exist as stable entities (because they either “decompose” or “react”, changing atom connectivities during a geometry optimization) then the TS search might produce results inconsistent with the user’s initial assumptions about the reaction. Certain σ-complex intermediates, the existence of which is often postulated in textbooks, but which are not located in careful theoretical studies serve as an example of such unstable species. 43,44 Our workflow exits in this situation and the user is required to restart the calculation with inputs that are truly stable on the requested PES. Before any further work is performed on the input molecules, we must define chemical bonds between atoms. This assignment is performed automatically by our code, using the cartesian coordinates and element types as input. We require this information to use various cheminformatics techniques which enable us to label the atoms in the reactant and product complexes consistently (see next section), to correctly determine force field (FF) parameters and, when applicable, to define atomic radii used by the PBF solvation model. 11 For organic molecules, bonding can be automatically assigned in a fairly robust process. This is less true for organometallic molecules and other molecules with metal atoms which display a greater variety of bonding arrangements and distances; this is one weakness of the workflow. We assign bonds by initially connecting atoms separated by a distance that is smaller than the sum of their covalent radii, followed by a protocol to assign a Lewis structure. 11 Chemical changes can accompany geometry optimization, TS optimization and IRC calculations, and, as such, we must re-evaluate the bonding anytime the intermediate output of a quantum calculation is read into our program.

2.2 Determining atom correspondence Computing a bijection, a function which determines the one-to-one correspondence of atoms in a chemical reaction, is required in order to compute distances and paths between the reactant and 10 ACS Paragon Plus Environment

Page 11 of 54

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

product complexes. This problem, also known in the literature as reaction mapping, is of more general interest for enumerating possible reaction mechanisms, extracting and processing reaction data from the literature as well as storing and retrieving reaction data from a database. 45–50 There are several approaches to solving this problem. The first approach is based on utilizing some variant of the maximum common substructure (MCS) of the reactant and product structures. 45–48 The second approach searches for a bijection which minimizes the number of broken bonds; 46,49 this idea is also referred to as the minimal chemical distance. The final approach is to search for a molecular graph representing the so-called imaginary transition state, essentially a single graph which represents both the fixed and changing bonds. 50,51 During the preparation of this manuscript we became aware of the work of Crabtree and Mehta, 49 whose approach to this problem mirrors our own. Our active bond searching method described below is essentially equivalent to what they refer to as the Fewest Bonds First with Constructive Count Vector (FBF-CCV) method with the important difference being that we use canonical SMILES strings 52 to uniquely identify isomorphic molecular graphs. This allows us to easily include stereochemical data in the reaction mapping algorithm by using isomeric SMILES. Crabtree and Mehta describe their methodology in a much more formal way than our rather pedagogical description below. One benefit of our method is that it is very easy to implement if one has access to basic cheminformatics tools such as canonical string representation of molecules (canonical SMILES) and substructure searching (SMARTS 53 ). A useful way to represent a chemical reaction is by constructing two, possibly disjointed, molecular graphs consisting of nodes (atoms) and vertices (bonds). A chemical reaction consists of a set of bond addition and deletion operations which transform the reactant graph into the product graph. We refer to the set of bonds that are being formed or broken as the active bonds. While identifying corresponding atoms between the reactants and products in a chemical reaction is a difficult problem, identifying the corresponding atoms between two structures that are conformers can be solved using standard cheminformatics tools. This is because these two molecular graphs are isomorphic. Relatively efficient methods are available to generate all bijections between two

11 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

isomorphic molecular graphs. 54 We note that by deleting all active bonds one can transform the reactant graph and product graph into two sets of graphs in which each member of the first set is a conformer of a member in the second set. Crabtree and Mehta 49 refer to the transformation between the modified reactant and modified product as an identity chemical reaction. The set of active bonds is not unique. For instance, defining all bonds to be active is a valid way of transforming the reactant graph into the product graph. This definition would break all reactant bonds and form all product bonds. However, we make a fundamental assumption that the only transformation of interest is the one that minimizes the number of active bonds. We have not yet considered cases where a preferred minimal solution is chosen amongst several degenerate ones (our algorithm chooses an arbitrary minimal solution in such cases) or non-minimal solutions that correspond to alternative mechanisms. These examples can be important and one is depicted in Figure 3. Cases in which active bonds are incorrectly identified will lead to errors in atom correspondences. If this occurs it is unlikely that the job will find a transition state. The user can, however, intervene and such interventions are briefly described in section 5. The above considerations lead us to an algorithm in which we construct a bijection between reactants and products in a two step process; the first step is to identify the active bonds and the second step is to compute a bijection for each pair of conformers which results from breaking all active bonds. Both of these steps are discussed in more detail below.

Figure 3: A lactonization reaction represented as a hypothetical single step reaction shows two minimal active bond solutions. The number of active bonds is the same in each solution, both pointing to a valid mechanism. Our active bond searching algorithm is not able to select the better of these two solutions. We use two techniques to identify active bonds, and we only use the latter approach when 12 ACS Paragon Plus Environment

Page 12 of 54

Page 13 of 54

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

the former fails. The first approach is essentially a brute force search (following Crabtree and Mehta 49 ), which is guaranteed to find the correct solution 49 when the number of active bonds is smaller than the maximum number we search (see below). The second technique is an MCS search, 54 which works for any number of active bonds but is not guaranteed to find the minimal active bond solution. 49 The brute force method exploits the fact that, as discussed above, breaking all active bonds results in the reactant and product structures being conformers. We break some number of bonds (Nbr ), generate canonicalized SMILES strings 52 for both reactant and product and test for their equivalence. The algorithm starts by searching for solutions with zero bond breaks and if no solutions are found the number of broken bonds is incremented and the search is repeated. This searching proceeds until either a solution is found or a maximum number of bond breaks is exceeded, in which case the algorithm has failed. The number of sets of bond breaks that are searched can be greatly reduced by requiring that the correct net change in bonding be maintained in each set searched. This filtering of operations is equivalent to the CCV approach discussed by Crabtree and Mehta. 49 At each stage in the above mentioned algorithm one is forced to search at most

Nbond Nbr



combinations

of bonds where 0 ≤ Nbr ≤ Nbond . In the worst case scenario where all bonds need to be broken the cost of the search would be an extraordinary 2Nbond . While it is unlikely that a chemical reaction of interest will break and reform all bonds, to avoid this exponential scaling we search only up to a maximum number of bond breaks. We have chosen a default value of four for this maximum number as we have found that this covers most practical reaction types we have studied thus far. Searching four bond breaks is many times faster than any of the quantum chemical calculations that are performed in the workflow and so it may be possible to increase this limit while simultaneously avoiding the introduction of a bottleneck step. If we fail to discover a solution using this method we utilize a less robust approach. The idea behind this less robust approach is that the common substructures of the reactant and product structures are conserved. Once the conserved substructures are recognized it is easy to determine which bonds are active. In this approach we look for the MCS of the reactant–product pair and

13 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 54

then remove this substructure. We next find the MCS of the remaining atoms and continue this procedure until no atoms remain or the remaining fragments have no substructure in common. For each MCS extracted a bijection can be determined, and this initial bijection is improved using the method described below. The problem with this approach is that there is no guarantee that the maximum common substructure is conserved, though in our experience this is usually the case. A counterexample is the glucuronidation reaction shown in Figure 4 where the MCS includes the C – O bond. This results in four active bonds whereas there is a two active bond solution (taking into consideration active bonds on both sides of the reaction), which, unsurprisingly, corresponds to the actual mechanism.

Figure 4: In this glucuronidation reaction the MCS algorithm identifies the maximum common substructure as containing the C – O bond shown in bold blue. Contrary to the typical case, the maximum common substructure in this example is not conserved in reality, since the oxygen atom, which is part of the bold blue bond, is not intact in the process. It is replaced by the oxygen atom from the phenolate and departs with the newly formed phosphate anion. As a result of this maximum common substructure assignment, the MCS algorithm defines active bonds inconsistent with the reality of the physical process whereas a brute force method described in the text correctly identifies the active bonds.

Once the active bonds have been enumerated we can delete all of them to obtain two sets of molecules, one for the reactant side and one for the product side. Each set contains conformers of the other and can be paired as such (using string based matching, for example). For each conformer pair we compute all bijections using a substructure search based on SMARTS pattern searching. 52,54 We choose the bijection which delivers the lowest root mean square distance (RMSD) after superposition. In addition, we require the stereochemistry of tetrahedral atoms to be conserved in the bijection. This situation must be considered even when the four atoms are chemically identical because labeling these identical atoms (with atom indexes) creates a stereocenter. For 14 ACS Paragon Plus Environment

Page 15 of 54

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

SN 2 reactions a change in chirality is required and we perform a numbering correction for this reaction type, which is easily detected. Once the bijection is determined the order of atoms is changed so that there is a direct, one-to-one correspondence between the atoms of the reactant and those of the product. While the methods we use for determining active bonds, for example the brute force bond search, have the potential to scale poorly with system size, as discussed above, the computational bottleneck in practice is the cost of quantum chemical calculations. Therefore, there is little ground for concern about the computational efficiency of the determination of the atomic bijection.

2.3 Generating a TS guess The most basic approach we use to generate a guess at the TS is to linearly interpolate a path connecting reactant and product complexes. 55 The linear interpolation is typically performed in distance coordinates though other choices are possible. We next compute the energy at the points along this path using density functional theory and use cubic splines to determine the maximum energy point. Finally, we interpolate the geometry to the point corresponding to this energy maximum. The accuracy of the TS guess computed in this manner is extremely sensitive to the smoothness of the interpolated path. In order to generate a smooth path it is crucial to intelligently position the individual reactant and product molecules as well as “align” their internal coordinates. In essence, we wish to minimize the number of moving parts. For the purposes of generating a TS guess it is not necessary (or even advantageous) to start and end the interpolated path at PES minima, though the bond angles and distances ought to be representative of the reactant and product structures. The geometries we seek are something akin to a so-called near attack conformation. The initial reactant and product molecules are positioned in a complex at the corners of a cube, a 2

× (nx, ny, nz) where a is the side length and nx, ny and nz take on values of -1 or 1. The side

length of the cube is determined by estimating a distance at which the molecules will not overlap, this distance not being critical as it is simply an initial guess which will be optimized later (see 15 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 54

below). All molecules are rotated into a coordinate frame that diagonalizes the inertial tensor in order to give a standard orientation. This process gives a reproducible initial guess at a reaction complex. In order to improve the orientation of the initial and final points along the interpolated reaction path we minimize the cartesian distance between reactant and product geometries. We achieve this by varying translations and rotations of individual molecules as well as varying torsion angles of all rotatable bonds, that is those single bonds that are not involved in ring substructures. This procedure has the potential of creating intramolecular atom collisions, so we supplement the target function with repulsive van der Waals terms from a force field. After minimization, the individual molecules making up the reactant or the product complex typically overlap with one another, and we separate them by translation along a centroid vector until their van der Waals radii are no longer overlapping. We have found that this relatively simple procedure is highly effective. By optimizing the reactant and product complexes over torsional coordinates we sacrifice the ability to study barriers of internal coordinate rearrangements in an effort to enhance our success rate at studying chemical changes. We are implicitly assuming that barriers leading to these internal rearrangements are small compared to the barrier associated with the chemical change. This procedure of manipulating internal coordinates becomes important when the user provides the reactant and product molecules separately because it is easy for the user to provide these molecules in very different conformations. Further, if the user supplies the input molecules via 2D, or even string representations this manipulation becomes essential. This is because it is unlikely that the 2D→3D conversion will create similar conformations of the various molecules. These differing conformations will unnecessarily complicate the reaction path connecting reactant and product complexes unless the internal coordinates are aligned. In addition to interpolation we have the ability to use geometric information from a previously located TS. We refer to this approach as templating and to the predefined TS geometries as templates. TS templates, along with the reactant and product geometries that they connect (as verified by an IRC calculation) are stored in an auxiliary computer file. Several string representations of the

16 ACS Paragon Plus Environment

Page 17 of 54

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

input reaction are generated and used to search this template file for matching reactions. Each representation is of a particular order, i, and is constructed by generating SMILES strings for the atoms in the reaction center plus all (i − 1)th nearest neighbors. We tend to use only three orders starting with i = 1. The SMILES patterns of the reactant and product structures are combined with the symbol “≫” as in reaction SMILES. The algorithm has the preference for matches at the highest order attainable. Once it is recognized that a usable template exists, we extract it from the file and compute an atom correspondence between the input structures and template structures. This is done by utilizing an MCS algorithm which computes a bijection for the reaction center atoms. The reaction matching described above guarantees that the reaction center and possibly some number of neighbors make up a substructure shared between the reactant and product. Internal coordinate constraints from the reaction center of the template are extracted and used to perform a constrained optimization of the reactant complex in order to construct a good TS guess. The internal coordinate constraints are chosen by performing the following analysis. We first define a set of internal coordinates which includes all bonded pairs as stretches, all 1-2-3 bonded triplets as angles and all 1-2-3-4 bonded quartets as torsions. We do this for reactant and product structures and take the union of the coordinate sets so that all active bonds are included. This set of coordinates is filtered by requiring that all atoms in the coordinate be active (involved in active bonds). The constraint values are taken to be the value of each of these coordinates for the TS template. Optimizing the reactant geometry using this set of constraints enforces the geometry of the reaction center of the template while allowing the rest of the molecule to relax. We typically perform a constrained optimization using a force field for energy and force evaluation prior to a final constrained optimization using the desired, quantum, level of theory. This method of extracting and using templates is completely automated and frees the user from the burden of keeping track of such templates, manually setting constraints as well as identifying and enforcing atom correspondences. In the case of templating we also need to align reactant and product endpoints to the template to provide a high quality setup for the QST calculation (see Section 2.4). We use an alignment

17 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 54

procedure similar to that described above. The only essential difference between the two is that now we do the alignment in two stages, using the reactant complex and TS complex as input in the first stage and the product complex and TS complex in the second stage. In both of these stages we hold the TS complex fixed, only optimizing the position, orientation and torsional coordinates of the reactant or product complex. As indicated in Figure 2, we use two methods to recover from situations in which the TS optimization fails to converge. The first is to optimize the TS guess. This means that we perform a constrained optimization of the TS guess. The set of constraints that is used is the same as the set used when templating. The second method we use is path relaxation. This is a way to improve the linear path by allowing it to relax towards a path that has zero gradient in the direction perpendicular to the tangent to the path. We have implemented the relaxation stage of the growing string method (GSM) 27,28,56 and refer to it as the relaxing string method (RSM). In this approach we begin with a path which is discretized at a set of points {i} and minimize

F=

1X ⊤ g Pi Pi gi , 2 i i

(1)

where gi is the gradient at point i and Pi = 1 − τi τi⊤ is a matrix which removes the component of the gradient that is tangent (τi ) to the path at point i. Our implementation follows the suggestions in the appendix of the work of Peters et al., 561 It has been previously noted that GSM is particularly effective (as opposed to other string methods) because the growth of the string avoids regions of high potential energy. 27,28 However, we believe that by positioning the reactant and product complexes carefully, as described in Section 2.3 our linear path samples lower energies than it would if one were not to use such an algorithm or, for example, restrict the endpoints to being PES minima. 1

The RSM string is optimized in internal coordinates using the Broyden-Fletcher-Goldfarb-Shanno (BFGS) method. The string is re-parameterized and path tangents are computed by interpolating each cartesian degree of freedom independently, using cubic splines.

18 ACS Paragon Plus Environment

Page 19 of 54

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

2.4 TS convergence Optimization of a TS guess is much more difficult than optimizing to a minimum. The reason is easily understood by inspecting the formula for a quasi-Newton step in a coordinate system that diagonalizes the Hessian (the matrix of second derivatives of energy with respect to nuclear displacements), δ˜ xi = −h−1 ˜i , ii g

(2)

where the step, δ˜ x, and gradient, g ˜, are related to the cartesian step and gradient by a unitary transformation, and h is a diagonal matrix of the eigenvalues of the Hessian. Eq. 2 shows that if the eigenvalue of the Hessian is positive we step opposite the gradient (downhill in energy) and if it is negative we step in the direction of the gradient (uphill in energy). Geometry optimization is easy because one can simply reverse the signs of all negative eigenvalues so that we always proceed downhill and we are guaranteed to eventually arrive at some minimum. In a standard TS optimization we wish to follow one (and only one) eigenvector uphill and all others downhill. Further, we want this chosen eigenvector to qualitatively resemble the reaction coordinate of our reaction of interest. The difficulty with this approach is that in order to have a good eigenvector to follow our TS guess must be in a region of the PES that has a Hessian which is qualitatively similar to that of the actual TS. If this is not true, we do not have a good way to choose an eigenvector to follow and therefore have no hope of converging to a meaningful TS. An attractive approach which has given us a high rate of success when acting on quality inputs, is the method of Peng and Schlegel. 57 This is a hybrid approach that uses quadratic synchronous transit and a quasi-Newton, eigenvector following optimization. We will refer to this method simply as QST. Briefly, the method takes three structures as input: reactant, product and TS guess. The first phase of the calculation is to locate the maximum energy point along a quadratic path going through reactant, TS guess and product. Once the maximum is attained the method switches to an eigenvector following algorithm, choosing that eigenvector to follow which has a significant overlap with the quadratic approximation to the reaction path. If no such vector exists,

19 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 54

the algorithm chooses the vector with the lowest eigenvalue; see Ref. 57 for details. An additional benefit of this approach is that the guess Hessian is improved in the initial stage of the optimization, before any eigenvectors are followed.

2.5 TS verification Once a TS has been successfully optimized it is necessary to validate a number of its properties. This stage is important because we need to be sure that the TS found is the one corresponding to the input reactants and products and that it is not flawed in some way. Such an inspection of the TS is especially valuable for automated TS searches like ours, where user involvement in the process is limited and where obvious problems with the generated TS might otherwise go unnoticed. The first step of the analysis is a pre-screening of the geometry and transition vector, which we refer to as TS vetting. The second step is to verify that the given TS “connects” the reactants and products given as input. The TS vetting involves three criteria which must all be satisfied. The first is that there is exactly one negative eigenvalue (imaginary frequency) of the Hessian. The eigenvector, or mode, corresponding to this eigenvalue is referred to as the transition vector. The second and third requirements, the technical details of which are given in the following paragraph, ensure that the TS geometry has at least one active bond at an intermediate distance and that the transition vector contains motion along an active bond. These are two properties that one might look at when visualizing the TS geometry. This vetting analysis is useful for screening out obviously wrong TS geometries, such as the case when a QST calculation converges to a TS corresponding to some internal motion (e.g. methyl group rotation) instead of the reaction of interest. It is a waste of computational time to proceed to an IRC with this type of TS geometry and it is best to consider this a failed QST calculation. An intermediate value for a bond distance (rij ) between atoms i and j satisfies the relationship

R0 ≤

ricov

rij ≤ R1 , + rjcov 20

ACS Paragon Plus Environment

(3)

Page 21 of 54

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

where ricov is the covalent radii of atom i, 11 whereas R0 and R1 are unitless constants which are set to 1.2 and 1.7 by default. To ensure that the transition vector has motion along a bond stretching mode we require that max |hvi |τ i| ≥ S0 i

(4)

where τ is the transition vector, vi is a unit vector representing the ith (active) bond stretching mode and S0 is a constant which defaults to 0.33. This latter criterion is relatively loose but since we will follow up this analysis with an IRC step which ensures the TS is connected to the input endpoints we would rather occasionally accept a poor TS than reject a good one. Once a TS has successfully passed the vetting process we need to check that it truly connects the input endpoints. To do this we compute the IRC, 58 the path leading directly downhill to minima on both sides of the TS, and extract the endpoints of this path. We are only interested in the endpoints and so this method is likely to be overly expensive (as we do not need an extremely accurate IRC), and this constitutes one possible avenue of improvement. The method of Gonzalez and Schlegel 58 is fairly robust but we do encounter difficulties when working with reactions with imaginary frequencies of small magnitude. One such reaction type we have studied is Michael additions involving methylthiolate. One simplification of the IRC we occasionally employ is to fully optimize the geometry after the first IRC point has been found. This procedure attempts to find only three points along the IRC, the reactant, TS and product geometries. We refer to this approximate method as three-point IRC and is essentially equivalent to Goodman and Silva’s “Quick Reaction Coordinate” (QRC). 59 In order to determine whether the TS connects the reactant and the product the endpoints of the IRC are compared to the input structures. We consider two criteria when deciding whether two structures are equivalent. The first is whether or not the two structures are conformers, and the second is that the root mean square deviation (RMSD) is under a certain threshold, typically ˚ By default, we only require that the first criterion be satisfied, and this means we require 0.2 A. that the TS leads to the correct chemical change while allowing for conformational differences. This avoids the burden of searching for additional barriers that lead to conformational changes. 21 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 54

Typically, these are low-lying torsional barriers which do not correspond to the rate-limiting step and are only of minimal interest. If the IRC endpoints are determined to be equivalent to the input endpoints we assume that we have located the correct TS and we report the results. If at least one of the IRC endpoints is a mismatch we must search for another connection (another TS). The manner in which we collect data on minima and their connections is as follows. We start with a list containing two structures,

M = [Rinp , Pinp ] ,

(5)

the input reactant and product, respectively. After the first IRC endpoints have been found, RIRC 1 and PIRC respectively, they are compared to the inputs. If the IRC endpoint is found to be 1 equivalent to the input endpoint we replace the input endpoint with the IRC endpoint; otherwise, it is inserted into the list between the input reactant and product structures. For example, if the reactant pair, but not the product pair, were found to be equivalent we would have a new list,

  IRC M = RIRC 1 , P1 , Pinp .

(6)

In this case we have knowledge of a TS that connects RIRC and PIRC whereas we would still 1 1 require a connection between PIRC and Pinp . We would restart the entire algorithm using PIRC 1 1 and Pinp as the input reactant and product, respectively. We proceed in this manner until either all known reactant and product structures are connected through transition states or we are forced to attempt the same connection twice. In order to enforce this latter criterion we keep a list of all pairs of structures we have previously attempted to connect by locating a TS. Before attempting to locate a new TS connecting two structures we inspect this list to ensure that we are not repeating a search, and this is done by equating structures with the same criterion used to equate input structures and IRC endpoints, described above. If it is determined that we have already searched for this particular connection then we exit the program instead of repeating the failed search. Similarly to any other quantum chemical method, IRC calculations do not always succeed for a 22 ACS Paragon Plus Environment

Page 23 of 54

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

variety of reasons including SCF failures and convergence failures due to running out of geometry optimization cycles. In our experience IRC calculations are particularly prone to failure when the TS in question has a small imaginary frequency. Two common failures that we have encountered are the IRC endpoints not moving far enough away from the TS before “converging” and the final endpoints resulting in either two reactants or two products (both sides of the IRC move in the same direction). We inspect the output of the IRC calculation and require that both of the IRC endpoints be neither too close to the TS geometry nor too close to each other, as measured by RMSD. If either of these criterion fails we consider the IRC calculation to have failed. We do not wish to allow these failures to ruin an otherwise successful calculation even though some uncertainty about whether or not the TS truly connects the input reactant and product structures remains. As such, we have no other recourse available than to assume that the TS in question does in fact connect to the input reactant and product structure. This possibly dubious assumption is presented to the user in a summary table in the output file. Recall however, that the TS in question has passed our vetting procedure so there is some assurance that the located TS is the correct one.

3 COMPUTATIONAL DETAILS Our automatic TS predictor, which we call AutoTS, was built on a chassis of Schr¨odinger software libraries and programs which comprise functions for handling SMILES representations of molecules, SMARTS molecular pattern recognition, finding the MCS between two or more structures, performing constrained optimization with force fields and quantum mechanics calculations. The force field used in all calculations as part of the TS search workflow was OPLS2005. 60–62 All quantum mechanics calculations were carried out with Jaguar 12,63 using the pseudo-spectral approximation, which speeds up DFT calculations by a factor of 2 to 5 at a negligible loss of accuracy. 64–68 We use non-mass-weighted coordinates when performing all IRC calculations, technically this yields a minimum energy path (MEP) though since we are only interested in the endpoints of this path the distinction is unimportant. To simulate solution phase processes we used the Poisson-Boltzmann

23 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 54

finite elements method (PBF) which is an implicit solvation model. 69–71 Several combinations of density functionals and basis sets were explored in applications. At this stage of the development process we tend to prefer the traditional functional B3LYP, 17,18 sometimes augmented with dispersion corrections. 72,73 The use of other, presumably more accurate functionals for transition state barriers, 13,16,19,74–77 is possible, and will be attempted in the future, when the highest achievable accuracy of the TS energetics is desirable. As to the basis set, we currently do not have a strong preference for a basis set family; however, the use of diffuse basis functions is often essential to stabilize anionic species. The AutoTS code is parallelized on two levels: independent quantum mechanical calculations, such as energy evaluation on points of a reaction path, can be executed concurrently (distributed parallelization), while each individual Jaguar calculation is in turn parallelized using OpenMP (threaded parallelization).

Figure 5: Correlation between ∆G‡ and log t1/2 for the first 14 Michael additions from Table 2 from the work of Flanagan et al., 78 for which a precise measurement of t1/2 was available. The units of t1/2 are hours. A comparison between the results of our automated approach and that of Flanagan et al. is given. All data are obtained with B3LYP functional and 6-311+G** basis set but with different solvation models: this work used PBF 69–71 and the work of Flanagan et al. used SMD. 79

24 ACS Paragon Plus Environment

Page 25 of 54

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

4 APPLICATIONS In this section we report applications of our automated approach to multiple reaction types and discuss its successes and failures. We believe that a successful computational tool for finding TSs should have a broad coverage of chemical space and reaction types. As in any workflow of significant complexity, certain limitations cannot be avoided, but in designing our program we strove to eschew the immediate or eventual suitability to a very narrow class of chemical reactions or applications. The chemical and reaction spaces are practically infinite, and therefore it is impossible to evaluate or predict the success rate of finding TSs even for a small region in the reaction space or even for the chemical space in one reaction class. Our attempts at doing so have proved unsuccessful, as it was easy, at any stage of the test, to come up with reactions or chemical structures that would significantly alter the heretofore determined success rate, swaying that value to better or worse. In order to avoid “subjective” contents of test sets it is therefore preferable to evaluate success rate on pre-defined test sets from the literature. Automated approaches like our TS search are capable of producing a lot of data in a short period of time. When it comes to the calculations presented below, the structures of the transition states, their energies, vibrational frequencies and modes, as well as the energies of the corresponding reactants and products have all been carefully validated before the result of the TS calculation could be declared a success. It is not the objective of the present work, however, to carefully analyze these data and draw chemical conclusions from them. Our main interest lies in verifying that our automated approach is capable of producing the same quality of TSs as when using traditional approaches, establishing the success rate of the automated approach for certain reaction types, and in making conclusions about the readiness of the present version of the code toward related reactions of actual chemical significance. We feel that the restrictions to the length of this publication will not permit us to do justice to the generated novel kinetic and thermodynamic data, and that it might be more appropriate to discuss and analyze these results from the chemist’s point of view elsewhere. Nevertheless, the structures of a number of representative transition states 25 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 54

located by our approach, together with the corresponding energetics data, are given in Supporting Information.

4.1 Michael addition The Michael addition involving the methanethiolate anion (see Figure 5) is one of the most common reaction types used in toxicity studies. 80,81 In particular, this reaction is used for modeling intrinsic reactivity and toxicity of covalent warheads in medicinal chemistry. 78,82 Despite its apparent simplicity, the Michael addition is a difficult reaction to model. The reverse TS barrier in this reaction is sometimes as low as 1 kcal/mol and the corresponding imaginary vibrational frequency is typically below 100 cm−1 . This makes the PES challenging and TS hard to locate. Additionally, the modeling must be conducted with an implicit solvent model because the experimental reactivity or toxicity measurements are performed in the solution phase. The presence of the delocalized negative charge in the TS and in the product suggests the use of diffuse basis functions, which makes the calculations more expensive. Figure 5 shows a prediction of reactivity on the test set of Michael additions of Flanagan and co-workers. 78 Our automated TS search produces slightly different values of ∆G‡ in comparison with those of the original paper, 78 doubtless due to differences in the computational protocols (for example, we are using a different implicit solvation model, PBF), but importantly, the correlation of our transition state barriers with the experimentally measured data is even better. For this data set, the results show that the automated approach compares well with both the previous literature calculations and with experiment.

4.2 Carbene insertion reaction An interesting test case for our automated TS search is a reaction type involving carbene chemistry. Structures with reactive divalent carbon atoms, such as carbenes, provide a good test for the cheminformatics and molecular mechanics layers of our automated workflow, as those are typically designed for tetravalent carbon atoms and could potentially malfunction in unusual valence environments. 26 ACS Paragon Plus Environment

Page 27 of 54

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

For our test we chose the work of Mieusset and Brinker, 83 which is dedicated to a theoretical study of insertion of carbenes into the C−H bonds of acetonitrile, methane, and isobutane. The goal of our test was to confirm that the fully automated approach would be able to correctly generate the transition states and their barriers in these difficult reactions, with the results obtained being similar to those of the original work. There were a total of 69 TS calculations in the original work, of which 57 were performed at the B3LYP/6-31G* and 12 at the HF/3-21G* levels of theory. Two of these 69 calculations did not yield an appropriate TS barrier in the original work. Six of our automated calculations failed to find a TS, two of which were those that failed in the original work, but the remaining 63 TSs were successfully found. A parity plot in Figure 6 shows the 63 barriers that could be compared, and demonstrates that there is an excellent agreement between the TS barriers found with the traditional and a completely automated approach. Insignificant differences observed in the results, often on the order of 1 kcal/mol or less, could be explained by slightly different accuracy and convergence settings employed by the two quantum mechanics engines that were used in our work and that of Mieusset and Brinker. Additionally, Mieusset and Brinker applied a frozen bond approximation in a number of their calculations, which might affect the energies slightly. As noted above, a small number (4 out of 63) of additional calculations failed in the automated TS approach as compared to the results reported in the paper. We performed a detailed analysis of these failures, which are technical in nature and should be straightforward to address in a future version of the methodology. Furthermore, even if the automated approach fails, the user is often left with a reaction setup and intermediate results from which a more sophisticated manual analysis can proceed. Concretely, the user will have access to the output from a (failed) TS optimization job, from which a guess at the TS is available. This guess is most likely a useful starting point for beginning a more intensive search for the TS of interest. Often some technical detail will be noticed and fixed simply by inspecting this output (for example, the number of geometry optimization cycles can be increased). The conclusion is then that the current version of the automated TS approach would be very useful in treating the carbene insertion reaction, requiring a relatively

27 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

small amount of additional work to address a small number of failed test cases subsequent to the initial application of the methodology, and that future version will be able to systematically reduce the frequency of such problem cases via improvements in various components of the workflow.

Figure 6: Free energies of the transition state barriers in gas phase obtained in the work of Mieusset and Brinker, 83 versus those produced by our completely automated transition state search. The levels of theory, always kept consistent between the two approaches, were B3LYP/6-31G* and HF/3-21G*.

4.3 Hydrogen abstraction Hydrogen abstraction reactions occupy a central role in describing fuel combustion, 84–87 properties of antioxidants, 88–91 and metabolism via cytochrome P450 pathways. 92–94 Recently, a large-scale effort to find TSs in 1393 hydrogen abstraction reactions in which C−H bonds were broken by small radical species, has been undertaken by Bhoorasingh and West. 33 Using an automated approach that required a significant preliminary effort for building a highly specific molecular group tree, these authors found 930, or 67% of hydrogen abstraction TSs. We applied our general transition state search to the reactions of Bhoorasingh and West and found 1202, or 86% of TSs. See Table 1 for the summary of the results. It is important to note that our approach did not require any training data or preliminary 28 ACS Paragon Plus Environment

Page 28 of 54

Page 29 of 54

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Figure 7: Comparison between energetics of Diels-Alder reactions proceeding via the concerted and a step-wise mechanism. The data is from the work of Chopin et al 95 and from this work, applying an automated TS search algorithm which used only 2D structures of the reactants and products as input. The intermediate and two TSs of the stepwise mechanism were predicted by the automated TSs without user intervention. The small differences in energies in the case of the stepwise mechanism are assumed to be due to conformational effects or differences in convergence criteria in the two approaches.

29 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 30 of 54

adjustments for hydrogen abstractions beyond using templates extracted from a number of successfully found TSs in the first round of application of the program. Our workflow can be applied to any other hydrogen abstraction reaction type (or even a completely different reaction class) with zero preliminary training or adjustments of the algorithm or data, whereas in the group tree approach “a group tree and an initial training set need to be created for each reaction family”. 33 Thus, not only has our automated approach been able to find a substantially larger number of TS on the same test set, but it also had a significantly lower usability overhead. Interestingly, in some cases we found a TS where the approach of Bhoorasingh and West failed, and vice versa. There were three main classes of failures in our automated approach: failures due to technical reasons (numbering atoms, forming reactant complexes, etc), QST failures (convergence issues, more than one imaginary frequency found), and chemical stability failures (reactants or products undergo a chemical transformation, forming or breaking bonds during a geometry optimization process, which most often results in chemical species which are identical to those on the other side of the reaction). The first two issues can and will be addressed by improving our cheminformatics and TS search algorithms. The third issue suggests that there is no barrier in the reaction, which is either an artifact of the combination of the level of theory and the accuracy used or that there is a very low barrier which the optimization algorithm can traverse. Table 1: Performance of our automated TS search on the test set of Bhoorasingh and West 33 consisting of 1393 hydrogen abstractions RH + R′ · = R · +R′ H. Both works used the same level of theory, B3LYP/6-31+G** in gas phase.

TS found Success rate Method transferable? Training required?

Ref. 33 930 68% No Yes2

This work 1202 86% Yes No3

30 ACS Paragon Plus Environment

Page 31 of 54

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

4.4 Diels-Alder reaction Cycloadditions represent a cornerstone class of organic synthesis reactions which are used widely for building rings and heterocycles. And among cycloadditions, the Diels-Alder reaction, a [4+2] cycloaddition, is a particularly well-known and well-studied variety. 96 The literature on the Diels-Alder reaction and its mechanisms is vast, even when considering works that deal with theoretical analysis of the reaction’s TSs. To test and demonstrate the applicability of the automated TS search to studying the Diels-Alder reaction we chose to focus on the important topological aspect of its mechanism – whether the reaction occurs in one concerted step or stepwise, which depends on the structures of the reactants. 95,97,98 Before applying our approach to a novel set of reactions it was important to validate it on a set of results known from the literature. For a validation study, we selected the work of Chopin and co-workers, which studied the Diels-Alder reaction both experimentally and theoretically and demonstrated the existence of the concerted and the stepwise mechanisms as a function of the structure of the reactants and the presence of a Lewis catalyst (simulated in theoretical calculations via BF3 .) We used our automated approach on all the reactions studied theoretically by Chopin and co-workers. Even though the quantum chemical programs used by us and by these authors were different, which means possibly different DFT grids and convergence criteria, the results of gas phase calculations, in cases where conformational flexibility was minimal and the basis sets did not involve diffuse basis functions, were quantitatively similar and in some cases extremely similar. Figure 7 demonstrates an example of the agreement between the energies obtained by Chopin and co-workers and by us using an automated approach. The cases where some divergence of the results was observed (typically on the order of a few kcal/mol for the TS barriers and the reaction energies) involved either implicit solvent (different implicit solvation models were used in the two works), diffuse basis functions, or conformational flexibility. A typical example of a case involving conformational flexibility is shown in Figure 8. All the structural details related to the reaction center are extremely well reproduced by the automated approach. The conformational discrepancy lies in a torsion angle some distance away from the reaction center. The TS structure of Chopin and co-workers is actually 0.22 kcal/mol lower in 31 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 32 of 54

absolute energy at the level of theory at which the TS was located (B3LYP/6-31G** in gas phase, when both structures’ energies were computed with Jaguar). Although this structural detail seems insignificant for the assignment of the reaction mechanism, and even for the qualitatively good agreement of the energies, it is nevertheless important that our automated approach perform an accompanying conformational search to avoid not completely optimized conformations of endpoints and TSs, something that we are planning to do in the future (see additional discussion in Section 5). Importantly, in all the reactions tested, our automated approach agreed with that of Chopin and co-workers on the number of transition states along the path between the reactants and the products. The transition states that we were able to compare with those available in the supporting information to Ref. 95 were very similar in topological properties of the TS (site of attack, bond lengths and angles). These results gave us assurance that our iterative automated approach is competent at not only finding TSs in the Diels-Alder reaction but also deciding on the mechanism – concerted or stepwise. As a next step in testing and validation of our automated approach, we applied it to a set of 64 Diels-Alder reactions shown in Figure ??. In this test we did not concern ourselves with the synthesizability of the compounds involved in the reactions. All calculations were performed in gas phase starting from 2D structures automatically translated into 3D structures. A set of templates containing the results of previously studied Diels-Alder reactions was used. All located TSs were visually inspected for additional assurance. Four combinations of functionals and basis sets were employed for all 64 reactions: B3LYP/6-31G**, B3LYP/6-31+G*, B3LYP-D3/6-31+G*, and M06-2X/6-31G**. The success rates of finding TSs were evaluated to be between 86 and 89% (see Table S2). We did experiment with using different templates and different initial conformations of the dienes (such as cis- and trans-isomers), and in some cases observed subtle effects of these changes on the final results. The main objective of this test, as in the previous application sections, was to demonstrate the applicability of our automated approach to a virtual screening-like application requiring minimum user intervention and setup overhead. For the present, we conclude that the automated approach is capable of finding TSs in diverse types of Diels-Alder reactions with a high

32 ACS Paragon Plus Environment

Page 33 of 54

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

degree of success.

5 KNOWN LIMITATIONS The diversity of chemical reactions is enormous, and even though our workflow is meant to be fairly general and is not geared toward any particular chemistry or application, it is still known to be inapplicable to certain classes of reactions. There are also some classes of reactions that might be in principle approachable, but locating their TS is expected to be extremely challenging. In this section we outline the currently known technical and conceptual limitations of our program as well as mention the types of chemical rearrangements that we do not commit to support and that should be avoided at present.

Figure 8: (a) A pair of superimposed, respective transition states found in Ref. 95 (brown) and by our automated approach (cyan), showing occasional conformational discrepancies between the structures located. The atoms important for interpretation of the structures are labeled with corresponding element names. Note that the assignment of bond order in 3D models of TS structures is not always significant. A 2D structure matching the structures in part (a) is shown in part (b). The transition states in this figure are from the first step of the stepwise reaction shown in Figure 7. Reactions that do not involve breaking or making covalent bonds cannot be handled with our workflow. For example, it is currently impossible to search for a transition state in a transformation from one conformation to another. Examples of this include searching for torsional barriers, conversion from a cis- to a trans- form, a creation of a π-stacking complex from separated components, or breaking of a hydrogen-bond. This limitation stems from the fact that while aligning the reactant 33 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

and product complexes we take the liberty to manipulate the internal coordinates. As discussed in Section 2.3, we have sacrificed the ability to treat internal coordinate rearrangements in order to amplify our success rate in treating chemical changes. It is conceivable that transformations not involving changes in covalent bonds could be handled with our approach in the future. Reactions involving “spectator” fragments or molecules, which do not rearrange their covalent bonds in the course of the reaction, and which are therefore not covalently attached to other reactants or products, are not supported. Examples of such reactions include reactions with explicit, non-reacting, solvent molecules or counter-ions as well as full catalytic cycles input as a single reaction step. The information about how to best position the reactant and product molecules in a reactant or product complex is encoded in the change in bonding. In a sense, the active bonds “couple” the various molecules and help to guide construction of a reaction complex. In our current alignment protocol a spectator molecule’s placement (position and orientation) is independent of the placement of the reacting molecules and for this reason the spectator molecule cannot be placed optimally. One possible solution to this problem is conformational sampling of the placement of these spectators, i.e. optimizing their placement in an energetic sense. Now we would like to bring up the types of reactions that are formally supported but which, in the current implementation of the code, present numerous technical problems. The success rate on such reactions is expected to be low. Into this category fall reactions containing transition metal atoms, reactions undergoing a large number of covalent bond transformations in a concerted way, reactions with highly symmetric or cage-like reactants or products, and reactions which involve stereochemical centers.

Figure 9: A general scheme of 26 = 64 Diels-Alder reactions explored in this work using the automated TS search. X1 − X6 = N, CH. The success rate of finding TSs varied between 86 and 89%, depending on the functional and basis set used. See text and Table S2 for details.

The presence of transition metal atoms is not a problem in itself, as Jaguar, which is the 34 ACS Paragon Plus Environment

Page 34 of 54

Page 35 of 54

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

quantum mechanical engine of our automated workflow, has been known to deal with metal-containing systems effectively (see, for example, Refs., 99,100 as well as review 12 and references therein). The problem is rather in the conflict between the intended level of automation of TS search and the inherent complexity, both on the chemical and technological level, of metal-containing systems. A much greater care should be exercised for such systems when solving self-consistent DFT equations, recognizing and defining covalent and dative bonding, applying force fields for “cleaning up” and orienting intermediate structures as well as choosing the appropriate level of quantum theory to correctly account for the physics of the process. These issues, difficult enough in manual processing, become a daunting challenge in automated workflows such as ours. Two observations, however, make us optimistic about the prospect of automating TS calculations for metal-containing systems in the future. First, catalytic cycles, an extremely important target of the TS search, are based on a limited diversity of reaction types, such as oxidative addition, migratory insertion, etc. The diversity of reaction types in all of organometallic chemistry might be significantly larger and too difficult to handle at present, but automated TS searches in catalytic reactions should be feasible. Second, we have already carried out numerous automated or semi-automated TS searches associated with reactions that involve transition metal atoms. We worked with both catalysis-type reactions as well as reactions more common in general organometallic chemistry. Because of the issues mentioned above, presently not addressed in the code, the success rate was modest. There were a few reaction types on which our automated approach worked reliably enough to conduct large-scale screenings of potential catalysts or substrates. Figure 10 shows an example of the application of our automated approach to the concerted oxidation by a model reaction that represents oxidation by the molybdenum-containing enzyme aldehyde oxidase. (see Refs. 101,102 for typical TS calculations on such reactions). The success rate of finding TSs in this reaction was 100% (see Table S4 for details). Note that the greater level of automation provided by our automated workflow enabled us to scan exhaustively the C – H positions of two heterocycles in search of the most reactive site. Reactions with a large number of bonds forming and breaking at the same time present difficulties

35 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 36 of 54

to the atom-numbering code, because in such situations its brute force component described in Section 2.2 becomes computationally expensive (though still not competitive with quantum chemical computations), and the fall-back approach, MCS, is not reliable enough for processes with multiple bond rearrangements. It is also not clear whether our QST code can work reliably with complex internal coordinates corresponding to multiple bond breaking processes separated by distance. We recommend studying processes which are characterized by breaking and forming one or two bonds at a time, if possible. It is difficult to treat reactions with species possessing high degrees of symmetry or those containing cage-like formations. This is due to technical difficulties in assigning SMILES strings to represent such structures, which, in turn, makes our atom correspondence algorithms fail. Examples include graphene-like structures including fullerenes and nanotubes. Additionally, some organometallic ligands such as cyclopentadienyl ion present similar difficulties. Reactions involving molecules with stereocenters pose technical difficulties in some cases detailed below. Our atom correspondence algorithm can handle molecules with stereochemistry by specifying the stereochemistry in the identifying SMILES strings, including stereochemistry induced by numbering atoms (see Section 2.2). However, certain difficulties remain, and the first involves honoring the stereochemistry in reactions where the reaction center contains a stereocenter. As a concrete example consider the Michael additions depicted in Figure 5. If the β-carbon of the enone is bonded to some functional group (R) a stereocenter will be present in the product, but not the reactant. While a linear path will tend to conserve this stereo chemistry, our templating algorithm will only constrain the carbon-sulfur distance and so this stereochemistry may or may not be maintained in the TS guess. As a result, the algorithm may converge to a TS with opposite chirality of that requested. The second difficulty involves the management of multiple stereochemical sites which are not in the reaction center. If the user is not careful to input molecules where the stereochemistry of these stereocenters is conserved in all of these non-reactive sites the calculation will likely fail. This is because the reaction path will be very complex as all of these sites must “invert”. In fact, there will be many TSs along this path, since each inversion will be accompanied

36 ACS Paragon Plus Environment

Page 37 of 54

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

by some large barrier. While it is easy to write this issue off as a user input error, we have found that it is very easy to make this mistake. Further, this includes sites not usually considered to be chiral, such as pyramidal nitrogen groups. These issues will be addressed in future work. As briefly discussed in the introduction, the algorithm described in this work does not yet incorporate a conformational search of the reactants, TS, or products. For the relatively rigid molecules treated in Section 4, this does not appear to be a serious problem (in view of the agreement with experimental data reported therein). Even for more flexible species, our experience has been that in many cases, the protocol described above yields quite reasonable results. However, we have also found some systems which exhibit significant dependence of the results on the input conformation, and this is an indication that a conformational search and Boltzmann averaging approach are necessary to achieve robustness and accuracy in the general case. Work along this direction is ongoing and will be reported in a subsequent publication. Although it might be evident, for completeness’ sake we would like to add that the automated transition state search workflow presented in this work cannot be applied to very large molecules such as proteins and to periodic structures. The underlying quantum engine, Jaguar, has a practical limit on the size of structures containing 300-500 atoms. 103 Even though the program will work technically even on larger structures, the calculations involving such structures typically take too much time, so that they should not be carried out on a regular basis in practice. With regards to periodic structures, Jaguar is currently not equipped with the code that can handle periodic boundary conditions. When studying reactions in proteins, surfaces and solids, it is recommended that the user work with cluster models, whenever possible. Practically important chemical reactions, even occurring in proteins and on surfaces, have a local nature, and it should be possible to devise a medium-sized cluster model for all but the most unusual such cases. Of the limitations described in this section only those pertaining to transition metal-containing structures are thought to significantly affect the application of the present version of our workflow in practice. The rest of the limitations stem from either difficulties in determining the active bonds and assigning atom correspondences, aligning reactant and product complexes or maintaining

37 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 10: An application of our automated TS code to studying susceptibility of heterocyclic carbon atoms to oxidation. Three heterocycles were studied using a reaction that serves as a model for oxidation by aldehyde oxidase-like enzymes. The figures next to each aromatic carbon are ∆E ‡ in kcal/mol. The lowest barrier, and therefore the predicted most reactive site, is marked in green. All calculations were performed at the B3LYP-D3/LACVP**++ level of theory. correct chirality. A workaround for most of these reactions is to abandon the atom-numeric part of the workflow altogether, and instead number atoms in the reactants and products “manually”, continuing with the automated workflow from there. It is also recommended that the user be aware and careful of stereochemical sites in the reactions under study. Our GUI can assist in identifying and allowing the user to conveniently intervene in such situations as described above. The user can choose to execute only the first part of our workflow and then visualize the reactant and product complexes before proceeding. If the alignment of these complexes appears to be non-ideal the user can use the GUI to manipulate the geometry of the structures by, for example, translating individual molecules or adjusting torsion angles. This type of manipulation can overcome shortcomings in the structural alignment algorithm. The second type of manipulation that may be necessary is an editing of atom indices. While one can visualize these indices in the GUI it can still be quite difficult to spot inaccuracies. Our GUI also allows the user to compute and visualize a linear path between the two complexes which aids in spotting problems in numbering and misaligned stereocenters. There is not currently a convenient graphical tool to adjust the atom indices so the structures would have to be written to a file, edited and re-loaded. Once the structures have been adjusted the user can choose to proceed with these manipulated reaction complexes. These operations allow the user to intervene and get an otherwise failing calculation to succeed.

38 ACS Paragon Plus Environment

Page 38 of 54

Page 39 of 54

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

6 CONCLUSION It is hard to overestimate the importance of being able to locate transition states reliably and with minimal hassle. This task is central to numerous applications of quantum chemistry and is essentially responsible for establishing the mechanism of chemical reactions, understanding and tuning reactivity, as well as exploring novel reactive pathways. While numerous semi-automatic approaches to transition state search no doubt exist, the operation of locating the transition state between two given minima is only beginning to emerge as a basic computational task in need of full automation. It is likely to become as vital to applied computational chemistry as the single point energy calculation and the geometry minimization, which have already been fine-tuned in a large number of quantum chemical computer programs. The present work makes an important step toward making TS searches between two well-defined minima routine. We have demonstrated that the automated approach can be applied to diverse reaction types with a high degree of success. Importantly, only the minimal input, namely 2D or 3D structures of the reactants and the products, was required from the user to set up the calculations. No preliminary training or tuning of parameters for different reaction types was needed. It is true that when switching to a new reaction type the user might be willing to experiment with such traditional quantum mechanics settings as the level of theory and accuracy thresholds, but often the default settings can be accepted without compromising the success rate. Using our approach, we have been able to locate over a thousand transition states in five reaction classes: Michael additions, hydrogen abstractions, Diels-Alder cycloadditions, carbene insertions, and oxidation of aromatic carbon atoms by a molybdenum complex. Several other classes of reactions have been treated with our automated workflow with high success rate – SN 2 nucleophilic attacks, nucleophilic attacks on nitriles, hydrogen transfers, cycloadditions beyond the [4+2] type, migratory insertions in organometallic compounds, etc (data not shown). What makes our approach especially attractive to the quantum chemical and modeling community is the ease with which these calculations can be set up and completed, opening new possibilities for research dependent on TS search for non-experts. 39 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 40 of 54

Technical improvements are still needed. In this regard, three directions appear to be of the highest importance for the present: improving the overall technical robustness (and consequently, success rate) of the machinery of the automated approach, a more reliable handling of transition metal-containing reactions, and, finally, an introduction of conformational sampling into the workflow. The first two of these directions have already been discussed in other sections of this work. The need for conformational sampling appears most clearly from the observation that the results of the TS search (regardless of the degree of its automation) might depend on the conformations of the initial structures. The dependence is negligible or non-existent for smaller and/or rigid reactants and products, but becomes more problematic for larger and flexible molecules. Different conformations of the transition state are likely to exist, all satisfying the criteria for its Hessian and for its connection with the different conformers of the reactants and the products via methods like IRC. The TS barriers of the reactions presented in this work have a negligible or small dependence on input conformation. In our experience, the overall correlation between the predicted and the experimental reaction rates of the reactions presented in this work (when such a comparison is possible, as in Figure 5) is not affected substantially by the input conformations. However, we have seen that in demanding cases the preference of a particular conformation of the TS might lead to a qualitatively different decision about the most reactive site in the substrate. In such cases a manual, or semi-automated conformational search was necessary in order to achieve meaningful and trustworthy results. All this points to the necessity of integrating a conformational search in our workflow. A successful conformational search would have to be able to treat not only organic structures but also metal-containing ones. A greater degree of automation of TS search, such as that proposed in this work, offers attractive prospects of a more prominent role of modeling and virtual screening in application to reactive chemistry, in such areas as design of covalent inhibitors in drug discovery, toxicity studies, synthetic chemistry, catalysis, and chemical engineering.

40 ACS Paragon Plus Environment

Page 41 of 54

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

7 ACKNOWLEDGMENT The authors would like to thank Dr. H. Minoux and Mr. D. Papin (Sanofi), and Dr. M. Krier (Merck KGaA) for numerous fruitful discussions at various stages of the algorithm development and validation.

8 SUPPORTING INFORMATION The energetics of representative TSs located using the automated approach and mentioned in the text are provided.

41 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

References (1) Hratchian, H. P.; Schlegel, H. B. In Theory and Applications of Computational Chemistry; Dykstra, C. E., Frenking, G., Kim, K. S., Scuseria, G. E., Eds.; Elsevier: Amsterdam, 2005; pp 195–249. (2) Anslyn, E. V.; Dougherty, D. A. Modern Physical Organic Chemistry; University Science Books: Sausalito, CA, 2006. (3) Polanyi, J. C.; Zewail, A. H. Direct Observation of the Transition State. Acc. Chem. Res. 1995, 28, 119–132. (4) Singleton, D. A.; Merrigan, S. R.; Liu, J.; Houk, K. N. Experimental Geometry of the Epoxidation Transition State. J. Am. Chem. Soc. 1997, 119, 3385–3386. ¨ om, H.; Oberg, ¨ (5) Ostr¨ H.; Xin, H.; LaRue, J.; Beye, M.; Dell’Angela, M.; Gladh, J.; Ng, M. L.; Sellberg, J. A.; Kaya, S.; Mercurio, G.; Nordlund, D.; Hantschmann, M.; Hieke, F.; K¨uhn, D.; Schlotter, W. F.; Dakovski, G. L.; Turner, J. J.; Minitti, M. P.; Mitra, A.; Moeller, S. P.; F¨ohlisch, A.; Wolf, M.; Wurth, W.; Persson, M.; Nørskov, J. K.; Abild-Pedersen, F.; Ogasawara, H.; Pettersson, L. G. M.; Nilsson, A. Probing the transition state region in catalytic CO oxidation on Ru. Science 2015, 347, 978–982. (6) Laldler, K. J.; King, M. C. Development of transition-state theory. J. Phys. Chem. 1983, 87, 2657–2664. (7) Nitzan, A. Chemical dynamics in condensed phases; Oxford University Press: New York, 2009. (8) Schr¨odinger Release 2016-1: LigPrep, version 3.7. Schr¨odinger, LLC: New York, NY, 2016. (9) Madar´asz, A.; Berta, D.; Paton, R. S. Development of a True Transition State Force Field from Quantum Mechanical Calculations. J. Chem. Theory Comput. 2016, 12, 1833–1844.

42 ACS Paragon Plus Environment

Page 42 of 54

Page 43 of 54

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(10) Bochevarov, A. D.; Watson, M. A.; Greenwood, J. R.; Philipp, D. M. Multiconformation, Density Functional Theory-Based pKa Prediction in Application to Large, Flexible Organic Molecules with Diverse Functional Groups. J. Chem. Theory Comput. 2016, 12, 6001–6019. (11) Jaguar User Manual. Schr¨odinger, LLC: 120 West 45th Street, New York, NY 10036, 2016. (12) Bochevarov, A. D.; Harder, E.; Hughes, T. F.; Greenwood, J. R.; Braden, D. A.; Philipp, D. M.; Rinaldo, D.; Halls, M. D.; Zhang, Z.; Friesner, R. A. Jaguar: A high-performance quantum chemistry software program with strengths in life and materials sciences. Int. J. Quant. Chem. 2013, 113, 2110–2142. (13) Hall, M. L.; Goldfeld, D. A.; Bochevarov, A. D.; Friesner, R. A. Localized Orbital Corrections for the Barrier Heights in Density Functional Theory. J. Chem. Theory Comput. 2009, 5, 2996–3009. (14) Devi, K. J.; Chandra, A. K. Kinetics and thermochemistry of the gas-phase reactions of 4-ethylpyridine with OH radical: A DFT study. Comp. Theor. Chem. 2011, 965, 268–274. (15) Sun, Y.; Chen, H. Performance of Density Functionals for Activation Energies of Re-Catalyzed Organic Reactions. J. Chem. Theory Comput. 2014, 10, 579–588. (16) Yu, H. S.; He, X.; Li, S. L.; Truhlar, D. G. MN15: A KohnSham global-hybrid exchangecorrelation density functional with broad accuracy for multi-reference and single-reference systems and noncovalent interactions. Chem. Sci. 2016, 7, 5032–5051. (17) Becke, A. D. Density-functional exchange-energy approximation with correct asymptotic behavior. Phys. Rev. A 1988, 38, 3098–3100. (18) Becke, A. D. Densityfunctional thermochemistry. III. The role of exact exchange. J. Chem. Phys. 1993, 98, 5648–5653. (19) Zhao, Y.; Truhlar, D. G. The M06 suite of density functionals for main group thermochemistry, thermochemical kinetics, noncovalent interactions, excited states, and 43 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

transition elements: two new functionals and systematic testing of four M06-class functionals and 12 other functionals. Theor. Chem. Acc. 2008, 120, 215–241. (20) Maragakis, P.; Andreev, S. A.; Brumer, Y.; Reichman, D. R.; Kaxiras, E. Adaptive nudged elastic band approach for transition state calculation. J. Chem. Phys. 2002, 117, 4651–4658. (21) Peters, B.; Heyden, A.; Bell, A. T.; Chakraborty, A. A growing string method for determining transition states: Comparison to the nudged elastic band and string methods. J. Chem. Phys. 2004, 120, 7877–7886. (22) Behn, A.; Zimmerman, P. M.; Bell, A. T.; Head-Gordon, M. Incorporating Linear Synchronous Transit Interpolation into the Growing String Method: Algorithm and Applications. J. Chem. Theory Comput. 2011, 7, 4019–4025. (23) Sharada, M. S.; Zimmerman, P. M.; Bell, A. T.; Head-Gordon, M. Automated Transition State Searches without Evaluating the Hessian. J. Chem. Theory Comput. 2012, 8, 5166–5174. (24) May, J. W.; Lehner, J. D.; Frisch, M. J.; Li, X. Transition State Search Using a Guided Direct Inversion in the Iterative Subspace Method. J. Chem. Theory Comput. 2012, 8, 5175–5179. (25) Zimmerman, P. M. Single-ended transition state finding with the growing string method. J. Comp. Chem. 2015, 36, 601–611. (26) Melander, M.; Laasonen, K.; J´onsson, H. Removing External Degrees of Freedom from Transition-State Search Methods using Quaternions. J. Chem. Theory Comput. 2015, 11, 1055–1062. (27) Zimmerman, P. M. Growing string method with interpolation and optimization in internal coordinates: Method and examples. J. Chem. Phys. 2013, 138, 184102. (28) Zimmerman, P. M. Reliable Transition State Searches Integrated with the Growing String Method. J. Chem. Theory Comput. 2013, 9, 3043–3050. 44 ACS Paragon Plus Environment

Page 44 of 54

Page 45 of 54

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(29) Maeda, S.; Morokuma, K. Finding Reaction Pathways of Type A + B X: Toward Systematic Prediction of Reaction Mechanisms. J. Chem. Theory Comput. 2011, 7, 2335–2345. (30) Maeda, S.; Abe, E.; Hatanaka, M.; Taketsugu, T.; Morokuma, K. Exploring Potential Energy Surfaces of Large Systems with Artificial Force Induced Reaction Method in Combination with ONIOM and Microiteration. J. Chem. Theory Comput. 2012, 8, 5058–5063. (31) Gao, M.; Lyalin, A.; Maeda, S.; Taketsugu, T. Application of Automated Reaction Path Search Methods to a Systematic Search of Single-Bond Activation Pathways Catalyzed by Small Metal Clusters: A Case Study on HH Activation by Gold. J. Chem. Theory Comput. 2014, 10, 1623–1630. (32) Suleimanov, Y. V.; Green, W. H. Automated Discovery of Elementary Chemical Reaction Steps Using Freezing String and Berny Optimization Methods. J. Chem. Theory Comput. 2015, 11, 4248–4259. (33) Bhoorasingh, P. L.; West, R. H. Transition state geometry prediction using molecular group contributions. Phys. Chem. Chem. Phys. 2015, 17, 32173–32182. (34) Bhoorasingh, P. L.; Slakman, B. L.; Khanshan, F. S.; Cain, J. Y.; West., R. H. Automated Transition State Theory Calculations for High-Throughput Kinetics. J. Phys. Chem. A 2017, 121, 6896–6904. (35) Mart´ınez-N´un˜ ez, E. An automated transition state search using classical trajectories initialized at multiple minima. Phys. Chem. Chem. Phys. 2015, 17, 14912–14921. (36) Bergeler, M.; Simm, G. N.; Proppe, J.; Reiher, M. Heuristics-Guided Exploration of Reaction Mechanisms. J. Chem. Theory Comput. 2015, 11, 5712–5722. (37) Varela, J. A.; V´azquez, S. A.; Mart´ınez-N´un˜ ez, E. An automated method to find reaction mechanisms and solve the kinetics in organometallic catalysis. Chem. Sci. 2017, 8, 3843–3851. 45 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(38) Gao, C. W.; Allen, J. W.; Green, W. H.; West, R. H. Reaction Mechanism Generator: Automatic construction of chemical kinetic mechanisms. Comp. Phys. Comm. 2016, 203, 212–225. (39) Goldsmith, C. F.; West, R. H. Automatic Generation of Microkinetic Mechanisms for Heterogeneous Catalysis. J. Phys. Chem. C 2017, 121, 9970–9981. (40) Z´ador, J.; Najm, H. N. Automated exploration of the mechanism of elementary reactions; Sandia National Laboratoies Report, 2012. (41) Peng, C.; Schlegel, H. B. Combining Synchronous Transit and Quasi-Newton Methods to Find Transition States. Isr. J. Chem. 1993, 33, 449–454. (42) Gonzalez, C.; Schlegel, H. B. An improved algorithm for reaction path following. J. Chem. Phys. 1989, 90, 2154–2161. (43) Galabov, B.; Koleva, G.; Simova, S.; Hadjieva, B.; Schaefer III, H. F.; Schleyer, P. V. R. Arenium ions are not obligatory intermediates in electrophilic aromatic substitution. Proc. Natl. Acad. Sci. 2014, 111, 10067–10072. (44) Galabov, B.; Nalbantova, D.; Schleyer, P. V. R.; Schaefer III, H. F. Electrophilic Aromatic Substitution: New Insights into an Old Class of Reactions. Acc. Chem. Res. 2016, 49, 1191–1199. (45) McGregor, J. J.; Willett, P. Use of a Maximal Common Subgraph Algorithm in the Automatic Identification of the Ostensible Bond Changes Occurring in Chemical Reactions. J. Chem. Inf. Comput. Sci. 1981, 21, 137–140. (46) Latendresse, M.; Malerich, J. P.; Travers, M.; Karp, P. D. Accurate Atom-Mapping Computation for Biochemical Reactions. J. Chem. Inf. Model. 2012, 52. (47) Fooshee, D.; Andronico, A.; Baldi, P. ReactionMap: An Efficient Atom-Mapping Algorithm for Chemical Reactions. J. Chem. Inf. Model. 2013, 53, 2812–2819. 46 ACS Paragon Plus Environment

Page 46 of 54

Page 47 of 54

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(48) K¨orner, R.; Apostolakis, J. Automatic Determination of Reaction Mappings and Reaction Center Information. 1. The Imaginary Transition State Approach. J. Chem. Inf. Model. 2008, 48, 1181–1189. (49) Crabtree, J. D.; Mehta, D. P. Automated Reaction Mapping. ACM J. Exp. Algorithmics 2009, 13, 1.15–1.29. (50) Mann, M.; Nahar, F.; Schnorr, N.; Backofen, R.; Stadler, P. F.; Flamm, C. Atom mapping with constraint programming. Algorithms Mol. Biol. 2014, 9, 23. (51) Fujita, S. Description of Organic Reactions Based on Imaginary Transition Structures. 1. Introduction of New Concepts. J. Chem. Inf. Comput. Sci. 1986, 26, 205–212. (52) Weininger, D.; Weininger, A.; Weininger, J. L. SMILES. 2. Algorithm for Generation of Unique SMILES Notation. J. Chem. Inf. Comput. Sci. 1989, 29, 97–101. (53) James, C. A.; Weininger, D. Daylight Theory Manual, Version 4.9. Daylight Chemical Information Systems, Inc.: Laguna Niguel, CA, 2011. (54) Ullmann, J. R. An Algorithm for Subgraph Isomorphism. J. Assoc. Comput. Mach. 1976, 23, 31–42. (55) Halgren, T. A.; Lipscomb, W. N. The synchronous transit method for determining reaction pathways and locating molecular transition states. Chem. Phys. Lett. 1977, 49, 225–232. (56) Peters, B.; Heyden, A.; Bell, A. T.; Chakraborty, A. A growing string method for determining transition states: Comparison to the nudged elastic band and string methods. J. Chem. Phys. 2004, 120, 7877–7886. (57) Peng, C.; Schlegel, B. Combining Synchronous Transit and Quasi-Newton Methods to Find Transition States. Isr. J. Chem. 1993, 33, 449–454. (58) Gonzalez, C.; Schlegel, H. B. An improved algorithm for reaction path following. J. Chem. Phys. 1989, 90, 2154–2161. 47 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(59) Goodman, J. M.; Silva, M. A. QRC: a rapid method for connecting transition structures to reactants in the computational analysis of organic reactivity. Tetrahedron Letters 2003, 44, 8233–8236. (60) Jorgensen, W. L.; Tirado-Rives, J. The OPLS [optimized potentials for liquid simulations] potential functions for proteins, energy minimizations for crystals of cyclic peptides and crambin. J. Am. Chem. Soc. 1988, 110, 1657–1666. (61) Jorgensen, W. L.; Maxwell, D. S.; Tirado-Rives, J. Development and Testing of the OPLS All-Atom Force Field on Conformational Energetics and Properties of Organic Liquids. J. Am. Chem. Soc. 1996, 118, 11225–11236. (62) Shivakumar, D.; Williams, J.; Wu, Y.; Damm, W.; Shelley, J.; Sherman, W. Prediction of Absolute Solvation Free Energies using Molecular Dynamics Free Energy Perturbation and the OPLS Force Field. J. Chem. Theory Comput. 2010, 6, 1509–1519. (63) Schr¨odinger Release 2016-2: Jaguar, version 9.2. Schr¨odinger, LLC: New York, NY, 2016. (64) Friesner, R. A. Solution of self-consistent field electronic structure equations by a pseudospectral method. Chem. Phys. Lett. 1985, 116, 39–43. (65) Friesner, R. A. Solution of the Hartree-Fock Equations for Polyatomic-Molecules by a Pseudospectral Method. J. Chem. Phys. 1987, 86, 3522–3531. (66) Ringnalda, M. N.; Belhadj, M.; Friesner, R. A. Pseudospectral Hartree-Fock Theory: Applications and Algorithmic Improvements. J. Chem. Phys. 1990, 93, 3397–3407. (67) Friesner, R. A. New Methods for Electronic Structure Calculations on Large Molecules. Ann. Rev. Phys. Chem. 1991, 42, 341–367. (68) Friesner, R. A.; Murphy, R. B.; Beachy, M. D.; Ringnalda, M. N.; Pollard, W. T.; Dunietz, B. D.; Cao, Y. Correlated ab Initio Electronic Structure Calculations for Large Molecules. J. Phys. Chem. A 1999, 103, 1913–1928. 48 ACS Paragon Plus Environment

Page 48 of 54

Page 49 of 54

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(69) Marten, B.; Kim, K.; Cortis, C.; Friesner, R. A.; Murphy, R. B.; Ringnalda, M. N.; Sitkoff, D.; Honig, B. New Model for Calculation of Solvation Free Energies: Correction of Self-Consistent Reaction Field Continuum Dielectric Theory for Short-Range Hydrogen-Bonding Effects. J. Phys. Chem. 1996, 100, 11775–11788. (70) Cortis, C. M.; Friesner, R. A. An Automatic 3D Finite Element Mesh Generation System for the Poisson-Boltzman Equation. J. Comp. Chem. 1997, 18, 1570–1590. (71) Cortis, C. M.; Friesner, R. A. Numerical Solution of the Poisson-Boltzmann Equation Using Tetrahedral Finite-Element Meshes. J. Comp. Chem. 1997, 18, 1591–1608. (72) Grimme, S.; Anthony, J.; Ehrlich, S.; Krieg, H. A consistent and accurate ab initio parametrization of density functional dispersion correction (DFT-D) for the 94 elements H-Pu. J. Chem. Phys. 2010, 132, 154104. (73) Goerigk, L.; Grimme, S. A thorough benchmark of density functional methods for general main group thermochemistry, kinetics, and noncovalent interactions. Phys. Chem. Chem. Phys. 2011, 13, 6670–6688. (74) Silva, P. J.; Ramos, M. J. Successes and failures of DFT functionals in acid/base and redox reactions of organic and biochemical interest. Comp. Theor. Chem. 2011, 966, 120–126. ˇ acˇ , J.; Hobza, P. Benchmark database (75) Valdes, H.; Pluh´acˇ kov´a, K.; Piton´ak, M.; Rez´ on isolated small peptides containing an aromatic side chain: comparison between wave function and density functional theory methods and empirical force field. Phys. Chem. Chem. Phys. 2008, 10, 2747–2757. (76) Yu, H. S.; Zhang, W.; Verma, P.; He, X.; Truhlar, D. G. Nonseparable exchangecorrelation functional for molecules, including homogeneous catalysis involving transition metals. Phys. Chem. Chem. Phys. 2015, 17, 12146–12160.

49 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(77) Yu, H. S.; He, X.; Truhlar, D. G. MN15-L: A New Local Exchange-Correlation Functional for KohnSham Density Functional Theory with Broad Accuracy for Atoms, Molecules, and Solids. J. Chem. Theory Comput. 2016, 12, 1280–1293. (78) Flanagan, M. E.; Abramite, J. A.; Anderson, D. P.; Aulabaugh, A.; Dahal, U. P.; Gilbert, A. M.; Li, C.; Montgomery, J.; Oppenheimer, S. R.; Ryder, T.; Schuff, B. P.; Uccello, D. P.; Walker, G. S.; Wu, Y.; Brown, M. F.; Chen, J. M.; Hayward, M. M.; Noe, M. C.; Obach, R. S.; Philippe, L.; Shanmugasundaram, V.; Shapiro, M. J.; Starr, J.; Stroh, J.; Che, Y. Chemical and Computational Methods for the Characterization of Covalent Reactive Groups for the Prospective Design of Irreversible Inhibitors. J. Med. Chem. 2014, 57, 10072–10079. (79) Marenich, A. V.; Cramer, C. J.; Truhlar, D. G. Universal Solvation Model Based on Solute Electron Density and on a Continuum Model of the Solvent Defined by the Bulk Dielectric Constant and Atomic Surface Tensions. J. Phys. Chem. B 2009, 113, 6378–6396. (80) Schultz, T. W.; Yarbrough, J. W.; Hunter, R. S.; Aptula, A. O. Verification of the Structural Alerts for Michael Acceptors. Chem. Res. Toxicol. 2007, 20, 1359–1363. (81) Enoch, S. J.; Cronin, M. T. D.; Schultz, T. W.; Madden, J. C. Quantitative and Mechanistic Read Across for Predicting the Skin Sensitization Potential of Alkenes Acting via Michael Addition. Chem. Res. Toxicol. 2008, 21, 513–520. (82) Schwartz, P. A.; Kuzmic, P.; Solowiej, J.; Bergqvist, S.; Bolanos, B.; Almaden, C.; Nagata, A.; Ryan, K.; Feng, J.; Dalvie, D.; Kath, J. C.; Xu, M.; Wani, R.; Murray, B. W. Covalent EGFR inhibitor analysis reveals importance of reversible interactions to potency and mechanisms of drug resistance. Proc. Natl. Acad. Sci. 2014, 111, 173–178. (83) Mieusset, J.-L.; Brinker, U. H. The Carbene Reactivity Surface: A Classification. J. Org. Chem. 2008, 73, 1553–1558.

50 ACS Paragon Plus Environment

Page 50 of 54

Page 51 of 54

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(84) Zhang, H. R.; Eddings, E. G.; Sarofim, A. F. Olefin Chemistry in a Premixed n-Heptane Flame. Energy Fuels 2007, 21, 677–685. (85) Tan, T.; Krauter, C. M.; Yu, Y.; Carter, E. A. Ab Initio Kinetics of Hydrogen Abstraction from Methyl Acetate by Hydrogen, Methyl, Oxygen, Hydroxyl, and Hydroperoxy Radicals. J. Phys. Chem. A 2015, 119, 6377–6390. (86) Merchant, S. S.; Zanoelo, E. F.; Speth, R. L.; Harper, M. R.; Geem, K. M. V.; Green, W. H. Combustion and pyrolysis of iso-butanol: Experimental and chemical kinetic modeling study. Combust. Flame 2013, 160, 1907–1929. (87) Chenoweth, K.; van Duin, A. C. T.; Goddard III, W. A. ReaxFF Reactive Force Field for Molecular Dynamics Simulations of Hydrocarbon Oxidation. J. Phys. Chem. A 2008, 112, 1040–1053. (88) Iuga, C.; Campero, C.; Vivier-Bunge, A. Antioxidant vs. prooxidant action of phenothiazine in a biological environment in the presence of hydroxyl and hydroperoxyl radicals: a quantum chemistry study. RSC Adv. 2015, 5, 14678–14689. (89) Galano, A.; Alvarez-Idaboy, J. R. A computational methodology for accurate predictions of rate constants in solution: Application to the assessment of primary antioxidant activity. J. Comp. Chem. 2013, 34, 2430–2445. (90) P´erez-Gonz´alez, A.; Galano, A.; Alvarez-Idaboy, J. R. Dihydroxybenzoic acids as free radical scavengers: mechanisms, kinetics, and trends in activity. New J. Chem. 2014, 38, 2639–2652. (91) Galano, A.; Mazzone, G.; Alvarez-Diduk, R.; Marino, T.; Alvarez-Idaboy, J. R.; Russo, N. Food Antioxidants: Chemical Insights at the Molecular Level. Annu. Rev. Food Sci. Technol. 2016, 7, 335–352.

51 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(92) Alves, C. N.; Borges, R. S.; Da Silva, A. B. F. Density functional theory study of metabolic derivatives of the oxidation of paracetamol. Int. J. Quant. Chem. 2006, 106, 2617–2623. (93) Olsen, L.; Rydberg, R.; Rod, T. H.; Ryde, U. Prediction of Activation Energies for Hydrogen Abstraction by Cytochrome P450. J. Med. Chem. 2006, 49, 6489–6499. (94) Leth, R.; Rydberg, P.; Jørgensen, F. S.; Olsen, L. Density Functional Theory Study on the Formation of Reactive Benzoquinone Imines by Hydrogen Abstraction. J. Chem. Inf. Model. 2015, 55, 660–666. (95) Chopin, N.; G´erard, H.; Chataigner, I.; Piettre, S. R. Benzofurans as Efficient Dienophiles in Normal Electron Demand [4 + 2] Cycloadditions. J. Org. Chem. 2009, 74, 1237–1246. (96) Fringuelli, F.; Taticchi, A. The Diels-Alder Reaction; John Wiley & Sons: Chichester, UK, 2002. (97) Liang, Y.; Hong, X.; Yu, P.; Houk, K. N. Why Alkynyl Substituents Dramatically Accelerate Hexadehydro-DielsAlder (HDDA) Reactions: Stepwise Mechanisms of HDDA Cycloadditions. Org. Lett. 2014, 16, 5702–5705. (98) Yu, P.; Yang, Z.; Hong, X.; Li, Y.; Houk, K. N. Distortion-Controlled Reactivity and Molecular Dynamics of Dehydro-DielsAlder Reactions. J. Am. Chem. Soc. 2016, 138, 8247–8252. (99) Mazumder, S.; Crandell, D. W.; Lord, R. L.; Baik, M.-H. Switching the Enantioselectivity in Catalytic [4 + 1] Cycloadditions by Changing the Metal Center: Principles of Inverting the Stereochemical Preference of an Asymmetric Catalysis Revealed by DFT Calculations. J. Am. Chem. Soc. 2014, 136, 9414–9423. (100) Huang, Y.; Nielsen, R. J.; Goddard III, W. A.; Soriaga, M. P. The Reaction Mechanism with Free Energy Barriers for Electrochemical Dihydrogen Evolution on MoS2. J. Am. Chem. Soc. 2015, 137, 6692–6698. 52 ACS Paragon Plus Environment

Page 52 of 54

Page 53 of 54

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(101) Alfaro, J. F.; Jones, J. P. Studies on the Mechanism of Aldehyde Oxidase and Xanthine Oxidase. J. Org. Chem. 2008, 73, 9469–9472. (102) Hu, L.; Chen, H. Assessment of DFT Methods for Computing Activation Energies of Mo/W-Mediated Reactions. J. Chem. Theory Comput. 2015, 11, 4601–4614. (103) Zhang, J.; Hughes, T. F.; Steigerwald, M.; Brus, L.; Friesner, R. A. Realistic Cluster Modeling of Electron Transport and Trapping in Solvated TiO2 Nanoparticles. J. Am. Chem. Soc. 2012, 134, 12028–12043.

53 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

202x102mm (100 x 100 DPI)

ACS Paragon Plus Environment

Page 54 of 54