Improving Vibrational Mode Interpretation Using Bayesian Regression

Dec 10, 2018 - To streamline the interpretation of vibrational spectra, this work introduces the use of Bayesian linear regression with automatic rele...
0 downloads 0 Views 1MB Size
Subscriber access provided by University of South Dakota

Spectroscopy and Excited States

Improving Vibrational Mode Interpretation Using Bayesian Regression Filipe Teixeira, and M. Natalia D.S. Cordeiro J. Chem. Theory Comput., Just Accepted Manuscript • DOI: 10.1021/acs.jctc.8b00439 • Publication Date (Web): 10 Dec 2018 Downloaded from http://pubs.acs.org on December 14, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 47 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Improving Vibrational Mode Interpretation Using Bayesian Regression Filipe Teixeira∗ and M. Nat´alia D. S. Cordeiro∗ LAQV-REQUIMTE, Faculty of Sciences of the University of Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal E-mail: [email protected]; [email protected] Abstract In order to streamline the interpretation of vibrational spectra, this work introduces the use of Bayesian linear regression with automatic relevance determination as a viable approach to decompose the atomic motions along any vibrational mode as a weighted combination of displacements along chemically meaningful internal coordinates. This novel approach denominated Vibrational Mode Automatic Relevance Determination (VMARD) is presented and compared with the well established Potential Energy Decomposition (PED) scheme. Good agreement is generally attained between the two methods. VMARD returns a decomposition of the atomic displacement using only a small number of internal coordinates, thus aiding the interpretation of the vibrational spectra. Moreover, the results show that the VMARD descriptions are resilient towards the addition of additional internal coordinates, achieving a concise description of the vibrational modes despite the use of redundant internal coordinates. Potential applications of VMARD involving the gathering of physical insights on the atomic motions along the reaction coordinate at Transition State structures, as well as the improvement of theoretically predicted vibrational frequencies, are also presented under a proof-of-concept perspective.

1

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

Introduction

Alongside with NMR spectroscopy, analysis of vibrational spectra is one of the most important techniques for understanding the structure and behaviour of matter at the nanoand sub-nano scales. Indeed, vibrational spectroscopic techniques such as Fourier-Transform Infrared (FTIR) and Raman spectroscopy permeate the chemical literature, providing valuable insights for the elucidation of molecular structures 1,2 , the organisation of molecular clusters 3–6 , the role on inter-ion interactions in ionic liquids 7,8 , as well as insights on the organisation of novel nano-structured materials 9–12 . In latter years, theoretical predictions became essential tools, aiding in the interpretation of IR and Raman spectra 7 . In addition to the aforementioned applications, theoretical predictions are a valuable aid in identifying the product of chemical reactions out of several possible outcomes 13–15 , and also allow the calculation of the thermodynamic and kinetic parameters of these reactions 14–17 . In order to tap into this wealth of information, different theoretical methods are available for calculating the vibrational spectra. These include, for example, the direct assessment of the local curvature of the Potential Energy Surface (PES) around the equilibrium geometry 18 , or from a Fourier analysis of the system’s trajectory during Molecular Dynamics (MD) simulations 19–21 . In many cases, the simple match between predicted and observed IR spectra may suffice. However, the interpretation of the vibrational spectra in terms of the atomic motions related to each observed signal can also provide valuable information about the chemical systems, its structure and behaviour 22 . For example, the contribution of N−H stretching motion to the different high-frequency vibrational modes of [EMIN][FAP] ionic pairs has shown that the hydrogen bond formed between EMIN+ and FAP – is not strong enough to to explain the relatively large stability of the ion pairs 8 . More recently, the interpretation of predicted Raman spectra for a number of anthraquinone dyes and lakes allowed Pagliai et al. not only to confirm the structure of such compounds, but also to identify some signals in the experimental Raman spectra as overtones 23 . Moreover, the description of the atomic motions 2

ACS Paragon Plus Environment

Page 2 of 47

Page 3 of 47 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

along vibrational modes provides useful descriptors for the development of Quantitative Structure-Activity Relationship (QSAR) models 24,25 . In order to fully exploit such wealth of information in a chemically meaningful way, it is in many cases desirable to express such motions in terms of intuitive terms, such as the deformation along bond lengths, valence angles and dihedrals 26 , which are familiar to most chemists devoted to the study of molecular properties and reactivity. In their turn, these descriptions can contribute to the development of new predictive models regarding the properties and/or reactivity for a family of structurally related molecules. However, with the exception of a handful of highly localized vibrations, this task requires some degree of complexity reduction, which can be accomplished either by the human operator, or by some mathematical device. In most cases, assignment is carried out by direct visualisation of the atomic displacements on a computer screen 7 . This visual interpretation approach is very simple, but quickly becomes ineffective with the increasing complexity of the systems being studied. Indeed, it is commonly accepted that this technique can only provide qualitative assessments when handling large systems 7 . Moreover, since humans are more sensitive towards large displacements, assignments made only by visual inspection can be biased in favour of the movement of lighter atoms, such as hydrogen 7,26 . A more quantitative interpretation of the vibrational spectra can be pursued using the Potential Energy Decomposition (PED) analysis. Instead of relying on the atomic motions, PED expresses the variation of the potential energy of the molecule as a combination of internal coordinates 26 . This is accomplished by projecting the mass-weighted Hessian matrix (Hw ) from Cartessian space onto a space defined by a set of internal coordinates. The results from PED analysis are particularly attractive to chemists, as the PED can be expressed in terms of localized internal coordinates, such as changes in bond lengths, valence angles, as well as out-of-plane and torsional dihedrals 7,26 . Traditionally, PED analysis required the user to manually devise a set of non-linearly dependent internal coordinates for the system at hand. More recent implementations of PED, such as the one offered by the VEDA

3

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(Vibrational Energy Decomposition Analysis) software 26 is able to generate and optimise a set of internal coordinates adequate for PED analysis, given the geometry and Hw . VEDA is also capable of using the information in the non mass-weighted Atom Displacement Matrix (ADM) to aid the generation of the internal coordinates. Despite this, a poorly chosen initial set of internal coordinates may leave the optimisation process stagnated at a local maximum. As PED relies on the projection of the Hessian matrix into a set of internal coordinates, the vibrational frequencies derived by PED may also differ from those listed in the theoretically predicted spectrum. Moreover, PED analysis has important limitations arising from the ambiguity of its solutions, and require spectroscopic knowledge and thoroughness from the interpreter 26 . Linear bending motions, such as the ones found in alkyne and allene moieties are particularly difficult to handle without specialised human intervention. What is more, accounting for anharmonicity in PED is not easily achieved. Other approaches, such as the localisation of vibrational modes using adiabatic connection schemes 27 , have enjoyed only marginal popularity and their applicability, results and limitations remain relatively unexplored. Newer approaches involving the localisation of molecular vibrations in term of local fragment modes 28 , or a generalised subsystem vibrational analysis 29 have emerged in recent years. These sophisticated techniques either reorganise the nuclear Hessian in terms of localized fragments, or calculate an effective Hessian matrix for each of the fragments. However, human intervention is needed for defining the fragments, and their application is still limited. Indeed, in the case of the local fragment modes method, the inclusion of anharmonic effects remains a non-trivial task 28 . In this work, we propose a novel approach for the interpretation of vibrational spectra based on the decomposition of the atomic motions in terms of displacements along a set of internal coordinates. For this purpose, we explore the application of Bayesian linear regression to improve the interpretability of the results, with the ultimate aim of providing a viable alternative to the well established PED method, without the need for devising a set of linearly independent coordinates. Moreover, an open source implementation of the

4

ACS Paragon Plus Environment

Page 4 of 47

Page 5 of 47 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

methods described in this work is provided for independent auditing of the code, as well as expansion of the code’s capabilities, and a number of potential applications of the results of this method are briefly explored.

2

Methodology

2.1

Generation of Internal Coordinates

The core problem in the computer-aided interpretation of vibrational spectra is to translate the atomic motions associated with each vibrational mode in terms of changes in the internal coordinates, ζi . The geometry of a system of N atoms is defined by the 3N Cartesian coordinates, qi , of their nuclei. Moreover, a chemically intuitive set of internal coordinates must take into consideration the topology of the molecule. This may be inferred by comparing the inter-nuclear distances, li,j , with a table of atomic covalent radii. For the purpose of this work, a bond between two atoms should satisfy the condition

li,j ≤ (1 + α) · (ri + rj )

(1)

where ri , and rj are the covalent radii for atoms i, and j, respectively, and α is a tolerance factor. After obtaining a set of bonds, a quasi-redundant system of internal coordinates is trivially obtained by applying some set operations: 1. Two bonds with one atom in common form a valence angle, with the common atom at its apex. A movement along these coordinates will be referred to as a bending, and represented by the symbol δ. 2. Three bonds with one common atom define an angle between a bond and a plane, hereafter referred to as an out-of-plane angle. A movement along such coordinates will be referred to as an out-of-plane deformation, and represented by the symbol γ.

5

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 47

3. Two angles sharing one common bond, but not the same apex, define an improper dihedral, or torsion of the peripheral atoms. The movement along this internal coordinate will be referred to as a torsion, and represented by the symbol τ . These simple rules allow one to define a set of internal coordinates, which include a number of redundant coordinates for systems with more than 3 atoms. It should be noted that for the purpose of analysing vibrational motions, out-of-plane modes are usually considered for sets of four co-planar atoms, allowing for an additional (and optional) constraint on the application of rule 2. Although a relatively wide arsenal of methods to prune this set is available 30–33 (thus obtaining one or more sets of non-redundant internal coordinates), the following discussion will consider a set of internal coordinates in which at least one coordinate is redundant. Once all desired internal coordinates are defined, their values can be easily computed using simple Euclidean geometry. For the purpose of this work, bond lengths will be expressed in angstrom, while angle amplitudes will be given in radians, with the aim of keeping typical distances and amplitudes approximately within the same order of magnitude. The conversion between internal and Cartesian coordinates can then be carried out using the S matrix, which is defined as Si,j =

∂ζi ∂qj

(2)

were qj is the j th Cartesian coordinate of the molecular system. The derivatives in Equation 2 can be calculated analytically using Wilson’s rules for converting between atomic motion between Cartesian and internal coordinates 34,35 , or numerically, by measuring the variation of ζi over an infinitesimal displacement of qi . Indeed, although the current work and current implementation of our methodology considers four types of internal coordinates (bond lengths, as well as valence, out-of-plane and torsion angles), it should be noted that the following discussion can easily be extended in order to include other types of internal coordinates (including delocalised internal coordinates), as long as the movement along such coordinates is translatable into a vector of atomic displacements in Cartesian space.

6

ACS Paragon Plus Environment

Page 7 of 47 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

While PED requires Hw (with the optional use of the ADM, to devise a set of linearly independent internal coordinates), our approach defines a set of redundant internal coordinates directly from the geometry of the molecular system and requires the ADM as the main source of information regarding the atomic motions associated to each vibrational mode. Although the raw displacements in the ADM can be used, preliminary surveys shown that this approach tends to over-emphasise the displacement of lighter atoms (especially hydrogen). Hence, we define the matrix D as the collection of column vectors with the mass-weighted displacements obtained in the ADM. In addition to this, the first 6 displacements in the ADM (5 for linear systems) describing the translation of the centre-of-mass and the system’s rotation are discarded. Finally, each column of both S (hereafter notated as si ), and each column of D (hereafter, dj ) are normalised to unit length.

2.2

Vibrational Mode Automatic Relevance Determination

At this point, each vibrational mode (dj ) can be considered as a linear combination of all internal coordinates dj =

X

βi,j si

(3)

i

where βi,j represents the contribution of the ith internal coordinate when describing the j th vibrational mode. The weight of each internal coordinate describing each vibrational mode, can then be determined by: |βi,j | × 100 wi,j (%) = P i |βi,j |

(4)

Moreover, it is desirable to define each si , so that a displacement of the system’s geometry by δsi leads to an increase of the length (or amplitude) of the ith internal coordinate for δ > 0, and vice-versa. Assuming this condition, the sign of the βi,j for any given vibrational mode dj provides information regarding the phase of the relative movement along each internal

7

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 47

coordinate. Such information can be incorporated in the results by adapting Equation 4: βi,j wi,j (%) = φ × P × 100 i |βi,j |

(5)

where φ = sgn(βk,j ), with k = arg maxi (|βi,j |), representing the signun function for the highest contribution describing the ith vibrational mode. This device will then provide positive weights for all internal coordinates which movement is in phase with that of the most preponderant internal coordinate describing the vibrational mode, and vice-versa. By presenting the atomic motions in terms of the most relevant changes in internal coordinates (according to the weights wi ), this procedure allows the end-user to classify each vibrational mode using the classical descriptors that gather the most important internal coordinates. Regarding the determination of βi,j , the simplest approach would be to use the minimum least squares method, since an analytical expression for expressing both dj and si is usually not readily available. Moreover, when S contains a large number of redundant internal coordinates, this approach is akin to performing a multi-variable regression using a set of highly correlated explanatory variables, which can negatively affect the results. Indeed, the interested reader might find additional information regarding such an approach in the Supplementary Information. In order to overcome this problem, we propose the use of Bayesian linear regression. For each vibrational mode dj , the Bayesian approach to linear regression considers a probability distribution for Bj = {β1,j , β2,j , . . . , βn,j }. In its initial state, this is described by a prior estimate based on general beliefs about the data 36 . Upon feeding the input data, the βi coefficients are found by Bayesian inference, with an associated precision, λi p(βi |λ) = N (βi |0, Λ−1 )

(6)

where diag(Λ) = {λ1 , λ2 , . . . , λn }). This takes place in an iterative manner until Bj converges within the desired tolerance. 8

ACS Paragon Plus Environment

Page 9 of 47 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Despite the robustness of Bayesian Linear Regression, an educated guess regarding the distribution of Bj is required. A normal prior distribution would correspond to the most common case found in the literature 37 , and tends to penalise large βi,j . As it will be discussed latter, the number of internal coordinates defined by the rules exposed above exceeds the number of vibrational modes for a molecular system with more than 3 atoms. Moreover, it is reasonable to assume that even a highly delocalised vibrational mode is describable by a finite number of internal coordinates much smaller than all those definable by the algorithm outlined in Section 2.1. Hence, a good educated guess for the prior distribution of Bj should conform to an elliptical distribution, which ensures that Bj will present a sparser distribution 38 . In addition to this, it is well known that the use of an elliptical prior mitigates potential over-fitting problems 39,40 without need for direct human intervention (i.e. selection of internal coordinates). This strategy is called Automatic Relevance Determination (ARD) 38 , and the application of ARD to the characterisation of vibrational motions described in this paper will be henceforth referred to as Vibrational Mode Automatic Relevance Determination (VMARD). Finally, it should be noted that although Equation 3 can be extended in order to handle all vibrational modes as an ensemble (D = BS), the current implementation of VMARD treats each vibrational mode independently. This is justifiable by the fact that the columns of D are almost always close to orthogonality (the orthogonality of the eigenvectors of Hw is only slightly offset in D by mass-weighting the displacements). Because of this, the benefits of applying a multi-targeted optimisation of the coefficients βi,j are minimal, and the current approach eases the parallelisation of the optimisation procedure.

3

Computational Details

The implementation of VMARD used in this work is this work is available from the GitHub account of one of the authors 41 , and relies on the Scikit-learn 42 for the linear regression

9

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

and Bayesian inference routines. The source code for this software at the time of writing is also given in the Electronic Supplementary Information. The software requires the ADM, the geometry and the atomic masses of the atoms in the system, which can be read from the output of popular ab initio computer chemistry codes, such as Orca 43 or Gaussian 44 , and can be easily extended to accept the output of other computational chemistry software packages with similar functionality, due to its open source license. Moreover, the code also allows the definition of other types of internal coordinates not implemented at the time of writing. All DFT calculations were performed using the Orca electronic structure software 43 version 3.0.2. In order to access the usability and applicability of the methods described above, four case studies are presented. The first study compared the performance of VMARD with that of PED. For this purpose, the vibrational spectra of methanol and 2-fluoropropane (2FP) were computed at the B3LYP/aug-cc-pVQZ level of theory. The results from the VMP, VMLD, and VMARD approaches were confronted by the reference assignments provided by Jamr´oz using PED, as implemented in the VEDA software 26 . The second study explored the general behaviour of VMARD. For this purpose, geometry optimisation and vibrational frequency calculations were carried out for the set of 25 organic compounds depicted in Figure 1, using DFT at the B3LYP/6-31G(d) 45–47 level of theory. The third study addressed the effect of hydrogen bonding and isotopic substitution on the interpretation of the vibrational spectra. For this purpose, the geometries of acetic acid and its dimer were optimised at the B3LYP/aug-cc-pVQZ level of theory, together with those of deuterium-substituted acetic acid and respective dimers. The infra-red spectra of these species was then calculated at the same level of theory and the results analysed using VMARD. The fourth and fifth study cases under reported in this work explored the application of VMARD as a way to characterise the motion along the reaction coordinate of the oxygen-

10

ACS Paragon Plus Environment

Page 10 of 47

Page 11 of 47 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Figure 1: Structural formulae of the 25 organic compounds selected for studying the behaviour of VMARD.

11

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

transfer step in vanadium-catalysed epoxidations, and also on different aza-Diels-Alder condensations. Both these studies relied on the data generated in previous studies by the authors 16,48 . Finally, the availability of experimental spectroscopic reference data on the NIST Standard Reference Database 49 for compounds 3 to 25 further allowed exploring the ability to use the results from VMARD to enhance the accuracy of harmonic vibrational frequencies. For this end, the equilibrium geometry and vibrational frequencies were also calculated at the B3LYP/cc-pVQZ 50 level of theory. Statistical treatment of the data (linear regression) was performed using the R 51 software package.

4

Results and Discussion

In order to evaluate the performance and utility of the VMARD method, we performed a number of tests, the results of which are presented and discussed in this section. The following discussion is thus organized in the following manner: in the first section, the vibrational analysis of methanol using both PED and VMARD is given. The second section contains a more comprehensive overview of the behaviour of VMARD, based on the results from the VMARD analysis for the set of compounds depicted in Figure 1. The remaining sections present example applications of VMARD, such as the vibrational analysis of the AcOH dimer including isotopic effects (Section 4.3) and the description of reaction coordinates at the transition state (TS) structure (Section 4.4). Finally the potential use of VMARD-generated descriptions as a way to improve the accuracy of the predicted vibrational frequencies is given in Section 4.5. Throughout this work, changes along bond lengths, valence angles, out-of-plane angles and dihedral angles will be referred to as stretching, bending, out-of-plane bending and torsions, and further abbreviated by the symbols ν, δ, γ, and τ , respectively. Such symbols are followed by a description of the atoms defining their respective internal coordinates, in

12

ACS Paragon Plus Environment

Page 12 of 47

Page 13 of 47 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

the form A(i) , where A is the atom’s chemical symbol, and the number (i) corresponds to the atom’s index in the molecule. These indexes are omitted when discussing atomic motions in a more generalized manner. Although the notation used in this work is generally agnostic towards the type of bond formed between the atoms, explicit indication of the bond type may be used to distinguish between two different bond type situations, such as the distinction between νCO in carbonylic (νC−O) and non-carbonylic (νC−O) moieties.

4.1

Comparison between PED and VMARD

A direct comparison between PED and VMARD must take into account the fact that PED and VMD are fundamentally different approaches to the problem of interpreting the atomic motions associated with a vibrational mode. As a result, the output provided by these two approaches may differ considerably. In particular, the PED analysis expresses the potential energy function of a molecule in terms of a set of compound internal coordinates 26 , each of which are a combination of bond lengths, the amplitude of valence angles, or the amplitude of either proper or improper dihedral angles. Usually each of these compound coordinates are linear combinations of only one type of internal coordinates, easing the classification of the vibrational modes among the general categories of stretching, bending, out-of-plane or torsion movements. On the other hand, the VMD approach decomposes the atomic motions in terms of changes in a set of internal coordinates. Despite the different standpoints, the two approaches should, in principle, allow for a similar interpretation of the underlying physical reality they aim at interpreting. Table 1 presents the attributions for the 12 vibrational modes of methanol using VMARD as well as the reference attributions provided by PED 26 , following the numbering scheme depicted in Figure 2. The results displayed in Table 1 show that the attributions derived from the results of the VMARD approach are in excellent agreement with those obtained from PED. The interested reader may also find the results from a similar comparative study (yielding similar conclusions) for the predicted vibrational modes of 2-fluoropropane in Table 13

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

S2 of the Electronic Supplementary Information. Nevertheless, some differences are worthy of mention and discussion. Indeed, the weight of the most prominent internal coordinates in VMARD is almost always smaller than the weights given by PED. This is observed even in the case of highly localised vibrations, such as the O−H stretching motion associated with mode 1 in Table 1. This behaviour emerges from the fact that, while PED requires the optimisation of the set of coordinates used to describe the molecule’s potential energy function, VMARD resorts to a more “brute force” strategy, where every internal coordinate generated by the rules described above can contribute to the description of the atomic motions.

Figure 2: Numbering scheme used for the vibrational interpretation of the methanol molecule reported in Table 2.

Moreover, the weights given by VMARD for the most prominent internal coordinates are usually different among themselves, while PED gives a more balanced description. This can be traced back to the fact that ARD linear regression aims at describing the largest possible variance in the data (i.e. the atomic motions) using the least number of descriptors (internal coordinates) 39,40 . Although the description of the atomic motions provided by VMARD highlight the changes on an usually small number of internal coordinates, the interpretations provided by PED usually consider a smaller number of internal coordinates, as observed in the attributions associated with the vibrational modes 6, 7 and 8 in Table 1. Again, this behaviour can be traced back to the optimisation of the set of internal coordinates (or lack of such a stage) in either methodologies. Although these results may at first glance disfavour the 14

ACS Paragon Plus Environment

Page 14 of 47

Page 15 of 47 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Table 1: Attributions for the 12 vibrational modes of methanol (calculated at the B3LYP/aug-cc-pvQz level of theory), according to PED and VMARD. For each mode and attribution strategy, the description of the atomic motions are given, together with their relevance (in percentage) in describing the atomic motions associated with each vibrational mode. In the description of each vibrational mode, stretching, bending and torsion motions are represented by the symbols ν, δ, and τ , respectively.

1 2 3

Wavenumber (cm – 1 ) 3833 3104 3034

4

2989

A’

5

1509

A’

6

1498

A”

7

1479

A’

8

1366

A’

9

1172

A”

10

1076

A’

11

1040

A’

12

314

A”

Mode

Symmetry A’ A’ A”

PED Description νO(5) H(6) νC(1) H(2) νC(1) H(3) νC(1) H(4) νC(1) H(3) νC(1) H(4) νC(1) H(2) δH(3) C(1) H(4) δH(2) C(1) O(5) δH(2) C(1) H(3)

(%) 100 98 50 –50 39 39 20 63 31 99

δH(2) C(1) O(5) δH(3) C(1) O(5) δH(4) C(1) O(5) δH(2) C(1) O(5) δH(6) O(5) C(1)

δH(2) C(1) H(4) δH(3) C(1) O(5) δH(4) C(1) O(5) νC(1) O(5) δH(6) O(5) C(1) δH(3) C(1) O(5) δH(4) C(1) O(5) δH(2) C(1) O(5) νC(1) O(5) δH(2) C(1) O(5) δH(3) C(1) O(5) δH(4) C(1) O(5) δH(6) O(5) C(1) τ H(2) C(1) O(5) H(6) τ H(3) C(1) O(5) H(6) τ H(4) C(1) O(5) H(6)

15

32 32 32 49 49

20 –40 –40 22 –18 –18 –18 18 75 6 –6 –6 –6 33 33 33

ACS Paragon Plus Environment

VMARD Description νO(5) H(6) νC(1) H(2) νC(1) H(3) νC(1) H(4) νC(1) H(3) νC(1) H(4) νC(1) H(2) δH(3) C(1) H(4) δH(2) C(1) O(5) δH(2) C(1) H(4) δH(2) C(1) H(3) δH(2) C(1) O(5) δH(3) C(1) O(5) δH(4) C(1) O(5) δC(1) O(5) H(6) δH(2) C(1) O(5) δH(3) C(1) H(4) δH(3) C(1) O(5) δH(4) C(1) O(5) δH(3) C(1) O(5) δH(4) C(1) O(5)

(%) 95 73 48 –48 39 39 17 60 21 40 –40 25 23 23 30 16 –13 –12 –12 37 –37

νC(1) O(5) δH(6) O(5) C(1) δH(3) C(1) O(5) δH(4) C(1) O(5) δH(2) C(1) O(5) νC(1) O(5) δH(2) C(1) O(5) δH(6) O(5) C(1)

35 –17 –14 –14 13 67 13 –11

τ H(2) C(1) O(5) H(6) τ H(3) C(1) O(5) H(6) τ H(4) C(1) O(5) H(6)

30 21 21

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

application of VMARD for the interpretation of the vibrational spectra of small and highly symmetrical molecular systems, it is important to notice that the optimisation of a set of internal coordinates is the most time-consuming stage of PED analysis, often requiring expert human intervention. Moreover, as illustrated by the different attributions given to vibrational modes 6 and 9, the results provided by VMARD highlight the A” symetry of the atomic motions, while PED appears to consider the atomic motions on only one side of the molecular symmetry plane. On the other hand, VMARD analysis may be carried out in an unsupervised fashion, with the human intervention limited to the choice of the conventional term which better represents the most prominent internal coordinates describing each vibrational mode. Because of this, VMARD may find usefulness not only as a tool for the direct interpretation of vibrational spectra for larger systems from theoretical predictions per se, but also as a convenient preliminary analysis aiding the choice of relevant internal coordinates to be used on a subsequent PED analysis. These results, together with general good agreement between VMARD and PED warrants usefulness to the approach described in this work.

4.2

Weight distribution and choice of internal coordinates

Despite the good accordance between VMARD and PED shown previously, it is important to understand the overall behaviour of VMARD over a larger range of species. For this purpose, we studied the decomposition given by VMARD for the set of organic molecules depicted in Figure 1. The following discussion is grounded on the VMARD analysis of the 1458 vibrational modes calculated at the B3LYP/6.31G(d) level of theory, for the 25 molecular specied depicted in Figure 1. Figure 3 depicts the weights of the most abundant internal coordinates in the data, as a function of the position of the vibrational mode in the vibrational spectra. For simplicity, the weights were combined into generic categories, such as νCH, νCC, δCCC, or τ CCCC. The results shown in Figure 3a reveal a clear prominence of νCH in the region around 3000 cm – 1 , 16

ACS Paragon Plus Environment

Page 16 of 47

Page 17 of 47 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

while νCC motions are mostly scattered in the region below 1500 cm – 1 . Despite that, there is a number of vibrational modes with a high νCC content which corresponds to the stretching of the C−C bonds. As expected, νCO motions are also distributed around two regions: one concentrated around 1800 cm – 1 and another more dispersed region below 1500 cm – 1 . These two regions correspond to the stretching motions of C−O and C−O bonds, respectively. The presence of ring componds in the set used for this study also justifies the relatively high weight of νCC, νCO, and νCN in the region below 1000 cm – 1 , due to the ring deformations which usually take place at lower wavenumbers. The results shown in Figure 3b further depict the bending of the methylene groups (δCHH) to be an important contributor to the vibrational modes located at around 1500 cm – 1 . On the other hand, δCCH and δCOH motions are more prevalent in the range between 900 cm – 1 and 1400 cm – 1 , in accordance with the vibrational wavenumbers associated with the deformation of the C−C−H moiety observed in alkenes and aromatic compounds. In their turn, δCCC motions trend to be more prevalent in the lower-end of the spectrum. Torsion and out-of-plane deformations increase their overall importance with the decreasing wave number associated to the vibrational mode, as shown in Figure 3c. As expected, bending, torsion and out-of-plane deformations play an insignificant role describing high-wavenumber vibrational modes, as illustrated in Figures 3b, and 3c. Also, the importance of stretching motions decreases with the decrease of the associated wavenumber. Indeed, the distribution of the motion categories depicted in Figure 3 reveals an outstanding agreement between the traditional classification of the vibrational modes for organic compounds 52,53 , and the descriptions provided by VMARD. The results depicted in Figure 3, as well as those shown in Table 1, strongly suggest that VMARD is able to select a small set of internal coordinates to describe each vibrational mode. This corresponds a “tail-heavy” distribution of the weights used to describe each vibrational mode, thus validating the choice for an elliptical prior distribution in the Bayesian regression procedure outlined in Section 2.2.

17

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

a) 100

νCH νCC νCO νCN

Weight (%)

80 60 40 20 0 0

500

1000

1500

2000

2500

3000

3500

Wavenumber (cm-1)

b)

100 δCCC δCCH δCHH δCOH

Weight (%)

80 60 40 20 0 0

500

1000

1500

2000

2500

3000

3500

Wavenumber (cm-1)

c)

100 τCCCC τCCCH τCCHH γCCCH

80 Weight (%)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

60 40 20 0 0

500

1000

1500

2000

2500

3000

3500

Wavenumber (cm-1)

Figure 3: Weight distribution of the internal coordinates as a function of the vibrational mode’s associated wavenumber: a) distribution for the stretching motions νCH, νCC, νCO, and νCN; b) distribution for the bending motions: δCCC, δCCH, δCHH, and δCOH; c) distribution for the torsion and out-of-plane deformations τ CCCC, τ CCCH, τ CCHH, and γCCCH.

18

ACS Paragon Plus Environment

Page 18 of 47

Page 19 of 47 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

One important caveat to bear in mind is the fact that some vibrations (specially those taking pace at low wavenumbers) can be highly delocalised. Indeed, Figure 3 shows that the weight associated with torsion and bending is usually lower than the one associated with stretching motions. In the case of the CH3 torsion in methanol, this delocalisation posed little trouble, since it can be described as the combination of motions along only three internal coordinates, as exemplified in Table 1. It is nevertheless important to understand the behaviour of VMARD when facing delocalised vibrations on larger systems, such as anthracene, (1, in Figure 1), or helicene (2, in Figure 1). For this purpose, the lowest wavenumber vibrational modes of anthracene and helicene (taking place at 93.7 cm – 1 and 48.8 cm – 1 , respectively) are depicted in Figure 4. The interested reader may also find video animations of both modes in the Electronic Supplementary Information. Visual inspection of both modes shows that the mode depicted in Figure 4a corresponds to the curling of anthracene, while the one depicted in Figure 4b corresponds to a closing of the gap between the two extremities of the helicene molecule.

Figure 4: Representation of the vibrational modes of anthracene (a) and helicene (b) with the lowest wavenumber, calculated at the B3LYP/6-31G(d) level of theory. Atomic motions along each vibrational mode are depicted as dark blue arrows.

19

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The description of both vibrational modes depicted in Figure 4 is summarised in Table 2. The results show that VMARD can achieve a description of the vibrational mode of anthracene located at 93.7 cm – 1 using 8 internal coordinates, while being able to describe about 70 % of the atomic motions involved in that mode. Indeed, considering only the four largest local contributions would cover 50 % of the atomic motions, and can be easily interpreted as a rising of the molecule’s extremes (atoms C(1) , C(2) , C(12) , and C(13) ) relative to the original plane of the molecule. On the other hand, the movement along the 48.8 cm – 1 mode of helicene is described by VMARD using 13 internal coordinates. Despite the increase in the number of descriptors, the VMARD covers only about 60 % of the atomic motions involved in that mode. Table 2: Description of the modes associated with the lowest wavenumber vibrations of anthracene (1) and helicene (2) provided by VMARD. The atomic numbering follows the scheme depicted in Figure 4.

1

Wavenumber (cm – 1 ) 93.7

2

(Total) 48.8

Compound

(Total)

Weight (%) 12.5 12.5 -12.5 -12.5 -5.1 -5.1 5.1 5.1 70.4 7.1 6.9 6.7 5.7 4.5 -4.2 4.0 3.9 3.8 3.7 3.4 3.3 3.2 60.4

20

Description τ C(2) C(1) C(6) C(5) τ C(9) C(14) C(13) C(12) τ C(8) C(11) C(12) C(13) τ C(1) C(2) C(3) C(4) τ C(10) C(4) C(5) C(6) τ C(7) C(8) C(9) C(14) τ C(11) C(8) C(9) C(10) τ C(3) C(4) C(5) C(7) τ C(12) C(11) C(15) C(20) τ C(9) C(8) C(12) C(16) τ C(9) C(14) C(18) C(19) τ C(18) C(23) C(25) C(26) τ C(13) C(17) C(22) C(23) τ C(15) C(20) C(21) C(16) τ C(23) C(25) C(26) C(24) γC(10) C(13) C(14) C(17) τ C(4) C(3) C(7) C(11) τ C(11) C(15) C(20) C(21) γC(8) C(12) C(11) C(16) γC(14) C(18) C(19) C(23) γC(1) C(4) C(3) C(8)

ACS Paragon Plus Environment

Page 20 of 47

Page 21 of 47 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

The results displayed in Table 2 prompted the question of whether the use of additional (user-defined) internal coordinates would help describing these poorly localised modes. In the case of anthracene, two hypothesis were tested: in one case, we manually added the two torsions (τ C(2) C(10) C(7) C(12) and τ C(1) C(7) C(10) C(13) ), while in the second experiment two angle bendings were added (δC(2) C(10) C(13) and δC(1) C(7) C(12) ). The first case was mildly successful, with τ C(1) C(7) C(10) C(13) achieving a weight of 15 % in describing the atomic motions associated with the mode depicted in Figure 4a, with a corresponding decrease in the weights of the other internal coordinates listed in Table 2. On the other hand, the addition of the two additional angles to the pool of available internal coordinates did not grant any additional benefits. Regarding the case of helicene, three experiments were carried out involving: a) the addition of all the torsions involving atoms C(15) , C(20) , C(21) , C(24) , C(25) , and C(26) with the C(9) −C(14) bond acting as pivot; b) addition of all angles involving atoms C(15) , C(20) , C(21) , C(24) , C(25) , and C(26) , with atom C(9) functioning as apex; and c) addition of the interatomic distances (pseudo-bonds): C(15) ···C(24) , C(20) ···C(26) , and C(21) ···C(25) . In all cases, the description of the vibrational mode at 48.8 cm – 1 remained essentially unchanged. These results strongly suggest that helicene’s vibrational mode at 48.8 cm – 1 is highly delocalised and that any attempt to reduce this complexity might lead to a loss of information regarding the atomic motions involved. On the other hand, they also offer a possible diagnostic tool to ascertain the validity of VMARD. Indeed, most analysis carried out in this work present a profile similar to that shown in Table 2 for anthracene: there are a few important internal coordinates (with similar weights) followed by an abrupt drop in the weight of the subsequent internal coordinates (usually the most important of these lesser contributions has a relative weight of less than half of the least important of the major contributors). This scenario correspond to the elliptical distribution assumed by VMARD, as previously described. On the other hand, a steady decrease in importance of the major contributors does not correspond to the assumed prior distribution of the weights and can

21

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

be associated with highly delocalised vibrations. Finally, it should be noted that VMARD treats the problem of describing vibrational motions, and not that of their localisation. Indeed, it is possible to introduce delocalised coordinates in the pool of available coordinates, as long as these can be translated to and from Cartesian coordinates using Equation 2, the subsequent methodology outlined in the work can then be applied. Taking into account the behaviour of VMARD explored in this work, it is plausible that such delocalised coordinates will be ranked with high weights by VMARD when describing highly delocalised vibrations, allowing the future application of VMARD to large systems such as carbon nanotubes and peptides.

4.3

AcOH dimer and isotopic substitution

In order to elucidate the relationships between physical properties and the organization of matter at the molecular level, one interesting strategy is to understand how hydrogen bonding and isotopic substitution may affect the vibrational spectra. In this regard, the acetic acid (AcOH) dimer is a particularly suitable candidate for demonstrating purposes, due to the large amount of information available 54,55 . For this purpose, the spectra of the AcOH molecule and its dimer were calculated at the B3LYP/aug-cc-pVQZ level of theory. Furthermore, the hydrogen atoms at the hydroxyl groups (H(5) and H(13) , following the numbering scheme presented in Figure 5) were systematically replaced by deuterium (henceforth denoted by D(5) and D(13) , respectively). The infrared spectra for the resulting five species is depicted in Figure 6. As expected, the substitution of H(5) for D(5) in AcOH shifts the signals associated with the motion of the replaced atom towards smaller wavenumbers. This is demonstrated by the relative positions of signals 1 and 7 , shown in Figure 6a. The results from the VMARD analysis for both species (summarised in Table 3) further confirms that these two signals correspond to νOH motions. On the other hand, the νC−O motions are mostly associated with signals 2 and 8 and are almost unaffected by the isotopic substitution between H(5) and D(5) . 22

ACS Paragon Plus Environment

Page 22 of 47

Page 23 of 47

Figure 5: Numbering scheme used for the vibrational interpretation of the acetic acid (AcOH) dimer. When referring to the interpretation of of the vibrational spectra of an isolated AcOH molecule, the numbering scheme used for the first monomer (atoms C(1) to H(8) ) is used.

Rel. Absorvance (%)

a) 100

ε2 ε8

AcOH AcOD

80 60

ε9

40 20 0 4000

ε1

ε10 ε11 ε4 ε5 ε6 ε12

ε7

3500

3000

ε3

2500

2000

1500

1000

500

Wavenumber (cm-1)

b) 100

Rel. Absorvance (%)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

η1 η5

η10

AcOH---AcOH AcOD---AcOH AcOD---AcOD

80 60 40

η6

η11 η7 η2

20 0 4000

3500

3000

2500

2000

η12 η13 η8 η9 η4 η3 1500

1000

500

Wavenumber (cm-1)

Figure 6: Infrared spectra of AcOH and AcOD (a), together with the infrared spectra for the dimers AcOH···AcOH, AcOD···AcOH, AcOD···AcOD (b). All data was calculated at the B3LYP/aug-cc-pVQZ level of theory. In the case of AcOD···AcOH, only the hydrogen atom at position 13 was replaced by deuterium (Cf. Figure 5).

23

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

A more interesting case can be made for some of the signals in the fingerprint region of the spectra shown in Figure 6a. The most intense signals in the 1200 cm – 1 to 1300 cm – 1 represent a sift of about +85 cm – 1 upon isotopic substitution (signals 3 and 10 in Figure 6a). The VMARD analysis shows that although sharing the same major contributor (νC(1) O(4) ), the movement of other atoms is distributed in a different manner: in the case of AcOH, this 3 is best described as an in-phase νCO (with a minor contribution from the bending of the C(1) −O(4) −H(5) moiety) while the corresponding vibration in AcOD appears to be more strongly localised on the stretching of the C(1) −O(4) bond, with lesser contributions from νC(1) C(2) and δO(3) C(1) O(4) . It could also be argued that 3 actually corresponds to 10 , taking into consideration the three largest contributions used to describe both vibrational modes, this would point towards a large wavenumber shift, which is justifiable given the relatively large contribution of δC(1) O(4) H(5) (or, δC(1) O(4) D(5) in the case of AcOD) when describing the atomic movement along the modes associate with 3 and 10 . Further comparison of the description for the vibrational modes 5 and 6 , with their counterparts in the spectrum of AcOD (11 and 12 , respectively), shows that the description of both modes relies heavily on movement along internal coordinates involving O(4) , which is coherent with the increase in mass for the hydroxyl group in AcOD. As suggested by the results depicted in Figure 6, dimer formation has a profound effect on the infra-red spectrum of acetic acid. A particularly interesting aspect is the fact that the intensity of signals associated with νO−H is boosted, while the signals in the fingerprint region become much less evident, as depicted in 6b. Isotopic substitution on the acetic acid dimer shows some interesting results, which are easily highlighted by VMARD and summarised in Table 4. As expected, hydrogen bonding between the two AcOH molecules causes a red shift in the signals associated with O−H stretching. This is evident when comparing the position of η1 to that of 1 (for AcOH), as well as when observing the positions of η10 and 7 . Both η1 and η10 correspond to an out-of-phase stretching of both O−H bonds. When only one of the hydrogen atoms involved in the hydrogen bonds of the dimer is replaced

24

ACS Paragon Plus Environment

Page 24 of 47

Page 25 of 47 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Table 3: Assignment of the 12 vibrational modes 1 to 12 highlighted in Figure 6a, according to VMARD. The numbering scheme follows the nomenclature established in Figure 5, and the description of each vibrational mode, stretching, bending and torsion motions are represented by the symbols ν, δ, and τ , respectively. Id. 1 2 3

Wavnumber (cm – 1 ) 3742 1811 1200

4

998

5

663

6

585

AcOH Weight (%) 95.0 74.1 22.0 21.5 -15.4 -10.8 24.2 11.9 -10.5 -10.5 38.5 -36.4 17.7 36.7 -22.7 17.2 -12.1 11.4

AcOD Weight (%) 97.6 76.6 31.7 -19.6 -12.5 -10.9 37.0 -12.9 11.3

Description

Id.

νO(4) H(5) νC(1) O(3) νC(1) O(4) νC(1) O(3) δC(1) O(4) H(5) νC(1) C(2) νC(1) O(4) δC(1) C(2) H(8) δC(1) C(2) H(7) δC(1) C(2) H(6) τ O(4) C(1) C(2) H(7) τ O(3) C(1) C(2) H(7) τ C(2) C(1) O(4) H(5) δO(3) C(1) O(4) νC(1) C(2) νC(1) O(4) δC(2) C(1) O(3) δC(1) O(4) H(5)

7 8 9

Wavenumber (cm – 1 ) 2722 1803 1282

10

969

11

615

50.1 -48.5

τ O(3) C(1) C(2) H(7) τ O(4) C(1) C(2) H(7)

12

549

31.4 20.2 -17.1

δO(3) C(1) O(4) νC(1) O(4) νC(1) C(2)

25

ACS Paragon Plus Environment

Description νO(4) D(5) νC(1) O(3) νC(1) O(4) νC(1) C(2) δO(3) C(1) O(4) δC(1) C(2) H(8) νC(1) O(4) δC(1) O(4) H(5) νC(1) O(3)

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

by deuterium, νO−H and νO−D become independent resulting in two signals (η5 and η6 ) shown in Figure 6b. Further analysis of the VMARD results for these two signals shows that there is a minor νC−O contribution involving the oxygen atom functioning as the acceptor in the hydrogen bond. Although these lesser contributions are not high enough to mislead one to classify η5 and η6 as a combination of νO−H and νC−O, they may provide interesting descriptors for unravelling novel structural insights in further studies of hydrogen bonding involving the carbonyl group.

26

ACS Paragon Plus Environment

Page 26 of 47

Page 27 of 47

Table 4: Assignment of the 13 vibrational modes η1 to η13 highlighted in Figure 6b, according to VMARD. The numbering scheme follows the nomenclature established in Figure 5, and the description of each vibrational mode, stretching, bending and torsion motions are represented by the symbols ν, δ, and τ , respectively. Id. η1

AcOH···AcOH Wavnumber Weight Description (cm – 1 ) (%) 33.2 νO(11) H(13) 3164 -33.1 νO(4) H(5)

Id. η5 η6

η2

1753

η3

1471

η4

1458

27

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47

Journal of Chemical Theory and Computation

33.1 -33.0 11.8 -11.7 8.8 -8.8 7.6 -7.5

νO(9) C(10) νC(1) O(3) δH(6) C(2) H(7) δH(14) C(12) H(16) νC(10) O(11) νC(1) O(4) δC(10) O(11) H(13) δC(1) O(4) H(5)

η7

12.5 -12.4 9.7 -9.6 -7.2 6.9

νC(1) O(4) νC(10) O(11) δC(1) O(4) H(5) δC(10) O(11) H(13) νC(1) O(3) νO(9) C(10)

η9

η8

AcOD···AcOH Wavnumber Weight Description (cm – 1 ) (%) 3112 61.9 νO(4) H(5) -6.7 νO(9) C(10) 2281 64.7 νO(11) D(13) -11.9 νC(1) O(3) 1749 35.2 νC(1) O(3) -30.8 νO(9) C(10) 1468 22.2 δH(14) C(12) H(16) -14.5 νO(9) C(10) -11.8 δH(15) C(12) H(16) -9.6 δH(14) C(12) H(15) 9.3 δC(10) C(12) H(15)

1418

20.6 12.1 11.9 -10.9 8.6 -6.0

νC(10) O(11) δC(10) C(12) H(16) δC(10) C(12) H(14) νC(10) C(12) δC(10) C(12) H(15) δO(9) C(10) O(11)

ACS Paragon Plus Environment

Id. η10

AcOD···AcOD Wavenumber Weight Description (cm – 1 ) (%) 2314 42.4 νO(11) D(13) -42.3 νO(4) D(5)

η11

1744

η12

1420

η13

1357

33.9 -33.9 12.9 -12.8 -7.4 7.1 6.8 6.8 -6.7 -6.5 16.0 -16.0 -6.0 6.0 -5.8 5.8

νO(9) C(10) νC(1) O(3) νC(10) O(11) νC(1) O(4) νC(10) C(12) νC(1) C(2) δC(10) C(12) H(16) δC(10) C(12) H(14) δC(1) C(2) H(6) δC(1) C(2) H(7) νC(1) O(4) νC(10) O(11) νC(1) C(2) νC(10) C(12) δC(1) C(2) H(8) δC(10) C(12) H(15)

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Further analysis of the data depicted in Figure 6 is shown in Table 4. VMARD classified the carbonyl stretching signals at about 1750 cm – 1 as an out-of-phase stretching of both C(1) −O(3) and C(10) −O(9) , in accordance with the general consensus regarding this attribution. In addition to this, the two most intense signals in the 1400 cm – 1 to 1500 cm – 1 range attained similar descriptions for the three dimers, albeit with minor variations regarding the importance of the internal coordinates involving atoms in the vicinity of those replaced by deuterium. More interestingly, the results shown in Table 4 remained consistent upon manually adding H(5) ···O(9) and O(3) ···H(13) to the list of internal coordinates used by VMARD. This can be explained by the fact that the lengths corresponding to the H(5) ···O(9) and O(4) −H(5) bonds are highly correlated (the same applying for the O(11) −H(13) and O(3) ···H(13) pair). Because the Bayesian regression approach used in VMARD relies on an elliptical distribution, it will favour a sparser distribution regarding the weights of the internal coordinates. Hence, while simpler linear regression techniques would distribute the weight almost equally among these two redundant descriptors, VMARD will concentrate the weights on the most relevant descriptors, thus revealing the task of devising a set of internal coordinates to be used in the analysis of vibrational spectra, as explained earlier in this work.

4.4

Description of Reaction Coordinates

As mentioned before, VMARD relies solely on the ADM, making it an attractive technique for studying and describing atomic motions at locations other than the minima of the PES, such as transition state structures (TS) and conical intersections. Figure 7 depicts the TS for the oxygen-transfer reaction between VO(O2 )(OH) and two olefinic substrates: an alkene (Figure 7a), and an allylic alcohol (Figure 7b), together with a representation of the atomic motions associated with the reaction coordinate (i.e., the vibrational mode associated with an imaginary wavenumber), hereafter referred to by ξ. These structures were obtained from a previous study concerning the Sharpless epoxidation process 48 , using the X3LYP 28

ACS Paragon Plus Environment

Page 28 of 47

Page 29 of 47 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

functional 56 in conjunction with a mixed basis set of triple-ζ quality 57,58 . The vibrational analysis of these TS structures focused on the characterisation of the atomic motions along ξ. For this purpose, the lengths for the V−O(3) , V−O(4) , O(3) −O(4) , O(3) −C(7) , and O(3) −C(8) were considered in addition to the internal coordinates generated from the rules explained in the Methodology section. The characterisation of both reactions coordinates using VMARD is summarised in Table 5.

Figure 7: Transition state structures for the oxygen-transfer reaction between VO(O2 )(OH) and two olefinic substrates: an alkene (a), and an allylic alcohol (b). The atomic motions associated to the reaction coordinate of each reaction are represented as dark blue arrows.

As shown in Table 5, the reaction coordinate at the TS is mainly characterised by νO(3) O(4) and νV(1) O(4) . This corresponds to the elongation of the O(3) −O(4) bond as O(3) attacks the olefinic bond and also to the strengthening of the V−O(4) , which is in line with 29

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Table 5: Characterisation of the reaction coordinates, ξ, at the transition state structures depicted in Figure 7 using VMARD, taking into consideration only the internal coordinates with an associated relative weight greater that 4%. Stretching and bending motions are represented by the symbols ν and δ, respectively. Substrate Alkene Wavenumber –409.1 cm – 1 νO(3) O(4) 21.0 % νV(1) O(4) –10.2 % νO(3) C(7) –7.8 % νO(3) C(8) –7.4 % δV(1) O(5) H(6) 6.4 % δC(7) O(3) C(8) 5.8 % νV(1) O(3) –4.9 % νV(1) O(19) NA

Alcohol –554.1 cm – 1 20.4 % –13.4 % –9.7 % –13.7 %