Development and Implementation of Advanced ... - ACS Publications

Apr 24, 2017 - been carried out, the set of experimental rotational constants and theoretical corrections, together with the nuclear masses of all iso...
2 downloads 14 Views 2MB Size
Subscriber access provided by HACETTEPE UNIVERSITESI KUTUPHANESI

Article

Development and implementation of advanced fitting methods for the calculation of accurate molecular structures Marco Mendolicchio, Emanuele Penocchio, Daniele Licari, Nicola Tasinato, and Vincenzo Barone J. Chem. Theory Comput., Just Accepted Manuscript • Publication Date (Web): 24 Apr 2017 Downloaded from http://pubs.acs.org on April 26, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Chemical Theory and Computation is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 49 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Development and implementation of advanced fitting methods for the calculation of accurate molecular structures Marco Mendolicchio, Emanuele Penocchio, Daniele Licari, Nicola Tasinato, and Vincenzo Barone∗ Scuola Normale Superiore, Piazza dei Cavalieri 7, I-56126 Pisa, Italy E-mail: [email protected] Abstract The determination of accurate equilibrium molecular structures plays a fundamental role for understanding many physical-chemical properties of molecules, ranging from the precise evaluation of the electronic structure to the analysis of dynamical and environmental effects in tuning their overall behavior. For this purpose the so-called semi-experimental approach, based on a non-linear least-squares fit of the moments of inertia associated to a set of available isotopologues, allows one to obtain very accurate results, without the unfavorable computational cost characterizing high-level quantum chemical methods. In the present work the MSR (Molecular Structure Refinement) software for the determination of equilibrium structures by means of the semi-experimental approach is presented, and its implementation is discussed in some detail. The software, which is interfaced with a powerful graphical user interface, includes different optimization algorithms, an extended error analysis, and a number of advanced features, the most remarkable ones concerning the choice of internal coordinates and the method of predicate observations. In particular, a new black-box scheme for defining automatically a suitable set of non-redundant internal coordinates of A1 symmetry in place of the

1

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

customary Z-matrix has been designed and tested. Finally, the implementation of the method of the predicate observations is discussed and validated for a set of test molecules. As an original application, the method is employed for the determination of the semi-experimental structure for the most stable conformer of glycine.

1 INTRODUCTION The knowledge of the equilibrium structures of isolated molecular systems of chemical and biological interest is of fundamental relevance to gain a deeper understanding of many chemical-physical processes, in the framework of the so-called structure-property relationships. Moreover, accurate equilibrium geometries represent invaluable benchmarks for testing more or less approximate computational methods rooted into quantum or classical mechanics (e.g. see Refs. 1–3 ). Nowadays, the amount of experimental data is ever increasing, but these often depend on the experimental set-up and are influenced by vibrational and/or environmental effects. The molecular structures obtained through isotopic substitution are actually vibrationally averaged properties, but vibrational effects are usually not explicitly considered during the inversion of the spectroscopic data to obtain the required geometrical parameters. As a matter of fact, vibrationally averaged (r0 ) and substitution (rs ) structures 4 depend on the isotopic species under examination. 5,6 In order to overcome these issues, the determination of the equilibrium structure (re ), defined as the geometry related to the minimum of the Born-Oppenheimer (B-O) potential energy surface (PES), has shown to be the most appealing alternative. 5,6 Even though this kind of structure is more challenging to be inferred from the experimental point of view, its determination allows the inclusion of vibrational effects and, within the B-O approximation, 7,8 it is independent from isotopic substitutions. Furthermore, these data can be directly compared with equilibrium structures issuing from quantum chemical computations. In this work we present the new program MSR (Molecular Structure Refinement), specifically designed for computing equilibrium structures based on a non-linear least-squares (NLLS) fitting of the moments of inertia for a given set of isotopologues. 9 This procedure allows semi2

ACS Paragon Plus Environment

Page 2 of 49

Page 3 of 49 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

experimental molecular structures to be determined, by refining the experimental moments of inertia through vibrational corrections, computed, for instance, in the framework of vibrational second-order perturbation theory (VPT2). 10–12 The MSR software has been implemented as a userfriendly code, with a wide range of options and features suitable for different problems/scenarios paying special attention to robustness and generality. In order to achieve this goal, a flexible choice of the optimization algorithm and an extended error analysis have been included. Moreover, one of the most useful features regards the definition of the coordinates employed in the optimization step. The most common set of nuclear coordinates is represented by the Z-matrix internal coordinates (ZICs) which, however, are not unambiguous, inasmuch largely based on chemical intuition. However, as shown by Demaison, 5 the minimum number of coordinates required in order to fully describe the molecular configuration is given by the number of totally symmetric vibrational modes (A1 coordinates). Therefore, a more effective choice for the coordinate set underlying the geometry refinement procedure is represented by the set of A1 coordinates selected from a set of non-redundant internal symmetry coordinates. In this context, in order to eliminate the userdependent choice of the coordinates, a robust protocol for a black-box generation of a set of A1 internal coordinates has been devised. In addition, the method of predicate observations 13,14 has been implemented in the MSR program. This option allows, for example, structural parameters obtained through quantum chemical calculations to be included as additional information, a useful approach when the number of experimental data is not sufficient for defining all the structural parameters of a molecule. Finally, the program is interfaced with a powerful graphical user interface (GUI) from which it can be completely controlled.

2 THE SEMI-EXPERIMENTAL METHOD The analysis of rotational and vibro-rotational spectra provides, among other spectroscopic parameters, the rotational constants of the target molecule (B α ), which are inversely proportional to the principal moments of inertia (I α ). The moments of inertia of different isotopologues permit,

3

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 49

in turn, to determine the structural parameters, such as bond lengths and angles. However, the definition of a good reference molecular geometry is a far from trivial concept, since molecules are not rigid rotors, being affected by vibrational motion. A remarkable improvement has been achieved with the introduction of the so-called semi-experimental structures (reSE ) by Pulay and co-workers. 15 These are determined by a least-squares fit of experimental rotational constants (or the corresponding moments of inertia) of different isotopologues corrected by vibrational contributions calculated theoretically. 5,6 This methodology is currently considered the best approach to obtain accurate equilibrium molecular structures. 16 Reliable vibrational corrections to rotational constants are obtained from the anharmonic cubic force field calculated at a suitable level of theory. In this context, a number of studies has been carried out in our group, 17–19 which demonstrated that anharmonic vibrational corrections obtained by methods rooted into the density functional theory (DFT) provide reSE structures in agreement with the most accurate equilibrium structures reported in the literature, showing in addition a more favorable computational request than refined postHartree-Fock (e.g coupled-cluster) methods. According to the semi-experimental approach, experimental ground-state rotational constants (B0α ) are corrected by vibrational contributions obtained by quantum chemical methods (∆B0α )QM , thus obtaining the semi-experimental equilibrium rotational constants (Beα ): Beα = B0α − (∆B0α )QM

(1)

where β = a, b, c is one of the principal inertial axes. The term (∆B0α )QM is composed of a α α vibrational (∆Bvib ) and an electronic (∆Bele ) contribution:

α α + ∆Bele (∆B0α )QM = ∆Bvib

(2)

The vibrational contribution is introduced in order to take into account the influence of the vibrational motion on the molecular geometry and hence, on rotational constants. Within the VPT2 framework 10–12 the vibrational dependence of rotational constants is accounted for by the following 4

ACS Paragon Plus Environment

Page 5 of 49 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

equation: α B{ν}

=

Beα



X

αrα

r

  1 νr + + ... 2

(3)

α where B{ν} stands for the rotational constants in the vibrational state identified by the quantum

numbers {ν1 , ν2 . . . }, while αrα are the first-order vibro-rotational interaction constants: " αrα = −2(Beα )2

X 3(aαβ )2 r

β

4ωr Ieβ

+

X (ζ α )2 (3ω 2 + ω 2 ) rs

s

r

s

ωr (ωr2 − ωs2 )

 c  21 X k aαα rss r +π 3/2 h ωr s

# (4)

α is an element of the Coriwhere ωr is the harmonic wavenumber related to the rth normal mode, ζrs 10 olis matrix, 20 and aαβ For the ground vibrational r are first-order derivatives of the inertia tensor.

state, the expression for the vibrational contribution can be obtained in the following form: α ∆Bvib = Beα − B0α = # "  c  12 X k aαα X 3(aαβ )2 X (ζ α )2 (3ω 2 + ω 2 ) rss r s r r rs +π − (Beα )2 + β 3/2 2 − ω2) ω (ω h 4ωr Ie r ωr r s r,s r,s β

(5)

It is important to point out that, unlike the αrα terms, the sum of first-order vibro-rotational interaction constants is not affected by resonances (for more details see Ref. 5 ). Vibrational anharmonicity effects are included through the cubic force constants krss , whose calculation represents the computational bottleneck of the quantum chemical calculation. The electronic contribution, 5,10,21,22 needs to be considered when high-accuracy calculations are performed, although it is very often negligible. This correction is due to the influence of electrons on the moments of inertia. In the most used approach electronic corrections are obtained through the rotational g-tensor: 21 α ∆Bele =

me αα α g Be mp

(6)

where me and mp are the mass of the electron and proton, respectively. The rotational g-tensor is

5

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 49

composed of an electronic (ge ) and a nuclear (gn ) component, respectively defined as follows: 5  2  ∂ E h ge = − µn ∂B∂J e 1 X gn = Zγ (rγ · rγ 1 − rγ rγ )I−1 e µn γ

(7)

where Zγ is the charge of the nucleus γ, µn is the nuclear magneton, B and J are the magnetic and the rotational angular momenta evaluated in the principal axis system, while rγ are the Cartesian coordinates of the nucleus γ. The geometrical structure of a molecular system can be obtained from the rotational constants (or the inertia moments), exploiting the fact that they are in general non-linear functions of the internal degrees of freedom. Let us remark that in most cases the knowledge of the moments of inertia related to a single isotopologue is not sufficient to evaluate the whole set of internal degrees of freedom, whose number is generally larger than three above triatomic molecules. Therefore, the rotational constants of different isotopologues are employed in a least-squares fitting procedure, due to the fact that within the B-O approximation they have the same equilibrium geometry. It is noteworthy that this methodology can be used to determine both effective molecular structures (r0 ) and equilibrium molecular structures (re ), although the use of effective and substitution (rs ) structure should be strongly discouraged because of the limitations pointed out in the introduction. As reported by Demaison and co-workers, 5 the strategy underlying the evaluation of the semi-experimental structure can be rationalized through different steps. First, the goodness of experimental data is tested by means of the calculation of the r0 structure. This operation is followed by the computation of the cubic force field for all isotopic species, allowing one to obtain vibrational corrections, and then the semi-experimental structural parameters. Finally, different basis sets and methods of calculation can be used until a certain stability of results is reached.

6

ACS Paragon Plus Environment

Page 7 of 49 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

3 IMPLEMENTATION AND DESCRIPTION OF THE MSR PROGRAM The previous section has presented the theory of the semi-experimental approach for the evaluation of equilibrium structures, based on a NLLS procedure. The present section illustrates how this theoretical framework has been implemented in the MSR package. The fit is generally preceded by a quantum chemical geometry optimization, the resulting structure being used as guess for the NLLS procedure. Anharmonic vibrational analysis is then required in order to compute the whole set of theoretical corrections required for the semi-experimental method. Once these operations have been carried out, the set of experimental rotational constants and theoretical corrections, together with the nuclear masses of all isotopologues represent the minimal set of input data needed for the optimization. The MSR software has been developed with the target of being as flexible and user-friendly as possible, through the following features: • availability of different optimization algorithms, supported by an appropriate treatment of geometrical constraints; • implementation of a detailed error analysis; • inclusion of different types of molecular coordinate sets, also exploiting molecular symmetry; • inclusion of predicate observations; • availability of an intuitive graphical interface. In addition to the optimization procedure, the software includes two other general procedures, namely detection and scan functions, which will be briefly discussed at the end of this section together with the available options. The MSR program has been included in the VMS software 23 as a standalone module. Therefore, it is supported by a complete graphical interface from which it can be easily controlled, and 7

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 49

the results of the optimization are fully displayed. A snapshot of the graphical interface is depicted in Figure 1.

Figure 1: Graphical interface supporting the MSR software: (a) input panel, (b), visualization panel, (c) results panel, (d) optimization trend panel, (e) correlation matrix panel.

3.1

Optimization algorithms

As anticipated above, the MSR software features a variety of optimization algorithms that will be (briefly) reviewed in this subsection. Due to the non-linearity and non-convexity of the residual function with respect to the variables x, the optimization has to be performed through an iterative procedure, in which the correction vector for the geometry at each iteration k (∆xk ) can be found by solving a linear system of the type:

Dk ∆xk = −gk 8

ACS Paragon Plus Environment

(8)

Page 9 of 49 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

where Dk is a general symmetric and positive definite matrix and gk is the gradient computed at the k th iteration. Before computing the updated solution (xk+1 ), the displacement vector can be scaled by a factor αk : xk+1 = xk + αk ∆xk

(9)

which ensures that the value of the function decreases at each iteration. This operation is commonly known as line-search, which in this work has been carried out through the Armijo rule. 24 Before moving toward the algorithms implemented within MSR, let us recall the definition of the residual function R(x), which corresponds to the objective function to be minimized during the NLLS procedure: R(x) =

NO X

ri (x)2

(10)

i=1

where ri (x) represent the residuals, whose expression is dependent on the model employed, and NO is the number of observations exploited. One of the simplest optimization algorithms is represented by the gradient descent method 25,26 (also known as steepest descent method), where the step is proportional to the negative of the gradient. Gradient descent is characterized by a relatively small computational cost, but besides being often subject to a relatively slow convergence near the minimum, it can be responsible of “zigzag” events. 26 From this point of view, Newton’s method 27 can be a valid alternative. In this context, the objective function is locally approximated as a quadratic function. The displacement step is then given by: ∆xk = −H−1 k gk

(11)

where Hk is the Hessian matrix computed at the k th iteration. Newton’s method is generally more efficient than steepest descent when regions about the minimum are taken into account, but it requires the computation of the analytical Hessian matrix at each iteration of the optimization, which may be too expensive in most cases. Moreover, the convergence properties are guaranteed only if the Hessian matrix is positive definite at each step, which is not a straightforward condition for highly non-linear and non-convex functions. These drawbacks can be partially avoided by quasi9

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 49

Newton methods, 28 where an approximation of the Hessian matrix is exploited. For this reason, they can be used when the Hessian is unavailable or unaffordable due to the high computational cost. One of the most successful quasi-Newton methods is the Broyden-Fletcher-Goldfarb-Shanno (BFGS) variant, 27 where an Hessian-like matrix Bk is used as an approximation to the true Hessian. There, the Sherman-Morrison formula 29,30 is commonly employed for updating directly the inverse of the Bk matrix, required to obtain the displacement step. MSR also features the Gauss-Newton and the Levenberg-Marquardt methods. In the Gauss-Newton method 25,26 the optimization step is obtained through the usual expression:  −1 T 12 ∆xk = − JTk Wk Jk Jk Wk Rk

(12)

where Jk , Wk and Rk are the Jacobian matrix, the weight matrix and the residual vector computed at the k th iteration. As specified in equation (12), weights of observations can be improved during the optimization, the resulting method being commonly referred to as iteratively reweighted leastsquares (IRLS). 31,32 In the MSR software, this procedure can be performed through three different updating schemes, namely (i) the biweight function, (ii) the Huber weights and (iii) the criterion proposed by Watson, 33 respectively. Within all these methods, weights are updated depending mainly on the values of residuals. It should be pointed out that the convergence of Gauss-Newton procedure is not always verified. In fact, the approximation exploited in this method is valid if residuals are relatively small about the minimum or if they are only slightly non-linear. Furthermore, the approximated form of the Hessian matrix is semidefinite positive, and this implies that singularities can occur. In order to avoid this possibility, the Levenberg-Marquardt algorithm 34 can be exploited, where equation (12) can be improved by means of the addition of a positive term Λk = λk diag(JTk Wk Jk ), where λk is called “damping factor”. 35 In the MSR software, the damping factor is updated according to the trust region approach. 35 All the algorithms discussed in this paragraph have been included in our program, since different variants can be advantageous in different situations.

10

ACS Paragon Plus Environment

Page 11 of 49 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

It is noteworthy that inclusion of constraints is often necessary, for example to guarantee molecules to maintain the symmetry point group during the whole optimization process. The constraints adopted for molecular geometry determinations are usually linear and then the elimination method 26 can be used to treat equality constraints. In this context, a sub-set of variables (xB ) is expressed in terms of the other ones (xR ) by means of a linear transformation:

xB = QxR + M

(13)

This expression leads to a new version of the problem, where the function F (xR ) to be minimized is different from the initial one, and it is dependent on only a sub-set of the variables. As a consequence of the elimination method, the objective function, gradient and Hessian matrix are dimensionally reduced, and the optimization can be carried out by employing one of the algorithms discussed above.

3.2

Error analysis

The MSR software is provided with an extended error analysis, which is fundamental to check the reliability of numerical results. The main features of error analysis implemented into the program are briefly presented in this section. An essential estimator of the quality of the fit is represented by the standard deviation s: 5 sP s=

n i=1

Wii ri2 (x∗ ) n−p

(14)

where n is the total number of observations, including also predicate observations, if present (see later), p is the number of optimized parameters and x∗ is the vector containing the optimized variables. It is noteworthy that this definition takes into the proper account the number of degrees of freedom. As a next step, the asymptotic variance-covariance matrix C(x∗ ) is determined,

C(x∗ ) = s2 (x∗ )H−1 (x∗ )

11

ACS Paragon Plus Environment

(15)

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 49

where H(x∗ ) is the Hessian matrix of the residual function computed at the optimized geometry. The standard deviations of optimized parameters are represented by the square roots of the diagonal elements of the variance-covariance matrix:

s(xj ) =

q

Cjj (x∗ )

(16)

Despite standard deviations are often employed as reliable estimations of uncertainties of parameters, a more robust estimation is offered by confidence intervals:

xj − ts(xj ) ≤ xj ≤ xj + ts(xj )

(17)

t being given by the Student distribution. 36 As sketched in Figure 1, the confidence intervals and the optimized values of all the internal coordinates employed in the fit are sketched in panel (c) of the graphical interface, while the output structure is graphically visualized in panel (a). The variance-covariance matrix is used for the calculation of the correlation matrix ρ(x∗ ), whose elements are given by:

ρij (x∗ ) = p

Cij (x∗ ) Cii (x∗ ) · Cjj (x∗ )

(18)

The correlation matrix is graphically visualized by means of a gray colorscale plot (see Figure 1d), where color saturation provides an intuitive indicator of the absolute value. The correlation matrix often plays an important role in analyzing results, allowing one to quantify correlations between different groups of parameters that, if strong, can lead to an ill-conditioned optimization problem. From this point of view, another powerful indicator of ill-conditioning is represented by the condition number κ(J), 37,38 where J is the Jacobian matrix whose columns have been normalized. First, the scaled Jacobian matrix J is transformed through a singular value decomposition (SVD): 39

J = UΣV

12

ACS Paragon Plus Environment

(19)

Page 13 of 49 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

The condition number corresponds to the maximum of all condition indices ηk , defined as follows: 5,38 ηk =

µmax µk

(20)

where µk are the singular values of J, and µmax is their maximum. The decomposed form of the matrix J is also used to derive the variance-decomposition proportions πjk : 38 2 Vkj µj =  2 p P Vki µi i=1 

πjk

k, j = 1, . . . , p

(21)

Finally, the analysis of leverages is carried out, by identifying the diagonal elements of the hat matrix H, 38 H = J(JT J)−1 JT

(22)

connecting computed and measured observations. This kind of analysis is helpful for highlighting observations that have a foremost impact on the results of the fit.

3.3

Non-redundant internal coordinates

In Section 3.1, the optimization algorithms implemented in the MSR program have been described with reference to a generic variable set x, without any assumption about the coordinates. In this section, the coordinate sets used, which correspond to curvilinear coordinate sets, are briefly discussed. The first problem characterizing the use of internal coordinates arises from the fact that the transformation from the space of Cartesian coordinates to the space of internal coordinates is in general non-linear. The simplest set of internal coordinates is represented by the so-called primitive internal coordinates (PICs), which are based on the identification of a set of bonds between atoms. Next we can define: • the lengths rij , between bonded atoms i and j: rij = ||rij ||2 = ||ri − rj ||2 13

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 49

• the bond angles αijk , between adjacent bonds i-j and k-j: cos (αijk ) = eij · ejk • the dihedral angles γijkl , between the planes identified by i-j-k and j-k-l: (eij × ejk ) · (ejk × ekl ) sin (αijk ) · sin (αjkl )

cos (γijkl ) =

where rij is the Cartesian vector joining the atomic centers i and j, and eij = rij /||rij ||. In order to find a simple way to define the internal coordinates in terms of the Cartesian ones, a Taylor series can be employed:

si =

 3N  X ∂si j=1

∂xj

eq

(xj −

xeq j )

 3N  1 X ∂ 2 si eq + (xj − xeq j )(xk − xk ) + . . . 2 j,k=1 ∂xj ∂xk eq

(23)

where s represents the vector containing the internal coordinates. The above expression can be truncated at the first order when small displacements from the reference configuration are concerned: si =

3N X

Bij (xj − xeq j )

(24)

j=1

where the Wilson B matrix 40 with elements,  Bij =

∂si ∂xj

 (25) eq

has been introduced. Basically, even though Cartesian coordinates are univocally defined, different choices of internal coordinates can be performed, and some of them can be affected by redundancies. A redundant set is generally characterized by a number of coordinates larger than the number of normal modes and it has to be properly built, i.e. able to span all internal degrees of freedom of the molecule under investigation. Conversely, there are as many non-redundant internal coordinates as normal modes. Therefore, in order to allow their use from a practical point of view, some criteria to extract non-redundant internal coordinates starting from a set of redundant ones must be provided. 14

ACS Paragon Plus Environment

Page 15 of 49 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Generally, non-redundant internal coordinates s are obtained through a linear transformation from the primitive internal coordinates sPIC : 41

s = AsPIC

(26)

where A is a Nv × NPIC matrix containing the contribution of each redundant internal coordinates for all non-redundant ones, Nv being the number of normal modes and NPIC the number of PICs. As already discussed, PICs include all bond lengths, bond angles and dihedral angles and hence they are very intuitive. However, before using them in the evaluation of non-redundant internal coordinates, some operations have to be performed. First of all, the coherence among dihedral angles must be checked, to ensure that the difference between any pair of them lies between 0 and π. In a second step, out-of-plane coordinates need to be included if planar centers are present. Schlegel and co-workers provided a convenient definition of out-of-plane coordinates, 42 as improper dihedrals between non-bonded atoms. Finally, linear bending angles require a specific treatment as shown by Wilson, 43 due to the fact that they generate divergent terms in the definition of the Wilson B matrix. Once these preliminary operations have been performed, the identification of redundancies can be carried out. Non-redundant internal coordinates have been first formulated by Pulay and co-workers, 44,45 through the definition of the natural internal coordinates (NICs). Within the definition of NICs, bond lengths are considered as individual coordinates, while bond and dihedral angles are linearly combined depending on the local symmetry of the molecular system. A very interesting property of NICs is that they can often minimize the coupling between different coordinates. 44 Besides, their definition can be standardized for typical functional groups/coordination numbers as suggested by Pulay et al. 44 On the other hand, automatized procedures for generating NICs are not straightforward, and they can still contain redundancies, making a posteriori modifications sometimes necessary. A second possible choice is represented by Z-matrix internal coordinates (ZICs) which are

15

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

widely used for determining equilibrium structures. These are represented by some sub-set of redundant internal coordinates. Nevertheless ZICs are not univocal and largely based on chemical intuition. As a consequence, different choices of ZICs could lead to different results, implying a strong user-dependency. In addition, ill-defined Z-matrices may cause the optimization procedure to not converge. One of the most important aspects characterizing a well-defined optimization process is represented by molecular symmetry. In fact, the symmetry of the molecular system has to be preserved during the whole optimization. When an optimization is performed by employing ZICs, the preservation of symmetry is generally guaranteed through the definition of appropriate geometrical constraints applied to the internal coordinates. Although this strategy has proven to be very efficient, several dummy atoms may be required when complex molecular topologies are investigated, implying the introduction of numerous artificial degrees of freedom. From this point of view, a possibly more convenient choice is represented by the set of totally symmetric coordinates (A1 coordinates) selected from a set of non-redundant internal symmetry coordinates. The so-called delocalized internal coordinates (DICs) 46–50 are particularly suitable for this purpose, since they are symmetry coordinates based on a relatively simple strategy for the identification of the redundancies. For the sake of completeness, the procedure necessary to compute DICs implemented into MSR is outlined in the following. As a first step, PICs are defined on the basis of the connectivity matrix. The identification of all bonded atoms is then performed, allowing one to define all bond lengths, bond angles and dihedrals. As stated above, out-of-plane coordinates can be included as well, and linear bends are replaced by two auxiliary bending coordinates. At this point, the Wilson B matrix in terms of redundant internal coordinates is computed. Linear dependencies of Wilson B matrix are in general diagnostic of the presence of redundancies among internal coordinates. As a consequence, by means of the computation of the matrix product G = BBT , redundancies can be identified as the eigenvectors of the G matrix corresponding to null eigenvalues. Therefore, the G matrix is

16

ACS Paragon Plus Environment

Page 16 of 49

Page 17 of 49 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

calculated and successively diagonalized:   Λ 0 BBT (UR) = (UR)   0 0

(27)

where U collects the non-redundant eigenvectors of G, while R contains redundant eigenvectors. In other words, U contains all coefficients by means of which DICs are expressed in terms of redundant internal coordinates. Once the set of symmetry coordinates has been defined, a robust strategy has been devised to detect A1 coordinates among the whole set. With this aim, the chain of steps is schematized in the following. First, DICs associated to the guess geometry are computed. Due to the fact that the expression of DICs is unavoidably dependent on the symmetry of the guess molecular structure, the latter has to be properly provided. In the second place, the symmetry point group and the representative matrices of all symmetry elements of the group are identified by following the algorithm proposed by Beruski and Vidal, 51 based on the determination of symmetry equivalent atoms (SEA). For each DIC, a displacement along a selected coordinate is carried out, and the resulting DICs sd are evaluated. Once the displaced geometry has been generated, it is converted to Cartesian coordinates. This step is requires an iterative procedure, 52,53 due to the fact that the relation between Cartesian and internal coordinates is non-linear and non-invertible. Finally, the representative matrices of all symmetry elements of the molecular point group are applied to the Cartesian coordinates related to sd vector, and only if the molecular geometry remains unchanged the internal coordinate is marked as totally symmetric. These steps are repeated systematically until all A1 coordinates have been identified. As a further check, all elements of G matrix (expressed in terms of DICs) coupling A1 coordinates to the other ones are checked. The identification of A1 coordinates is considered correct if all such couplings vanish. Hence, all coordinates not belonging to A1 irreducible representation are kept fixed at their initial values, while the A1 coordinates are free to vary during the optimization. Let us remark that the approach described above has two main advantages. The first one is that DICs allow for an automatized and transparent procedure for computing a set of non-redundant internal coordinates. The second advantage is that displace17

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 49

ments of the geometry through this kind of coordinates do not affect the global symmetry of the molecule. Hence, unlike other sets of coordinates, no geometrical equality constraints are needed. Although the strategy based on the use of A1 coordinates is potentially applicable to molecules of any symmetry, it is particularly advantageous as the symmetry increases. On the other hand, for non-symmetric systems the selection of A1 coordinates becomes unnecessary. Furthermore, in some cases the use of localized internal coordinates such as NICs may be preferable. For this reason, the code can support any type of internal coordinates, through a tool that allows the user to provide as input both redundant and non-redundant internal coordinates. Because DICs mix internal coordinates, including rigid and floppy degrees of freedom, Swart et al. 54 proposed to compute them starting from a set of weighted redundant internal coordinates, in order to reduce the coupling between different degrees of freedom (e.g. bond lengths and bond angles). This procedure can be carried out by multiplying each row of the redundant Wilson B matrix by specific weight factors. The weight factors proposed by Swart are: ω(rij ) = ρij 1

ω(αijk ) = (ρij ρjk ) 2 [f + (1 − f ) sin αijk ] (28)

1 3

ω(τijkl ) = (ρij ρjk ρkl ) [f + (1 − f ) sin αijk ] [f + (1 − f ) sin αjkl ] 1

ω(θijkl ) = (ρij ρjk ρjl ) 3 [f + (1 − f ) sin αijk ] [f + (1 − f ) sin αijl ] where rij , αijk , τijkl and θijkl define bond lengths, bond angles, dihedral angles and out-of-plane coordinates, respectively; f is a scalar equal to 0.12 and ρij is the so-called screening function:  −

ρij = e

rij Cij

 −1

(29)

Cij being the sum of the covalent radii of atomic centers i and j. As a concluding remark, it should be pointed out that the symmetry of observations must be coherent with the symmetry of the system, no matter which set of internal coordinates is employed during the optimization. For example, it is not uncommon that the set of input data contains 18

ACS Paragon Plus Environment

Page 19 of 49 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

the rotational constants related to a mono-substituted species, with the substituted atom being symmetrically equivalent to other ones. As a consequence, if a single set of nuclear masses is used in the optimization (where only one of the SEA is substituted), an artificial asymmetry is forced on the structural parameters during the optimization. This is due to the fact that at variance with calculations, where SEA are labeled (and then distinguishable), they are actually indistinguishable at the experimental level. In the MSR software the presence of equivalent substitutions that may be present in a molecule is automatically checked, and possible missing “equivalent isotopologues” are internally added to the set of observations, in order to have a well-conditioned problem in terms of symmetry.

3.4

Method of predicate observations

The determination of the molecular geometry through the semi-experimental method requires a number of observations sufficiently large to ensure all internal coordinates to be correctly determined, i.e. a relatively large number of isotopic substitutions. However, this condition may not be fulfilled. The consequent loss of information can seriously affect the quality of the fit, making it impracticable in some cases. In addition, the presence of strong correlation between variables can negatively affect the accuracy of structural parameters as well. 5 A first approach generally adopted to overcome this issue is fixing a subset of parameters to values obtained through quantum mechanical methods. Even though this strategy is simple and often allows satisfactory results to be achieved, a different approach, known as method of predicate observations, 55–57 has been proposed. This method is based on the use of estimates of structural parameters as additional input data in the fitting procedure weighted appropriately, usually according to their assigned uncertainties. This approach, in addition of being more flexible with respect to treating theoretical parameters as rigid constraints, should lead to better estimates if weights are carefully chosen. 55 With the introduction of the predicate observations approach, the NLLS theory presented in the previous sections remains mostly unchanged, except for the fact that inclusion of predicates leads to an augmented set of data for the fitting. As a consequence, an additional term has to be 19

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 49

considered in the expression of the residual function:

R(s) =

NO X

2

Wii ri (s) +

i=1

Np X

(p)

wi (si − si )2

(30)

i=1

(p)

where Np is the number of predicates, s represents the set of internal coordinates employed, si is the ith predicate value and wi its associated weight. Weights are usually defined starting from their assigned uncertainties σi , wi = 1/σi2 . For the sake of readability, let us suppose that a predicate is associated to each internal coordinate. In this case, both residual vector and Jacobian matrix have to be modified as follows: 





r1     ..   .        rNO    Rp =   (p)  s1 − s1      .   ..     (p) sNp − sNp



 (∂r1 /∂s1 ) (∂r1 /∂s2 )  .. ..  . .    (∂rNv /∂s1 ) (∂rNv /∂s2 ) Jp =    −1 0   .. ..  . .   0 0

... ... ... ... .. . ...

(∂r1 /∂sNv )   ..  .    (∂rNv /∂sNv )    0   ..  .   −1

(31)

As expected, by using the above definitions of R and J in the context of Gauss-Newton method, additional contributions arise for the gradient and the Hessian matrix. Analytical expression for gradient and Hessian in matrix notation are reported in Table 1. Table 1: Expressions of gradient and Hessian with and without predicates. Without predicatesa Gradient Hessian

T

2J WR 2JT WJ

With predicates 2JT WR − 2wrp 2JT WJ + 2w

a) Gauss-Newton algorithm.

It follows that such modifications are not invasive, and further the contribution of w to the Hessian matrix can be mathematically useful during the inversion step. In the literature, 58–69 several studies of molecular structures exploiting this type of calculations 20

ACS Paragon Plus Environment

Page 21 of 49 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

have been performed, proving the validity of the methodology in this research field.

3.5

Overview of the program MSR

In the previous sections, technical details about the computational algorithms implemented in MSR have been presented. In this paragraph the features of the program are briefly discussed, in order to further clarify available options and types of calculations. In addition to the optimization function, which has been object of interest up to now, two other features are available. The first one is the detection modality, which performs a characterization of the geometry, by computing and analyzing the analytical gradient and Hessian matrix of the residual function. If the gradient vanishes, i.e. each component is (in absolute value) less than a specific threshold, the spectrum of eigenvalues of the Hessian matrix is calculated, and the stationary point is classified. The second feature is the scan option, which allows a one-dimensional scan of the residual function along a specific internal coordinate. This can be useful when the trend of the residual function has to be checked in more detail. It should be pointed out that the three features (optimization, detection and scan) can be used in the same calculation. The input data required in order to carry out a calculation are the guess geometry, the masses of all isotopologues, the optimization algorithm (selected among those described in Section 3.1), the internal coordinates (see Section 3.3) and the specfication regarding the type of calculation to be performed. Moreover, other options can be specified, such as the inclusion of predicate observations. All the data and options required to run the program can be specified from the input section (see Figure 1a) of the graphical interface of the MSR-module of VMS. The output structure depends on the program feature used, even though it is clear that, if detection or scan procedure are used, the associated output is quite simple. In the context of the optimization procedure, the output contains results of the optimization step-by-step, among which there are current internal and Cartesian coordinates and the standard deviation. Once the optimization is completed, the relevant output is shown in a dedicated panel (see Figure 1b) together with the values of the internal coordinates employed in the optimization and the corresponding 21

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

confidence intervals (Figure 1c). Furthermore, details about the optimization procedure are also summarized: the trend of the residual function is plotted (Figure 1d) and by clicking on the point corresponding to a given iteration, all the relevant information can be analyzed. A specific section for the symmetry analysis is also present, where the molecular point group and the class of rotor are reported. If A1 coordinates are employed, all results concerning the identification of these coordinates are reported as well.

4 RESULTS AND DISCUSSION Several studies have been carried out in order to confirm the robustness of our algorithm. The structural optimization procedure in terms of standard Z-matrix (as well as tests concerning the different optimization algorithms and error analysis) has been extensively tested in a previous work 19 where semi-experimental equilibrium structures have been obtained for oxygen- and sulfurcontaining molecules. Therefore, in the present work the attention will be focused on applications concerning the use of A1 coordinates, as well as on the inclusion of predicate observations. Three test-cases are presented for both kinds of calculations, and in addition, the predicate method is applied to determine the semi-experimental structure of the most stable conformer of glycine.

4.1

Symmetry optimization

Concerning the use of symmetry during the optimization process, the test-case molecules presented in the following have been chosen to span different classes of rotors. In particular diacetylene (see Figure 2a) has been selected as representative of linear molecules, while ketene (see Figure 2b) and benzene (see Figure 2c) as examples of asymmetric and symmetric tops, respectively.

22

ACS Paragon Plus Environment

Page 22 of 49

Page 23 of 49 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Figure 2: Molecular structures of diacetylene (a), ketene (b) and benzene (c).

In order to confirm the validity of this methodology, the semi-experimental equilibrium structures presented in previous works 18,19 have been computed through the use of both ZICs and totally symmetric internal coordinates selected from weighted DICs (from now on referred to as A1 -WDICs). Only a brief description of the semi-experimental data used for the calculations will be given, since they are already present in the reference papers. 18,19 All experimental (ground-state) rotational constants related to the investigated systems 70–74 have been corrected through vibrational α and electronic corrections. Vibrational corrections ∆Bvib have been computed at DFT level, by

using the double-hybrid B2PLYP functional 75–77 in conjunction with a doubly polarized triple-ζ α have been obtained cc-pVTZ (hereafter VTZ) basis set. 78 Instead, electronic corrections ∆Bele

by using the hybrid B3LYP functional 79,80 coupled with the aug-cc-pVTZ (hereafter AVTZ) basis set. 78,81 The resulting semi-experimental rotational constants are converted internally to the respective moments of inertia, which are then employed as input data for the optimization procedure. The results of symmetry optimizations for the three molecules are summarized in Table 2, where the final values of A1 -WDICs for all compounds are reported. For the sake of completeness, the definitions of A1 -WDICs in terms of redundant internal coordinates are specified in Tables S1, S2 and S3 of the Supporting Information (SI).

23

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 49

Table 2: Semi-experimental A1 -WDICs (Bohr/radians) of C4 H2 , C2 H2 O and C6 H6 . C4 H2 internal coordinate s1 s2 s3

Value

Standard Deviationa

Confidence Intervala

1 2 1

2 4 2

4.9696 0.5559 0.438

C2 H2 O internal coordinate

Value

s1 s2 s3 s4

2.5304 −3.5328 −0.5253 −0.308

C6 H6 internal coordinate

Value

s1 s2

8.1348 −0.5551

Standard Deviationa

Confidence Intervala

1 2 2 1

2 4 9 2

Standard Deviationa

Confidence Intervala

1 2

4 5

a) All fits have been performed on moments of inertia equally-weighted. The standard deviations and confidence intervals are rounded to 1·10−4 if smaller than this value.

For comparison, the structural parameters have been also derived by means of ZICs. In this context, Z-matrices have been defined to retain molecular symmetry during the optimization (for more details see Tables S4, S5 and S6 of the SI). Finally, the whole set of data obtained in terms of A1 -WDICs has been converted to the space of ZICs. The two sets of structural parameters of each molecule are reported and compared in Table 3.

24

ACS Paragon Plus Environment

Page 25 of 49 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Table 3: Semi-experimental ZICs (distances in Å, angles in degrees) of C4 H2 , C2 H2 O and C6 H6 . A1 -WDICs converted to ZICsa Value C4 H2 r(C≡C) r(C−C) r(C−H) κ σ[uÅ2 ] r C2 H2 O r(C=O) r(C=C) r(C−H) a(C=C−H) κ σ[uÅ2 ] r C6 H6 r(C−C) r(C−H) κ σ[uÅ2 ] r

ZICsa

St. Dev.

Conf. Int.

1.2077 1.3734 1.0613 42 0.0007 10

1 4 3

2 8 6

1.1606 1.3124 1.0764 119.05 42 0.0007 10

3 4 1 1

1.3916 1.0799 16 0.0007 2

1 1

Value

St. Dev.

Conf. Int.

1.2077 1.3734 1.0613 331 0.0007 10

1 4 3

2 8 6

7 8 2 2

1.1606 1.3124 1.0764 119.05 194 0.0007 10

3 4 1 1

7 8 2 2

1 2

1.3916 1.0799 7 0.0007 2

1 1

1 2

a) All fits have been performed on moments of inertia equally-weighted. The standard deviations and confidence intervals are rounded to 1·10−4 for lengths and 1·10−2 for angles if smaller than these values. κ is the condition number of the fit, σ is the residual standard deviation and r is the number of degrees of freedom.

As evident from Table 3, results stemming from the use of A1 -WDICs exactly coincide with those obtained through ZICs, both in terms of structural parameters and standard deviations, validating the algorithm implemented within the MSR software for the automatic selection of A1 WDICs coordinates and the subsequent symmetry optimization. Furthermore, this protocol has proven to be as accurate as the ZICs one, but exploiting a more robust and black-box method for defining the minimal set of non-redundant internal coordinates. As stated above, the inclusion of symmetry directly in the definition of non-redundant internal coordinates has allowed to perform the optimization without the set-up of a user-defined Z-matrix, as well as avoiding the inclusion of equality constraints in the procedure. To the best of our knowledge, this is a unique feature of the MSR program.

25

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

4.2

Predicate observations

As anticipated, the method of predicate observations can be used whenever the number of isotopic substitutions is not sufficiently large to characterize all internal degrees of freedom. Test and validation of this feature of MSR has been performed selecting three molecules, fluorohydroxyborane (Figure 3a), methanimine (Figure 3b) and nitric acid (Figure 3c), for which detailed literature data are available. 58,67,69

Figure 3: Molecular structures of fluorohydroxyborane (a), methanimine (b) and nitric acid (c).

In all cases, the semi-experimental equilibrium structure has been first determined by employing only the semi-experimental moments of inertia. Next, the set of input data has been augmented through the inclusion of one or more predicate observations, thus improving the quality of the fits and hence of the retrieved structural parameters. The semi-experimental equilibrium structure of fluorohydroxyborane has been derived by Vogt and co-workers, 67 employing a set of ten isotopologues. The experimental rotational constants 82 have been refined through vibrational corrections computed through the second-order MøllerPlesset perturbation theory 83 (MP2) in conjunction with the cc-pVTZ basis set. As a starting point, the fit of structural parameters has been carried out in the absence of predicate observations. Next, due to the presence of internal coordinates affected by large uncertainties (in particular the a(H−B−F) angle), predicate observations for the bond length r(B−H2) and the bond angle a(H2−B−F) have been introduced. In this work, an analogous analysis has been carried out, where the weights of semi-experimental moments of inertia have been recomputed at each iteration in an 26

ACS Paragon Plus Environment

Page 26 of 49

Page 27 of 49 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

IRLS calculation, exploiting the biweight weighting. A comparison between two sets of structural parameters after the inclusion of predicate observations is presented in Table 4. Table 4: reSE geometry of fluorohydroxyborane. Distances in Å, angles in degrees. Present worka Coordinate

Fit 1c

Fit 2d

r(B−H2) r(B−F) r(B−O) r(O−H1) a(H2−B−F) a(F−B−O) a(B−O−H1)

1.1900(1) 1.3206(7) 1.3451(7) 0.9579(4) 118.7(4) 117.19(1) 112.67(3)

1.1899(1) 1.3192(3) 1.3463(3) 0.9585(3) 119.37(9) 117.20(1) 112.65(3)

Reference worka,b Fit 2d Conf. Int.e 2 6 6 6 19 1 6

Fit 2d 1.1900(1) 1.3200(3) 1.3464(3) 0.9584(3) 119.38(10) 117.20(1) 112.65(3)

a) All fits have been performed on moments of inertia, whose weights have been updated at each iteration. Standard deviations have been rounded to 1·10−4 for lengths and 1·10−2 for angles if smaller than these values. b) Ref. 67 c) Fit performed without using any predicate observation. d) r(B−H2) and a(H2−B−F) used as predicate observations (values reported in Table S7 of the SI). e) Confidence intervals at a 95% confidence level.

Detailed analyses of the semi-experimental structures of phosphaethene and methanimine have been carried out by Margulés and co-workers. 69 In this context, methanimine is characterized by a semi-experimental structure less accurate than that of phosphaethene, the condition number being sufficiently large and the two bond lengths r(C−H) strongly correlated. As specified in Ref., 69 this is basically due to the absence of experimental data associated to separate deuterations. Moreover, it has been pointed out that for this molecule ab initio values of the bond lengths r(C−H) and angles a(N−C−H) are more accurate than the semi-experimental ones. As a consequence, ab initio values of such coordinates have been included in the optimization as predicate observations with the associate weights. The semi-experimental rotational constants of six isotopologues used in the fit have been obtained by adding to the experimental values through vibrational and electronic corrections. For this purpose, the anharmonic force field has been computed at MP2/VTZ level, while elements of g tensor at B3LYP/AVTZ level. The same data set has been employed in the present work adopting the Huber weighting criterion. 5 A detailed comparison of the data obtained with those presented in Ref. 69 is given in Table 5.

27

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 28 of 49

Table 5: reSE geometry of methanimine. Distances in Å, angles in degrees. Present worka Coordinate

Fit 1c

Fit 2d

r(C−N) r(N−H3) r(C−H2) r(C−H1) a(H−N−C) a(H2−C−N) a(H1−C−N)

1.2709(1) 1.0195(3) 1.084(3) 1.092(3) 110.35(4) 119.3(5) 123.7(5)

1.2709(1) 1.0193(2) 1.0861(7) 1.0896(7) 110.33(1) 118.8(1) 124.1(1)

Reference worka,b Fit 2d Conf. Int.e 1 4 13 13 3 2 2

Fit 2d 1.2709(1) 1.0192(2) 1.0861(4) 1.0897(4) 110.33(1) 118.76(7) 124.18(7)

a) All fits have been performed on moments of inertia, whose weights have been updated at each iteration. Standard deviations and confidence intervals have been rounded to 1·10−4 for lengths and 1·10−2 for angles if smaller than these values. b) Ref. 69 c) Fit performed without using any predicate observation. d) r(C−H1), r(C−H2), a(H1−C−N), a(H2−C−N) used as predicate observations (values reported in Table S7 of the SI). e) Confidence intervals at 95% confidence level.

The study of nitric acid, carried out by Gutle and co-workers, 58 has been accomplished by means of eight isotopologues, whose experimental rotational constants 84–86 have been refined through vibrational corrections computed at CCSD(T) 87,88 level of theory, i.e. the coupled-cluster approach including single and double excitations augmented by a perturbative treatment of triple excitations. In addition, electronic corrections have been determined with the experimental values of the rotational g tensor taken from Ref. 89 In order to overcome the issues related to the too large condition number and the too small r(O−H) bond length, a predicate for such coordinate has been included in the calculation. In a second place, an additional set of five predicate observations (for more details see Table S7 of the SI) has been used to demonstrate the stability of the results. In the present work a parallel study has been performed, with equally weighted moments of inertia. The results are reported in Table 6 together with the values given in Ref. 58

28

ACS Paragon Plus Environment

1.206(1) 1.1957(5) 1.3983(8) 0.952(7) 115.85(2) 113.9(1) 103.1(4)

r(N−O3) r(N−O2) r(N−O1) r(O1−H) a(O1−N−O3) a(O1−N−2) a(N−O1−H) 1.2084(3) 1.1948(3) 1.3965(3) 0.9683(4) 115.83(2) 114.14(2) 102.22(2)

Fit 2d 5 6 6 9 5 4 5

Fit 2d Conf. Int. 1.2085(3) 1.1944(3) 1.3968(4) 0.9683(5) 115.80(3) 114.15(2) 102.22(3)

Fit 3e 6 7 8 10 6 5 6

Fit 3e Conf. Int.

a) All fits have been performed on moments of inertia weighted according to their assigned uncertainties. b) Ref. 58 c) Fit performed without any predicate observation. d) r(O−H) used as predicate observation (value reported in Table S7 of the SI). e) Predicate observations used for all structural parameters except for r(N-O) (values reported in Table S7 of the SI).

Fit 1c

Coordinate

Present worka

1.2084(3) 1.1948(3) 1.3965(3) 0.9683(4) 115.83(2) 114.14(2) 102.22(2)

Fit 2d

1.2085(3) 1.1944(3) 1.3968(3) 0.9683(5) 115.80(3) 114.15(2) 102.22(3)

Fit 3e

Reference workb

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Table 6: reSE geometry of nitric acida . Distances in Å, angles in degrees.

Page 29 of 49 Journal of Chemical Theory and Computation

ACS Paragon Plus Environment

29

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 30 of 49

As it can be appreciated from Tables 4, 5 and 6 the results obtained by using the method of predicate observations implemented within MSR exactly reproduce the reference values taken from the literature. In fact, all geometrical parameters are equivalent within their standard deviations, the only exception is provided by the r(B−F) bond length of fluorohydroxyborane. Nevertheless, it should be noted that a more realistic estimate of the error comes from confidence intervals. For this reason, Tables 4-6 report also the 95% confidence intervals, as obtained from the extended error analysis performed by MSR. As expected, when confidence intervals are considered, also the r(B−F) bond lengths reconcile within their errors. Finally, a new application of the method of predicate observations has been performed, namely the Ip conformer of glycine, depicted in Figure 4.

Figure 4: Molecular structure of the Ip conformer of glycine

The determination of the equilibrium molecular structure of this system has been object of several studies. 90–92 In this context, the best-estimated ab initio geometry has been first derived by employing the bestCC scheme, 93,94 well known to provide highly accurate results. Within this method, the quantum-mechanical geometry optimization is carried out by using the following expression for the gradient, dECBS+CV dE ∞ (HF-SCF) d∆E ∞ (CCSD(T)) d∆E(CV) = + + dx dx dx dx

(32)

where dE ∞ (HF-SCF)/dx and d∆E ∞ (CCSD(T))/dx are the gradient terms associated with the e−Cn extrapolation scheme for HF-SCF, 95 and n−3 extrapolation expression for the CCSD/(T) 30

ACS Paragon Plus Environment

Page 31 of 49 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

contribution. 96 Core-correlation effects are taken into account by the term d∆E(CV)/dx. In the second place, a partial semi-experimental equilibrium geometry has been determined, non-determinable structural parameters being kept fixed at the corresponding best-estimated values. In fact, the number of available isotopic species is not sufficient to characterize all internal degrees of freedom, making the use of an alternative strategy mandatory. The most refined semiexperimental structure determined up to now 90 has been obtained through a three-step calculation exploiting an interplay between fixed and relaxed parameters. For the purpose, the experimental moments of inertia of five isotopologues 97 corrected through vibrational contributions evaluated by employing the B3LYP functional in conjunction with the double-ζ SNSD basis set 98,99 have been used. In the present work, the semi-experimental equilibrium structure has been determined by using both the same Z-matrix as Ref. 90 and predicate observations. For the latter, the starting set of semi-experimental moments of inertia has been augmented with the inclusion of the whole set of theoretical parameters as predicate observations. The SE structure has been obtained through a weighted least-squares calculation, in which the weights of the semi-experimental moments of inertia and predicate observations have been defined on the basis of experimental and assigned uncertainties, respectively. For predicates at bestCC level the uncertainties of bond lengths and valence bond angles have been taken to be 0.001 Å and 0.1 degrees on the basis of literature information. 90,91 With the aim to confirm the numerical stability of the structural parameters in terms of the guess geometry, an additional fit has been performed, where the SE structure determined by Kasalová et al. 100 has been used as a guess. It should be pointed out that these fits have been carried out by employing a very high level of electronic correlation for obtaining predicate values. Clearly, this is not always possible, due to the unfavorable computational cost of coupled cluster methods. With the aim to develop a protocol free from these restrictions, the analysis has also been performed by using predicate observations as well as guess geometrical parameters computed at both B2PLYP/VTZ (see Fits 4-6 of Table 7) and B3LYP/SNSD (see Fits 7-9 of Table 7) levels, the theoretical geometries being computed by means of the quantum chemistry package GAUSSIAN. 101 As it can be seen, the use of structural parameters at these levels of theory yield badly

31

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

determined parameters, in particular at B3LYP/SNSD level. In order to improve the structural parameters employed in the refinement, the linear regression approach (LRA) 18,19 has been adopted for correcting the DFT geometries. Within this approach, the theoretical equilibrium geometry re is corrected through a contribution ∆r, obtained from the linear regression of the fitting of semiexperimental structural parameters versus the corresponding theoretical ones. As a first step, the application of the LRA has been limited to bond distances and then, it has been extended to to those bond angles for which literature data are available. 18 The groups of structural parameters are reported and compared in Table 7.

32

ACS Paragon Plus Environment

Page 32 of 49

1.348(3) 1.203(5) 0.9645f ix 1.513(4) 1.443(1) 1.0908(6) 1.0104(9) 123.00(3) 111.5(4) 115.3(1) [107.36] [110.10] 106.64f ix [123.21] [57.43]

Fit 1 1.349(1) 1.203(1) 0.965(2) 1.5133(7) 1.4429(9) 1.0907(3) 1.011(2) 123.03(6) 111.44(8) 115.28(5) 107.39(4) 109.9(2) 106.7(2) 123.20(4) 57.4(2)

Fit 3

1.349(1) 1.203(1) 0.965(2) 1.5133(7) 1.4429(9) 1.0907(3) 1.011(2) 123.03(6) 111.44(8) 115.28(5) 107.39(4) 109.9(2) 106.7(2) 123.20(4) 57.4(2)

Fit 2

d,f

d,e

1.350(3) 1.203(2) 0.968(7) 1.513(1) 1.442(2) 1.0907(3) 1.011(7) 123.1(1) 111.5(2) 115.3(1) 107.38(5) 109.7(6) 106.6(7) 123.21(4) 57.2(6)

Fit 4

g

1.349(2) 1.203(2) 0.964(3) 1.513(1) 1.443(2) 1.0907(3) 1.010(3) 123.0(1) 111.5(2) 115.3(1) 107.38(5) 109.9(6) 106.6(8) 123.20(4) 57.4(5)

Fit 5

h

Present worka

1.349(2) 1.203(2) 0.964(3) 1.513(1) 1.443(2) 1.0907(3) 1.010(3) 123.0(1) 111.5(2) 115.3(1) 107.38(5) 109.9(6) 106.7(8) 123.20(4) 57.4(5)

Fit 6

i

1.347(3) 1.203(3) 0.97(2) 1.513(1) 1.442(3) 1.0907(3) 1.02(1) 122.3(2) 111.5(2) 115.3(2) 107.37(5) 110.2(8) 107.1(8) 123.21(4) 57(1)

Fit 7

l

1.348(3) 1.204(2) 0.963(6) 1.513(1) 1.442(2) 1.0907(3) 1.011(6) 123.0(1) 111.5(2) 115.3(1) 107.37(5) 110.0(7) 107.1(7) 123.22(4) 57.4(6)

Fit 8

m

1.348(3) 1.204(2) 0.963(6) 1.513(1) 1.442(2) 1.0907(3) 1.011(6) 123.0(1) 111.5(2) 115.3(1) 107.37(5) 110.0(7) 106.6(8) 123.22(4) 57.4(6)

Fit 9

1.3476 1.2021 0.9645 1.5128 1.4430 1.0903 1.0109 123.05 111.35 115.25 107.49 109.86 106.64 123.18 57.93

bestCC

Reference workb

a) reSE geometry of the Ip conformer of glycine. The 95% confidence intervals on the geometrical parameters are reported within parentheses. b) Refs. 90,91 c) Fit performed by following the same protocol as Ref. 90 d) Fit performed through the inclusion of the bestCC structure as a set of predicate observations. The uncertainties assigned to bond distances, bond angles and dihedral angles are equal to 0.001 Å, 0.1◦ , and 1.0◦ , respectively. e) Guess geometry set equal to the bestCC structure presented in Refs. 90,91 f) Guess geometry set equal to the SE structure presented in Ref. 100 g) Fit performed by using the B2PLYP/VTZ structure (see Table S8 of the SI) as guess geometry and predicate observations. The uncertainties assigned to bond distances, bond angles and dihedral angles are equal to 0.005 Å, 0.5◦ and 2.0◦ , respectively. h) Fit performed by using the B2PLYP/VTZ structure (see Table S8 of the SI) as guess geometry and predicate observations. Bond distances corrected by LRA. The uncertainties assigned to bond distances, bond angles and dihedral angles are equal to 0.002 Å, 0.5◦ and 2.0◦ , respectively. i) Fit performed by using the B2PLYP/VTZ structure (see Table S8 of the SI) as guess geometry and predicate observations. Bond distances and bond angles corrected by LRA. The uncertainties assigned to bond distances, bond angles and dihedral angles are equal to 0.002 Å, 0.5◦ and 2.0◦ , respectively. l) Fit performed by using the B3LYP/SNSD structure (see Table S8 of the SI) as guess geometry and predicate observations. The uncertainties assigned to bond distances, bond angles and dihedral angles are equal to 0.01 Å, 0.5◦ and 2.0◦ , respectively. m) Fit performed by using the B3LYP/SNSD structure (see Table S8 of the SI) as guess geometry and predicate observations. Bond distances corrected by LRA. The uncertainties assigned to bond distances, bond angles and dihedral angles are equal to 0.004 Å, 0.5◦ and 2.0◦ , respectively. n) Fit performed by using the B3LYP/SNSD structure (see Table S8 of the SI) as guess geometry and predicate observations. Bond distances and bond angles corrected by LRA. The uncertainties assigned to bond distances, bond angles and dihedral angles are equal to 0.004 Å, 0.5◦ and 2.0◦ , respectively.

r(C2−O2) r(C2−O1) r(O3−H5) r(C1−C2) r(C1−N) r(C1−H3) r(N−H1) a(O1−C2−O2) a(C1−C2−O2) a(N−C1−C2) a(H3−C1−C2) a(H1−N−C1) a(H5−O2−C2) d(H4−C1−C2−O1) d(H2−N−C1−C2)

Coordinate

c

n

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Table 7: Equilibrium geometry of the Ip conformer of glycine. Distances in Å, angles in degrees.

Page 33 of 49 Journal of Chemical Theory and Computation

ACS Paragon Plus Environment

33

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The structure obtained from Fit 1 exactly reproduces the outcomes of Ref. 90 Despite the good results achieved, the r(O3−H5) bond length and a(H5−O2−C2) bond angle are fixed to the corresponding bestCC values and the angles within square brackets have been determined through fits performed in several consecutive steps. Conversely, in the fit performed exploiting predicates, all the structural parameters of glycine Ip have been refined in a single step on the basis of a very chemically intuitive Z-matrix. As it is evident from Table 7, the retrieved parameters are very well determined thus highlighting the good quality of the fit. Furthermore the confidence intervals are improved by use of predicate observations. The retrieved structural parameters are also in good agreement with those evaluated at the bestCC level which, as stated above, is expected to deliver bond lengths and bond angles accurate to 0.001 Å and 0.1 degrees, respectively. As a further remark, it should be noted that the SE structure is not affected by the choice of the guess moving from Fit 2 to Fit 3. The fits performed by employing both guess geometry and predicate observations at DFT level show a good agreement with the previous ones in terms of structural parameters. However, as it can be appreciated from Fits 4 and 7 the uncertainties are usually significantly larger than those of Fits 2 and 3. This behavior may be due to the reduced accuracy of DFT geometries. In fact, by adopting the LRA for correcting both the guess structure and the predicates, an improvement can be noticed (see Fits 5, 6, 8 and 9 of Table 7). At this point it is worth of comment the effect about the choice of bond angles. As reported in Ref., 18 the correction terms associated with bond angles are less accurate than those of bond lengths. For this reason, the refinements have been carried out by applying the LRA only to bond lengths obtaining, in general, the same results as those resulting from the use of the LRA corrections to both bond lengths and angles. Even though structures from Fits 2 and 3 are the most accurate ones, DFT geometries corrected by the LRA appear a valid alternative to the use of guess structures from computationally demanding coupled cluster computations. In fact, inspection of Table 7 reveals that the SE structures obtained in Fits 6 and 9 are in close agreement with that of Fit 2 with confidence intervals that, in general, are almost doubled. Furthermore, it should be noted that the LRA is employed in the context of predicate

34

ACS Paragon Plus Environment

Page 34 of 49

Page 35 of 49 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

observations, and hence a more systematic analysis is deserved in the near future. In any case, it is worth noticing that this is the first determination of glycine Ip structure performed without fixing any structural degree of freedom. A final comment concerning the use of predicate observations is probably required at this point. This method represents an invaluable tool of modern structural determinations that allows the lack of data due to missing isotopic substitutions to be overcame. However, it has to be applied judiciously. In fact it appears quite sensitive to the number and the weights of predicates, that if not correctly chosen may lead to bias in the retrieved structure. Under this point of view, a key issue is a coherent selection of weights for data that comes from different sources. For example, in the case of glycine the weights of dihedral angles were first set equal to those of bond angles. The resulting structure was characterized by some confidence intervals significantly larger than those from Fit 1. For this reason and due to the lack of sufficient information in the literature concerning uncertainties of dihedral angles, the uncertainties of d(H4−C1−C2−O1) and d(H2−N−C1−C2) were increased up to 1.0 degrees. This choice is well justified by the fact that uncertainties of dihedral angles are generally larger than those of bond angles. In addition, the tests carried out by using DFT geometries, and their LRA counterparts, have shown that while geometrical parameters converge to values in agreement with those from Fits 1-3, the same is not true for the uncertainties thus underlining that the method of predicates strongly relies on the proper choice of the additional data. For these reasons, the outcomes of a geometry optimization exploiting the method of predicate should be properly checked, for example by performing different tests changing the number, the value and the weights of the predicates.

5

CONCLUSION

In this work, the new MSR software for the computation of molecular equilibrium geometries through the semi-experimental approach has been presented and extensively validated. In the first section, a brief overview of the semi-experimental approach has been proposed, as

35

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

well as the strategy to obtain the structural parameters through a least-squares fitting procedure. Next, the implementation of this theoretical framework in a new module of the VMS software has been discussed in some detail. The program has been designed to be robust and user-friendly at different levels. The MSR software presents a number of features that make it a unique tool for molecular structure refinements. First, it is a three-function program, being it able to perform in addition to the structural optimization a characterization of the geometry or a monodimensional scan of the residual function along a specific internal coordinate. Alongside the semi-experimental approach, the program can be employed for determining effective (r0 ) or substitution (rs ) structures, though their use should be discouraged in modern times. The program has been equipped with a flexible choice of the optimization algorithm together with an appropriate treatment of geometrical constraints, and different reweighting schemes. It also features an extensive error treatment, including the analysis of leverages, ill-conditioning and variance-covariance matrix. In addition to the standard deviations, 95% confidence intervals are evaluated, since they represent more realistic estimates of the errors affecting geometrical parameters. Particular attention has been devoted to the protocol for defining automatically the minimum set of non-redundant internal coordinates employable in the fit. The standard methodology, based on the use of ZICs, has proven to be often disadvantageous when complex topologies are studied, in addition to being user-dependent. In order to overcome all the issues connected with this selection route of internal coordinates, a different procedure has been devised, based on the extraction of all A1 coordinates from a set of symmetry internal coordinates. In this respect, it has to be stressed that this approach, which is particularly useful when symmetric molecules are considered, is completely automatic and it does not require any intervention from the user. Our results suggest that this new black-box scheme compares favorably with properly built Z-matrices both in terms of accuracy and computational cost. To the best of our knowledge this is the first time that such a procedure is exploited for molecular structure refinement, and included among the available options of a dedicated software. In addition, the MSR program has been equipped with the possibility of including predicate

36

ACS Paragon Plus Environment

Page 36 of 49

Page 37 of 49 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

observations in the fit. This approach allows the set of observations to be augmented through estimates of structural parameters. This part of the code has been successfully validated by a direct comparison of our results with detailed literature studies, and a new application to glycine has been presented. In addition to the wide range of computational features, the user-friendliness of MSR is guaranteed by an intuitive graphical interface from which it can be completely controlled and where results are promptly displayed. Some final comments concerning possible future developments of the MSR module of VMS are worth of mention. Altough designed to deal with molecular structures obtained from the analysis of rotational spectra, its modularity and flexibility of implementation make it expandable to other spectroscopies (e.g. electron diffraction) with only a modest programming effort. Finally, the availability of a dedicated input interface makes it possible, for example, to import structures from available structural databases. In this respect, the method of predicate observations can be coupled to the template approach, based on the use of structural synthons, as a further step toward the determination of accurate molecular structures for systems of increasing size at affordable computational costs.

Supporting information The Supporting Information is available free of charge on the ACS Publications website at DOI: Tables S1, S2 and S3 report the definitions of A1 -WDICs of diacetylene, ketene and benzene in terms of redundant internal coordinates. Tables S4, S5 and S6 describe the structures of Zmatrices employed to derive the semi-experimental equilibrium structures of diacetylene, ketene and benzene. Table S7 reports the values of predicate observations and assigned uncertainties used for fluorohydroxyborane, methenammine and nitric acid. Table S8 reports the theoretical equilibrium geometry of the Ip conformer of glycine computed at both B2PLYP/VTZ and B3LYP/SNSD levels.

37

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Acknowledgement The research leading to these results has received funding from the European Research Council under the European Union’s Seventh Framework Programme (FP/2007-2013) / ERC Grant Agreement n. [320951].

References (1) Brémond, E.; Savarese, M.; Su, N. Q.; Pérez-Jiménez, A. J.; Xu, X.; Sancho-García, J. C.; Adamo, C. Benchmarking density functionals on structural parameters of small-/mediumsized organic molecules. J. Chem. Theory Comput. 2016, 12, 459–465. (2) Peverati, R.; Truhlar, D. G. Quest for a universal density functional: the accuracy of density functionals across a broad spectrum of databases in chemistry and physics. Phil. Trans. R. Soc. A 2014, 372, 20120476. (3) Grimme, S.; Steinmetz, M. Effects of London dispersion correction in density functional theory on the structures of organic molecules in the gas phase. Phys. Chem. Chem. Phys. 2013, 15, 16031–16042. (4) D. Papousek, M. A. Molecular Structure; Elsevier Scientific Pub Co, 1982; pp 236–241. (5) Demaison, J.; Boggs, J.; Csaszar, A. Equilibrium Molecular Structures: From Spectroscopy to Quantum Chemistry; CRC Press, 2016. (6) Domenicano, A., Hargittai, I., Eds. Accurate molecular structures. Their determination and importance; Oxford University Press: New York, NY, 1992. (7) Born, M.; Oppenheimer, R. Zur quantentheorie der molekeln. Ann. Phys. (Berlin, Ger.) 1927, 389, 457–484.

38

ACS Paragon Plus Environment

Page 38 of 49

Page 39 of 49 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(8) Jensen, P.; Bunker, P. In Computational molecular spectroscopy; Jensen, P., Bunker, P., Eds.; Wiley: New York, NY, 2000; pp 3–12. (9) Demaison, J. Experimental, semi-experimental and ab initio equilibrium structures. Mol. Phys. 2007, 105, 3109–3138. (10) D. Papousek, M. A. Molecular vibrational-rotational spectra: theory and applications of high resolution infrared, microwave, and Raman spectroscopy of polyatomic molecules; Elsevier Scientific Pub Co, 1982. (11) Watson, J. K. Simplification of the molecular vibration-rotation Hamiltonian. Mol. Phys. 1968, 15, 479–490. (12) Aliev, M.; Watson, J.; Rao, K. N. Molecular spectroscopy: modern research, Vol. III; 1985; pp 1–67. (13) Bartell, L.; Romanesko, D.; Wong, T.; Sims, G.; Sutton, L. Augmented Analyses: Method of Predicate Observations. Chemical Society Specialist Periodical Report No. 20: Molecular Structure by Diffraction Methods 1975, 3, 72–79. (14) Demaison, J.;

Craig, N. C.;

Cocinero, E. J.;

Grabow, J.-U.;

Lesarri, A.;

Rudolph, H. D. Semiexperimental equilibrium structures for the equatorial conformers of N-methylpiperidone and tropinone by the mixed estimation method. J. Phys. Chem. A 2012, 116, 8684–8692. (15) Pulay, P.; Meyer, W.; Boggs, J. E. Cubic force constants and equilibrium geometry of methane from Hartree–Fock and correlated wavefunctions. J. Chem. Phys. 1978, 68, 5077– 5085. (16) Pawłowski, F.; Jørgensen, P.; Olsen, J.; Hegelund, F.; Helgaker, T.; Gauss, J.; Bak, K. L.; Stanton, J. F. Molecular equilibrium structures from experimental rotational constants and calculated vibration–rotation interaction constants. J. Chem. Phys. 2002, 116, 6482–6496. 39

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(17) Piccardo, M.; Penocchio, E.; Puzzarini, C.; Biczysko, M.; Barone, V. Semi-experimental equilibrium structure determinations by employing B3LYP/SNSD anharmonic force fields: validation and application to semirigid organic molecules. J. Phys. Chem. A 2015, 119, 2058–2082. (18) Penocchio, E.; Piccardo, M.; Barone, V. Semiexperimental Equilibrium Structures for Building Blocks of Organic and Biological Molecules: The B2PLYP Route. J. Chem. Theory Comput. 2015, 11, 4689–4707. (19) Penocchio, E.; Mendolicchio, M.; Tasinato, N.; Barone, V. Structural features of the carbon– sulfur chemical bond: a semi-experimental perspective. Can. J. Chem. 2016, 94, 1065–1076. (20) Allen, H. C.; Cross, P. C. Molecular vib-rotors; Wiley New York, 1963. (21) Flygare, W. Magnetic interactions in molecules and an analysis of molecular electronic charge distribution from magnetic parameters. Chem. Rev. 1974, 74, 653–687. (22) Gauss, J.; Ruud, K.; Helgaker, T. Perturbation-dependent atomic orbitals for the calculation of spin-rotation constants and rotational g tensors. J. Chem. Phys. 1996, 105, 2804–2812. (23) Licari, D.; Baiardi, A.; Biczysko, M.; Egidi, F.; Latouche, C.; Barone, V. Implementation of a graphical user interface for the virtual multifrequency spectrometer: The VMS-Draw tool. J. Comput. Chem. 2015, 36, 321–334. (24) Armijo, L. Minimization of functions having Lipschitz continuous first partial derivatives. Pacific J. Math. 1966, 16, 1–3. (25) Strutz, T. Data fitting and uncertainty: A practical introduction to weighted least squares and beyond; Vieweg and Teubner, 2010. (26) Bertsekas, D. P. Nonlinear programming; Athena scientific Belmont, 1999. (27) Fletcher, R. Practical methods of optimization; John Wiley & Sons, 2013. 40

ACS Paragon Plus Environment

Page 40 of 49

Page 41 of 49 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(28) Dennis, J. E., Jr; Moré, J. J. Quasi-Newton methods, motivation and theory. SIAM Rev. 1977, 19, 46–89. (29) Sherman, J.; Morrison, W. J. Adjustment of an inverse matrix corresponding to a change in one element of a given matrix. Ann. Math. Stat. 1950, 21, 124–127. (30) Press, W. H. Numerical recipes 3rd edition: The art of scientific computing; Cambridge university press, 2007. (31) Hamilton, L. Regression with graphics: a second course in applied statistics. Belmont, CA: Wadsworth. 1992. (32) Bakri, B.; Demaison, J.; Kleiner, I.; Margulès, L.; Møllendal, H.; Petitprez, D.; Wlodarczak, G. Rotational Spectrum, Hyperfine Structure, and Internal Rotation of Methyl Carbamate. J. Mol. Spectrosc. 2002, 215, 312–316. (33) Watson, J. K. Robust weighting in least-squares fits. J. Mol. Spectrosc. 2003, 219, 326–328. (34) Marquardt, D. W. An algorithm for least-squares estimation of nonlinear parameters. SIAM J. Appl. Math. 1963, 11, 431–441. (35) Bazaraa, M. S.; Sherali, H. D.; Shetty, C. M. Nonlinear programming: theory and algorithms; John Wiley & Sons, 2013. (36) Sen, A.; Srivastava, M. Regression analysis: theory, methods, and applications; Springer Science & Business Media, 2012. (37) Belsley, D.; Kuh, E. a Welsch RE: Regression Diagnostics. New York 1980, (38) Liao, D.; Valliant, R. Condition indexes and variance decompositions for diagnosing collinearity in linear model analysis of survey data. Surv. Methodol. 2012, 38, 189–202. (39) Golub, G. H.; Reinsch, C. Singular value decomposition and least squares solutions. Numer. Math 1970, 14, 403–420. 41

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(40) Wilson, E. B. A Method of Obtaining the Expanded Secular Equation for the Vibration Frequencies of a Molecule. J. Chem. Phys. 1939, 7, 1047–1052. (41) Baiardi, A.; Bloino, J.; Barone, V. General formulation of vibronic spectroscopy in internal coordinates. The Journal of chemical physics 2016, 144, 084114. (42) Ayala, P. Y.; Schlegel, H. B. Identification and treatment of internal rotation in normal mode vibrational analysis. J. Chem. Phys. 1998, 108, 2314–2325. (43) Wilson, E. B.; Decius, J. C.; Cross, P. C. Molecular Vibrations: The Theory of Infrared and Raman Vibrational Spectra; Dover Publications, 1980. (44) Pulay, P.; Fogarasi, G.; Pang, F.; Boggs, J. E. Systematic ab initio gradient calculation of molecular geometries, force constants, and dipole moment derivatives. J. Am. Chem. Soc. 1979, 101, 2550–2560. (45) Fogarasi, G.; Zhou, X.; Taylor, P. W.; Pulay, P. The calculation of ab initio molecular geometries: efficient optimization by natural internal coordinates and empirical correction by offset forces. J. Am. Chem. Soc. 1992, 114, 8191–8201. (46) Baker, J.; Kessi, A.; Delley, B. The generation and use of delocalized internal coordinates in geometry optimization. J. Chem. Phys. 1996, 105, 192–212. (47) Baker, J. Constrained optimization in delocalized internal coordinates. J. Comput. Chem. 1997, 18, 1079–1095. (48) Bakken, V.; Helgaker, T. The efficient optimization of molecular geometries using redundant internal coordinates. J. Chem. Phys. 2002, 117, 9160–9174. (49) Baker, J.; Kinghorn, D.; Pulay, P. Geometry optimization in delocalized internal coordinates: An efficient quadratically scaling algorithm for large molecules. J. Chem. Phys. 1999, 110, 4986–4991.

42

ACS Paragon Plus Environment

Page 42 of 49

Page 43 of 49 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(50) Paizs, B.; Fogarasi, G.; Pulay, P. An efficient direct method for geometry optimization of large molecules in internal coordinates. J. Chem. Phys. 1998, 109, 6571–6576. (51) Beruski, O.; Vidal, L. N. Algorithms for computer detection of symmetry elements in molecular systems. J. Comput. Chem. 2014, 35, 290–299. (52) Peng, C.; Ayala, P. Y.; Schlegel, H. B.; Frisch, M. J. Using redundant internal coordinates to optimize equilibrium geometries and transition states. J. Comput. Chem. 1996, 17, 49–56. (53) Pulay, P.; Fogarasi, G. Geometry optimization in redundant internal coordinates. J. Chem. Phys. 1992, 96, 2856–2860. (54) Swart, M.; Matthias Bickelhaupt, F. Optimization of strong and weak coordinates. Int. J. Quantum Chem. 2006, 106, 2536–2544. (55) Sim, G.; Sutton, L.; Bartell, L.; Romenesko, D.; Wong, T. Augmented analyses: Method of predicate observations. 1975, (56) Belsley, D. A. Conditioning diagnostics; Wiley Online Library, 1991. (57) Champion, J.; Robiette, A. G.; Mills, I.; Graner, G. Simultaneous analysis of the ν1 , ν4 , 2ν2 , ν2 + ν5 , and 2ν5 infrared bands of 12 CH3 F. J. Mol. Spectrosc. 1982, 96, 422–441. (58) Gutle, C.; Demaison, J.; Rudolph, H. Anharmonic force field and equilibrium structure of nitric acid. J. Mol. Spectrosc. 2009, 254, 99–107. (59) Merke, I.; Heineking, N.; Demaison, J. Rotational spectra of isotopic species of SO2 F2 : experimental and ab initio structures. J. Mol. Spectrosc. 2004, 228, 308–313. (60) Demaison, J. F.; Craig, N. C. Semiexperimental Equilibrium Structure for cis, trans-1, 4Difluorobutadiene by the Mixed Estimation Method. J. Phys. Chem. A 2011, 115, 8049– 8054.

43

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 44 of 49

(61) Demaison, J.; Craig, N. C.; Conrad, A. R.; Tubergen, M. J.; Rudolph, H. D. Semiexperimental equilibrium structure of the lower energy conformer of glycidol by the mixed estimation method. J. Phys. Chem. A 2012, 116, 9116–9122. (62) Demaison, J.;

Craig, N. C.;

Cocinero, E. J.;

Grabow, J.-U.;

Lesarri, A.;

Rudolph, H. D. Semiexperimental equilibrium structures for the equatorial conformers of N-methylpiperidone and tropinone by the mixed estimation method. J. Phys. Chem. A 2012, 116, 8684–8692. (63) Craig, N. C.; Chen, Y.; Fuson, H. A.; Tian, H.; van Besien, H.; Conrad, A. R.; Tubergen, M. J.; Rudolph, H. D.; Demaison, J. Microwave spectra of the deuterium isotopologues of cis-hexatriene and a semiexperimental equilibrium structure. J. Phys. Chem. A 2013, 117, 9391–9400. (64) Demaison, J.; Rudolph, H. D.; Császár, A. G. Deformation of the benzene ring upon fluorination: equilibrium structures of all fluorobenzenes. Mol. Phys. 2013, 111, 1539–1562. (65) Demaison, J.; Jahn, M. K.; Cocinero, E. J.; Lesarri, A.; Grabow, J.-U.; Guillemin, J.-C.; Rudolph, H. D. Accurate semiexperimental structure of 1, 3, 4-oxadiazole by the mixed estimation method. J. Phys. Chem. A 2013, 117, 2278–2284. (66) Demaison, J. Experimental, semi-experimental and ab initio equilibrium structures. Mol. Phys. 2007, 105, 3109–3138. (67) Vogt, N.; Demaison, J.; Vogt, J.; Rudolph, H. D. Why it is sometimes difficult to determine the accurate position of a hydrogen atom by the semiexperimental method: Structure of molecules containing the OH or the CH3 group. J. Comput. Chem. 2014, 35, 2333–2342. (68) Demaison, J.; Craig, N. C.; Groner, P.; Écija, P.; Cocinero, E. J.; Lesarri, A.; Rudolph, H. D. Accurate Equilibrium Structures for Piperidine and Cyclohexane. J. Phys. Chem. A 2014, 119, 1486–1493.

44

ACS Paragon Plus Environment

Page 45 of 49 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(69) Margulès, L.; Demaison, J.; Sreeja, P.; Guillemin, J.-C. Submillimeterwave spectrum of CH2 PH and equilibrium structures of CH2 PH and CH2 NH. J. Mol. Spectrosc. 2006, 238, 234–240. (70) Thorwirth, S.; Harding, M. E.; Muders, D.; Gauss, J. The empirical equilibrium structure of diacetylene. J. Mol. Spectrosc. 2008, 251, 220–223. (71) East, A. L.; Allen, W. D.; Klippenstein, S. J. The anharmonic force field and equilibrium molecular structure of ketene. J. Chem. Phys. 1995, 102, 8506–8532. (72) Hollenstein, H.; Piccirillo, S.; Quack, M.; Snels, M. High-resolution infrared spectrum and analysis of the ν11 , A2u (B2 ) fundamental band of 12 C6 H6 and 13 C12 C5 H6 . Mol. Phys. 1990, 71, 759–768. (73) Pliva, J.; Johns, J.; Goodman, L. Infrared bands of isotopic benzenes: ν13 of 13 C6 D6 and ν12 of 13 C6 H6 . J. Mol. Spectrosc. 1990, 140, 214–225. (74) Pliva, J.; Johns, J.; Goodman, L. Infrared bands of isotopic benzenes: ν13 and ν14 of 13 C6 D6 . J. Mol. Spectrosc. 1991, 148, 427–435. (75) Grimme, S. Semiempirical hybrid density functional with perturbative second-order correlation. J. Chem. Phys. 2006, 124, 034108. (76) Neese, F.; Schwabe, T.; Grimme, S. Analytic derivatives for perturbatively corrected “double hybrid” density functionals: theory, implementation, and applications. J. Chem. Phys. 2007, 126, 124115. (77) Biczysko, M.; Panek, P.; Scalmani, G.; Bloino, J.; Barone, V. Harmonic and anharmonic vibrational frequency calculations with the double-hybrid B2PLYP method: analytic second derivatives and benchmark studies. J. Chem. Theory Comput. 2010, 6, 2115–2125. (78) Woon, D. E.; Dunning Jr, T. H. Gaussian basis sets for use in correlated molecular calculations. III. The atoms aluminum through argon. J. Chem. Phys. 1993, 98, 1358–1371. 45

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(79) Lee, C.; Yang, W.; Parr, R. G. Development of the Colle-Salvetti correlation-energy formula into a functional of the electron density. Phys. Rev. B 1988, 37, 785. (80) Becke, A. D. Becke’s three parameter hybrid method using the LYP correlation functional. J. Chem. Phys. 1993, 98, 5648–5652. (81) Kendall, R. A.; Dunning Jr, T. H.; Harrison, R. J. Electron affinities of the first-row atoms revisited. Systematic basis sets and wave functions. J. Chem. Phys. 1992, 96, 6796–6806. (82) Kawashima, Y.; Takeo, H.; Matsumura, C. Structure of metastable molecules-the microwave-spectra of BH(OH)2 and BHF(OH). Nippon Kagaku Kaishi 1986, 1465–1475. (83) Møller, C.; Plesset, M. S. Note on an approximation treatment for many-electron systems. Phys. Rev. 1934, 46, 618–622. (84) Petkie, D. T.; Helminger, P.; Butler, R. A.; Albert, S.; De Lucia, F. C. The millimeter and submillimeter spectra of the ground state and excited ν9 , ν8 , ν7 , and ν6 vibrational states of HNO3 . J. Mol. Spectrosc. 2003, 218, 127–130. (85) Drouin, B. J.; Miller, C. E.; Fry, J. L.; Petkie, D. T.; Helminger, P.; Medvedev, I. R. Submillimeter measurements of isotopes of nitric acid. J. Mol. Spectrosc. 2006, 236, 29–34. (86) Cox, A.; Ellis, M.; Attfield, C.; Ferris, A. Microwave spectrum of DNO3 , and average structures of nitric and nitrous acids. J. Mol. Struct. 1994, 320, 91–106. (87) Raghavachari, K.; Trucks, G. W.; Pople, J. A.; Head-Gordon, M. A fifth-order perturbation comparison of electron correlation theories. Chem. Phys. Lett. 1989, 157, 479–483. (88) Purvis III, G. D.; Bartlett, R. J. A full coupled-cluster singles and doubles model: The inclusion of disconnected triples. The Journal of Chemical Physics 1982, 76, 1910–1918. (89) Albinus, L.; Spieckermann, J.; Sutter, D. The electric field gradient tensor and the tensors of the molecular magnetic susceptibility and the molecular electric quadrupole moment in 46

ACS Paragon Plus Environment

Page 46 of 49

Page 47 of 49 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

nitric acid: a high-resolution rotational Zeeman effect study. J. Mol. Spectrosc. 1989, 133, 128–147. (90) Barone, V.; Biczysko, M.; Bloino, J.; Puzzarini, C. Glycine conformers: a never-ending story? Phys. Chem. Chem. Phys. 2013, 15, 1358–1363. (91) Barone, V.; Biczysko, M.; Bloino, J.; Puzzarini, C. The performance of composite schemes and hybrid CC/DFT model in predicting structure, thermodynamic and spectroscopic parameters: the challenge of the conformational equilibrium in glycine. Phys. Chem. Chem. Phys. 2013, 15, 10094–10111. (92) Barone, V.; Biczysko, M.; Bloino, J.; Puzzarini, C. Characterization of the elusive conformers of glycine from state-of-the-art structural, thermodynamic, and spectroscopic computations: Theory complements experiment. J. Chem. Theory Comput. 2013, 9, 1533–1547. (93) Puzzarini, C.; Stanton, J. F.; Gauss, J. Quantum-chemical calculation of spectroscopic parameters for rotational spectroscopy. Int. Rev. Phys. Chem. 2010, 29, 273–367. (94) Puzzarini, C. Accurate molecular structures of small-and medium-sized molecules. International Journal of Quantum Chemistry 2016, 116, 1513–1519. (95) Feller, D. The use of systematic sequences of wave functions for estimating the complete basis set, full configuration interaction limit in water. J. Chem. Phys. 1993, 98, 7059–7071. (96) Helgaker, T.; Klopper, W.; Koch, H.; Noga, J. Basis-set convergence of correlated calculations on water. J. Chem. Phys. 1997, 106, 9639–9646. (97) Godfrey, P. D.; Brown, R. D. Shape of Glycine. J. Am. Chem. Soc. 1995, 117, 2019–2023. (98) Carnimeo, I.; Puzzarini, C.; Tasinato, N.; Stoppa, P.; Charmet, A. P.; Biczysko, M.; Cappelli, C.; Barone, V. Anharmonic theoretical simulations of infrared spectra of halogenated organic compounds. J. Chem. Phys. 2013, 139, 074310.

47

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(99) Barone, V.; Biczysko, M.; Bloino, J. Fully anharmonic IR and Raman spectra of mediumsize molecular systems: accuracy and interpretation. Phys. Chem. Chem. Phys. 2014, 16, 1759–1787. (100) Kasalová, V.; Allen, W. D.; Schaefer, H. F.; Czinki, E.; Császár, A. G. Molecular structures of the two most stable conformers of free glycine. J. Comput. Chem. 2007, 28, 1373–1383. (101) Frisch, M. J.; Trucks, G. W.; Schlegel, H. B.; Scuseria, G. E.; Robb, M. A.; Cheeseman, J. R.; Scalmani, G.; Barone, V.; Mennucci, B.; Petersson, G. A.; Nakatsuji, H.; Caricato, M.; Li, X.; Hratchian, H. P.; Bloino, J.; Janesko, B. G.; Izmaylov, A. F.; Lipparini, F.; Zheng, G.; Sonnenberg, J. L.; Liang, W.; Hada, M.; Ehara, M.; Toyota, K.; Fukuda, R.; Hasegawa, J.; Ishida, M.; Nakajima, T.; Honda, Y.; Kitao, O.; Nakai, H.; Vreven, T.; Throssell, K.; Montgomery Jr., J. A.; Peralta, J. E.; Ogliaro, F.; Bearpark, M.; Heyd, J. J.; Brothers, E.; Kudin, K. N.; Staroverov, V. N.; Keith, T.; Kobayashi, R.; Normand, J.; Raghavachari, K.; Rendell, A.; Burant, J. C.; Iyengar, S. S.; Tomasi, J.; Cossi, M.; Rega, N.; Millam, J. M.; Klene, M.; Knox, J. E.; Cross, J. B.; Bakken, V.; Adamo, C.; Jaramillo, J.; Gomperts, R.; Stratmann, R. E.; Yazyev, O.; Austin, A. J.; Cammi, R.; Pomelli, C.; Ochterski, J. W.; Martin, R. L.; Morokuma, K.; Zakrzewski, V. G.; Voth, G. A.; Salvador, P.; Dannenberg, J. J.; Dapprich, S.; Parandekar, P. V.; Mayhall, N. J.; Daniels, A. D.; Farkas, O.; Foresman, J. B.; Ortiz, J. V.; Cioslowski, J.; Fox, D. J. Gaussian Development Version, Revision I.03; Gaussian, Inc, Wallingford CT, 2015.

48

ACS Paragon Plus Environment

Page 48 of 49

Page 49 of 49 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

TOC graphic

49

ACS Paragon Plus Environment