Estimating Systematic Error and Uncertainty in Ab Initio

Aug 8, 2019 - Estimating Systematic Error and Uncertainty in Ab Initio Thermochemistry: I. Atomization Energies of Hydrocarbons in the ATOMIC(hc) Prot...
0 downloads 0 Views 593KB Size
Subscriber access provided by UNIVERSITY OF WISCONSIN-MILWAUKEE

Quantum Electronic Structure

Estimating Systematic Error and Uncertainty in Ab Initio Thermochemistry: I. Atomization Energies of Hydrocarbons in the ATOMIC(hc) Protocol Dirk Bakowies J. Chem. Theory Comput., Just Accepted Manuscript • DOI: 10.1021/acs.jctc.9b00343 • Publication Date (Web): 08 Aug 2019 Downloaded from pubs.acs.org on August 20, 2019

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 67 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Estimating Systematic Error and Uncertainty in Ab Initio Thermochemistry: I. Atomization Energies of Hydrocarbons in the ATOMIC(hc) Protocol Dirk Bakowies June 19, 2019

Institute of Physical Chemistry, Department of Chemistry, University of Basel, Klingelbergstr. 80, CH 4056 Basel, Switzerland, email: [email protected]

First revision

Keywords bond separation reaction, ATOMIC protocol, composite model, Lewis valence structure, ab initio thermochemistry, uncertainty

1 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 67

Abstract Research in ab initio quantum chemistry has produced an increasing number of thermochemistry protocols, serving different needs from benchmark-level accuracy for small molecules to “chemical accuracy” for larger molecules. While in experimental thermochemistry it is accepted standard to report results complete with intervals of 95% confidence, so far only few of the most advanced theoretical approaches have followed suit, based either on statistical comparison to well-established experimental data or careful assessment of high level theoretical results for individual molecules. Here we report on the development of intrinsic uncertainty estimates for the ATOMIC protocol in applications to hydrocarbons. ATOMIC is a theoretical procedure geared toward larger molecules and based on the ab initio implementation of bond separation reactions (BSRs) to reduce errors of mid-level composite approaches. Each of the components contributing to the bottom-of-the-well atomization energy (EA,e ) is scrutinized for possible error by comparison to a large number of very high-level results, including complete-basis-set estimates of CCSDT(Q) bond separation energies for 83 hydrocarbons up to the size of naphthalene. Some of the observations are: Post-CCSD(T) effects are sizeable even for saturated aliphatic compounds but well-represented in a BSR model summing over bond contributions, while conjugated systems pose more problems. The second most significant source of error are complete-basis-set extrapolations for all-electron CCSD(T) contributions, which still carry an uncertainty of a few tenths of a kcal/mol for mid-size molecules like benzene, even if based on large basis set calculations (cc-pCV5Z, cc-pCV6Z). Scalar relativistic terms and diagonal Born-Oppenheimer corrections are of less concern, the former, because they are well represented in a BSR model, the latter because they are small in general. Observations are cast into simple expressions that separate obvious bias from non-systematic error, formulating the former as correction to and the latter as uncertainty of an ATOMIC result. The updated protocol, complete with uncertainties and termed ATOMIC(hc) (“hc” for hydrocarbons), is validated in comparisons with both experimental data from the Active Thermochemical Tables and high-level theoretical data generated in this work. Analysis of lowerlevel ATOMIC models and of further components needed to convert EA,e into enthalpies of formation will be reported separately.

2 ACS Paragon Plus Environment

Page 3 of 67 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Graphical Abstract

Atomization energies (bottom-of-the-well, kcal/mol) 1368.16

1368.05 ± 0.63

ATOMIC/A*

ATOMIC(hc)/A* Corrections and Uncertainties CCSD(T) higher-order relativistic DBOC BSR prototype molecules bond separation energies

3 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

Page 4 of 67

Introduction

Atomization energies relate the energy of a molecule to that of its constituent atoms. Providing a conceptually simple access to experimentally relevant and measurable enthalpies of formation they are a cornerstone of theoretical thermochemistry. Calculations at the ab initio level of wave function theory started out in the mid-1980s1 and were soon followed by the development of the still very popular Gaussian (G-n)2–5 and CBS model chemistries.6–8 Both implement composite approaches that reduce the errors of a base quantum-chemical procedure by incremental and independent corrections for various types of basis set augmentation or extrapolation and improvements in the level of electron-correlation. They all need some ’high-level-correction’ for acceptable accuracy, most often expressed in formulas that include a few empirically calibrated parameters. Fully ab initio approaches place much higher demands on the level of theory employed, nearly always including accurate complete-basis-set extrapolations for the “gold-standard” CCSD(T), corrections for even higher-order electron correlation, for scalar relativistic, non-Born-Oppenheimer, and spin-orbit coupling effects. The most advanced versions of Weizmann theory9–12 and of the HEAT protocol13–15 consider excitations beyond CCSDTQ and so achieve sub-kJ/mol accuracy in comparison to well-established experimental data for small molecules. Their computational complexity limits applications to fairly small molecules, however, and even the treatment of systems like benzene requires compromises in the level of theory.16 The correlation-consistent composite approach (ccCA)17, 18 was presented as parameter-free alternative to Gaussian, lower in complexity and accuracy than Weizmann theory and HEAT, but certainly applicable to larger molecules. All these protocols follow fixed recipes, but a number of authors have also presented solutions that are custom-tailored to the problem at hand. Already in 1993, East and Allen introduced focal point analysis,19–21 which probes for incremental convergence of both the one- and N -particle limits toward the focal point of an exact treatment. The protocol presented by Feller, Peterson, and Dixon (FPD)22, 23 likewise customizes the treatment to the desired accuracy. A few years ago Peterson et al. have reviewed all the aforementioned protocols and their numerous variants.24 New protocols continue to be reported, such as diet-HEAT25 and ANL.26 Many of the more recent developments have focused on strategies to reduce computational cost while maintaining overall 4 ACS Paragon Plus Environment

Page 5 of 67 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

accuracy, including the use of explicitly-correlated MP2 and CCSD components25, 27–31 to lower basis set requirements, of local coupled-cluster implementations for improved scaling with system size32 and of empirically calibrated extrapolations of post-CCSD(T) effects that allow for the use of very small basis sets.33–35 It is accepted standard in experimental thermochemistry to assign 95% confidence intervals to measured quantities, as was already proposed some 88 years ago in Rossini’s work on the enthalpy of formation of water.36 The same need arises also for theoretical determinations or “virtual measurements” in order to express the level of fidelity for a given number. Proposals to do so37, 38 are far less often followed in theory, however, due at least in part to the difficulty of estimating systematic error, which obviously affects quantum chemistry more than experiment. Uncertainties in the above-mentioned FPD approach,23, 39 for example, are estimated component by component and then summed up, not allowing for possible error cancellation. Such a procedure certainly requires significant experience and insight. The far more common fixed-recipe protocols are typically assessed in benchmarks that compare theoretical numbers to established experimental reference data.40, 41 The Active Thermochemical Tables (ATcT)42, 43 have emerged as the preferred source for such data; they replace the traditional sequential approach in experimental thermochemistry with the solution of complex thermochemical networks and so achieve a high level of accuracy and reliability otherwise unattainable. Statistical analyses with established reference data are not free from pitfalls,44, 45 but they may estimate 95% confidence intervals for theoretical models, using twice the root mean squared error, or, more properly, twice the standard deviation, and quoting the mean signed error as systematic bias.37, 38 Basing their statistical analysis on 35 exceptionally accurate ATcT reference data for small molecules,46 Karton et al have recently reported two hundred W411 atomization energies with impressively small error bars of ±0.25 kcal/mol of even larger coverage factor (3 standard deviations).16 The inherent assumption is of course that the test set is representative enough to allow for uncertainty prediction outside. Many thermochemistry protocols are simpler in design than W4 and geared toward applications to larger molecules. Uncertainties will not only be larger but also more difficult to assess. First, these protocols are expected to be less robust and are thus in need of more thorough testing across chemical space, certainly including also many larger molecules not covered by ATcT. Secondly, popular data com-

5 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 67

pilations47–49 and benchmarks (e.g. G3/9950 ) available for this purpose necessarily collect less accurately known experimental data (often up to ±1 kcal/mol),50 occasionally also values that are later found to be in serious error.51 This obviously impairs the significance of statistical analysis; recall that the observable (apparent) uncertainty reflects both, theoretical and experimental error.38 Finally, atomization energies are extensive quantities, and errors of quantum-chemical approaches are expected to scale with molecular size. Such scaling has rarely been analyzed,32 and only some recently proposed uncertainty models seem to consider scaling with either the number of bonds52 or atoms.25, 44, 53 Some time ago we have introduced the ATOMIC protocol,51, 54, 55 short for Ab initio Thermochemistry using Optimal-balance Models with Isodesmic Corrections, which is a fixed-recipe composite protocol for larger molecules and achieves high accuracy through consistent ab initio implementation of Pople’s bond separation reaction (BSR) scheme.56, 57 High level methods, comparable in rigor to the advanced versions of Weizmann and HEAT theories, are used only for the small molecules representing the separated bonds (parent molecules and simple hydrides57), collectively referred to as BSR prototype molecules, or, shorter, BSR prototypes. The bond-separation energy is computed at the composite CCSD(T) level, using smaller and carefully selected basis sets for each of the contributing components (HF, MP2-HF, CCSD-MP2, (T); valence-shell and all-electron) that provide an optimal balance between accuracy and computational cost. The protocol is applicable to every molecule for which a Lewis-valence structure can be drawn, and this structure uniquely defines the BSR and so the final ATOMIC result for a given composite model. Highlevel data computed for 37 BSR prototypes and stored away for later re-use cover applications to most neutral closed-shell molecules composed of H, C, N, O, and F atoms.51, 54 Here we present an intrinsic uncertainty model for the ATOMIC protocol that does not refer to experimental reference data and so provides an alternative to the usual statistical assessment. All components contributing to the bottom-of-the-well atomization energy will be scrutinized for possible error. High-level results for BSR prototypes and bond separation energies are considered separately, and obvious bias is separated from unspecific or seemingly random error. The goal is to provide realistic corrections and uncertainty estimates corresponding to intervals of 95% confidence that recognize the expected scaling of error with molecular size. We deliberately restrict the analysis to hydrocarbons in order to allow for in-depth analysis and refer to corrected numbers, complete with uncertainty estimate, as ATOMIC(hc)

6 ACS Paragon Plus Environment

Page 7 of 67 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

results to distinguish them from regular ATOMIC and to remind us of the restriction to hydrocarbons (“hc”). Corrections and uncertainties are formulated in simple models and do not require any quantumchemical calculations beyond standard ATOMIC. In a follow-up report we shall scrutinize errors and uncertainties related to further terms necessary for the evaluation of enthalpies of formation and apply ATOMIC(hc) to a large set of hydrocarbons.58 We shall first review the ATOMIC protocol and outline the uncertainty model (Sec. 2), provide computational details (Sec. 3) and then analyze contributions to the atomization energy and model bias and uncertainty (Sec. 4). The discussion (Sec. 5) focuses on a critical assessment of the ATOMIC(hc) protocol. This work analyzes a large body of high-level quantum chemical results, many of which are reported only in the Supporting Information. Labels of Tables and Figures included there are identified by a preceding “S” (Tables S1 - S21, Figures S1 - S10). Data are generally provided from analyses carried out at higher precision and rounded only for display in Tables; manual recalculation will thus often lead to round-off error in the last decimal place reported.

2

Theoretical concept

The ATOMIC protocol relies on the concept of bond-separation reactions (BSR)56, 57 used to cancel systematic errors of lower-level quantum-chemical procedures. The taxing task of obtaining high-level energy estimates for a given molecule is reduced to computing them only for the small molecules (BSR prototypes) that represent all constituent bonds of the molecule and combining them with bond-separation energies obtained at lower level.

2.1

The ATOMIC protocol

The formalism has been introduced earlier,54 however, we need to recap the essentials and elaborate on some details in order to make the report self-contained and to set the notation. Energies will generally be expressed with symbol E, using subscripts to specify their type (atomization, reaction), optional superscripts to indicate both contribution (CCSD(T), higher-order, relativistic, etc) and, in curly brackets, theory level, and arguments in square bracket to indicate the entity (molecule, bond) referred to. {m}

Following the ATOMIC protocol (model m), the bottom-of-the-well atomization energy EA,e [M ] of 7 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 67

molecule M (“A” for atomization, “e” for equilibrium geometry, implying exclusion of zero-point-energies) {L}

can conveniently be expressed as a low (L) level value EA,e [M ], corrected by a sum over bond (“b”) {H−L}

increment contributions ∆EA,e

{L}

{H}

[b] = EA,e [b] − EA,e [b],54 that define the difference between high (H)

and low (L) terms

{L}

{m}

EA,e [M ] = EA,e [M ] +

X

{H−L}

∆EA,e

[b]

b∈M

=

 

{L}

EA,e [M ] −



X

{L}

EA,e [b]

b∈M

{L} [M ] + = Ereac

X

 

{H}

EA,e [b]

+



X

{H}

EA,e [b]

(1)

b∈M

b∈M

The last line of Eq. (1) formulates the atomization energy as the sum of a low-level bond separation {L}

energy (BSE), i.e. a reaction energy (Ereac [M ]) for the process of bond separation, and high-level bond increments. We shall see in a moment that this expression indeed follows from the first line, provided that bond increments are defined properly from the energies of BSR prototypes, i.e. the small auxiliary molecules appearing on the left and right hand side of the BSR and representing the bonds of M . In the case of hydrocarbons we need to consider just four prototype molecules (methane, ethane, ethylene, acetylene), and, correspondingly, four bond types, each denoted by the atoms involved, the multiplicity, and the free valencies satisfied by adjacent bonds. The definitions hold equally for high and low levels, so we drop the superscripts L and H:

EA,e

h

h

3

EA,e

1 EA,e [ CH4 ] 4

i

=

C-C3

i

= EA,e [ C2 H6 ] − 6EA,e

h

C-H

3

C-H

3

i i

EA,e

h

C=C2

i

= EA,e [ C2 H4 ] − 4EA,e

h

C-H

EA,e

h

C≡C1

i

= EA,e [ C2 H2 ] − 2EA,e

h

C-H

2 1

3 3

(2)

i

Ereac [M ] (again with superscripts dropped for generality) is the bond separation energy of molecule M .

8 ACS Paragon Plus Environment

Page 9 of 67 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Taking benzene as an example,

C6 H6 + 6 CH4 → 3 C2 H6 + 3 C2 H4

(3)

Ereac [C6 H6 ] = −3EA,e [ C2 H6 ] − 3EA,e [ C2 H4 ] + EA,e [C6 H6 ] + 6EA,e [ CH4 ]

(4)

it reads

or, using Eq. (2),

Ereac [C6 H6 ] = EA,e [C6 H6 ] − 3EA,e

h

3

i

C-C3 − 3EA,e

h

2

i

C=C2 − 6EA,e

h

3

i

C-H

(5)

and, more generally,

Ereac [M ] = EA,e [M ] −

X

EA,e [b]

(6)

b∈M

which is what we used in the last line of Eq. (1). Note that atomization energies enter with negative sign in Eq. (4) as they subtract molecular from atomic energies. The general idea of the ATOMIC protocol is thus to perform only a low-level (L) calculation for a given molecule M, and take precomputed high-level (H ) bond increments to correct the result, following Eq. (1). The formalism is initially applied to establish the atomization energy at the complete basis set limit of CCSD(T){m}

CCSD(T),59–61 which we shall denote specifically as EA,e

[M ]. All models {m} defined so far54, 55

use the same reference level H: It is obtained from extrapolations of CCSD(T)(full)/cc-pCVX Z62 (X =Q, 5) energies to the complete basis set limit, corrected by higher-level (5,6) extrapolations for the valenceshell CCSD portions, using cc-pVX Z basis sets.63–65 ATOMIC models {m} differ in the composite model CCSD(T){L}

used for the low-level (L) CCSD(T) atomization energy EA,e

[M ], ranging from the most expensive,

quite accurate model A, which involves valence-shell calculations up to MP2/cc-pV5Z, CCSD/cc-pVQZ, and CCSD(T)/cc-pVTZ, to the much cheaper but less reliable model C that uses significantly smaller basis sets. Short-hand notations for correlation-consistent basis sets and for ATOMIC models have been introduced earlier54 and will be recapped in Sec. 4.1. The ATOMIC protocol then considers further contributions to atomization energies: higher order

9 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 67

(“h.o.”) electron correlation effects beyond CCSD(T), scalar relativistic (“rel.”) effects, diagonal BornOppenheimer (“DBOC”) and spin-orbit corrections (“SO”). The total “bottom-of-the-well” atomization energy is given as a sum of these contributions: CCSD(T){m}

ATOMIC EA,e [M ] = EA,e

h.o.{m}

[M ] + EA,e

rel.{m}

[M ] + EA,e

DBOC{m}

[M ] + EA,e

SO{m}

[M ] + EA,e

[M ]

(7)

Each of the contributions (generically termed “contrib”) is formally evaluated using bond separation reactions contrib{m}

EA,e

contrib{L} [M ] = Ereac [M ] +

X

contrib{H}

EA,e

[b]

(8)

b∈M

where the model designation {m}, defining both high (H) and low (L) level methods, is specific to the contribution. We have just skipped additional subscripts to m, L, and H to keep notation simple. CCSD(T){m}

ATOMIC models differ only in the EA,e

[M ] term, all other contributions are evaluated following

the same prescription, irrespective of the choice of ATOMIC model: Higher-order electron correlation66–69 h.o.{H}

terms EA,e

[b] include the valence-shell energy differences CCSDT-CCSD(T) (extrapolated from ccrel.{H}

pVDZ and cc-pVTZ) and CCSDTQ-CCSDT (cc-pVDZ). Scalar relativistic terms EA,e

[b] are obtained

from CCSD(T)(full)/aug-cc-pCVTZ one particle density matrices contracted with one-electron Darwin DBOC{H}

and mass-velocity terms.70–72 DBOC terms73–75 EA,e

[b] are evaluated at the Hartree-Fock (cc-

pVDZ) level. All molecules considered in this work are closed-shell, hence spin-orbit corrections arise only from the constituent atoms, are taken from experimental data76 as listed by Curtiss et al 4 and can SO{m}

conveniently be expressed in high-level bond increments (EA,e h.o.{m}

Ereac

rel.{m}

[M ], Ereac

DBOC{m}

[M ], and Ereac

[M ] =

P

SO{H}

b∈M

EA,e

[b]).77

[M ] are evaluated under the greatly simplifying assumption

that the corresponding bond-separation reactions are thermoneutral,

contrib{L} Ereac [M ] = 0

(9)

The evaluation of atomization energy contributions then proceeds trivially by summing up pre-computed high-level (H ) bond increments

10 ACS Paragon Plus Environment

Page 11 of 67 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

contrib{m}

EA,e

[M ] =

X

contrib{H}

EA,e

[b]

(10)

b∈M

2.2

Analysis of uncertainty and error

It is the purpose of this work to establish internal error estimates for ATOMIC calculations and provide an alternative to the usual comparison to experiment. To proceed we scrutinize each of the components contributing to the final ATOMIC atomization energy, Eq. (7), and build models to determine fair estimates of their expected error or uncertainty. A set of 83 hydrocarbons with three or more carbon atoms plus the four relevant BSR prototypes methane, ethane, ethylene, acetylene (Figs. S1-S10, Table S1) serves as the benchmark set. It includes some highly strained systems as well as larger molecules with up to 10 carbon atoms to allow for meaningful analyses and gives preference to molecules with high symmetry to ease calculations at high levels. Still, some of the reference calculations reported here are computationally so demanding that subsets need to be selected on a case-by-case basis. For each of the contributions to the atomization energy we shall try to identify systematic error or bias contrib{m}

and separate it from unspecific or random error. The former will be used to improve the energy E A,e contrib{m}

with a correction C A,e

contrib{m}

, the latter to estimate its uncertainty uA,e

. Like for energy components

we shall distinguish high- (“H”) and low-level (“L”) contributions resulting from the application of bond contrib{H}

separation reactions. At the high-level H, the correction C A,e

[b] will be obtained from extending

the level of theory for BSR prototypes to what we can currently afford and comparing the result to the contrib{H}

(somewhat inferior) reference implemented in standard ATOMIC. The uncertainty u A,e

[b] will be

based on convergence analyses and numerical experience gained that identifies possible further correction through potential exact or fully converged calculations that would be desirable but are out of reach contrib{H}

computationally. Both terms, C A,e

contrib{H}

[b] and u A,e

[b], are formulated as bond terms, but they

express corrections and uncertainties for BSR prototypes and so are defined in the same non-empirical contrib{H}

fashion as the corresponding energy terms E A,e contrib{L}

At low-level L, corrections Creac

[b] are in Eq. (2). contrib{L}

[M ] and uncertainties ureac

[M ] will be obtained from anal-

yses carried out for benchmark sets and cast into statistical or empirical models. Ideally, atomization contrib{m}

energy contributions EA,e

[M ] +

P

contrib{H}

b∈M

CA,e

[b] will be compared to fully corrected reference

11 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

contrib{H}

data EA,e

contrib{H}

[M ] + CA,e

Page 12 of 67

[M ], but less rigorous choices may need to be made for contributions contrib{H}

(notably the higher-order electron correlation terms), for which CA,e

[M ] terms are too taxing to

compute for a larger set of molecules. Overall corrections and uncertainties for particular contributions are then obtained as follows: contrib{m}

C A,e

[M ] = C contrib{L} [M ] + reac

X

contrib{H}

C A,e

[b]

b∈M



contrib{m}

u A,e

[M ]

2

=



2

u contrib{L} [M ] reac



+

X

b∈M

contrib{H}

u A,e

2

[b]

(11)

The use of straight summation for (unsigned) uncertainties of high-level (“H”) bond terms in Eq. (11) certainly deserves some comment. If, like in ordinary applications of BSR schemes, the final energy of molecule M were obtained from experimental energies for BSR prototypes and theoretical reaction energies, then regular error propagation from the uncertainties for the reaction energy and the uncertainties contrib{m}

for each of the BSR prototype energies would be the only reasonable choice to estimate u A,e

[M ],

because experimental measurement errors must be assumed to be random and uncorrelated. The situation is quite different if BSR prototype data come from theory, as residual errors will be highly correlated. The use of bond separation reaction schemes is so beneficial to theory because they conserve the number of bonds of each type (3 each of types 3 C-C3 and 2 C=C2 , 30 of type 3 C-H in the above example of benzene, Eq. (3)), and many of the errors even in fairly approximate quantum-chemical theory tend to be characteristic of bond type, so they largely cancel out for bond-separation reactions. Even better cancellation may be expected at benchmark levels of theory, and, while by construction the BSR scheme does not balance bonds in the high-level (H ) terms, there are still 24 bonds of type 3 C-H appearing on the left and right hand side of the BSR of benzene that cancel if high-level (H ) terms are expressed in bond terms rather than in energies for BSR prototype molecules (compare Eqs. (4) and (5)). Formulation in bond terms is merely a convenient representation of BSRs if energies are concerned, however, it has profound consequences if uncertainties are estimated because it cancels a significant portion of systematic error. In turn we should allow for likely positive correlation between the much smaller remaining bond term uncertainties, and favor the safer straight summation over error propagation. The resulting sum over contrib{H}

u A,e

[b] terms should be a conservative estimate of possible error relating to the reference energies 12 ACS Paragon Plus Environment

Page 13 of 67 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

contrib{L}

used for BSR prototypes. A separate uncertainty ureac

(M ) will finally be estimated for the low-level

(L) reaction energy itself, which is expected to be largely uncorrelated with the former, and so added in contrib{L}

quadrature (see Eq. (11)). We note that low-level uncertainties ureac

(M ) themselves will often be

expressed in terms of bond increments, which this time arise from calibrated additive empirical models and so demand straight summation by construction. In summary, uncertainty estimates for low- and high-level terms first need to be established for the entire molecule, employing straight summation in case of bond-increment-representations. Resulting contribution uncertainties for the molecule are then added in quadrature, following Eq. (11). Overall correction and uncertainty are given as ATOMIC(hc)

C A,e

CCSD(T){m}

[M ] = C A,e

h.o.{m}

+ C A,e 

ATOMIC(hc)

uA,e

2

[M ]

=



+



rel.{m}

[M ] + C A,e

CCSD(T){m}

uA,e

h.o.{m}

uA,e

o.s.a.{m}

[M ] + C A,e

[M ]

2

[M ]

2 

[M ]

(12)

DBOC{m}

[M ] + C A,e



o.s.a.{m}

+ uA,e rel.{m}

+ uA,e

2

[M ]

[M ]

2

[M ]



DBOC{m}

+ uA,e

2

[M ]

The revised protocol is referred to as ATOMIC(hc) for the sole purpose of distinguishing regular (ATOMIC) and corrected (ATOMIC(hc)) results, where the label “hc” serves as a reminder of the current limitation to hydrocarbons: ATOMIC(hc)

EA,e

ATOMIC(hc)

ATOMIC [M ] = EA,e [M ] + C A,e

ATOMIC(hc)

[M ] + u A,e

[M ]

(13)

Note that no corrections or uncertainties appear in Eq. (12) for spin-orbit terms (compare Eq. (7)) as o.s.a. [M ] they are derived from experimental data and assumed to have negligible error. One set of terms, C A,e o.s.a. [M ], on the other hand, has no counterpart in the energy expression of the ATOMIC protocol and u A,e

(Eq. (7)), it has been added to remove inconsistencies in the open-shell treatment of the atoms (“o.s.a.”, ROHF vs UHF) for CCSD(T) and higher-order electron correlation contributions.77 Eq. (12) obtains final estimates of uncertainty by summation in quadrature, assuming that errors of different contributions are uncorrelated. In practice we expect some correlation even after separating obvious bias and formulating it as a correction. Residual errors will still tend to be on one side of 13 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 67

the uncertainty interval for similar molecules that share larger common fragments. In the absence of better knowledge it seems justified, however, to proceed as if component errors were uncorrelated. In all likelihood, positive correlation will be balanced by negative correlation, and so summing up errors in quadrature should yield far more realistic final uncertainties than strict summation, which would of course cover worst-case scenarios but in general lead to unnecessarily inflated error bars. In the following sections we present very accurate calculations for all the components to the ATOMIC atomization energy and derive corrections and uncertainties following the principles outlined above. Final results are listed in Table 1. A number of corrections and uncertainties are found to be negligible or irrelevant and so are not considered in ATOMIC(hc) and skipped in Table 1. None of the remaining terms require any additional quantum-chemical calculation, hence corrections to, and uncertainties of, ATOMIC results are obtained as simple post-hoc estimates.

3

Computational details

Following the ATOMIC standard,54 geometries are optimized at the (RI)78 -MP279 (fc)/cc-pVTZ63 level of theory, and all energies reported here have been evaluated for those geometries. Although usually adequate for high-level atomization energies, RI-MP2/cc-pVTZ geometries are not without flaws; in a follow-up report58 we shall assess geometry-related energy errors and anticipate some of the results here when needed (Sec. 5). Force constant analyses are used to verify that optimized geometries are minima, in a few cases we observe very small imaginary frequencies (e.g. 2-butyne, 8i cm−1 ) that characterize numerical problems with essentially barrier-free rotations rather than genuine transition states.51 TURBOMOLE80, 81 (version 5.10, 2008) was used for geometry optimizations and to obtain force-constant matrices from gradients for displaced geometries. MOLPRO82, 83 was employed for correlated single-point calculations up to CCSD(T)59–61 and for relativistic Douglas-Kroll84 -Hess85 (DKH) calculations. MRCC,86, 87 at times in conjunction with ACES II88 was used for higher-order coupled-cluster calculations,66–69 CFOUR89 for evaluations of diagonal Born-Oppenheimer corrections90 and for relativistic calculations that evaluate the mass-velocity and one-electron Darwin terms.70–72 A limited number of data was taken from previous work.51, 54 Finally, MOLPRO was used to produce CCSD-F12(fc)/cc-pV5Z-F12(rev2)91 correlation energies as 14 ACS Paragon Plus Environment

Page 15 of 67 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

reference for complete basis set (CBS) estimates of CCSD(fc). We followed the procedure outlined by Peterson et al,91 using canonical HF orbitals and CCSD-F12b92, 93 calculations (CCSD(F12*)94 for comparison in some cases) that employ the aug-cc-pwCV5Z/MP2fit auxiliary basis set95 for resolution of the identity and density fitting treatments of F12 corrections only. In summary, correlation-consistent basis sets96–98 cc-pVX Z63–65 and cc-pCVX Z62, 99 were used throughout, the former for valence-shell correlation treatments, the latter for all-electron calculations (unless specified otherwise, Sec. 4.1), at times augmented with diffuse functions,100 revised for use in explicitly correlated calculations,91 or recontracted for use with relativistic hamiltonians.101 Complete-basis-set (CBS) extrapolations are performed as specified in Sec. 4.1 for CCSD(T) contributions and assume X −3 convergence for higher-order (post-CCSD(T)) effects. Consistent with the original publication,54 we shall use simplified basis-set notation for these contributions, specifying just the cardinal number in numerical form (X=3 instead of X=T, Secs. 4.14.8), augmented with a prime symbol (X’) to refer to mixed basis sets using X − 1 for hydrogen and X for carbon (Secs. 4.3-4.8), and boldface typesetting with top bar to indicate CBS extrapolation (X for(X − 1, X) and X′ for(X − 1, X ′ )). A truncated triple-ζ basis set X=3tr (highest-angular momentum functions deleted, second-highest taken from cc-pVDZ, denoted “PVTZ(nof 1d)” by Karton et al 33 ) is used in Sec. 4.5. MP2(full)/cc-pCVX Z (X =Q, 5) correlation energies are needed for a new model A* covering the entire benchmark set (Sec. 4.2); for larger molecules they have been obtained with TURBOMOLE using the RI approximation and auxiliary basis sets composed of those for cc-pV6Z, for carbon atoms supplemented with the core-valence part designed for cc-pwCV5Z95 and with two additional functions for each of the angular momenta of the latter, choosing exponents two- and four-times higher than the highest ones already included. Tested for more than 50 of all 87 hydrocarbons, this protocol reproduced MP2 correlation energies to within 0.004 kcal/mol (cc-pCVQZ) and 0.012 kcal/mol (cc-pCV5Z). This small residual inaccuracy is not considered any further in our error analysis.

4

The ATOMIC(hc) protocol

The following sections assess corrections to and uncertainties of ATOMIC estimates for each of the individual contributions to bottom-of-the-well atomization energies, first analyzing CCSD(T) contributions 15 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 67

(Secs. 4.1 and 4.2), then higher-order electron correlation contributions (Secs. 4.3 to 4.8), corrections relating to the open-shell treatment of the atoms (Sec. 4.9), scalar relativistic effects (Sec. 4.10), and finally diagonal Born-Oppenheimer corrections (Sec. 4.11).

4.1

CCSD(T) atomization energies at the reference level H CCSD(T){H}

The ATOMIC protocol obtains CCSD(T) atomization energies at the reference level (EA,e

[M ])

from extrapolations of CCSD(T)(full)/cc-pCVXZ (X=Q, 5) energies to the complete basis set limit, corrected by higher-level (5,6) extrapolations for the valence-shell CCSD portions. This level is denoted54 as [6 6 6 5 | 5 5 5 5 ] using the basis set notation introduced in Sec. 3 and listing from left to right the HF (EHF (X1 )), valence shell MP2-HF (∆MP2 (X2 )), CCSD-MP2 (∆CCSD (X3 )), and CCSD(T)-CCSD (∆CCSD(T) (X4 )) components and the increments toward corresponding cc-pCVX Z evaluations (∆core HF (X5 ), core core ∆core MP2 (X6 ), ∆CCSD (X7 ), ∆CCSD(T) (X8 )), where electron correlation considers all electrons.

Here we wish to increase the reference level to [6 6 6 6 | 6 6 6 5 ] and reconsider the precise formulas used for complete basis set extrapolation. The Hartree-Fock component appears to be uncritical, as extrapolation to the CBS limit using the simple exponential formula of Halkier et al 102 changes the atomization energy by 0.03 kcal/mol or less even for larger hydrocarbons with six carbon atoms (see 6,6 δHF = EHF (6) − EHF (6) in Table 2), indicating negligible extrapolation error of around 0.01 kcal/mol.

Note also that the addition of core functions to the basis set (cc-pCVX Z) changes results by less than core 0.01 kcal/mol (see ∆core HF (6) in Table 2), very close to the expected CBS limit ∆HF (∞)=0.

The situation is less favorable for correlation energies, owing to their slow convergence to the CBS limit. ATOMIC assumes X −β convergence with β adjusted to reproduce either large, but finite basis set results (EX+N with finite N ) or known CBS results (infinite N , here evaluated as documented in Sec. 3) from results obtained for X − 1 and X.103

EX+N = EX +

X −β − (X + N )−β (X − 1)−β − X −β

(EX − EX−1 )

(14)

β should approach 3 in the asymptotic limit, at least for singlet electron pairs,104, 105 but experience shows that even basis sets as large as cc-pV6Z are still far away from this limit, and that extrapolation of atomization energy contributions, i.e. of energy differences, poses additional challenges.103 Based on 16 ACS Paragon Plus Environment

Page 17 of 67 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

results for (34)→5, (34)→6, and (45)→6 extrapolations of valence-shell CCSD correlation contributions to atomization energies, a compromise value of β=3.15 had been suggested for (34), (45), and (56) extrapolations,103 which was then also used for all-electron CCSD and perturbational triples contributions without further testing.54 We have repeated the previous analysis103 but have now focused on hydrocarbons and considered allelectron CCSD(T) as well. Table 3 lists optimal extrapolation exponents that minimize the sum of squared errors between extrapolated and reference values for a set of 16 small hydrocarbons (see Tables S2-S8 for individual results). Valence-shell CCSD/cc-pVX Z contributions show much smaller optimal exponents for (45)→6 and (45)→∞ than for all other extrapolations, indicating anomalously slow convergence in these cases. This irregularity may well reflect some lack of radial flexibility of correlation-consistent polarized valence basis sets that Sylvetsky et al noted already31 for diffuse-augmented (aug-cc-pVX Z) versions and found to impair the accuracy of CBS extrapolations. The use of cc-pCVX Z basis sets indeed eliminates this irregularity (Table 3) and affords “optimal” extrapolation exponents for valence-shell CCSD contributions that vary fairly little (3.3-3.4) across all types of extrapolation. For accurate CBS reference data, Peterson et al recommend CCSD-F12 calculations with increased geminal exponents (1.4), paired with the “rev2” version of their cc-pV5Z-F12 basis set,91 which contains additional polarization functions on hydrogen. This protocol obviously produces slightly larger correlation contributions to atomization energies at the estimated CBS limit than other choices do (Table S2), and so makes convergence of regular valence-shell CCSD/cc-pCVX Z calculations appear somewhat slower as indicated by smaller optimal extrapolation exponents in Table 3. On the whole, however, optimal exponents are reassuringly similar for all four CBS choices and consistent also with optimal exponents found for finite basis set extrapolations. Finally we note that Kesharwani et al have recently favored CCSD(F12*)94 over the here employed CCSDF12b92, 93 to estimate CBS limits in particular for molecules with some multi-reference character.106 For the hydrocarbons examined here (including more difficult cases like bicyclo[1.1.0]but-1(3)-ene, cf. Sec. 4.5), we have found little difference between the two CCSD-F12 approximations (Table S2), however, and so expect no material shift for derived optimal extrapolation exponents. We have not attempted to produce CBS reference data for the more relevant all-electron models of CCSD and CCSD(T), because the F12 treatment does not affect perturbational triples (T) corrections

17 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 67

and, with currently available basis sets,107 has not shown better performance than conventional large-basis set extrapolation for inner-shell correlation at the CCSD level.108 Results for finite-basis set extrapolations in Table 3 indicate, however, that optimal exponents for CBS extrapolations of CCSD(T)(full) correlation contributions should be fairly similar to CCSD(fc), possibly slightly larger than the latter. In summary, we recommend a revised extrapolation exponent of β=3.4 for hydrocarbons and assign an uncertainty of ±0.25. The interval (3.15· · ·3.65) includes the original exponent (3.15) at the lower limit and almost all individual exponents determined for finite- and complete-basis extrapolations of CCSD(fc), CCSD(full), and CCSD(T)(full) contributions to atomization energies (Tables S6-S8). Notable exceptions include less relevant small-basis-set extrapolations, but only for hydrogen-rich hydrocarbons (methane, ethane) whose correlation energies are known to converge particularly fast.109 The smaller extrapolation exponent (3.15) suggested in the original work103 reflects the noted problems with cc-pVX Z basis sets (s.a.), the focus on smaller X used for extrapolation, and a number of molecules in the original data set that primarily contain oxygen and fluorine atoms and exhibit slow and often erratic convergence. To retain the original definition of the ATOMIC protocol, including the original CBS extrapolation procedure, we express the combined effect of both the change in reference (high, H ) level (from [6 6 6 5 | 5 5 5 5 ] to [6 6 6 6 | 6 6 6 5 ] ) and the change in β (from 3.15 to 3.4) for BSR prototype CCSD(T){H}

molecules (methane, ethane, ethylene, acetylene) in terms of simple bond increments (CA,e

[b])

correcting systematic error, to be added to regular ATOMIC results. Similar evaluation for the lower bound to the extrapolation exponent (β=3.4-0.25=3.15) for [6 6 6 6 | 6 6 6 5 ] yields an upper bound to the corrected CCSD(T) energy contribution, and, by difference, estimates for the uncertainty or random CCSD(T){H}

error (uA,e

[b]) . We accept the same difference to define also the lower bound to the corrected

energy contribution, this choice is somewhat more generous than evaluation with β=3.4+0.25=3.65 and CCSD(T){H}

retains symmetric uncertainty estimates. Note that bond increments (CA,e

CCSD(T){H}

[b], uA,e

[b])

are derived following Eq. (2). This decomposition ensures that bond increments represent evaluations of thermoneutral bond separation reactions. Results of this decomposition are collected in Table 1. Table 2 includes ATOMIC(hc) estimates to the CCSD(T) correlation contribution of atomization energies that are obtained from ATOMIC reference ( [6 6 6 5 | 5 5 5 5 ], β=3.15) calculations and BSR increments as detailed above (Table 1). For BSR prototypes (methane, ethane, ethylene, acetylene),

18 ACS Paragon Plus Environment

Page 19 of 67 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

ATOMIC(hc) matches “best estimates” by construction, for larger molecules ATOMIC(hc) is found to reproduce “best estimates” quite well, and uncertainties are always recovered to within 0.01 kcal/mol, in support of our strictly additive bond increment model for the error (compare Sec. 2.2).

4.2

CCSD(T) atomization energies at lower-level L

Errors and uncertainties for the standard ATOMIC models (A, B1 -B6 , C, E)54, 55 shall be discussed in a follow-up report,58 together with applications to large hydrocarbons. For the purposes of the current work, however, we still need a very accurate ATOMIC model that allows us to treat molecules with up to about 10 carbon atoms, for which calculations at the reference (high, H ) level would be unfeasible. None of the standard ATOMIC models meets the demands of accuracy, but one listed without further discussion in the original work54 turns out to be surprisingly accurate for hydrocarbons ( [5 5 5 4 | - 5 3 3 ]; see line 5 in Table I of Ref. 54). P

CCSD(T){H}

b∈M

C A,e

CCSD(T){A∗}

This model will be referred to as A*; results (EA,e

[M ] +

[b]) for 29 hydrocarbons are listed in Table S9 and visualized in Figure 1. The error

with respect to the ATOMIC(hc) reference is most conveniently expressed as percentage of the reference atomization energy; broken lines in Figure 1 indicate the mean signed error (+0.0062 %), plus and minus three times the standard deviation (±0.0164 %, derived corrections listed in Table 1). Atomization energies tend to be slightly overestimated for molecules with C=C double bonds and about right for those with C≡C triple bonds. The obviously systematic nature of the residual error, the relatively small size of the benchmark set, and the importance of ATOMIC/A* for the judgment of more approximate models58 have prompted us to choose a generous error estimate (3 rather than 2 standard deviations) to remain on the safe side and cover all cases studied here, including more “exotic” ones. Errors are still extremely small, so it is obviously immaterial whether they are estimated as percentage of reference or model atomization energies. This will be important for actual applications where reference energies are unavailable.

4.3

Higher-order electron correlation: General remarks

The ATOMIC protocol estimates higher-order electron correlation contributions toward Full CI from h.o.{H}

bond increments only (EA,e

[b]). These increments were derived from calculations for BSR proto-

type molecules, covering a valence-shell CCSDT-CCSD(T) term, extrapolated to the CBS limit from

19 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 67

cc-pVDZ and cc-pVTZ results and a CCSDTQ-CCSDT term evaluated with a cc-pVDZ basis set. The choice of theory level was largely guided by experience shared by other authors, and in particular by the implementation reported for the HEAT protocol,13 the difference being that the latter extrapolates CCSDT-CCSD(T) term from larger basis sets. Here we wish to test the validity of the protocol for hydrocarbons and report post-CCSD(T) results for all 87 molecules of the benchmark set to derive both h.o.{H}

corrections and uncertainties for the reference level (CA,e

h.o.{L}

related to the neglect of bond separation energies (ureac

h.o.{H}

[b], uA,e

[b]) and estimate uncertainties

[M ]).

Extended treatments of higher-order electron correlation66–69 are still possible only for the small prototypes of BSRs (methane, acetylene, ethylene and ethane) and very few other molecules with three or four carbon atoms, depending on symmetry and on the number of hydrogens present. This allows us h.o.{H}

to derive both CA,e

h.o.{L}

to estimate ureac

h.o.{H}

[b] and uA,e

[b] (Sec. 4.4), but we need to follow a more pragmatic approach

[M ]. First we shall settle on a reasonable and affordable level to compute bond

separation energies for all molecules and use a limited base of higher-level calculations to estimate their possible error (Sec. 4.5). Now, for computational reasons, neither ATOMIC nor ATOMIC(hc) considers h.o.{L}

bond separation energies explicitly (Ereac

h.o.{L}

[M ]=0, Creac

h.o.{L}

[M ]=0), hence the final model for ureac

[M ]

needs to cover both the computed values and their potential error in comparison to estimated Full CI (Sec. 4.6). This model will be applicable to most cases but fail for molecules that exhibit excessive higher-order contributions to bond separation energies. A quality diagnostic is then developed that tries to identify those cases based only on available CCSD(T) results. (Sec. 4.7). Scrutiny of higher order atomization energy contributions (Tables S10 and S11) permits a number of general conclusions to be drawn: (a) CCSDT-CCSD(T) contributions (henceforth termed [T-(T)]) and CCSDT(Q)-CCSDT ([(Q)-T]) contributions are normally of the same order of magnitude and of opposite sign. It is thus mandatory to include both of them in concert ([(Q)-(T)]) in order to improve upon CCSD(T) results; note that the failure to improve CCSD(T) by [T-(T)] corrections alone has been reported by several authors before.10, 110 (b) Similar remarks apply to CCSDTQ-CCSDT(Q) ([Q-(Q)]) and CCSDTQ(P)-CCSDTQ ([(P)-Q]) contributions, which are both generally much smaller than lowerorder contributions, but typically of similar magnitude among each other and often of opposite sign. (c) Even higher excitations have only been evaluated with double-ζ basis sets for the very smallest molecules

20 ACS Paragon Plus Environment

Page 21 of 67 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(methane and acetylene), their contribution to atomization energies is expectedly very small and should generally be negligible, at least for hydrocarbons with little or no non-dynamical correlation. (d) Innershell correlation effects need to be considered at the [(Q)-(T)]core level to achieve an accuracy of about a hundredth of a kcal/mol. Here [(Q)-(T)]core is defined as the difference between CCSDT(Q)(full)CCSD(T)(full) increments and their valence-shell equivalents [(Q)-(T)]. (e) The slowest convergence to the complete basis set limit is observed for the lowest-order contribution, [T-(T)], where a double-ζ evaluation generally suggests too little destabilization, occasionally even stabilization, at variance with accurate extrapolations to the complete basis set limit. Basis set extrapolation thus appears mandatory. Aiming at an accuracy of one or a few hundredths of a kcal/mol for high-level bond terms, however, one also needs to extrapolate the remaining contributions to their basis set limits, including [(P)-(Q)] and the inner-shell term [(Q)-(T)]core . Note, e.g., that double-ζ evaluations suggest a vanishingly small valenceshell [(P)-(Q)] contribution and a small negative inner-shell [(Q)-(T)]core contribution of -0.02 kcal/mol for acetylene, while the best complete basis set estimates suggest -0.03 and +0.06 kcal/mol, respectively.

4.4

Higher-order electron correlation at the reference level H

As mentioned in Sec. 2.2, we shall study the 1- and N -particle convergence for bond increments (Table 4) rather than for BSR prototypes (Tables S10 and S11) on which they are based. This offers the benefit of error compensation between the left- and right-hand sides of the BSR and, as we shall see, of reduced basis-set requirements for CC bond terms. 3 C-H

bond increments tend to be extremely small. [(Q)-(T)] contributions are still overestimated with

small basis sets, but reliable (34) extrapolations indicate a vanishingly small value at the CBS limit. The best available estimates for the [(P)-(Q)] and [(Q)-(T)]core terms are very close to zero as well. The 3 C-C3 bond cannot be dealt with in as much detail as the other relevant bond types because the parent molecule ethane is the largest by far and of fairly low symmetry, thus putting serious constraints on the types of calculations that are computationally feasible. Convergence is fairly rapid, however. The [(Q)-(T)] term appears well converged for X = 4, and the [(P)-(Q)] and[(Q)-(T)]core are so much smaller that their evaluation for X = 3′ and X = 3 seems more than adequate. Note that the use of smaller hydrogen basis sets (X = 4′ rather than X = 4, see Sec. 3 for notation) hardly affects [(Q)-(T)] results

21 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 67

for the 3 C-C3 bond (by only 0.001 kcal/mol) while it has a significant effect for the corresponding BSR prototype, ethane (0.033 kcal/mol, see Table S11). This supports our strategy to analyze derived BSR bond terms rather than energies of BSR prototype molecules, which could have saved the expense of an X = 4 evaluation for ethane at the CCSDT(Q) level (about 12 days on 16 cores of commodity hardware) in favor of the much cheaper X = 4′ evaluation (about 5 days on 8 cores). This strategy definitely helps in the analysis of the 2 C=C2 bond. Convergence of the [(Q)-(T)] term seems less rapid, calling for an X = 5′ calculation for ethylene, which proved to be just feasible, while a full quintuple-ζ-level calculation exceeded our possibilities. Again, results for X = 4′ and X = 4 match each other very well, consolidating our confidence that a full X = 5 evaluation would match the X = 5′ result to within a few cal/mol (0.001 kcal/mol). At this level, the [(Q)-(T)] term seems to be well-converged. The 2 C=C2 bond does show non-negligible [(P)-(Q)] and [(Q)-(T)]core terms, however, numbers listed in Table 4 indicate reasonable convergence for [(P)-(Q)], and, with some reservation, also for [(Q)-(T)]core . Both terms are of similar magnitude and opposite sign, so they nearly cancel each other. Not unexpectedly, energies are hardest to converge for 1 C≡C1 bonds. Luckily the BSR prototype (acetylene) is small and symmetric enough to permit very large basis set calculations. Reasonable convergence has been reached with the largest basis sets considered, that is at X = 5, X = 3, and X = 4, respectively, for [(Q)-(T)], [(P)-(Q)], and [(Q)-(T)]core terms. Again we note that basis set reduction on hydrogen hardly influences results, neither for [(Q)-(T)] nor for [(P)-(Q)] terms. Like for the 2 C=C2 bond, the [(P)-(Q)], and [(Q)-(T)]core terms are non-negligible and of opposite sign. Final corrections and their associated uncertainties (Table 5) are obtained as follows for each of the four bonds and each of the three terms, [(Q)-(T)], [(P)-(Q)], and [(Q)-(T)]core : The CBS extrapolation using the largest basis sets available, Xmax and Xmax -1, (the rightmost entry in Table 4) defines the correction, and the difference with respect to either the next-lower quality CBS extrapolation, from Xmax -1 and Xmax -2, or to the unextrapolated value (Xmax ), whichever is larger, serves as an estimate of the associated uncertainty. Note for completeness that results from truncated basis sets and their extrapolations are not mixed with those from full basis sets in the estimate of uncertainties. Modest error bars are assigned for the neglect of effects beyond [(P)-(Q)] and [(Q)-(T)]core on any of the CC bonds, 0.01 kcal/mol in the case of 3 C-C3 and 2 C=C2 , a value that matches the order-of-magnitude of

22 ACS Paragon Plus Environment

Page 23 of 67 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

the highest-order effects still considered, and 0.02 kcal/mol for the 1 C≡C1 bond, which shows somewhat larger [(P)-(Q)], and [(Q)-(T)]core energies and a very small, but at this level non-negligible, [(6)-(P)] energy of 0.005 kcal/mol, computed for X=2 (see Table S10). For each of the four bonds, we obtain final estimates of “higher-order” electron correlation corrections by summation and uncertainties by error propagation. (right column in Table 5). The difference with respect to standard ATOMIC54 is fairly significant for 2 C=C2 and 1 C≡C1 bonds (Table 1), mainly reflecting basis set effects in the [(Q)-T] portion of the [Q-T] term, which has previously only been computed with a small double-ζ basis set.

4.5

Explicit calculation of [(Q)-(T)]terms h.o.{L}

The ATOMIC protocol neglects bond-separation energies Ereac.

[M ] in the evaluation of higher-order

correlation contributions, see Eq. (9), assuming that the contributions are very local in nature and thus well captured by a system of computed bond-increments (Sec. 4.4). The validity of this assumption can h.o.{L}

only be checked by explicit calculations of Ereac.

[M ], and the experience gained so far suggests [(Q)-(T)]

evaluations using X = 3′ as minimum level for such assessment. These calculations are so demanding, however, that they need to be limited to molecules the size of C6 H6 with at least C2v or C2h symmetry (e.g. fulvene, 3,3’-bicyclopropenyl), for which the perturbational (Q) term alone takes about 2 weeks to compute on state-of-the-art 16-core workstations with 256 GB of memory. In order to extend our test bed to the complete set of molecules considered here, we resort to the W3.2lite(b) protocol33 of Karton et al. which, which following our short-hand notation, combines CCSDT(Q) calculations using double-ζ basis sets and CCSDT calculations using truncated triple-ζ basis sets (3tr , denoted “PVTZ(nof 1d)” by Karton et al,33 see also Sec. 3) in the following way:

[(Q)-(T)] /(2, 3tr ) = c3 [T-(T)]/3tr + (1 − c3 ) [T-(T)]/2

(15)

+ c4 [(Q)-T]/2 This procedure is computationally much less expensive than [(Q)-(T)]/3′ as it uses a heavily truncated triple-ζ basis set for the [T-(T)] component and only a double-ζ basis set for the perturbational [(Q)-T] 23 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 67

component. The suggested parameters (c3 =2.6, c4 =1.1) had originally been obtained by calibration against accurate reference data for very small molecules. Here we recalibrate the model to reproduce available [(Q)-(T)] /3′ bond separation energies (c3 =2.0, c4 =1.25). Already a glimpse on the data collected in Table S11 shows that this calibration disappoints in the calculation of atomization energy contributions, however, it reproduces bond-separation energies extremely well and does so consistently (see Table S13), cutting the RMS deviation from 0.08 kcal/mol (using W3.2lite(b) parameters, data not shown but recoverable from Table S11 ) to only 0.02 kcal/mol. There is one outlier excluded in this analysis, spiropentane, whose [(Q)-(T)]/3′ value might well be an artifact, as it is the only species for which we see serious disagreement between [T-(T)]/3′ and [T-(T)]/3 (Table S13). Unfortunately CCSDT(Q) calculations for X = 3 have proved unfeasible, but the expectation is that [(Q)-(T)] /3 would be fairly close to the calibrated [(Q)-(T)] /(2, 3tr ). Two potential types of error contribute to the uncertainty of bond separation energies evaluated at the [(Q)-(T)] /(2, 3tr ) level. The first one concerns inaccuracies in the estimate of the CBS limit. Inspection of the few cases where X = 3, and X = 4′ data are available (Tables S13, S14) lets us expect that the error incurred at X = (2, 3tr ) is indeed negligibly small. This of course applies only to bond separation energies, not to “raw” atomization energy contributions (compare Table S11). Comparison of available X = 3′ and X = (2, 3tr ) data further shows that these two estimates agree to within 0.01 kcal/mol per carbon-carbon bond except for spiropentane (s.a.). The second potential source of error concerns neglect of contributions to bond-separation energies that go beyond valence-shell [(Q)-(T)] terms. While the sheer lack of data makes these errors hard to estimate, they are expected to be very small in general. Recall that the analysis is for bond separation energies which cancel out any local (bond-specific) contribution of presumably small terms. Hence inner-shell contributions [(Q)-(T)]core to bond separation energies are tiny indeed: From the data collected in Tables S12 and S13 we determine a maximum value of 0.006 kcal/mol per carbon-carbon bond at the X = 2 level (29 molecules, maximum found for hexapentaene), and of 0.005 kcal/mol at the X = 3 level (3 molecules, maximum found for 1,3-butadiyne). Even fewer data are available for valence-shell contributions beyond CCSDT(Q): [(P)-(Q)] terms, evaluated at the double-ζ level, amount to -0.003 (cyclopropene), -0.008 (allene), and -0.019 kcal/mol

24 ACS Paragon Plus Environment

Page 25 of 67 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(1,3-butadiyne) per carbon-carbon bond, respectively. Additional [Q-(Q)]/2 terms without perturbational pentuples are available for a few more molecules, and calculated bond separation energies here are somewhat larger (e.g. -0.024 kcal/mol per CC bond for cyclobutadiene), but disturbing only for the pathological case of bicyclo[1.1.0]but-1(3)-ene (-0.043 kcal/mol per CC bond), whose very strained geometry leads to an excessively large [(Q)-(T)] bond separation energy as well (see Table S14 and discussion below). While these small-basis-set evaluations of [(Q)-(T)]core terms and of [(P)-(Q)] terms, in particular, are not expected to yield accurate results they should be good enough to estimate their typical contribution to bond separation energies. We estimate an uncertainty of ±0.03 kcal/mol per CC bond, roughly 0.01 kcal/mol due to basis set effects in [(Q)-(T)] terms and 0.02 kcal/mol due to neglect of higher-order terms, if bond-separation energies are based on [(Q)-(T)]/(2,3tr ) (or [(Q)-(T)]/3′ ) calculations alone. The final column of Table S14 collects our best estimates for higher order corrections to atomization energies, using [(Q)-(T)]/(2,3tr ) bond separation energies with their estimated uncertainties and bond increments (Table 5) as derived in Sec. 4.4 from high-level calculations for BSR prototypes. Reported uncertainties are obtained by error propagation.

4.6

Higher-order electron correlation: Thermoneutral BSR approximation

We are now in a position to evaluate the ATOMIC(hc) error that results from neglect of the bond separation energy term (Eq. (9)). Figure 2 plots the value of this term (reported in column “(2, 3tr )” of Table S14) for all 83 molecules in the benchmark set, using color coding to distinguish between classes of molecules. Table S14 further compares evaluations without (column “Value”) and with (final column “Best est.”) inclusion of the bond separation energy term. A number of conclusions are obvious from Figure 2: Bond separation energies are essentially negligible for alkanes and for hydrocarbons containing only a single double or triple bond, bicyclo[1.1.0]but-1(3)-ene being the only exception (s.a.). None of the strained alkanes, including tetrahedrane and cubane, shows sizeable bond separation energies. Conjugated double bonds do not pose problems either, if they are limited to isolated 1,3-butadiene units (“conjugated double, simple” in Figure 2), but afford noticeable bond separation energies otherwise: Most benzenoid hydrocarbons show negative values, antiaromatic hydrocarbons and other extended double-bond systems

25 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 67

typically show positive values as do polyynes and cumulenes. The latter two types of molecules seem particularly problematic, with bond separation energies exceeding 1 kcal/mol in some cases. This analysis suggests a simple model to estimate ATOMIC(hc) error bars resulting from the neglect of bond separation energies. Pure alkanes and hydrocarbons with isolated double and triple bonds are assigned an uncertainty of 0.03 kcal/mol per bond, which is the same that we assume for actually calculated, but negligibly small, [(Q)-(T)]/(2,3tr ) values of bond separation energies. To represent the increased uncertainty for conjugated systems, we assign an additional 0.05 kcal/mol and 0.10 kcal/mol per formally sp2 and sp-hybridized carbon atom, respectively, excepting only isolated bonded pairs of sp2 and sp carbon atoms (Table 1). Table S14 reports uncertainty estimates in column “Unc.” and Figure 2 shows corresponding error bars as black lines. There are only a few molecules not covered by this uncertainty estimate, including the cyclic cumulene C8 H4 and all linear cumulenes larger than pentatetraene in addition to cyclobutadiene and bicyclo[1.1.0]but-1(3)-ene (see above). Other than these fairly exotic outliers, molecules important for organic chemistry are all covered, including benzenoid and other types of aromatic systems. Borderline cases are para-xylylene and pentatetraene where bond separation energies match estimated error bars to within 0.01 kcal/mol.

4.7

Higher-order electron correlation: Quality diagnostics

It would be of obvious benefit to have a simple predictor that warns of cases for which the above error estimate may be too small. We have initially considered the T1 diagnostic111 evaluated as a by-product of CCSD calculations. It is often used to identify potential multi-reference cases but at times performs poorly in doing so.11, 112, 113 This choice may be less than ideal also because the thermoneutral BSR model only fails to capture higher-order excitation contributions that are non-local in nature, while bond contributions are considered at high level (see above). Two alternative choices try to accommodate this: The first is the excess T1 diagnostic, which we define for molecule M as the difference between its T1 diagnostic and a weighted average of T1 diagnostics for the BSR prototypes ethane, ethylene, and acetylene,

∆T1 (X) [M ] = T1 (X) [M ] − c1 T1 (X) [C2 H6 ] − c2 T1 (X) [C2 H4 ] − c3 T1 (X) [C2 H2 ]

26 ACS Paragon Plus Environment

(16)

Page 27 of 67 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

where ci = ni / (n1 + n2 + n3 ), n1 , n2 , and n3 being the number of CC single, double, and triple bonds, respectively, and X is the cardinal number of the basis set used for evaluation. To give an example, the excess T1 diagnostic for benzene equals the difference between its T1 diagnostic and the average of those for ethane and ethylene. The second alternative is the percentage, denoted as %BSE[(T)] and evaluated for X = 4, that the bond separation energy contributes to the total (T) component of the atomization energy, the remainder of course being the sum over contributions from the respective BSR prototypes as they appear in the bond separation reaction. The expectation is that large non-local (T) contributions may well be an indicator for larger non-local contributions from even higher-order excitations. Note that this choice is different from Martin’s %TAE[(T)] diagnostic11 as it does not refer to contributions to the total atomization energy. Results collected in Table S15 indicate that T1 indeed is a poor predictor of excessive [(Q)-(T)] bond separation energy contributions. The largest observed T1 values are still quite a bit lower than the threshold of 0.02, commonly thought to indicate problematic multi-reference cases,111, 112 and the pathological case of bicyclo[1.1.0]but-1(3)-ene carries a low value (0.014) comparable to that of much less-problematic cases such as 1,3-butadiyne or butatriene. The %BSE[(T)] diagnostic reassuringly assigns one of the highest values to the former, but is also unable to clearly differentiate it from much less problematic cases such as [1.1.1]propellane, butalene, or ortho-benzyne. The data further indicate non-negligible scaling with system-size, thus precluding the use of %BSE[(T)] as diagnostic for larger molecules for which explicit [(Q)-(T)] evaluations are definitely unfeasible. The excess T1 diagnostic appears to be much better behaved: Setting a threshold of ∆T1 = 0.005, it correctly identifies bicyclo[1.1.0]but-1(3)-ene, linear cumulenes larger than pentatetraene, and 1,2,3,5,6,7cyclooctahexaene as cases not covered by our simple error estimate for the neglect of [(Q)-(T)] BSE contributions. The diagnostic shows little basis set dependence, and we choose X = 2sd (cc-pVDZ less p functions on H) for general use as this makes the evaluation very inexpensive and the preferred low-cost ATOMIC models B5 and B6 already include CCSD/cc-pVX Z evaluations with X = 2sd .55 Even more important is the observation that the diagnostic does not produce any false positives: Only pentatetraene, identified as a borderline case for the error model, shows a value of around 0.005, and the next lower value (butalene, 0.004) is significantly below the chosen threshold. Figure 2 indicates all cases of moderate to

27 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 28 of 67

large ∆T1 diagnostic. Note that the worst cases (∆T1 >0.005, solid circles and squares) all exhibit large positive - and no negative - [(Q)-(T)] BSE contributions. Eliminating these cases through application of the ∆T1 diagnostic criterion helps to reduce the bias observed for the BSE distribution, i.e. for the h.o.{L}

distribution of errors resulting from the assumption of thermoneutrality, Ereac.

[M ]=0. This is obviously

important as the overall uncertainty is obtained through error propagation (Sec. 2.2).

4.8

Higher-order electron correlation: Summary

In summary, ATOMIC(hc) estimates higher-order electron correlation corrections from BSR increments (Table 1) only and assigns uncertainties both to the values of these increments and to the neglect of [(Q)-(T)] BSE contributions (Table 1). The latter are believed to be fair estimates of possible error if ∆T1 0.005, bottom panel of Figure 5). Although ATOMIC(hc) is an improvement to standard ATOMIC, differences between the two are still fairly small overall, ranging from -0.34 kcal/mol (propane, MID 10) to +0.13 kcal/mol (1,3,5-hexatriyne, MID 29). For a start this is good news as the scrutiny undertaken in this work confirms the high quality of the ATOMIC protocol that was previously established only by comparison to experiment. Hence the key benefit of the additional ATOMIC(hc) treatment is the evaluation of a realistic intrinsic error estimate for an already performed ATOMIC calculation, and, in particular, one that does not require additional computer time. Table S21 shows, however, that the usually small difference between ATOMIC(hc) and ATOMIC reflects fairly large, but opposing differences for the CCSD(T) and higher-order correlation components to the atomization energy. Standard ATOMIC overestimates molecular stability at the CCSD(T) level as well as (potential) destabilization through higher-order correlation. As discussed in Secs. 4.1 and 4.4, the first effect results from smaller basis sets and smaller extrapolation exponents being used in the CBS extrapolation, and the second effect from neglecting the basis set dependence of correlation terms beyond CCSDT. CCSD(T){H}

Inspection of Table 1 indicates net corrections toward ATOMIC(hc) (CA,e o.s.a.{H}

CA,e

DBOC{L}

[a]+CA,e

h.o.{H}

[b]+CA,e

[b] +

[M ]) of -0.037, +0.010, +0.058, and 0.084 kcal/mol for 3 C-H, 3 C-C3 and 2 C=C2 , CCSD(T){L}

and 1 C≡C1 bonds, respectively. These numbers do not yet account for corrections to model A* (Creac

[M ]), which are included in the CCSD(T) term listed in Table S21. Still they show that the largest net negative corrections should be expected for acyclic alkanes and the largest net positive corrections for cumulenes and polyynes, which is exactly what we have observed above (see also Table S21, last column). ATOMIC is in fact not the only protocol that shows the somewhat unsatisfactory error compensation. Benzene serves as an illustrative example because it is a popular albeit compute-intensive benchmark for some of the most accurate thermochemistry procedures available at their time.9, 31, 33, 127, 130, 131 Let us focus on the two most recent, highest-level, studies: Harding et al report EA,e [M ]=1367.83 kcal/mol using HEAT345-(Q)127 and Sylvetsky et al present calculations based on a modified W4-F12 protocol (a procedure here referred to as SPKM) from which we obtain EA,e [M ]=1367.89 kcal/mol.31, 132 Both

35 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 36 of 67

numbers are obviously very close to our best estimate (1367.86 kcal/mol including the geometry correction, Table 6), but contributing components do show differences: SPKM affords a CCSD(T) value of 1369.79 kcal/mol, reasonably close to our best estimate (1369.61 kcal/mol after geometry correction), but at variance with the HEAT345-(Q) value (1370.29 kcal/mol). Likewise the higher-order electron-correlation contributions of SPKM (-0.49 kcal/mol) compare reasonably with ours (-0.42 kcal/mol) but not with those of HEAT345-(Q) (-1.05 kcal/mol). The numerical differences between SPKM and HEAT345-(Q) have already been pointed out by Sylvetsky et al.31 Here we note that they are nearly identical to those that we observe between the substantially simpler ATOMIC(hc)/A* and ATOMIC/A* protocols (CCSD(T):-0.63 kcal/mol, h.o.: +0.56 kcal/mol, Table S21) that consider the most accurate treatments only at the level of BSR prototypes. However, both W4-F12 and ATOMIC(hc) consider complete-basis-set extrapolation of terms beyond CCSDT, which HEAT345-(Q) and standard ATOMIC are lacking. The above analysis suggests that any upgrade of the ATOMIC protocol, as reported here for hydrocarbons, should always consider CCSD(T) and higher-order correlation terms in concert. Improving only one of these terms bears the risk of impairing the established quality of results. As far as the ATOMIC protocol is concerned, the demands of highest accuracy fortunately pertain only to fairly small molecules used as BSR prototypes since the calculation of bond separation energies benefits from error cancellation and the ATOMIC protocol is geared toward accurate - but not benchmark level - thermochemistry of larger molecules.

6

Conclusions

Atomization energy contributions beyond CCSD(T) are far from negligible even for hydrocarbons. Unfortunately, their evaluation is a formidable task and not feasible for any thermochemistry approach focused on efficiency and application to large molecules. ATOMIC separates contributions into those for BSR prototypes and bond separation energies, calculating the former and tabulating them for applications to larger molecules and neglecting the latter. In this work we have revisited the protocol to determine high-level contributions for BSR prototypes, finding that the previously adopted approach54 considering CBS-extrapolated [T-(T)]/3 corrections and [Q-T]/2 corrections is less than ideal, because even contributions beyond [T-(T)] need CBS extrapolation for accuracy. ATOMIC(hc) goes much further and derives 36 ACS Paragon Plus Environment

Page 37 of 67 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

3 C-H, 3 C-C3 , 2 C=C2 ,

and 1 C≡C1 bond terms and associated uncertainties from CBS-extrapolated data

considering valence-shell and all-electron correlation at levels up to CCSDTQ(P) and CCSDT(Q), respectively. Higher-order correlation contributions turn out to be negligible for 3 C-H, small for 2 C=C2 , but sizeable for both 3 C-C3 (∼-0.1 kcal/mol) and 1 C≡C1 (∼+0.1 kcal/mol, Table 5). Bond separation energies have been evaluated at lower level ([(Q)-(T)]/(2,3tr ) and are generally found to be small for saturated systems and for hydrocarbons with isolated double or triple bonds. Using the bond terms listed above, this observation allows for the remarkable conclusion that higher-order or postCCSD(T) effects destabilize “simple” alkanes by about -0.1 kcal/mol per CC bond, summing up to -1.2 kcal/mol for cubane, and that “gold-standard” CCSD(T) shows sizeable systematic error. This was already noted in the original ATOMIC work,51, 54 recently studied in detail for linear n-alkanes,133 and is now confirmed more generally: With rare exceptions, the post-CCSD(T) contribution for any non-conjugated hydrocarbon may be estimated from simple sums over bond increments. Conjugation between sp2 -carbons, however, introduces sizeable bond separation energies, enhancing post-CCSD(T) destabilization by -0.2 to -0.3 kcal/mol for benzenoid hydrocarbons (e.g. benzene, naphthalene), and lowering it by often larger amounts for most others (e.g. cyclobutadiene, pentalene, para-xylylene). Conjugation between sp-carbons (cumulenes, polyynes) has an even larger, consistently stabilizing effect. ATOMIC(hc), like ATOMIC, neglects bond-separation energies for correlation effects beyond CCSD(T) and so introduces an uncertainty whose value is estimated based on the number of CC bonds and of conjugated sp- and sp2 centers. The uncertainty estimate disregards severe outliers deliberately, however, we have found that the excess T 1 diagnostic (Sec. 4.7) identifies most of those cases, advising of the necessity to perform explicit [(Q)-(T)] calculations. The refinement of BSR reference data for correlation beyond CCSD(T) leads to markedly larger h.o.{H}

atomization energies, in particular for molecules with double and triple bonds (CA,e

[b], Table 1).

At the same time more accurate CBS limits for CCSD(T), reflecting both refined extrapolation and the use of larger basis sets, lower ATOMIC(hc) atomization energies compared to standard ATOMIC CCSD(T){H}

(CA,e

[b]). Corrections are significant (+0.6 and -0.6 kcal/mol for benzene), underlining the dif-

ficulty encountered even by advanced ab initio thermochemistry protocols to reach “chemical accuracy”. Reassuringly, however, the fair estimates of residual uncertainty relating to BSR prototypes (here +0.1

37 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 38 of 67

and +0.3 kcal/mol after correction) are smaller. The original ATOMIC protocol obviously benefits from significant error compensation like even higher-level protocols have done in the past (Sec. 5). The remaining contributions to the bottom-of-the-well atomization energy are of less concern and their contribution to overall uncertainties is fairly small. We note in particular that BSR models reproduce highlevel calculations of scalar relativistic contributions to an accuracy of ±10%. Diagonal Born-Oppenheimer corrections are harder to reproduce in BSR models, but they are smaller overall, and evaluation at the correlated level lowers them even further. Thermoneutral BSR models tend to underestimate DBOCs as well, the use of Hartree-Fock reference data in ATOMIC thus leads to qualitatively reasonable, but DBOC{L}

quantitatively disappointing estimates. ATOMIC(hc) implements an empirical correction (Creac

[M ])

derived from correlated reference data for homologous series, which leads to marked improvement. ATOMIC is a fully ab initio thermochemistry approach. Improvements to mid-range composite models are implemented in the framework of bond-separation reactions, supplying high-level reference data for BSR prototypes. Corrections toward ATOMIC(hc) follow the same ab initio principle at the reference (high, H ) level, but use some empiricism at the low (L) level. All corrections and uncertainties are expressed in terms of bond increments or fractions of energies. Hence the evaluation of an ATOMIC(hc) atomization energy, complete with uncertainty estimate, does not require any quantum-chemical calculations beyond those already performed for ATOMIC. In a follow-up report,58 we shall scrutinize those components for error and uncertainty that are needed to convert bottom-of-the-well atomization energies into enthalpies of formation, and apply the ATOMIC(hc) protocol to a larger set of hydrocarbons.

Acknowledgments Work started in affiliation with ETH Z¨ urich, and ETH resources, in particular generous grants of computer time on the EULER and BRUTUS clusters, are gratefully acknowledged, as is additional support through a grant from the Swiss National Supercomputing Centre (CSCS) under project ID s152. The author also acknowledges support by O. Anatole von Lilienfeld (Univ. Basel) and thanks him as well as J. Missimer (PSI, Villigen) for helpful discussions.

38 ACS Paragon Plus Environment

Page 39 of 67 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Associated Content - Supporting Information The Supporting Information contains a list of all molecules considered, both in tabular and graphical form, correlation contributions to atomization energies and derived effective extrapolation exponents, CCSD(T) atomization energies, higher-order correlation contributions to atomization and to bond separation energies (non-extrapolated and CBS-extrapolated), including a summary of results, electron correlation diagnostics, differences between ROHF- and UHF- based total energies for the carbon atom, scalar relativistic and DBOC contributions to atomization energies, and high-level atomization energies of small hydrocarbons. This information is available free of charge via the internet at http://pubs.acs.org.

39 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 40 of 67

References [1] Pople, J. A.; Lulke, B. T.; Frisch, M. J.; Binkley, J. S. Theoretical thermochemistry. 1. Heats of formation of neutral AHn molecules (A = Li to Cl). J. Phys.Chem. 1985, 89, 2198–2203. [2] Pople, J. A.; Head-Gordon, M.; Fox, D. J.; Raghavachari, K.; Curtiss, L. A. Gaussian-1 theory: A general procedure for prediction of molecular energies. J. Chem. Phys. 1989, 90, 5622–5629. [3] Curtiss, L. A.; Raghavachari, K.; Trucks, G. W.; Pople, J. A. Gaussian-2 theory for molecular energies of first- and second-row compounds. J. Chem. Phys. 1991, 94, 7221–7230. [4] Curtiss, L. A.; Raghavachari, K.; Redfern, P. C.; Rassolov, V.; Pople, J. A. Gaussian-3 (G3) theory for molecules containing first and second-row atoms. J. Chem. Phys. 1998, 109, 7764–7776. [5] Curtiss, L. A.; Redfern, P. C.; Raghavachari, K. Gaussian-4 theory. J. Chem. Phys. 2007, 126, 084108. [6] Petersson, G. A.; Bennett, A.; Tensfeldt, T. G.; Al-Laham, M. A.; Shirley, W. A.; Mantzaris, J. A complete basis set model chemistry. I. The total energies of closed-shell atoms and hydrides of the first-row elements. J. Chem. Phys. 1988, 89, 2193–2218. [7] Montgomery Jr., J. A.; Ochterski, J. W.; Petersson, G. A. A complete basis set model chemistry. IV. An improved atomic pair natural orbital method. J. Chem. Phys. 1994, 101, 5900–5909. [8] Ochterski, J. W.; Petersson, G. A.; Montgomery Jr., J. A. A complete basis set model chemistry. V. Extensions to six or more heavy atoms. J. Chem. Phys. 1996, 104, 2598–2619. [9] Martin, J. M. L.; de Oliveira, G. Towards standard methods for benchmark quality ab initio thermochemistry - W1 and W2 theory. J. Chem. Phys. 1999, 111, 1843–1856. [10] Boese, A. D.; Oren, M.; Atasoylu, O.; Martin, J. M. L.; K´allay, M.; Gauss, J. W3 theory: Robust computational thermochemistry in the kJ/mol accuracy range. J. Chem. Phys. 2004, 120, 4129– 4141. [11] Karton, A.; Rabinovich, E.; Martin, J. M. L.; Ruscic, B. W4 theory for computational thermochemistry: In pursuit of confident sub-kJ/mol predictions. J. Chem. Phys. 2006, 125, 144108. 40 ACS Paragon Plus Environment

Page 41 of 67 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

[12] Karton, A.; Taylor, P. R.; Martin, J. M. L. Basis set convergence of post-CCSD contributions to molecular atomization energies. J. Chem. Phys. 2007, 127, 064104. [13] Tajti, A.; Szalay, P. G.; Cs´asz´ ar, A. G.; K´ allay, M.; Gauss, J.; Valeev, E. F.; Flowers, B. A.; V´azquez, J.; Stanton, J. F. HEAT: High accuracy extrapolated ab initio thermochemistry. J. Chem. Phys. 2004, 121, 11599–11613. [14] Bomble, Y. J.; V´azquez, J.; K´ allay, M.; Michauk, C.; Szalay, P. G.; Cs´ asz´ ar, A. G.; Gauss, J.; Stanton, J. F. High-accuracy extrapolated ab initio thermochemistry. II. Minor improvements to the protocol and a vital simplification. J. Chem. Phys. 2006, 125, 064108. [15] Harding, M. E.; V´azquez, J.; Ruscic, B.; Wilson, A. K.; Gauss, J.; Stanton, J. F. High-accuracy extrapolated ab initio thermochemistry. III. Additional improvements and overview. J. Chem. Phys. 2008, 128, 114111. [16] Karton, A.; Sylvetsky, N.; Martin, J. M. L. W4-17: A diverse and high-confidence dataset of atomization energies for benchmarking high-level electronic structure methods. J. Comput. Chem. 2017, 38, 2063–2075. [17] DeYonker, N. J.; Cundari, T. R.; Wilson, A. K. The correlation consistent composite approach (ccCA): An alternative to the Gaussian-n methods. J. Chem. Phys. 2006, 124, 114104. [18] DeYonker, N. J.; Grimes, T.; Yockel, S.; Dinescu, A.; Mintz, B.; Cundari, T. R.; Wilson, A. K. The correlation-consistent composite approach: Application to the G3/99 test set. J. Chem. Phys. 2006, 125, 104111. [19] East, A. L. L.; Allen, W. D. The heat of formation of NCO. J. Chem. Phys. 1993, 99, 4638–4650. [20] Cs´asz´ar, A. G.; Allen, W. D.; Schaefer III, H. F. In pursuit of the ab initio limit for conformational energy prototypes. J. Chem. Phys. 1998, 108, 9751–9764. [21] Cs´asz´ar, A. G.; Szalay, P. G.; Leininger, M. L. The enthalpy of formation of 2 Π CH. Mol. Phys. 2002, 100, 3879–3883.

41 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 42 of 67

[22] Feller, D.; Dixon, D. A.; Peterson, K. A. Heats of formation of simple boron compounds. J. Phys. Chem. A 1998, 102, 7053–7059. [23] Feller, D.; Peterson, K. A.; Dixon, D. A. A survey of factors contributing to accurate theoretical predictions of atomization energies and molecular structures. J. Chem. Phys. 2008, 129, 204105. [24] Peterson, K. A.; Feller, D.; Dixon, D. A. Chemical accuracy in ab initio thermochemistry and spectroscopy: Current strategies and future challenges. Theor. Chem. Acc. 2012, 131, 1079. [25] Ganyecz, A.; K´ allay, M.; Csontos, J. Moderate-cost ab initio thermochemistry with chemical accuracy. J. Chem. Theory Comput. 2017, 13, 4193–4204. [26] Klippenstein, S. J.; Harding, L. B.; Ruscic, B. Ab initio computations and active thermochemical tables hand in hand: Heats of formation of core combustion species. J. Phys. Chem. A 2017, 121, 6580–6602. [27] Klopper, W.; Ruscic, B.; Tew, D. P.; Bischoff, F. A.; Wolfsegger, S. Atomization energies from coupled-cluster calculations augmented with explicitly-correlated perturbation theory. Chem. Phys. 2009, 356, 14–24. [28] Klopper, W.; Bachorz, R. A.; H¨attig, C.; Tew, D. P. Accurate computational thermochemistry from explicitly correlated coupled-cluster theory. Theor. Chem. Acc. 2010, 126, 289–304. [29] Vogiatzis, K. D.; Haunschild, R.; Klopper, W. Accurate atomization energies from combining coupled-cluster computations with interference-corrected explicitly correlated second-order perturbation theory. Theor. Chem. Acc. 2014, 133, 1446. [30] Karton, A.; Martin, J. M. L. Explicitly correlated Wn theory: W1-F12 and W2-F12. J. Chem. Phys. 2012, 136, 124114. [31] Sylvetsky, N.; Peterson, K. A.; Karton, A.; Martin, J. M. L. Toward a W4-F12 approach: Can explicitly correlated and orbital-based ab initio CCSD(T) limits be reconciled? J. Chem. Phys. 2016, 144, 214101.

42 ACS Paragon Plus Environment

Page 43 of 67 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

[32] Paulechka, E.; Kazakov, A. Efficient DLPNO CCSD(T)-based estimation of formation enthalpies for C-, H-, O-, and N-containing closed-shell compounds validated against critically evaluated experimental data. J. Phys. Chem. A 2017, 121, 4379–4387. [33] Karton, A.; Kaminker, I.; Martin, J. M. L. Economical post-CCSD(T) computational thermochemistry protocol and applications to some aromatic compounds. J. Phys. Chem. A 2009, 113, 7610–7620. [34] Chan, B.; Radom, L. W3X: A cost-effective post-CCSD(T) composite procedure. J. Chem. Theory Comput. 2013, 9, 4769–4778. [35] Chan, B.; Radom, L. W2X and W3X-L: Cost-effective approximations to W2 and W4 with kJ mol−1 accuracy. J. Chem. Theory Comput. 2015, 11, 2109–2119. [36] Rossini, F. D. The heat of formation of water. J. Res. Natl. Bur. Stand. 1931, 6, 1–35. [37] Irikura, K. K.; Johnson III, R. D.; Kacker, R. N. Uncertainty associated with virtual measurements from computational quantum chemistry models. Metrologia 2004, 41, 369–375. [38] Ruscic, B. Uncertainty quantification in thermochemistry, benchmarking electronic structure computations, and active thermochemical tables. Int. J. Quantum Chem. 2014, 114, 1097–1101. [39] Feller, D.; Peterson, K. A.; Dixon, D. A. Refined theoretical estimates of the atomization energies and molecular structures of selected small oxygen fluorides. J. Phys. Chem. A 2010, 114, 613–623. [40] Simmie, J. M.; Somers, K. P. Benchmarking compound methods (CBS-QB3, CBS-APNO, G3, G4., W1BD) against the Active Thermochemical Tables: A litmus test for cost-effective molecular formation enthalpies J. Phys. Chem. A 2015, 119, 7235–7246. [41] Simmie, J. M.; Sheahan, J. N. Validation of a database of formation enthalpies and of mid-level model chemistries. J. Phys. Chem. A 2016, 120, 7370–7384. [42] Ruscic, B.; Pinzon, R. E.; Morton, M. L.; von Laszevski, G.; Bittner, S. J.; Nijsure, S. G.; Amin, K. A.; Minkoff, M.; Wagner, A. F. Introduction to active thermochemical tables: Several ”key” enthalpies of formation revisited. J. Phys. Chem. A 2004, 108, 9979–9997. 43 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 44 of 67

[43] Ruscic, B.; Pinzon, R. E.; von Laszewski, G.; Kodeboyina, D.; Burcat, A.; Leahy, D.; Montoya, D.; Wagner, A. F. Active Thermochemical Tables: Thermochemistry for the 21st century. J. Phys. Conf. Ser. 2005, 16, 561–570. [44] Savin, A.; Johnson, E. R. Judging density-functional approximations: Some pitfalls of statistics. Top. Curr. Chem. 2015, 365, 81–95. [45] Pernot, P.; Savin, A. Probabilistic performance estimators for computational chemistry methods: The empirical cumulative distribution function of absolute errors. J. Chem. Phys. 2018, 148, 241707. [46] Karton, A.; Daon, S.; Martin, J. M. L. W4-11: A high-confidence benchmark dataset for computational thermochemistry derived from first-principles W4 data. Chem. Phys. Lett. 2011, 510, 165–178. [47] Pedley, J. B.; Naylor, R. D.; Kirby, S. P. Thermochemical data of organic compounds.; Chapman and Hall, London, 2nd ed., 1986. [48] Gurvich, L. V.; Veyts, I. V.; Alcock, C. B. Thermodynamic properties of individual substances., Vol. 1, part 2; Hemisphere Publishing Corp., New York, 1989. [49] Chase Jr., M. W. NIST-JANAF Thermochemical Tables - Fourth Edition. J. Phys. Chem. Ref. Data 1998, Monograph No 9, 1–1951. [50] Curtiss, L. A.; Raghavachari, K.; Redfern, P. C.; Pople, J. A. Assessment of Gaussian-3 and density functional theories for a larger experimental test set. J. Chem. Phys. 2000, 112, 7374–7383. [51] Bakowies, D. Ab initio thermochemistry with high-level isodesmic corrections: Validation of the ATOMIC protocol for a large set of compounds with first-row atoms (H, C, N, O, F). J. Phys. Chem. A 2009, 113, 11517–11534. [52] Zhang, W. J.; Truhlar, D. G.; Tang, M. S. Tests of exchange-correlation functional approximations against reliable experimental data for average bond energies of 3d transition metal compounds. J. Chem. Theory Comput. 2013, 9, 3965–3977.

44 ACS Paragon Plus Environment

Page 45 of 67 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

[53] Perdew, J. P.; Sun, J.; Garza, A. J.; Scuseria, G. E. Intensive atomization energy: Re-thinking a metric for electronic structure theory methods. Z. Phys. Chem. 2016, 230, 737–742. [54] Bakowies, D. Ab initio thermochemistry using optimal-balance models with isodesmic corrections: The ATOMIC protocol. J. Chem. Phys. 2009, 130, 144113. [55] Bakowies, D. Simplified wave function models in thermochemical protocols based on bond separation reactions. J. Phys. Chem. A 2014, 118, 11811–11827. [56] Ditchfield, R.; Hehre, W. J.; Pople, J. A.; Radom, L. Molecular orbital theory of bond separation. Chem. Phys. Lett. 1970, 5, 13–14. [57] Hehre, W. J.; Ditchfield, R.; Radom, L.; Pople, J. A. Molecular orbital theory of the electronic structure of organic compounds. V. Molecular theory of bond separation. J. Am. Chem. Soc. 1970, 92, 4796–4801. [58] Bakowies, D.;

Estimating systematic error and uncertainty in ab initio thermochemistry: II.

ATOMIC(hc) enthalpies of formation for a large set of hydrocarbons.; to be submitted, 2019. [59] Purvis, G. D.; Bartlett, R. J. A full coupled-cluster singles and doubles model: The inclusion of disconnected triples. J. Chem. Phys. 1982, 76, 1910–1918. [60] Raghavachari, K.; Trucks, G. W.; Pople, J. A.; Head-Gordon, M. A fifth-order perturbation comparison of electron correlation theories. Chem. Phys. Lett. 1989, 157, 479–483. [61] Hampel, C.; Peterson, K. A.; Werner, H.-J. A comparison of the efficiency and accuracy of the quadratic configuration interaction (QCISD), coupled cluster (CCSD), and Brueckner coupled cluster (BCCSD) methods. Chem. Phys. Lett. 1992, 190, 1–12. [62] Woon, D. E.; Dunning Jr., T. H. Gaussian basis sets for use in correlated molecular calculations. V. Core-valence basis sets for boron through neon. J. Chem. Phys. 1995, 103, 4572–4585. [63] Dunning Jr., T. H. Gaussian basis sets for use in correlated molecular calculations. I. The atoms boron through neon and hydrogen. J. Chem. Phys. 1989, 90, 1007–1023.

45 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 46 of 67

[64] Peterson, K. A.; Woon, D. E.; Dunning Jr., T. H. Benchmark calculations with correlated molecular wave functions. IV. The classical barrier height of the H+H2 ->H2 +H reaction. J. Chem. Phys. 1994, 100, 7410–7415. [65] Wilson, A. K.; van Mourik, T.; Dunning Jr., T. H. Gaussian basis sets for use in correlated molecular calculations. VI. Sextuple zeta correlation consistent basis sets for boron through neon. J. Mol. Struct.: THEOCHEM 1996, 388, 339–349. [66] K´ allay, M.; Surj´ an, P. R. Higher excitations in coupled-cluster theory. J. Chem. Phys. 2001, 115, 2945–2954. [67] Bomble, Y. J.; Stanton, J. F.; K´allay, M.; Gauss, J. Coupled-cluster methods including noniterative corrections for quadruple excitations. J. Chem. Phys. 2005, 123, 054101. [68] K´ allay, M.; Gauss, J. Approximate treatment of higher excitations in coupled-cluster theory. J. Chem. Phys. 2005, 123, 214105. [69] K´ allay, M.; Gauss, J. Approximate treatment of higher excitations in coupled-cluster theory. II. Extension to general single-determinant reference functions and improved approaches for the canonical Hartree-Fock case. J. Chem. Phys. 2008, 129, 144101. [70] Cowan, R. D.; Griffin, D. C. Approximate relativistic corrections to atomic radial wave functions. J. Opt. Soc. Am. 1976, 66, 1010–1014. [71] Martin, R. L. All-electron relativistic calculations on AgH. An investigation of the Cowan-Griffin operator in a molecular species. J. Phys. Chem. 1983, 87, 750–754. [72] Helgaker, T.; Jørgensen, P.; Olsen, J. Molecular Electronic-Structure Theory.; Wiley, Chichester, UK, 2000. [73] Handy, N. C.; Yamaguchi, Y.; Schaefer III, H. F. The diagonal correction to the Born-Oppenheimer approximation: Its effect on the singlet-triplet splitting of CH2 and other molecular effects. J. Chem. Phys. 1986, 84, 4481–4484. [74] Handy, N. C.; Lee, A. M. The adiabatic approximation. Chem. Phys. Lett. 1996, 252, 425–430. 46 ACS Paragon Plus Environment

Page 47 of 67 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

[75] Valeev, E. F.; Sherrill, C. D. The diagonal Born-Oppenheimer correction beyond the Hartree-Fock approximation. J. Chem. Phys. 2003, 118, 3921–3927. [76] Moore, C. E.; Atomic Energy Levels, Natl. Bur. Stand. (US), Circ 467 (1949). [77] Terms that only concern the constituent atoms of a molecule obviously involve only contributions at the high (H ) level but none at the low (L) level. They can be represented as bond terms if compact formats are sought that condense all bond contributions for a given bond type into a single number (see Table 1 of Ref. 51), but otherwise representations as atomic terms are preferred (label [a] for ”atom” in Table 1 ) . [78] Weigend, F.; H¨aser, M. RI-MP2: First derivatives and global consistency. Theor. Chem. Acc. 1997, 97, 331–340. [79] Møller, C.; Plesset, M. S. Note on an approximation treatment for many-electron systems. Phys. Rev. 1934, 46, 618–622. [80] Ahlrichs, R.; B¨ar, M.; H¨aser, M.; Horn, H.; K¨olmel, C. Electronic structure calculations on workstation computers: The program system Turbomole. Chem. Phys. Lett. 1989, 162, 165–169. [81] TURBOMOLE, a development of University of Karlsruhe and Forschungszentrum Karlsruhe GmbH, 1989-2007, TURBOMOLE GmbH, since 2007; available from http://www.turbomole.com. [82] Werner, H.-J.; Knowles, P. J.; Knizia, G.; Manby, F. R.; Sch¨ utz, M.; Celani, P.; Gy¨orffy, W.; Kats, D.; Korona, T.; Lindh, R.; Mitrushenkov, A.; Rauhut, G.; Shamasundar, K. R.; Adler, T. B.; Amos, R. D.; Bernhardsson, A.; Berning, A.; Cooper, D. L.; Deegan, M. J. O.; Dobbyn, A. J.; Eckert, F.; Goll, E.; Hampel, C.; Hesselmann, A.; Hetzer, G.; Hrenar, T.; Jansen, G.; K¨oppl, C.; Liu, Y.; Lloyd, A. W.; Mata, R. A.; May, A. J.; McNicholas, S. J.; Meyer, W.; Mura, M. E.; Nicklass, A.; O’Neill, D. P.; Palmieri, P.; Peng, D.; Pfl¨ uger, K.; Pitzer, R.; Reiher, M.; Shiozaki, T.; Stoll, H.; Stone, A. J.; Tarroni, R.; Thorsteinsson, T.; Wang, M.; MOLPRO, version 2015.1: A package of ab initio programs., 2015; see http://www.molpro.net. (Earlier versions 2002.6, 2006.1, 2009.1, and 2012.1 have been used as well).

47 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 48 of 67

[83] Werner, H.-J.; Knowles, P. J.; Knizia, G.; Manby, F. R.; Sch¨ utz, M. Molpro: A general-purpose quantum chemistry program package. WIREs Comput. Mol. Sci. 2012, 2, 242–253. [84] Douglas, M.; Kroll, N. M. Quantum electrodynamical corrections to the fine structure of helium. Ann. Phys. (N.Y.) 1974, 82, 89–155. [85] Hess, B. A. Relativistic electronic-structure calculations employing a two-component no-pair formalism with external-field projection operators. Phys. Rev. A 1986, 33, 3742–3748. [86] K´ allay, M.; Rolik, Z.; Csontos, J.; Ladj´anszki, I.; Szegedy, L.; Lado´ oczki, B.; Samu, G.; Petrov, K.; Farkas, M.; Nagy, P.; Mester, D.; H´egely, B.; MRCC, a quantum chemical program suite, see also www.mrcc.hu. [87] Rolik, Z.; Szegedy, L.; Ladj´anszki, I.; Lad´ oczki, B.; K´ allay, M. An efficient linear-scaling CCSD(T) method based on local natural orbitals. J. Chem. Phys. 2013, 139, 094105. [88] Stanton, J.; Gauss, J.; Watts, J.; Szalay, P.; Bartlett, R.; ACES II, Mainz-Austin-Budapest version, with contributions from others and the integral packages MOLECULE (J. Alml¨of and P. R. Taylor), PROPS (P. R. Taylor), and ABACUS (T. Helgaker, H. J. A. Jensen, P. Jørgensen, and J. Olsen). For the current version, see http://www.aces2.de. [89] Stanton, J. F.; Gauss, J.; Cheng, L.; Harding, M. E.; Matthews, D. A.; Szalay, P. G.; CFOUR, Coupled-Cluster techniques for computational chemistry, a quantum-chemical program package.; with contributions from A.A. Auer, R.J. Bartlett, U. Benedikt, C. Berger, D.E. Bernholdt, Y.J. Bomble, O. Christiansen, F. Engel, R. Faber, M. Heckert, O. Heun, M. Hilgenberg, C. Huber, T.-C. Jagau, D. Jonsson, J. Jus´elius, T. Kirsch, K. Klein, W.J. Lauderdale, F. Lipparini, T. Metzroth, L.A. M¨ uck, D.P. O’Neill, D.R. Price, E. Prochnow, C. Puzzarini, K. Ruud, F. Schiffmann, W. Schwalbach, C. Simmons, S. Stopkowicz, A. Tajti, J. V´azquez, F. Wang, J.D. Watts and the integral packages MOLECULE (J. Alml¨of and P.R. Taylor), PROPS (P.R. Taylor), ABACUS (T. Helgaker, H.J. Aa. Jensen, P. Jørgensen, and J. Olsen), and ECP routines by A. V. Mitin and C. van W¨ ullen. For the current version, see http://www.cfour.de.

48 ACS Paragon Plus Environment

Page 49 of 67 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

[90] Gauss, J.; Tajti, A.; K´allay, M.; Stanton, J. F.; Szalay, P. G. Analytic calculation of the diagonal Born-Oppenheimer correction within configuration-interaction and coupled-cluster theory. J. Chem. Phys. 2006, 125, 144111. [91] Peterson, K. A.; Kesharwani, M. K.; Martin, J. M. L. The cc-pV5Z-F12 basis set: Reaching the basis set limit in explicitly correlated calculations. Mol. Phys. 2015, 113, 1551–1558. [92] Adler, T. B.; Knizia, G.; Werner, H.-J. A simple and efficient CCSD(T)-F12 approximation. J. Chem. Phys. 2007, 127, 221106. [93] Knizia, G.; Adler, T. B.; Werner, H.-J. Simplified CCSD(T)-F12 methods: Theory and benchmarks. J. Chem. Phys. 2009, 130, 054104. [94] H¨attig, C.; Tew, D. P.; K¨ohn, A. Communications: Accurate and efficient approximations to explicitly correlated coupled-cluster singles and doubles, CCSD-F12. J. Chem. Phys. 2010, 132, 231102. [95] H¨attig, C. Optimization of auxiliary basis sets for RI-MP2 and RI-CC2 calculations: Core-valence and quintuple-zeta basis sets for H to Ar and QZVPP basis sets for Li to Kr. Phys. Chem. Chem. Phys. 2005, 7, 59–66. [96] Some of the basis sets were obtained from the EMSL basis set library, see Refs. 97, 98. [97] Feller, D. The role of databases in support of computational chemistry calculations. J. Comput. Chem. 1996, 17, 1571–1586. [98] Schuchardt, K. L.; Didier, B. T.; Elsethagen, T.; Sun, L.; Gurumoorthi, V.; Chase, J.; Li, J.; Windus, T. L. Basis set exchange: A community database for computational sciences. J. Chem. Inf. Model. 2007, 47, 1045–1052. [99] Peterson, K. A.; Wilson, A. K.; Woon, D. E.; Dunning, T. H. Benchmark calculations with correlated molecular wave functions. XII. Core correlation effects on the homonuclear diatomic molecules B2 - F2 . Theor. Chem. Acc. 1997, 97, 251–259. [100] Woon, D. E.; Dunning Jr., T. H. Gaussian basis sets for use in correlated molecular calculations. IV. Calculation of static electrical response properties. J. Chem. Phys. 1994, 100, 2975–2988. 49 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 50 of 67

[101] de Jong, W. A.; Harrison, R. J.; Dixon, D. A. Parallel Douglas-Kroll energy and gradients in NWChem: Estimating scalar relativistic effects using Douglas-Kroll contracted basis sets. J. Chem. Phys. 2001, 114, 48–53. [102] Halkier, A.; Helgaker, T.; Jørgensen, P.; Klopper, W.; Olsen, J. Basis-set convergence of the energy in molecular Hartree-Fock calculations. Chem. Phys. Lett. 1999, 302, 437–446. [103] Bakowies, D. Extrapolation of electron correlation energies to finite and complete basis set targets. J. Chem. Phys. 2007, 127, 084105. [104] Kutzelnigg, W.; Morgan III, J. D. Rates of convergence of the partial-wave expansions of atomic correlation energies. J. Chem. Phys. 1992, 96, 4484–4508. [105] Klopper, W.; Bak, K. L.; Jørgensen, P.; Olsen, J.; Helgaker, T. Highly accurate calculations of molecular electronic structure. J. Phys. B 1999, 32, R103–R130. [106] Kesharwani, M. K.; Sylvetsky, N.; K¨ ohn, A.; Tew, D. P.; Martin, J. M. L. Do CCSD and approximate CCSD-F12 variants converge to the same basis set limits? The case of atomization energies. J. Chem. Phys. 2018, 149, 154109. [107] Hill, J. G.; Mazumder, S.; Peterson, K. A. Correlation consistent basis sets for molecular corevalence effects with explicitly correlated wave functions: The atoms B-Ne and Al-Ar. J. Chem. Phys. 2010, 132, 054108. [108] Sylvetsky, N.; Martin, J. M. L. Probing the basis set limit for thermochemical contributions of inner-shell correlation: Balance of core-core and core-valence contributions. Mol. Phys. 2019, 117, 1078–1087. [109] Bakowies, D. Accurate extrapolation of electron correlation energies from small basis sets. J. Chem. Phys. 2007, 127, 164109. [110] Bak, K. L.; Jørgensen, P.; Olsen, J.; Helgaker, T.; Gauss, J. Coupled-cluster singles, doubles and triples (CCSDT) calculations of atomization energies. Chem. Phys. Lett. 2000, 317, 116–122.

50 ACS Paragon Plus Environment

Page 51 of 67 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

[111] Lee, T. J.; Taylor, P. R. A diagnostic for determining the quality of single-reference electron correlation methods. Int. J. Quantum Chem. 1989, pages 199–207. [112] Jiang, W.; DeYonker, N. J.; Wilson, A. K. Multireference character for 3d transition-metalcontaining molecules. J. Chem. Theory Comput. 2012, 8, 460–468. [113] Fogueri, U. R.; Kozuch, S.; Karton, A.; Martin, J. M. L. A simple DFT-based diagnostic for nondynamical correlation. Theor. Chem. Acc. 2013, 132, 1291. [114] Knowles, P. J.; Hampel, C.; Werner, H.-J. Coupled-cluster theory for high-spin, open-shell reference wave functions. J. Chem. Phys. 1993, 99, 5219–5227. [115] Knowles, P. J.; Hampel, C.; Werner, H.-J. Erratum: ”Coupled cluster theory for high spin, open shell reference wave functions” [J. Chem. Phys. 99, 5219 (1993)]. J. Chem. Phys. 2000, 112, 3106–3107. [116] Bauschlicher Jr, C. W. The scalar relativistic contribution to gallium halide bond energies. Theor. Chem. Acc. 1999, 101, 421–425. [117] Bauschlicher, C. W. The scalar relativistic contribution to the atomization energies of CF, CF4 , and SiF4 . J. Phys. Chem. A 2000, 104, 2281–2283. [118] Martin, J. M. L.; Parthiban, S. W1 and W2 theories, and their variants: Thermochemistry in the kJ/mol accuracy range. In Quantum-mechanical prediction of thermochemical data; Cioslowski, J., Ed., pages 31–65. Kluwer, Dordrecht, 2001. [119] Tajti, A.; Szalay, P. G.; Gauss, J. Perturbative treatment of the electron-correlation contribution to the diagonal Born-Oppenheimer correction. J. Chem. Phys. 2007, 127, 014102. [120] Mohallem, J. R. Semiempirical evaluation of post-Hartree-Fock diagonal-Born-Oppenheimer corrections for organic molecules. J. Chem. Phys. 2008, 128, 144113. [121] Ruscic, B.; Bross, D. H.; Active Thermochemical Tables (ATcT, Refs. 42, 43) values based on ver. 1.122d of the Thermochemical Network (2018); available at ATcT.anl.gov.

51 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 52 of 67

[122] Harding, L. B.; Georgievskii, Y.; Klippenstein, S. J. Accurate anharmonic zero-point energies for some combustion-related species from diffusion Monte Carlo. J. Phys. Chem. A 2017, 121, 4334– 4340. [123] Pfeiffer, F.; Rauhut, G.; Feller, D.; Peterson, K. A. Anharmonic zero point vibrational energies: Tipping the scales in accurate thermochemistry calculations? J. Chem. Phys. 2013, 138, 044311. [124] Karton, A.; Ruscic, B.; Martin, J. M. L. Benchmark atomization energy of ethane: Importance of accurate zero-point vibrational energies and diagonal Born-Oppenheimer corrections for a ’simple’ organic molecule. J. Mol. Struct.: THEOCHEM 2007, 811, 345–353. [125] Bloino, J.; Biczysko, M.; Barone, V. General perturbative approach for spectroscopy, thermodynamics, and kinetics: Methodological background and benchmark studies. J. Chem. Theory Comput. 2012, 8, 1015–1036. [126] Karton, A.; Gruzman, D.; Martin, J. M. L. Benchmark thermochemistry of the Cn H2n+2 alkane isomers (n=2-8) and performance of DFT and composite ab initio methods for dispersion-driven isomeric equilibria. J. Phys. Chem. A 2009, 113, 8434–8447. [127] Harding, M. E.; V´ azquez, J.; Gauss, J.; Stanton, J. F.; K´ allay, M. Towards highly accurate ab initio thermochemistry of larger systems: Benzene. J. Chem. Phys. 2011, 135, 044513. [128] Earlier reports suggest that harmonic C-H stretching frequencies, arguably the major contributor to zero-point energies of hydrocarbons, are slightly overestimated at the CBS limit of all-electron CCSD(T) and underestimated if inner-shell correlation effects are discarded (Refs. 124, 129). This may at least partially explain the difference in ZPEs for propane between Ref. 123 (64.02 kcal/mol, CCSD(T)-F12(fc) for harmonic part) and Ref. 126 (64.20 kcal/mol, CCSD(T)(full)), of which we take the average. Hence it is also possible that the ZPEs of allene, propene, and propyne, all based on CCSD(T)(full) (Ref. 126), are very slightly too high, possibly by a few hundredths of a kcal/mol. [129] Martin, J. M. L. Spectroscopic quality ab initio potential curves for CH, NH, OH and HF. A convergence study. Chem. Phys. Lett. 1998, 292, 411–420.

52 ACS Paragon Plus Environment

Page 53 of 67 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

[130] Feller, D.; Dixon, D. A. Predicting the heats of formation of model hydrocarbons up to benzene. J. Phys. Chem. A 2000, 104, 3048–3056. [131] Parthiban, S.; Martin, J. M. L. Fully ab initio atomization energy of benzene via Weizmann-2 theory. J. Chem. Phys. 2001, 115, 2051–2054. [132] The authors of Ref. 31 report all-electron CCSD(T), valence-shell CCSDT-CCSD(T), and CCSDT(Q)-CCSDT contributions and refer to Ref. 127 for scalar relativistic, DBOC, and atomic spin-orbit terms, which we have added up to arrive at EA,e [M ]=1367.89 kcal/mol. [133] Karton, A. How large are post-CCSD(T) contributions to the total atomization energies of mediumsized alkanes? Chem. Phys. Lett. 2016, 645, 118–122.

53 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

Table 1: ATOMIC(hc) corrections and uncertaintiesa

Contributionb

Level

Notationc

Sectiond

3

C-H

3

C-C3

2

C=C2

1

C≡C1

other

Corrections CCSD(T)e

H

CCSD(T){H}

C A,e

[b]

CCSD(T){L} [M ] reac h.o.{H} [b] A,e o.s.a.{H} [a] A,e DBOC{L} [M ] reac

CCSD(T)

L (A*)

C

h.o.

H

C

o.s.a.

H

C

DBOC

L

C

CCSD(T)

H

CCSD(T)

L (A*)

CCSD(T){H} [b] uA,e CCSD(T){L} [M ] ureac h.o.{H} uA,e [b] h.o.{L} [M ] ureac o.s.a.{H} uA,e [a] rel.{L} ureac [M ] DBOC{L} [M ] ureac

4.1

-0.026

-0.047

-0.101

-0.163 CCSD(T){L}

4.2 4.4

-0.0062% of EA,e 0.002

0.037

0.144

4.9 4.11

[M ]

0.271 0.009 per C atom

-0.015

0.015

0.006

-0.038

±0.026

±0.043

±0.062

Uncertainties

54

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47

Page 54 of 67

h.o.

H

h.o.

L

o.s.a.

H

rel.

L

DBOC

L

4.1

±0.014

CCSD(T){L}

4.2

±0.0164% of EA,e

4.4

±0.004

±0.016

±0.020

±0.035

4.6

-

±0.030

±0.030

±0.030

4.9

[M ]

±0.100 (sp), ±0.050 (sp2 )f ±0.005 per C atom

4.10

±0.005

±0.010

±0.013

±0.018

4.11

±0.006

±0.003

±0.010

±0.013

a

Listed values constitute corrections of standard ATOMIC toward ATOMIC(hc). All values are given in kcal/mol unless specified otherwise CCSD(T){L} CCSD(T){L} (C reac [M ] and ureac [M ] for ATOMIC model A*). b Contributions are defined in Sec. 2.2. Note that higher-order (h.o.) and scalar relativistic (rel.) contributions have been renamed in this work for reasons of generality, they have previously been referred to as “CCSDTQCCSD(T)” and MVD (Mass-Velocity-Darwin) terms (Table III of Ref. 54). c Notation of terms as introduced in Sec. 2.2. High-level terms ({H}) cover just the bond ([b]) or atom ([a]) (Ref. 77) specified, low-level terms ({L}) cover the entire molecule ([M ]) and imply summation over values listed for all constituent bonds. d Text section that discusses the energy contribution. e Note that bond increment used for the last CCSD(T){H} [b] that relates to the correlation energy (-0.025, -0.047, -0.099, -0.160 kcal/mol in column of Table 2 include only the portion of C A,e the order listed). The slight deviation originates from the difference between the ∆core HF (X5 ) terms used in ATOMIC and ATOMIC(hc) references core f 2 2 (∆core HF (5) vs ∆HF (6)) Uncertainty increment per sp or sp carbon in conjugated systems, not counting isolated pairs of sp and / or sp carbon atoms.

ACS Paragon Plus Environment

Page 55 of 67 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Table 2: HF and CCSD(T) correlation contributions to atomization energies (kcal/mol)

Molecule

HF contributionsa

CCSD(T) correlation contributionsb ATOMIC

EHF (6)

6,6 δHF

∆core HF (6)

Best estimate

6,5 δCCSD(T)

regular

corrected

CH4

methane

331.59

0.005

0.001

88.78± 0.05

-0.001

88.88

88.78± 0.05

C2 H2

acetylene

299.52

0.007

0.003

105.72± 0.09

-0.002

105.93

105.72± 0.09

C2 H4

ethylene

435.09

0.011

0.003

128.89± 0.10

0.000

129.09

128.89± 0.10

C2 H6

ethane

558.10

0.009

0.002

154.99± 0.11

-0.001

155.19

154.99± 0.11

C3 H4

allene

532.75

0.013

0.004

171.19± 0.14

-0.004

171.51

171.22± 0.14

C3 H4

cyclopropene

507.19

0.014

0.004

174.79± 0.14

-0.002

175.09

174.80± 0.15

C3 H4

propyne

533.74

0.011

0.004

171.61± 0.14

-0.004

171.94

171.63± 0.14

C3 H6

cyclopropane

655.64

0.013

0.003

198.51± 0.16

0.000

198.80

198.51± 0.16

C3 H6

propene

665.30

0.014

0.004

196.18± 0.15

-0.001

196.48

196.18± 0.15

C3 H8

propane

785.43

0.013

0.003

222.49± 0.16

-0.002

222.79

222.49± 0.16

C4 H2

1,3-butadiyne

505.05

0.012

0.007

191.75± 0.17

-0.009

192.20

191.78± 0.18

C4 H4

bicyclo[1.1.0]but-1(3)-ene

558.54

0.023

0.005

228.60± 0.19

0.003

228.94

228.56± 0.20

C4 H4

butatriene

629.58

0.016

0.007

215.92± 0.18

-0.007

216.36

215.96± 0.18

C4 H4

cyclobutadiene

598.20

0.024

0.004

222.17± 0.19

-0.001

222.56

222.17± 0.19

C4 H4

ethynylethene

640.10

0.016

0.006

213.76± 0.18

214.19

213.78± 0.18

C4 H4

methylenecyclopropene

616.73

0.022

0.006

213.52± 0.19

-0.004

213.94

213.55± 0.19

-0.002

C4 H4

tetrahedrane

570.78

0.021

0.005

223.19± 0.20

223.61

223.23± 0.21

C4 H6

1,2-butadiene

761.92

0.017

0.006

238.49± 0.19

238.91

238.52± 0.19

C4 H6

1,3-butadiene

775.32

0.019

0.006

237.80± 0.19

238.21

237.81± 0.19

C4 H6

2-butyne

766.64

0.015

0.006

237.83± 0.19

238.27

237.86± 0.20

C4 H6

bicyclo[1.1.0]butane

741.93

0.018

0.004

244.83± 0.20

245.21

244.83± 0.21

C4 H6

cyclobutene

759.74

0.017

0.004

242.13± 0.20

242.53

242.14± 0.20

C4 H6

methylenecyclopropane

753.11

0.018

0.005

240.43± 0.20

240.84

240.45± 0.20

C4 H8

cyclobutane

883.32

0.015

0.004

267.39± 0.21

267.78

267.39± 0.21

C4 H8

isobutene

895.29

0.017

0.005

264.43± 0.20

264.84

264.45± 0.20

C5 H4

1,4-pentadiyne

731.57

0.017

0.008

256.07± 0.23

256.63

256.11± 0.23

C5 H4

pentatetraene

728.43

0.019

0.009

260.16± 0.22

260.73

260.23± 0.22

C5 H4

spiropentadiene

679.05

0.024

0.007

262.66± 0.23

263.16

262.68± 0.24

C6 H2

1,3,5-hexatriyne

710.40

0.017

0.011

279.11± 0.26

279.76

279.14± 0.27

C6 H4

butalene

780.95

0.029

0.008

320.45± 0.28

321.07

320.49± 0.28

C6 H4

hexapentaene

826.73

0.021

0.011

305.33± 0.26

306.02

305.43± 0.27

C6 H4

triafulvalene

781.30

0.033

0.009

303.32± 0.28

303.95

303.37± 0.28

C6 H6

benzene

1045.18

0.027

0.009

324.39± 0.29

324.99

324.40± 0.29

55 ACS Paragon Plus Environment

-0.001

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

a 6,6 δHF

Page 56 of 67

= EHF (6) − EHF (6) reports the difference between HF atomization energies evaluated with cc-pV(56)Z

extrapolated and standard cc-pV6Z basis sets, ∆core HF (6) the difference between HF atomization energies evaluated with cc-pCV(56)Z extrapolated and cc-pV(56)Z extrapolated basis sets.

b

The best estimate correlation

energy contribution refers to a [- ¯ 6¯ 6¯ 6|- ¯ 6¯ 6¯ 5] evaluation, using β = 3.4 for CBS extrapolation; the reported uncertainty covers the interval between this value and one evaluated with β − ∆β = 3.15 as exponent for 6,5 CBS extrapolation. δCCSD(T) is defined as the difference between [- ¯ 6¯ 6¯ 6|- ¯ 6¯ 6¯ 6] and [- ¯ 6¯ 6¯ 6|- ¯ 6¯ 6¯ 5], using ex-

trapolation exponent β = 3.4. The columns labeled ATOMIC report standard ATOMIC reference results ([CCSD(T){H} CCSD(T){H} ¯ 6¯ 6¯ 5|- ¯ 5¯ 5¯ 5], β = 3.15), without corrections or with C A,e [b] and uA,e [b] applied as defined in

footnote e to Table 1.

56 ACS Paragon Plus Environment

Page 57 of 67 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Table 3: Optimized extrapolation exponents for correlation contributions to the atomization energya

Correlation Methodb

CBS Referencec

CCSD(fc, VXZ)

V5Z-F12rev2, 1.4

(34)→5

(34)→6

(34)→∞

(45)→6

(45)→∞

(56)→∞

3.53

3.46

3.40

3.15

3.23

3.29

CCSD(fc, CVXZ)

V5Z-F12rev2, 1.4

3.43

3.42

3.39

3.36

3.32

3.29

CCSD(fc, CVXZ)

V5Z-F12rev2, 1.2

3.43

3.42

3.40

3.36

3.35

3.35

CCSD(fc, CVXZ)

V5Z-F12, 1.4

3.43

3.42

3.40

3.36

3.36

3.36

CCSD(fc, CVXZ)

V5Z-F12, 1.2

3.43

3.42

3.42

3.36

3.41

3.46

CCSD(full, CVXZ)

-

3.41

3.41

3.39

CCSD(T)(full, CVXZ)

-

3.42

3.42

3.43

a

Least squares fits have been performed using the 16 molecules shown in Tables S2-S8.b Basis sets (VXZ: cc-pVXZ; CVXZ: cc-pCVXZ) are defined by their cardinal number, here, as in the text given in numerical form: X =3· · ·6. c CCSD-F12b is used as complete basis set reference for the valence-shell CCSD contribution with basis sets and geminal exponents as shown.

57 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

Table 4: Higher-order electron correlation contributions to bond terms: convergence to the complete basis set limit (in kcal/mol)a

2

3’

3

4’

4

5’

5

3′

3

4′

4

5′

5

[(Q)-(T)]

0.032

0.003

0.007

0.005

0.002

0.002

0.000

-0.009

-0.004

0.004

-0.001

0.002

-0.003

[(P)-(Q)]

0.000

-0.001

-0.001

-0.001

-0.001

Bond

Term

3

C-H

core

[(Q)-(T)] 3

58

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47

Page 58 of 67

2

C-C3

C=C2

0.029

-0.058

[(P)-(Q)]

-0.005

-0.008

[(Q)-(T)]core

-0.005

-0.060

0.002 -0.076

0.001

-0.076

-0.094

0.241

0.076

[(P)-(Q)]

-0.006

-0.014

-0.008

-0.098

0.004 -0.087

-0.088

0.025

0.023

0.017

0.098

0.096

0.078

-0.010 0.000

[(Q)-(T)] [(Q)-(T)]

C≡C1

0.000

[(Q)-(T)]

core

1

-0.001

0.072

0.002 0.045

0.044

0.031

0.007

0.001

-0.018 0.014

[(Q)-(T)]

0.421

0.162

0.160

[(P)-(Q)]

-0.002

-0.017

-0.018

[(Q)-(T)]core

-0.013

0.020

0.023 0.124

0.123

0.101

0.100

0.036

a

0.052

0.049

-0.024

-0.026 0.034

0.076

0.048

Shown are valence-shell CCSDT(Q)-CCSD(T) ([(Q)-(T)]) and CCSDTQ(P)-CCSDT(Q) ([(P)-(Q)]) contributions and inner-shell core CCSDT(Q)(full)-CCSD(T)(full)-CCSDT(Q)+CCSD(T) ([(Q)-(T)] ) contributions to atomization energies. See text for basis set designations.

ACS Paragon Plus Environment

Page 59 of 67 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Table 5: Higher-order electron correlation contributions to bond terms: complete basis set estimates (in kcal/mol)a

Bond

a

[(Q)-(T)]

[(P)-(Q)]

[(Q)-(T)]core

higher exc.b

sum

3

C-H

-0.003±0.003

-0.001±0.000

0.004±0.003

0.00±0.00

0.000±0.004

3

C-C3

-0.088±0.012

-0.010±0.002

0.002±0.002

0.00±0.01

-0.096±0.016

2

C=C2

0.017±0.014

-0.018±0.004

0.023±0.009

0.00±0.01

0.022±0.020

1

1

0.076±0.024

-0.026±0.008

0.048±0.014

0.00±0.02

0.098±0.035

C≡C

See text for details of evaluation.

b

core

Uncertainty estimated for neglect of effects beyond [(P)-(Q)] and [(Q)-(T)]

59 ACS Paragon Plus Environment

.

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 60 of 67

Table 6: Enthalpies of formation ∆Hf0 [M, 0] (kcal/mol): Comparison to ATcTa

Molecule M [6 - - - | 6 - - - ] [- 6 6 6 | - 6 6 5 ] CCSD(T) higher-order scalar relativistic DBOC open-shell (atoms) spin-orbit EA,e [M ] geometry correction ZPE [M ]

Patoms

∆Hf0 (k, 0)

methane

ethane

ethylene

acetylene

331.59± 0.00

558.10± 0.00

435.09± 0.01

299.52± 0.00

88.78± 0.05

154.99± 0.11

128.89± 0.10

105.72± 0.09

420.38± 0.05

713.09± 0.11

563.98± 0.10

405.24± 0.09

0.00± 0.02

-0.10± 0.04

0.02± 0.04

0.10± 0.04

-0.19± 0.00

-0.39± 0.00

-0.33± 0.00

-0.28± 0.00

0.06± 0.01

0.08± 0.01

0.06± 0.01

0.08± 0.01

0.01± 0.01

0.02± 0.01

0.02± 0.01

0.02± 0.01

-0.09± 0.00

-0.18± 0.00

-0.18± 0.00

-0.18± 0.00

420.17± 0.06

712.52± 0.11

563.57± 0.10

404.98± 0.10

0.00± 0.01

0.00± 0.01

0.00± 0.01

-0.09± 0.01

27.72± 0.02

46.34± 0.06

31.53± 0.07

16.42± 0.03

376.56± 0.01

649.86± 0.02

546.59± 0.02

443.33± 0.02

∆Hf0 [M, 0]

best est.

-15.88± 0.06

-16.33± 0.13

14.55± 0.13

54.68± 0.11

∆Hf0

ATcT

-15.91± 0.01

-16.33± 0.03

14.56± 0.03

54.69± 0.03

allene

propane

propene

propyne

benzene

[6 - - - | 6 - - - ]

532.75± 0.01

785.44± 0.01

665.30± 0.01

533.74± 0.00

1045.19± 0.01

[- 6 6 6 | - 6 6 5 ]

171.19± 0.14

222.49± 0.16

196.18± 0.15

171.61± 0.14

324.39± 0.29

CCSD(T)

703.94± 0.14

1007.93± 0.16

861.48± 0.15

705.35± 0.14

1369.59± 0.29

kǫM

[M, 0]

Molecule M

higher-order

prototypes

0.04± 0.06

-0.19± 0.06

-0.07± 0.06

0.00± 0.07

-0.22± 0.13

higher-order

BSE

0.04± 0.06

0.00± 0.06

0.01± 0.06

0.01± 0.06

-0.20± 0.18

-0.47± 0.00

-0.58± 0.01

-0.52± 0.00

-0.48± 0.00

-1.00± 0.01

DBOC

0.09± 0.01

0.11± 0.01

0.10± 0.01

0.11± 0.01

0.14± 0.01

open-shell (atoms)

0.03± 0.01

0.03± 0.01

0.03± 0.01

0.03± 0.01

0.05± 0.03

scalar relativistic

spin-orbit EA,e [M ] geometry correction

-0.26± 0.00

-0.26± 0.00

-0.26± 0.00

-0.26± 0.00

-0.53± 0.00

703.40± 0.16

1007.03± 0.18

860.75± 0.17

704.74± 0.17

1367.84± 0.37

0.00± 0.01

0.00± 0.01

0.00± 0.01

-0.11± 0.01

-0.02± 0.01

ZPE [M ]

34.08± 0.10

64.11± 0.09

49.36± 0.10

34.52± 0.10

62.14± 0.10

Patoms

716.62± 0.03

923.15± 0.03

819.89± 0.03

716.62± 0.03

1329.98± 0.07

kǫM

∆Hf0 (k, 0)

∆Hf0 [M, 0]

best est.

47.30± 0.19

-19.76± 0.21

8.49± 0.20

46.29± 0.20

24.26± 0.39

∆Hf0

ATcT

47.17± 0.06

-19.78± 0.05

8.36± 0.05

46.09± 0.06

24.04± 0.05

[M, 0]

a

Contributions to the atomization energy EA,e [M ] refer to calculations at the highest levels performed in this study. Uncertainties are reported to be “0.00” if they are below 0.005 kcal/mol or if they are neglected as is the case for spin-orbit terms. The final best estimate of ∆Hf0 [M, 0] is compared to the ATcT value taken from Ref. 121. Details are discussed in the text (Sec. 5), which also lists references for ZPEs. Note that individual contributions have been obtained, stored, and summed up to higher precision than reported here, hence summation of reported values may lead to results differing in the last decimal place.

60 ACS Paragon Plus Environment

Page 61 of 67 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

61 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

Model A* Deviation of model from reference (kcal/mol)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 62 of 67

0.2

0.1

0

CC bond types involved

-0.1

single double single, double

-0.2

single, double, triple single, triple

0

1000

500

1500

CCSD(T) atomization energy (kcal/mol)

Figure 1: Performance of ATOMIC(hc)/A* tested for a set of 29 hydrocarbons. Deviations between model atP CCSD(T){A∗} CCSD(T){H} [M ] + b∈M C A,e omization energies (EA,e [b])) and ATOMIC(hc) reference values CCSD(T){H} CCSD(T){H} ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ (EA,e [M ] + C A,e [M ], i.e. [6666|6665] with extrapolation exponent β=3.4) are plotted as function of the latter. Colors indicate types of CC bonds contained in a molecule, e.g. all cumulenes are shown in blue (CC double bonds only). Broken lines indicate the mean signed error plus and minus three times the standard deviation. Individual results are listed in Table S9.

62 ACS Paragon Plus Environment

Page 63 of 67

2 0.05 [(Q)-(T)] contribution to BSE (kcal/mol)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

1.5

0

1

-0.05

0.5

0

-0.5

0

20

40 Molecule index

60

80

CC bond types involved single only isolated double isolated triple isolated double and triple

conjugated double, simple conjugated double, benzenoid conjugated double, antiaromatic

conjugated double, other conjugated double and triple conjugated triple cumulenic double

Figure 2: [(Q)-(T)] contribution to bond separation energy using (2,3tr ) extrapolation to the complete basis set, see Eq. (15). Individual data for all 83 hydrocarbons are listed in Tables S13 and S14. The molecule index (cf. Table S14) sorts molecules for bond separation energy, and color coding classifies molecules according to structural features. Most classifications should be self-explanatory, see footnote b to Table S14 for “conjugated double, simple” and “conjugated double, other”. Black lines indicate error bars of ATOMIC(hc) estimated for the neglect of [(Q)-(T)] contributions to bond separation energies (listed in column “Unc.” of Table S14 and evaluated from data in Table 1). ∆T1 (4) diagnostics exceeding values of 0.003, 0.004, 0.005, and 0.006 (see Eq. (16) and Table S15) are indicated below colored bars by open circles, open squares, solid circles, and solid squares, respectively. The inset shows a magnified view for molecules with index running from 3 to 55.

63 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

Relativistic corrections to atomization energies (kcal/mol)

0

ATOMIC(hc) bond increments

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 64 of 67

-0.5

-1

-1.5

-1

-1.5

-0.5

0

CCSD(T)/cc-pVTZ-DK

Figure 3: Relativistic corrections to atomization energies: Scatter plot comparing ATOMIC(hc) estimates to second-order DKH calculations at the CCSD(T)/cc-pVTZ-DK level (all 87 molecules). ATOMIC(hc) uses BSR bond terms as derived in the original paper from MVD calculations at the CCSD(T)(full)/augcc-pCVTZ level (-0.049, -0.101, -0.133, and -0.180 kcal/mol for 3 C-H, 3 C-C3 , 2 C=C2 , and 1 C≡C1 bond terms, respectively, see Table III of Ref. 54). The two solid lines indicate the 10% uncertainty interval. Individual data are listed in Table S18. See text for details.

64 ACS Paragon Plus Environment

Page 65 of 67

DBOC contributions to atomization energies (kcal/mol)

0.4

0.3 Model

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

0.2

0.1

0 0

0.1

0.2

0.3

0.4

CCSD(full)/cc-pCVDZ

Figure 4: DBOC contributions to the atomization energy: Scatter plots comparing various models to the reference CCSD(full)-ccpCVDZ (all 87 molecules): HF/cc-pVDZ (blue circles), BSR model derived from HF/ccpVDZ (green circles, used in ATOMIC), BSR model derived from CCSD(full)/cc-pCVDZ (red circles), and empirical model derived from CCSD(full)/cc-pCVDZ (black circles, used in ATOMIC(hc)). Ideal correlation is indicated by a black line.

65 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

Atomization energies: Errors and uncertainties

Energy relative to best estimate (kcal/mol)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 66 of 67

1

Best estimate (and uncertainty) ATOMIC/A* ATOMIC(hc)/A* (and uncertainty)

0.5

0

-0.5

-1

0.005

∆T1(2sd)

0 0

10

20

30

Molecule index

Figure 5: Bottom-of-the-well atomization energies for RI-MP2/cc-pVTZ geometries. ATOMIC/A* (red) and ATOMIC (hc)/A* values (blue with cyan error bars) are shown relative to best estimates as defined in the text and detailed in Table 6. Solid black lines indicate error bars for best estimates. The lower panel shows excess T1 diagnostics ∆T1 (2sd ) as defined in Sec. 4.7 and listed in Table S15; the broken line indicates the threshold of ∆T1 (2sd )=0.005 above which ATOMIC(hc)/A* error estimates are expected to be unreliable. See Table S21 for identification of molecules and for raw data.

66 ACS Paragon Plus Environment

Page 67 of 67 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

67 ACS Paragon Plus Environment