Towards Accurate Conformational Energies of ... - ACS Publications

calculated data, we have not found any cheaper variant for the treatment of conformational energies, since ..... MPCONF196 set we use the domain based...
0 downloads 7 Views 1MB Size
Subscriber access provided by UNIV OF NEW ENGLAND ARMIDALE

Article

Towards Accurate Conformational Energies of Smaller Peptides and Medium-Sized Macrocycles: MPCONF196 Benchmark Energy Data Set Jan #ezá#, Daniel Bím, Ondrej Gutten, and Lubomír Rulíšek J. Chem. Theory Comput., Just Accepted Manuscript • DOI: 10.1021/acs.jctc.7b01074 • Publication Date (Web): 20 Feb 2018 Downloaded from http://pubs.acs.org on February 22, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Chemical Theory and Computation is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Towards Accurate Conformational Energies of Smaller Peptides and Medium-Sized Macrocycles: MPCONF196 Benchmark Energy Data Set Jan Řezáč,* Daniel Bím, Ondrej Gutten, Lubomír Rulíšek*

The Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Gilead Sciences Research Center & IOCB, Flemingovo náměstí 2, 166 10 Praha 6, Czech Republic

ABSTRACT. Carefully selected set of acyclic and cyclic model peptides and several other macrocycles comprising 13 compounds in total, has been used to calibrate accuracy of DFT(-D3) method for conformational energies, employing BP86, PBE0, PBE, B3LYP, BLYP, TPSS, TPSSh, M06-2X, B97-D, OLYP, revPBE, M06-L, SCAN, revTPSS, BH-LYP, and ωB97X-D3 functionals. Both high- and low-energy conformers, 15-16 for each compound adding to 196 in total, denoted as MPCONF196 data set, were included and the reference values were obtained by the composite protocol yielding the CCSD(T)/CBS extrapolated energies or their DLPNOCCSD(T)/CBS equivalents in the case of larger systems. The latter was shown to be in nearquantitative (~0.10-0.15 kcal.mol-1) agreement with the canonical CCSD(T) provided the

ACS Paragon Plus Environment

1

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 40

TightPNO setting is used and therefore, can be used as the reference for larger systems (likely up to 150-200 atoms) for the problem studied here. At the same time, it was found that many D3corrected DFT functionals provide results of ~1 kcal.mol-1 accuracy which we consider as quite encouraging. This result implies that DFT-D3 methods can be, for example, safely used in efficient conformational sampling algorithms. Specifically, DFT-D3/DZVP-DFT level of calculation seems to be the best trade-off between computational cost and accuracy. Based on the calculated data, we have not found any cheaper variant for the treatment of conformational energies, since the semiempirical methods (including DFTB) provide results of inferior accuracy (errors of 3-5 kcal.mol-1).

ACS Paragon Plus Environment

2

Page 3 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

1. Introduction To fully understand the structural determinants of complex biomolecules, such as DNA, proteins or some of their medium-sized interaction partners, such as macrocycle-containing inhibitors,1,2,3,4 a reliable and accurate computational protocol for exhaustive mapping of their conformational space is needed. The ideal conformational sampling protocol comprises two key components: sampling algorithm itself and accurate energies computed for every new conformation found by the algorithm. Since most, if not all, biomolecules operate in liquid phase, the free energy value corresponding to a particular conformer can be conveniently expressed as: GS = Eel + ∆Gsolv + EZPVE - RTln(qtransqrotqvib) + pV

(1),

where Eel is the electrostatic potential energy of the molecule in vacuo (gas-phase molecular energy), ∆Gsolv is the solvation energy, EZPVE is the zero-point vibrational energy whereas RTln(qtransqrotqvib) are the entropic terms obtained from the rigid-rotor/harmonic oscillator (RRHO) or RRFRHO (FR, free rotor applied for low-lying vibrational modes) approximations. The pV (= RT) term is then a constant for each conformer. It has been shown in many applications that the accuracy in computed GS is primarily determined by the Eel and ∆Gsolv, while the contribution originating in the gas-phase thermodynamics (last three terms in Eq. 1) is not negligible, but usually not dominant either in the overall GS. This is especially true in processes where the number of species remains constant. In a recent study,5 we presented a comprehensive survey of the performance of several forcefield based conformational sampling methods on an extensive sample of more than 10 000

ACS Paragon Plus Environment

3

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 40

conformers of 8 medium-sized systems (~100 atoms), mostly macrocycles. Ranking the obtained local minima by the GS values (calculated by the composite QM/COSMO-RS protocol according to the Eq. 1 above) we could, for the first time, present a rigorous comparison/benchmarking of the conformational sampling methods. These included plain molecular dynamics, or molecular dynamics with the enhanced sampling along the low-lying vibrational modes (MD/LLMOD). It was argued, however, that the only quantum-chemical alternative to compute Eel for huge sets of conformers of medium-sized system in a realistic time frame (putting aside less accurate semiempirical QM methods or density functional tight binding, DFTB) is the density functional theory (DFT), preferably employing the non-hybrid functionals with the resolution-of-theidentity (RI-J) or density-fitting (DF) approximations. At the same time, it was shown5 that deviations between various respectable DFT functionals for relative conformational energies of medium-sized systems can be no less than few tens of kJ.mol-1 (5-10 kcal.mol-1), significantly greater values than can be inferred from studies of smaller systems. This was especially true for the DFT functionals without the inclusion of the dispersion correction. Therefore, it is highly desirable to calibrate the popular and frequently used DFT functionals on a well-balanced set of molecules representing constituents of proteins or their ligands, such as macrocyclic compounds. We assume that combination of the “best” DFT(-D) functional with the state-of-the-art solvation model (e.g. SMD or COSMO-RS) can ultimately lead to a better than 2-3 kcal.mol-1 accuracy in the computed GS values which can be considered as near-quantitative in many practical applications. While the latter (Gsolv) has been the subject of extensive method development and parametrization efforts in that area, the aim of this work is to critically evaluate the in vacuo component (Eel) of the overall free energy in solution (GS).

ACS Paragon Plus Environment

4

Page 5 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Systematic efforts to benchmark cheaper computational methods on various data sets comprising smaller model systems have been numerous. This has been to some extent enabled by complete basis set (CBS) extrapolation schemes employing the “gold standard” coupled cluster method with singles, doubles and perturbative triples, CCSD(T). For weak intermolecular interactions, this approach gave rise to S22 and S66 databases6,7 which soon became widely used as the solid reference for method development and calibration. Other data sets of interaction energies - developed by us and others – followed soon and were recently reviewed.8 Somewhat less attention was paid to accuracy of various quantum chemical methods for conformational energies. These are governed by the genuine interplay and competition between the intramolecular non-covalent interactions and covalent and steric effects. As a consequence, even seemingly simple compounds - alkanes, amino acids, peptides, sugars –– exhibit a surprisingly rich conformational behavior and numerous conformations may exist in an energetically accessible part of the conformational space.5 These are, in turn, the key determinants of the structure of larger biomolecular blocks or whole biomolecules. In addition to simpler organic molecules,9,10,11 conformer energetics of amino acids had received the most attention.12,13,14,15,16,17 To make the model systems relevant to conformational behavior of proteins, longer peptide chains have to be considered, with a tetrapeptide being the first model that can represent all structural motifs of the protein backbone.18 Similarly, conformers of the respective building blocks are studied as models for understanding the structure of nucleic acids.19,20 Of a special interest for the present work are the studies related to conformations of small tripeptides (FGG, GFA, GGF, WGG) and dipeptides (WG) carried out by Hobza and coworkers. The goal of these studies was to identify the lowest-energy conformers in gas phase and assign

ACS Paragon Plus Environment

5

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 40

the structures to conformer-resolved IR spectra measured in a molecular beam.21,22 Subsequently, this database of conformers was used for calibrating more approximate QM methods23 and some part of it (FGG) has also become a part of the Grimme’s GMTKN database24 under the name PCONF (next to alkane, ACONF,9 cysteine, CYCONF,13 sugar, SCONF,25 conformational sets). However, the accuracy of the PCONF reference data, obtained using state of the art methods available more than a decade ago, may not match contemporary requirements on solid benchmark data (better reference data for this set were published recently as a part of the GMTKN55 database).26 However, we feel that more extensive data set comprising also larger molecular systems and greater span of energies from the global minimum of the particular compound is needed to critically evaluate the general performance of cheaper methods (DFT or even SQM/DFTB). It should be also emphasized that the “CCSD(T)-based benchmark methodology” and related calibration of cheaper, mostly DFT, methods, is conveniently applicable only to small systems (< 50 atoms) and the transferability of the results into medium and larger systems may not be a trivial problem. It may require an intermediate step – such as calibration of cheaper correlated quantum chemical methods on small systems – and its subsequent usage as the reference on the medium sized systems. In an area of non-covalent complexes, the problem was solved by using data sets of medium-sized complexes (100-150 atoms) for which interaction energies can still be computed by highly-correlated QM methods in smaller basis sets and approximately extrapolated to CBS (L7 data set),27 or derived from experimental data (S12L and S30L data sets).28,29 In the present work, we explore this “small to medium-size” transition in an area of conformational energetics on a reasonably large set of several hundreds of conformers of 13 different compounds – representing constituents of proteins or their ligands (Figure 1). This is

ACS Paragon Plus Environment

6

Page 7 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

achieved by adding seven medium-sized macrocyclic systems and one modified peptidic compound to our original benchmark data set of oligopeptides (four tripeptides and one dipeptide). The data set is further denoted as MPCONF196. Especially larger macrocyclic compounds are considered to be excellent test systems since their quantum chemical description must include well balanced description of both the covalent part (bonding terms in the language of molecular mechanics) and non-covalent – through-space – interactions (non-bonding terms). This imposes fairly stringent requirements on the performance of the particular quantum chemical method. As an “intermediate” benchmark method for medium-sized systems in the MPCONF196 set we use the domain based local pair-natural orbital coupled-cluster [DLPNOCCSD(T)] method.30 Besides a new data base of presumably accurate computational data that can be used for many purposes (such as force field parametrization or the development of the SQM/DFTB methods), the primary outcome of the presented study is the critical assessment of the performance of popular DFT(-D3) functionals and semiempirical QM methods that are currently still representing the only QM alternative for large-scale computations (e.g., exhaustive conformational sampling) of medium to large-sized systems. Ultimately, the combination of efficient sampling algorithms with fast and accurate evaluation of the conformational (free) energies may lead to understanding the fundamental structural determinants encoded in nature’s building blocks (such as peptides, nucleotides, and sugars.

ACS Paragon Plus Environment

7

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

FGG

CAMVES

Cpd_A

GFA

WG

GGF

CHPSAR

POXTRD

COHVAW

SANGLI

Page 8 of 40

WGG

Cpd_B

YIVNOG

Figure 1. Model systems studied in this work.

2. Computational Details Peptide Conformer Data Sets. The sets of peptide conformers were taken from our previous work; they comprise fifteen (FGG, GGF, WG, WGG) or sixteen (GFA) low-lying conformers.23 The conformers were identified using a multi-step protocol based on MD/quenching followed by refinement using increasingly accurate QM methods.23 The final geometries were obtained using MP2/cc-pVTZ optimization. The original benchmark energies were calculated using a composite CCSD(T)/CBS scheme built from MP2/CBS extrapolated from cc-pVTZ and cc-pVQZ basis sets and CCSD(T) correction. The latter has been calculated using relatively small 6-31G* basis

ACS Paragon Plus Environment

8

Page 9 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

which measured by current standards may not be considered as truly benchmark value (for details of the CBS extrapolations, see below). The geometries, as well as the original benchmark data are available online in the BEGDB database.31 All five peptides were considered in their neutral forms which are expected to be the stable in vacuo structures whereas zwitterionic forms will prevail in the solution. We expect that the original motivation behind this choice – comparison with the experimental gas-phase data21 – does not hinder in any respect the comparison of the performance and accuracy of studied quantum chemical methods for their conformational energies. The difference between the two forms in solution (∆GS) should be, in principle, treated by the solvation model used. Macrocycle Conformer Data Sets. Five macrocyclic compounds of varying ring size were taken from the recent study of Shelley and coworkers32 and can be found in the Cambridge Structural Database (CSD). They are further denoted by their CSD codes: POXTRD, CAMVES, COHVAW, CHPSAR, YIVNOG. Other two macrocycles included in the MPCONF196 data set are inhibitors of human cyclophilin A (to some extent inspired by natural product and immunosuppressant cyclosporine): sanglifehrin A analogue (denoted here as SANGLI which differs from the sanglifehrin A by truncation of the R1; i.e. R1 = H),33 and recently published developmental compound denoted here as Cpd_A (6 in Ref. 34) with KD = 64 nM in TR-FRET Cyp A binding assay. The last compound in MPCONF196 data set is the acyclic synthetic precursor of Cpd_A, denoted as Cpd_B (KD > 1 µM). The conformers’ geometries (15 for each compound)

were

obtained

by

geometry

optimization

of

snapshots

selected

from

SQM(PM6+D3H4)35,36 molecular dynamics (MD) trajectories,5 employing the Becke-Perdew 86 (BP86) exchange-correlation functional,37 Ahlrichs’ def-TZVP basis set,38 the empirical

ACS Paragon Plus Environment

9

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 40

correction to dispersion39 (denoted as D3) with zero damping and conductor-like screening model (COSMO) for implicit solvation (εr = 80). Density Functional Theory Calculations. The single-point energies were calculated using the BP86,37 BLYP,37a,40 PBE,41 B97-D,42 OLYP,

40,43,44

revPBE (GGA),45 M06-L,46 TPSS,47

SCAN,48 revTPSS (meta-GGA),49 B3LYP,37a-b,40,50 PBE0,51 TPSSH,52 M06-2X,53 BHLYP,54 and ωB97X-D3 (hybrid)55 DFT functionals and the def2-QZVP,38 def2-TZVPD,38 and DZVP-DFT56 basis sets. All the DFT calculations were coupled with the D3 dispersion correction39,57 and we report the results obtained with two commonly used damping functions, the zero and Becke-Johnson (BJ) damping.58 We have also tested the updated parametrization of the BJ damping referred to as D3M59 and the optimized power (OP) damping function which adds one more parameter.60 We have performed all the D3 calculations both with and without the 3-body term. The D3 correction was originally parameterized for the def2-QZVP basis set; however, by using the same parameter set also for calculations in the def2-TZVPD basis the error (RMSE in the S66 data set, BLYP functional) increases only by 0.06 kcal.mol-1. In practice, it is desirable to reduce the size of the basis set further. Recently, we have found that the DZVP-DFT basis set yields excellent results when coupled with specific reparametrization of the D3 correction61 and, therefore, we test this seemingly very efficient setup here as well. CCSD(T) Calculations. The extrapolated CCSD(T)/CBS benchmark energies for the peptide data sets and POXTRD macrocycle have been calculated using the same composite scheme as the original one,23 updating the size of the basis set sizes to current standards. The final energy is

ACS Paragon Plus Environment

10

Page 11 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

built from the Hartree-Fock energy EHF/aug-cc-pVQZ, MP2 correlation energy extrapolated to the CBS limit EMP2/CBS and the ∆CCSD(T) correction: ECCSD(T)/CBS = EHF/aug-cc-pVQZ + EMP2/CBS + ∆CCSD(T)

(2)

where the correction ∆CCSD(T) = ECCSD(T) - EMP2

(3)

is calculated in smaller basis set, in this case aug-cc-pVDZ. The MP2/CBS term is obtained using the Helgaker extrapolation formula62 from correlation energies calculated in aug-cc-pVTZ and aug-cc-pVQZ basis sets: / =

/  / 

(4)

The Hartree-Fock energy is not extrapolated but taken from the calculation in the larger basis set used for the extrapolation (in our case aug-cc-pVQZ). This composite scheme was shown to yield interaction energies within 2% from true CCSD(T)/CBS limit63 in the A24 data set.64 DLPNO-CCSD(T) Calculations. In order to obtain accurate reference energies for larger systems in the data set (60-120 atoms), DLPNO-CCSD(T) calculations30 were employed instead of canonical CCSD(T) since the latter is unfeasible for systems of this size. Using local pair natural orbitals, the computational costs are greatly diminished but the accuracy becomes dependent on the cutoff values. The default "NormalPNO" and “TightPNO” setups (as defined in Ref. 65) were tested. However, it has been shown previously that the TightPNO thresholds are necessary for accurate description of non-covalent interactions65,66 as well as for the atomization energies.67 All DLPNO-CCSD(T) calculations are performed using aug-cc-pVDZ basis set. When DLPNOCCSD(T) method is used to calculate a ∆CCSD(T) term in a composite CCSD(T)/CBS scheme,

ACS Paragon Plus Environment

11

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 40

a question arises which MP2 energy should be used in Eq. 3. A localized MP2 is calculated as a part of the DLPNO-CCSD(T) calculation, but using this energy leads to inaccurate value of the ∆CCSD(T) correction.68 Better results are achieved if canonical MP2 energy is used, but this requires an additional calculation. In this work, we use the latter approach, but we have briefly tested both. It turned out that using the local MP2 increases the error of the final conformation energies with respect to canonical CCSD(T) by the factor of two. Approaching complete basis set limit in larger systems. In the larger systems studied, also the size of basis set applicable to the MP2 calculations is limited to triple-ζ, which makes it impossible to use reliable extrapolation to the CBS limit. More accurate estimate of the CBS can be obtained using explicitly correlated MP2-F12 method69,70,71 which is computationally more demanding than plain MP2 but converges faster with basis set size. For the medium-sized complexes in the set, MP2-F12 calculations in cc-pVTZ-F12 basis set72 were possible and for the largest systems, we were limited to the cc-pVDZ-F12 basis. The benchmark CCSD(T)/CBS energy is then estimated as a sum of MP2-F12 energy and the ∆CCSD(T) correction. Other correlated methods tested. Besides various CCSD(T) setups, we have tested also some more economical correlated methods that could be potentially used as a benchmark for DFT. The performance of MP2 can be improved by scaling its spin components; here we have tested the original spin-component-scaled method SCS-MP273 as well as its variant parametrized for noncovalent interactions, SCS-MI-MP2.74 In both cases, the results were extrapolated to the CBS limit using Eq. 4. The next step towards including higher-order correlation is the MP3 perturbation theory. Itself, it does not bring any improvement over MP2, but when the MP3 contribution is scaled in the so

ACS Paragon Plus Environment

12

Page 13 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

called MP2.5 approach,75 the results become significantly better than any other methods of similar complexity.76 Here, we estimate MP3/CBS and MP2.5/CBS energies from the MP2/CBS energy and a higher-order correction (∆MP3, ∆MP2.5) constructed analogously to the ∆CCSD(T) correction (Eq. 3) and calculated in the aug-cc-pVDZ basis set. Semiempirical QM methods. We have also tested multiple efficient semiempirical or empirically corrected computational methods. Out of classical MNDO-based SQM methods, we tested PM6 and RM1 with D3H4 corrections for dispersion and hydrogen bonding (PM6-D3H4, RM1D3H4),36 PM777 and OM2 and OM3 methods which include additional orthogonalization correction,78 again with empirical dispersion correction (OM2-D3, OM3-D3).79 Another class of tested methods is based on the tight binding approach. The third-order selfconsistent-charges density-functional tight binding80 (3rd-order SCC-DFTB, abbreviated as DFTB3) is less empirical as it uses integrals precalculated at DFT level. Here, we couple this method with two versions of corrections for non-covalent interactions, D3H481 and D3H5.82 We include also an older parametrization of second-order DFTB83 with a simpler dispersion correction84 (DFTB-D) which was found to yield very good results for small peptides.23 The GFN-xTB method85 is also a self-consistent-charges tight binding scheme but it uses more approximate Hamiltonian relying on empirical parameters; it includes a dispersion correction and it was parameterized specifically for geometries, frequencies and non-covalent interactions (hence the GFN in the name). Finally, we have also tested the HF-3c method86 which is a full HF calculation in a minimal basis set with a dispersion correction and two additional empirical corrections for the artifacts caused by limited size of the basis set, and two analogous DFT-based methods PBEh-3c87 and B97-3c88 which use double- and triple-zeta basis sets, respectively.

ACS Paragon Plus Environment

13

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 40

Software used. All MP2, MP2-F12, MP3, CCSD(T) and density functional theory calculations reported in this work were carried out using the Turbomole 7.0 program.89 Cuby framework90 was used to automate the calculations of the data sets; the peptide data sets are already distributed with the Cuby package and the complete MPCONF196 data set will be released along with this publication. DLPNO-CCSD(T) and HF-3c calculations were carried out by ORCA v. 4.0 program.91 PM6, PM7 and RM1 calculations were performed using MOPAC2016.92 The OM2 and OM3 calculations were carried out in the MNDO program93 provided by the authors of the method. The GFN-xTB implementation was also provided by the authors.85 Statistical processing of the results. The conformation energies are relative quantities that have to be defined with respect to some reference state. An obvious choice is to express the conformer energies relative to lowest-lying conformer. This will, however, distort the overall statistics if the energy of the reference conformer was calculated with larger error than the remainder of the set. We thus prefer to express all the conformer energies relative to the average conformer energy in the set. These relative conformer energies (denoted simply as Econf) can be then easily compared between the tested methods. To evaluate the accuracy of the tested methods, we use the root mean square error (RMSE) as a primary error measure, whereas in some cases, we also report the mean unsigned error (MUE). It turned out to be advantageous to put the studied errors into the context of the magnitude of the studied variable. The average magnitude of the conformational energy Econf (which is already a relative value) is calculated for the N conformers of each compound as 

〈〉 = ∑# "#$%&' ".

(5)

ACS Paragon Plus Environment

14

Page 15 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

3. Results and Discussion 3.1. CCSD(T)/CBS conformational energies for the small systems (oligopeptides and POXTRD macrocycle). The five linear peptides (FGG, GFA, GGF, WG, WGG) were taken from previous studies in which the benchmark conformational energies had been calculated using composite CCSD(T)/CBS scheme based on MP2 extrapolated from cc-pVTZ and ccpVQZ basis sets whereas the CCSD(T) correction was calculated using fairly small 6-31G* basis23 - state-of-the-art setup available a decade ago. Employing larger basis sets, aug-cc-pVTZ and aug-cc-pVQZ in the MP2/CBS extrapolation, and aug-cc-pVDZ in the CCSD(T) correction, values of presumably “benchmark quality” were obtained in this study, shown previously to yield accuracy of ~2% in non-covalent interactions in the A24 data set.63,64 The new MP2/CBS and CCSD(T)/CBS conformation energies, along with their original counterparts, are provided in the Supporting Information in Table S1. The difference between the old23 and new (this work) CCSD(T)/CBS calculations measured as RMSE averaged over the five systems is 0.26 kcal.mol-1 which is considered as non-negligible provided that the average magnitude of the conformational energy (see Eq. 5) in this set is 0.72 kcal.mol-1. When the MP2/CBS and ∆CCSD(T) components are compared separately (new vs original data), they contribute almost equally to this error, which shows that it was necessary to update both terms. There exists another update of the PCONF data set (FGG tripeptide) published recently as a part of the GMTKN55 database.26 There, the benchmark conformation energies were obtained by a direct extrapolation of DLPNO-CCSD(T) correlation energies from aug-cc-pV(T,Q)Z basis sets. Compared to our approach, this scheme avoids the use of smaller basis sets, but it introduces the

ACS Paragon Plus Environment

15

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 40

error due to the DPLNO approximation (which is non-negligible despite the use of tight PNO cutoffs). The accuracy of these benchmarks cannot be compared on the systems studied here, but some insight can be obtained from small model systems where much more accurate CCSD(T)/CBS data are available. Because non-covalent interactions are an important contribution to the conformation energies, and because their description is strongly basis set dependent, we use the A24 data set64 as a model. When counterpoise correction is used, the two schemes perform comparably with errors slightly over 2%. However, for the intramolecular interactions involved in the conformation energies, the counterpoise correction cannot be applied. In that case, MP2/CBS + ∆CCSD(T)/aug-cc-pVDZ scheme yields average relative error of 2.2% while DLPNO-CCSD(T)/CBS yields error of 4.9%. In addition to better accuracy, a benchmark based on full CCSD(T) calculations, albeit in smaller basis set, allows testing the accuracy of approximate CCSD(T) approaches needed for the calculations of the larger systems. Next, the six systems for which the accurate CCSD(T)/CBS benchmark is available are used for validation of cheaper ab initio methods that can be subsequently used as (albeit less accurate) benchmarks in the larger systems (remaining seven compounds comprising 34-116 atoms). To obtain more approximate CCSD(T)/CBS estimates, both MP2/CBS and ∆CCSD(T) terms should be handled more efficiently. Concerning the former, it should be mentioned that the MP2/CBS extrapolation from triple- and quadruple-ζ basis sets (Eq. 4) becomes impractical for systems greater than 80-100 atoms and extrapolation from smaller basis sets is not reliable. Therefore, an alternative approach to obtain MP2/CBS estimates consists in employing the explicitly correlated MP2-F12 method, yielding results presumably closer to the CBS limit even in smaller basis sets, cc-pVDZ-F12 or cc-pVTZ-F12. The ∆CCSD(T) correction is then obtained by using the DLPNO-CCSD(T)/aug-cc-pVDZ method. Two recommended cutoffs for correlation of electron

ACS Paragon Plus Environment

16

Page 17 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

pairs, denoted ‘NormalPNO’ and ‘TightPNO’ in ORCA software, were used. It should be emphasized that the calculations with the ‘TightPNO’ option are approximately one order of magnitude more demanding with respect to the ‘NormalPNO’ option. In the medium-sized systems, it would be possible to calculate the DLPNO-CCSD(T) term in larger basis set such as aug-cc-pVTZ. We have, however, found that the difference between the conformation energies using ∆CCSD(T) correction in aug-cc-pVDZ and aug-cc-pVTZ basis sets for the FGG peptide is only 0.09 kcal.mol-1, what is less than the error due to the DLPNO approximation which is 0.15 kcal.mol-1. For the sake of consistency and simplicity, we thus use the aug-cc-pVDZ basis for the whole data set. Overall, eight combinations of the MP2 and ∆CCSD(T) terms are tested against the CCSD(T)/CBS benchmark. Also, the MP2/CBS value without further corrections is included in the comparison. The full set of data is available in Table S1 (Supporting information) whereas the errors averaged over the six test sets is summarized in Table 1. First, it can be seen that the MP2 as standalone method yields surprisingly large errors (compared to the magnitude of the energies in the set) and cannot be recommended for any calibration purposes. Use of the spin component scaling in SCS-MP2 does bring some improvement but the method is still far from being sufficiently accurate. The SCS-MI-MP2 approach which was optimized for interaction energies in non-covalent complexes apparently does not work well for the problem studied here. Similarly to non-covalent interactions, MP3 is even slightly worse than MP2, but MP2.5 brings in a substantial improvement.

ACS Paragon Plus Environment

17

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 40

Table 1: The errors of MP2, MP3, MP2.5 and the approximate CCSD(T)/CBS estimates relative to the full CCSD(T)/CBS benchmark, averaged over the six small systems (with the exception of MP3 and MP2.5 where only the five peptides were calculated). Errors (RMSE and MUE) are calculated separately in each set and then averaged. All values are in kcal.mol-1.

MP2

higher-order correction

MP2/CBS SCS-MP2/CBS SCS-MI-MP2/CBS MP2/CBS MP2/CBS MP2-F12/cc-pVDZ-F12 MP2-F12/cc-pVTZ-F12 MP2-F12/cc-pVDZ-F12 MP2/CBS MP2-F12/cc-pVTZ-F12 MP2-F12/cc-pVDZ-F12 MP2/CBS MP2-F12/cc-pVTZ-F12 MP2-F12/cc-pVDZ-F12

None None None MP3/ aug-cc-pVDZ MP2.5/aug-cc-pVDZ MP2.5/aug-cc-pVDZ CCSD(T)/aug-cc-pVDZ CCSD(T)/aug-cc-pVDZ DLPNO-CCSD(T)/aug-cc-pVDZ, normal DLPNO-CCSD(T)/aug-cc-pVDZ, normal DLPNO-CCSD(T)/aug-cc-pVDZ, normal DLPNO-CCSD(T)/aug-cc-pVDZ, tight DLPNO-CCSD(T)/aug-cc-pVDZ, tight DLPNO-CCSD(T)/aug-cc-pVDZ, tight

RMSE

MUE

0.41 0.34 0.43 0.50 0.20 0.22 0.04 0.04 0.33 0.35 0.35 0.14 0.15 0.15

0.35 0.29 0.31 0.28 0.14 0.15 0.03 0.03 0.26 0.29 0.28 0.12 0.13 0.13

Second, if we replace the extrapolated MP2/CBS term in the benchmark composite scheme with explicitly correlated MP2-F12, there results change only marginally (0.04 kcal.mol-1). Interestingly, the use of both cc-pVDZ-F12 and cc-pVTZ-F12 basis sets leads to very similar results with the same overall error (RMSE) of only 0.04 kcal.mol-1. This confirms our previous conclusions made for non-covalent complexes where the MP2-F12 can be used instead of the MP2/CBS extrapolation in the composite CCSD(T)/CBS calculations with almost no loss of accuracy8,63 Previous application of MP2-F12 to peptide conformers also confirms that MP2-F12 results calculated already in the cc-pVDZ-F12 basis set are very close to the CBS limit extrapolated from canonical MP2 calculations.18 The small difference between the results

ACS Paragon Plus Environment

18

Page 19 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

obtained with cc-pVDZ-F12 and cc-pVTZ-F12 basis sets (which is due to an error compensation with the ∆CCSD(T) term) makes it possible to use the more economical cc-pVDZ-F12 basis in calculations of the larger systems. Third, the accuracy of the schemes employing the DLPNO-CCSD(T) calculations is determined by the threshold applied in the DLPNO approximation. When the ‘NormalPNO’ cutoff is used, the errors are quite large (RMSE of ~0.35 kcal.mol-1) and only after employing the ‘TightPNO’ cutoff the error decreases to ~0.15 kcal.mol-1. Note that the MP2.5 method stands in between with RMSE of ~0.2 kcal.mol-1. The different treatments of the MP2 term in the composite scheme yield almost identical results. These tests allow us to select the best methods to be used as a benchmark for the larger systems. It is clear that the DLPNO-CCSD(T) must be used with the Tight cutoff in order to preserve the desired accuracy. To approach the complete basis set limit, we use the composite scheme based on MP2-F12/cc-pVDZ-F12 and DLPNO-CCSD(T). Calculating the largest systems in this set is rather demanding but tractable due to efficient parallelization of the DLPNO-CCSD(T) method in ORCA software. 3.2. System size. Based on the size of the systems and the applied benchmark method, the data set is divided into three groups. The group of ‘small’ systems comprise the five peptides for which true composite CCSD(T)/CBS calculations were feasible (vide supra). Five systems belong to the ‘medium’ size group (up to 70 atoms). Among them, full CCSD(T)/CBS benchmark is available only for the POXTRD macrocyle. For the remaining four systems (CAMVES, CHPSAR, COHVAW and Cpd_B), the composite scheme based on DLPNOCCSD(T) with the ‘TightPNO’ cutoff is used. The remaining three systems (SANGLI, YIVNOG

ACS Paragon Plus Environment

19

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 40

and Cpd_A) are substantially larger (92 – 116 atoms) and comprise the group of ‘large’ systems. Here, the benchmark results were also obtained with the composite scheme based on DLPNOCCSD(T) with the ‘TightPNO’ cutoff. 3.3. Performance of DFT and DFT-D3. DFT calculations were carried out for all the thirteen systems (196 conformers) included in the study. The results are, however, analyzed separately in the groups of systems of varying size. There are several choices in the DFT-D3 methodology that should be discussed first. The use of the three-body term in the D3 correction provides a significant improvement of the description of the larger systems (average improvement of RMSE 0.19 kcal.mol-1 in the group large systems) at a cost of a negligible increase of RMSE in the small ones (0.01 kcal.mol-1). We will thus report and discuss only the results obtained with the three-body correction. We have also tested two alternative versions of the damping function in the dispersion correction. The first approach, D3M, is a reparametrization of the Becke-Johnson damping without change in its form.59 In the MPCONF data set, it yields slightly larger error in the small systems (average increase of RMSE is 0.04 kcal.mol-1) and practically the same results as the original version of the BJ damping. The optimized power (OP) damping60 adds one more parameter to the same formalism what makes the damping function more flexible. However, in this application, it yields results slightly worse than BJ damping in all the three groups of systems. Because these methods do not bring any practical improvement, we will list and discuss in only the results obtained with the original version of the BJ and zero damping. It was also shown that using the van der Walls (non-local) DFT functionals (vdW-DFT, DFT-NL) instead of the D3 dispersion correction provides slightly more accurate description of conformation energies.94,95 We do not consider this approach here because in the practical application to conformational analysis of larger systems (which includes

ACS Paragon Plus Environment

20

Page 21 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

also geometry optimizations), the efficiency of the DFT-D3 approach outweighs its slightly lower accuracy. The full set of results is available in Table S1 (Supporting information), whereas the RMSE with respect to the benchmark values are summarized in Table 2 and depicted in Figure 2. It is obvious from these results that the dispersion correction is necessary for obtaining meaningful results, and we will not discuss the uncorrected DFT energies except in special cases. System-size independent trends will be discussed first. It can be seen that there is only a small difference between the results obtained with the def2-QZVP and def2-TZVPD basis sets. The ranking (accuracy) of the DFT functionals in these series is conserved and the calculations in the smaller basis yield virtually identical results. DFT-D3 in the small DZVP-DFT yields excellent results comparable to the much more expensive calculations in larger basis sets. Interestingly, all the tested functionals perform very similarly in the DZVP-DFT basis. This may indicate that the differences among the functionals are caused by the dispersion correction and its transferability from the realm of interaction energies (which it was parametrized for) to the present problem. In the case of the DZVP-DFT basis, the dispersion correction was fitted to larger and more diverse training set, which may lead to its better transferability.

ACS Paragon Plus Environment

21

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 40

Figure 2. Root mean square errors of conformer energies averaged over groups of small, medium-sized and large systems of the MPCONF196 data set evaluated against the best benchmark available for each system. DFT results are plotted uncorrected (gray) and with D3 dispersion correction using the zero (yellow) and BJ (red) damping function.

ACS Paragon Plus Environment

22

Page 23 of 40

Journal of Chemical Theory and Computation

1 2 3 Table 1. Root mean square errors of conformer energies averaged over groups of small, 4 5 medium-sized and large systems of the MPCONF196 data set evaluated against the best 6 benchmark available for each system. Both size-dependent and overall RMSE’s are listed. 7 8 9 10 No dispersion D3(BJ) D3(zero) Functional 11 Basis set Small Medium Large All Small Medium Large All Small Medium Large 12 def2-QZVP B-LYP 1.85 2.67 4.46 3.04 0.60 0.72 1.22 0.85 0.42 0.87 1.64 13 B-P 1.76 2.24 3.58 2.54 0.89 0.85 1.61 1.14 0.77 1.04 2.05 14 B97-D 1.94 3.20 5.15 3.51 0.68 0.86 1.12 0.89 0.53 0.83 1.15 15 OLYP 2.48 4.46 7.41 4.94 0.79 1.07 1.07 0.99 0.61 0.98 1.03 16 PBE 1.46 1.83 2.82 2.03 0.90 0.76 0.96 0.87 0.78 0.70 1.04 17 revPBE 1.87 3.14 5.15 3.47 0.69 0.95 1.02 0.90 0.56 0.89 1.13 18 M06-L 0.37 0.91 1.81 1.10 0.37 0.98 2.08 19 SCAN 0.55 0.73 0.97 0.74 0.41 0.49 1.05 0.66 0.39 0.49 1.10 20 revTPSS 1.28 1.78 3.04 2.06 0.64 0.69 0.89 0.74 0.55 0.65 0.95 21 22 TPSS 1.64 2.15 3.67 2.51 0.87 0.75 0.83 0.83 0.71 0.68 0.92 23 B3-LYP 1.57 2.28 3.70 2.56 0.41 0.46 1.04 0.65 0.27 0.58 1.27 24 BH-LYP 1.30 1.95 3.00 2.12 0.30 0.58 1.01 0.66 0.32 0.71 1.24 25 M06-2X 1.76 0.63 1.35 1.35 1.77 0.68 1.57 26 PBE0 1.23 1.72 2.60 1.86 0.60 0.52 0.84 0.65 0.50 0.50 0.93 27 TPSSH 1.52 2.08 3.50 2.40 0.72 0.63 0.76 0.71 0.58 0.57 0.80 28 wB97X-D3 0.33 0.48 1.31 29 def2-TZVPD B-LYP 1.83 2.66 4.39 3.01 0.59 0.74 1.27 0.88 0.41 0.89 1.69 30 B-P 1.73 2.21 3.48 2.48 0.87 0.88 1.71 1.17 0.76 1.07 2.15 31 B97-D 1.93 3.18 5.09 3.48 0.66 0.88 1.17 0.91 0.51 0.84 1.20 32 OLYP 2.46 4.44 7.33 4.90 0.75 1.10 1.11 1.00 0.58 1.00 1.08 33 PBE 1.43 1.80 2.73 1.98 0.86 0.77 0.99 0.87 0.75 0.71 1.08 34 revPBE 1.85 3.12 5.07 3.42 0.66 0.97 1.08 0.92 0.55 0.92 1.19 35 M06-L 0.34 0.95 1.93 1.16 0.35 1.03 2.21 36 SCAN 0.53 0.72 0.96 0.73 0.40 0.50 1.09 0.68 0.38 0.50 1.14 37 revTPSS 1.25 1.76 2.97 2.01 0.62 0.71 0.94 0.76 0.53 0.68 1.01 38 39 TPSS 1.60 2.14 3.62 2.48 0.82 0.76 0.86 0.82 0.66 0.69 0.95 40 B3-LYP 1.55 2.26 3.63 2.52 0.40 0.48 1.09 0.68 0.26 0.60 1.32 41 BH-LYP 1.28 1.93 2.95 2.09 0.29 0.58 1.04 0.67 0.31 0.72 1.27 42 M06-2X 1.73 0.66 1.39 1.35 1.73 0.71 1.61 43 PBE0 1.20 1.68 2.51 1.81 0.57 0.52 0.88 0.65 0.48 0.51 0.98 44 TPSSH 1.49 2.06 3.46 2.37 0.67 0.64 0.79 0.70 0.54 0.58 0.84 45 0.36 0.65 2.87 ωB97X-D3 46 DZVP-DFT B-LYP 1.54 2.30 4.03 2.68 0.45 0.79 0.97 0.74 0.38 0.98 1.35 47 B-P 1.37 1.88 3.24 2.20 0.62 0.84 1.08 0.84 0.53 0.97 1.35 48 B97-D 1.64 2.79 4.76 3.17 0.46 0.79 0.89 0.74 0.51 0.84 1.05 49 PBE 1.08 1.47 2.45 1.68 0.62 0.78 0.99 0.79 0.51 0.78 1.10 50 B3-LYP 1.29 1.97 3.32 2.25 0.42 0.72 0.92 0.70 0.43 0.88 1.25 51 PBE0 0.87 1.34 2.16 1.50 0.46 0.70 0.96 0.71 0.49 0.71 1.08 52 TPSS 1.23 1.76 3.22 2.11 0.50 0.77 0.79 0.69 0.45 0.88 0.98 53 54 55 56 57 58 59 ACS Paragon Plus Environment 60

23

All 1.02 1.33 0.85 0.89 0.83 0.87 1.25 0.69 0.72 0.77 0.75 0.79 1.42 0.64 0.64 0.74 1.05 1.38 0.86 0.91 0.83 0.90 1.32 0.71 0.75 0.77 0.78 0.81 1.42 0.66 0.65 1.47 0.93 0.96 0.81 0.80 0.87 0.77 0.78

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 40

As can be seen in Figure 2 and Table 2, the results and conclusions differ somewhat between the small, medium-sized and large systems. As expected, the errors against the reference CCSD(T)/CBS values grow with the system size from few tenths of kcal.mol-1 to about 1 kcal.mol-1 for the best dispersion-corrected functionals. Expectedly, this correlates with the span of the conformation energies. The small peptide conformers were selected in relatively narrow energy window of less than 3 kcal.mol-1 (as they come from a study searching for the lowestlying minima), whereas all other systems spanned the range of about 20 kcal.mol-1 (here the conformers were selected randomly from the DFT-D3/def-TZVP/COSMO optimized snapshots of MD/SQM simulations). This does not, however, explain the increase of the errors between medium-sized and large systems. Moreover, the energy range itself does not reflect the magnitude of the energy contributions that add up to the total conformational energy, but only their final balance. The magnitudes of the individual components (energy of intramolecular noncovalent interactions and the strain energy of the molecule – non-bonding vs. covalent terms in the force-field terminology) are both likely to be proportional to the size of the molecules. What is more interesting than the changes of the magnitude of the error is the change of its distribution with system size. In the smallest systems, there are more pronounced (relative) differences between various functionals, while in the largest systems, they significantly diminish. It may be partly due to the lower accuracy of the DLPNO-CCSD(T) benchmark used for the medium-sized and larger systems (assuming that the DLPNO-CCSD(T) and DFT errors were uncorrelated), but the error of this benchmark (discussed above) is probably smaller than the observed effects. It is thus likely that the error of the DFT-D3 calculations becomes not only larger but also more random as we move away from small molecules the DFT-D3 was parametrized on.

ACS Paragon Plus Environment

24

Page 25 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Next, we compare the damping functions used in the DFT-D3 calculations. For all of the smallest systems, better results are obtained with the zero damping. This is somewhat surprising as it was shown that the Becke-Johnson damping yield better interaction energies on noncovalent complexes.96 The situation changes for the larger systems. Already in the medium-sized group the BJ damping performs better in most cases, and for the large systems it is unambiguously the better option. When we move to comparing individual dispersion-corrected DFT functionals in the larger basis sets, the results again differ in systems of different size (unless specifically noted, we will discuss only the calculations in the def2-QZVP basis, the difference between this basis an def2TZVP is usually negligible compared to the differences between the functionals). For the small peptides, B3LYP-D3 yields the best results in both def2-QZVP and def2-TZVPD basis sets. The lowest error of 0.26 kcal.mol-1 is achieved by employing the B3LYP-D3(zero)/def2-QZVP level of calculations. The BHLYP-D3 functionals scores second with RMSE of 0.30 kcal.mol-1. Among non-hybrid functionals (which are computationally more efficient), the best results in this basis set are obtained with meta-GGA functionals M06L-D3 and SCAN-D3 (RMSE 0.37 and 0.39 kcal.mol-1). The first pure GGA functional is BLYP-D3, with error of 0.42 kcal.mol-1. For the medium-sized molecules, B3LYP-D3/def2-QZVP yields the smallest error again (RMSE 0.46 kcal.mol-1) followed by SCAN-D3 (0.49 kcal.mol-1) and PBE0-D3 (0.50 kcal.mol-1). The best pure GGA functional is PBE-D3, but its error is rather large, 0.70 kcal/mol. For the large molecules, the order of the best functionals changes significantly. The best results are obtained with TPSSh-D3 and TPSS-D3 functionals (RMSE 0.76 and 0.83 kcal.mol-1), followed by PBE0-D3 (0.84 kcal.mol-1) and B3LYP-D3 (0.86 kcal.mol-1). The differences between most of the functionals are, however, less pronounced than in the smaller systems. The

ACS Paragon Plus Environment

25

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 40

large error of the ωB97X-D3/def2-TZVPD setup in this group is suspicious as the same functional performs much better in the def2-QZVP basis. These calculations were difficult to converge, but close inspection of the results did not reveal any obvious problems, the larger errors are scattered among multiple structures and systems. It may be caused by a poor transferability of the parametrization of the method to a smaller basis set, but we cannot exclude a technical problem in the calculations (which were carried out in ORCA). Overall, the ranking of the functionals is not very different from previous studies such as the recent broad survey on the GMTKN55 database.26 Hybrid functionals were shown to yield the best results, but the optimal accuracy to cost ratio is achieved with meta-GGA functionals. Pure GGA functionals perform well in the smaller systems, but lose their accuracy somewhat in the large ones. On the other hand, it is likely that the differences between the functionals are often caused by the dispersion correction and its transferability to systems of different sizes. The calculations in the DZVP-DFT basis use a different parametrization of the dispersion correction and here we do not see any pronounced differences between the functionals. Here, the dispersion correction was parametrized on a larger training set, which also included larger molecular complexes, and is presumably more robust. This is probably also the reason why in the large systems these calculations become more accurate than DFT in larger basis sets. Another remarkable case is the M06-2X functional which fails to describe the small systems, regardless of using the dispersion correction or not, but yields results comparable to DFT-D3 in the larger systems. This is probably just an error cancellation specific to this data set as there are many sources reporting that this functional and related variants are not reliable for the treatment of noncovalent intractions97

ACS Paragon Plus Environment

26

Page 27 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Comparison of the DFT-D3 calculations to the correlated wave function methods shows that obtaining any significant improvement over DFT requires a large increase in the computational cost. Especially in the larger systems, DFT-D3 yields substantially smaller errors (the best result has RMSE of 0.93 kcal.mol-1) than MP2-F12/cc-pVDZ (RMSE 1.64 kcal.mol-1), and the results are only slightly worse than those from DLPNO-CCSD(T) with the normal cutoff (RMSE 0.77 kcal.mol-1). These are very encouraging results for DFT-D3 if we take into account that the benchmark itself is based on the MP2-F12 (used for DLPNO-CCSD(T)/MP2-F12 composite scheme) and DLPNO-CCSD(T) with the only difference in the cutoff used. From the practical point of view, the DFT-D3 calculations in the DZVP-DFT basis set are most remarkable. For the large systems, they are as good as DFT-D3 in large basis. For the smaller systems, the difference is small enough to be neglected in many applications. Moreover, the parametrization of the dispersion correction for this basis set levels the differences between the functionals, so it is safe to use the most economical pure GGA ones, such as BLYP. We may provide only working hypothesis explaining this interesting finding; i.e. what distinguishes the DZVP-DFT basis set among other popular basis sets of double-zeta quality. The DZVP-DFT was specifically parametrized to provide accurate electron densities for the DFT calculations, unlike many other basis set parametrizations which are primarily focused on energies provided by wave function methods. As a consequence, one may expect more accurate results provided by various DFT functionals. Last but not least, the DFT-DZVP basis set is less prone to BSSE which also provide indirect supportive evidence for its overall very good performance. At this point, we may emphasize that the accuracy of the overall composite Gibbs free energies in solution (Eq. 1) may depend on the parametrization of the particular solvation model used as found previously for the theoretical predictions of stability constants.98 In this respect we find

ACS Paragon Plus Environment

27

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 28 of 40

quite important that there are no “underperformers” among the tested DFT-D3 methods which may ensure the accuracy of the computed Gibbs free energies. 3.4. Semiempirical and empirically corrected methods. We tested a range of modern semiempirical QM methods on the MPCONF196 data set. The results are summarized in Table 3. We include here also the empirically corrected methods which consist of ab initio or DFT calculation and three corrections (hence the name 3c): the D3 dispersion correction, the gCP correction for basis set superposition error and another correction for basis set size. All the tested QSM methods include dispersion correction, often based on the D3 correction for DFT. Some of the methods feature also corrections for hydrogen bonding, either standalone (labeled as H4 and H5) or included in the method (PM7). With the use of these corrections, some of these methods yield rather accurate interaction energies in non-covalent complexes.99 However, in the case of conformation energies studied here, none of the method yields satisfactory results. In the smaller systems, several methods approach the accuracy of 1 kcal.mol-1 (DFTB3-D3H4/5, OM3-D3, OM3-D3H4 and HF-3c) but even this is not acceptable as this error is very large when compared to the range of the conformer energies in these systems (3 kcal.mol-1). The only method that yields better results is DFTB-D, but it is just a result of an error cancellation observed earlier.23 It does not hold true when we pass to other quantities such as interaction energies, or when we move to the larger systems of the present data set. In the medium-sized and large systems, the gap between DFT-D3 and SQM methods becomes even wider. The errors become so large that none of these methods can be recommended to be used in practice. The HF-3c method performs similarly to the more approximate approaches; such a behavior was observed in studies focused solely on non-covalent interactions.61

ACS Paragon Plus Environment

28

Page 29 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

This is an important result for the future development of the SQM methods. Recently, reasonably high accuracy was reached for non-covalent interactions, but to make these methods universally applicable, it would now be necessary to focus also on the energy terms arising from the covalent structure, most likely on torsional potential energy profiles. The HF-3c method yields very large errors because it is based on a HF calculation in minimal basis set where the BSSE is too large to be corrected empirically. The PBEh-3c setup87 uses a double-zeta basis (modified def2-SVP basis set), but the results (overall RMSE 1.36 kcal/mol) are still considerably worse than those of the similarly expensive DFT-D3/DZVP-DFT setup discussed above (the sets is TPSS-D3/DZVP-DFT with RMSE of 0.69 kcal/mol). Moreover, PBEh-3c uses a hybrid functional, what makes it less efficient than GGA calculations in basis sets of similar size. Finally, the B97-3c method performs very well (RMSE 0.73 kcal/mol), comparably to the better DFT-D3 calculations in larger basis sets or to the best DFT-D3/DZVPDFT setups. It should be noted, however, that the B97-3c method uses a triple-zeta basis set (albeit a smaller one).

Table 3: The errors (RMSE in kcal.mol-1) of semiempirical and empirically corrected calculations averaged over groups of systems of different size and total averages. Method PM6-D3H4 PM6-D3H4(MM) PM7 DFTB-D DFTB3-D3H4 DFTB3-D3H5 GFN-xTB OM2-D3 OM3-D3 OM3-D3H4

Small

Med-size

Large

Overall

1.62 1.58 1.62 0.73 1.11 1.09 1.48 1.29 1.13 1.02

3.58 3.58 3.06 2.39 2.62 2.93 3.02 2.56 2.87 2.60

4.82 4.79 4.69 3.08 3.65 3.31 3.33 4.56 4.70 4.50

3.11 3.09 2.88 1.91 2.28 2.31 2.50 2.53 2.63 2.43

ACS Paragon Plus Environment

29

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

RM1-D3H4 HF-3c

1.87 1.11

2.75 2.89

5.44 4.16

3.04 2.50

PBEh-3c

0.88

1.43

1.75

1.36

B97-3c

0.66

0.69

0.85

0.73

Page 30 of 40

4. Conclusions Conformational energies for the set of 196 conformers of 13 small- to medium-sized compounds were calibrated by the CCSD(T)/CBS and DLPNO-CCSD(T)/CBS methods. The latter was shown to be in near-quantitative (~0.10-0.15 kcal.mol-1) agreement with the canonical CCSD(T), provided the TightPNO setting is used, and therefore, it can be used as the reference for larger systems (likely up to 150-200 atoms) for the problem studied here. At the same time, it was found that many D3-corrected DFT functionals provide results of ~1 kcal.mol-1 accuracy which we consider as quite encouraging. DFT-D3 methods can be safely used for large-scale calculations, for instance in efficient conformational sampling algorithms. Specifically, DFTD3/DZVP-DFT level of calculation seems to offer the best trade-off between computational cost and accuracy. Based on the calculated data, we have not found any cheaper variant for the treatment of conformational energies, since the semiempirical methods (including DFTB) provide results of inferior accuracy (errors of 3-5 kcal.mol-1). In this and our earlier work,5 we showed that the full DFT-D3/COSMO-RS sampling of the conformational space of ~100 atom systems (typically 500-1000 conformers) can be done within ~2 days on a mid-sized computer cluster. In summary, combining the 1 kcal.mol-1 (in)accuracy in the calculated Eel (this work) with the expected (in)accuracy of ~2 kcal.mol-1 found in many studies calibrating the state-ofthe-art solvation models, the ultimate goal of achieving near-quantitative accuracy (2-3 kcal.mol-1) in conformational free energies in solution for medium- to larger systems is realistic.

ACS Paragon Plus Environment

30

Page 31 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

ASSOCIATED CONTENT Supporting Information. Equilibrium geometries of all studied conformers (zipped .xyz files), all primary energetic data (Tables S1 and S2). This material is available free of charge via the Internet at http://pubs.acs.org. AUTHOR INFORMATION Corresponding Authors *E-mail: rulisek@uochb.cas.cz; rezac@uochb.cas.cz Notes The authors declare no competing financial interest. Author Contributions The manuscript was written through contributions of all authors. All authors have given approval to the final version of the manuscript. Funding Sources Grant Agency of the Czech Republic (grant no. 17-24155S - LR, 16-11321Y – JŘ), Czech Academy of Sciences (RVO 61388963) ACKNOWLEDGMENT The financial support of the Grant Agency of the Czech Republic (Grant No. 17-24155S) and Czech Academy of Sciences (RVO 61388963) is gratefully acknowledged.

ACS Paragon Plus Environment

31

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 32 of 40

REFERENCES (1) Giordanetto, F.; Kihlberg, J. Macrocyclic drugs and clinical candidates: what can medicinal chemists learn from their properties? J. Med. Chem. 2014, 57, 278-295. (2) Canales, A.; Nieto, L.; Rodriguez-Salarichs, J.; Sanchez-Murcia, P. A.; Coderch, C.; Cabrera, A. C.; Paterson, I.; Carlomagno, T.; Gago, F.; Andreu, J. M.; Altmann, K.-H.; Jimenez-Barbero, J.; Diaz, J. F. The molecular recognition of epothilones by microtubules and tubulin dimers revealed by biochemical and NMR approaches. ACS Chemical Biology 2014, 9, 1033-1043. (3) Schneider, G.; Reker, D.; Chen, T.; Hauenstein, K.; Schneider, P.; Altmann, K.-H. Deorphaning the Macromolecular Targets of the Natural Anticancer Compound Doliculide. Angew. Chem. Int. Ed. 2016, 55, 12408-12411. (4) Steinmetz, H.; Li, J.; Fu, C.; Zaburannyi, N.; Kunze, B.; Harmrolfs, K.; Schmitt, V.; Herrmann, J.; Reichenbach, H.; Höfle, G.; Kalesse, M.; Müller, R. Isolation, Structure Elucidation, and (Bio)Synthesis of Haprolid, a Cell-Type-Specific Myxobacterial Cytotoxin. Angew. Chem. Int. Ed. 2016, 55, 10113-10117. (5) Gutten, O.; Bím, D.; Řezáč, J.; Rulíšek, L.: Macrocycle Conformational Sampling by DFTD3/COSMO-RS Methodology. J. Chem. Inf. Model. 2018, 58, 48-60. (6) Jurečka, P.; Šponer, J.; Černý, J.; Hobza, P. Benchmark Database of Accurate (MP2 and CCSD(T) Complete Basis Set Limit) Interaction Energies of Small Model Complexes, DNA Base Pairs, and Amino Acid Pairs. Phys. Chem. Chem. Phys. 2006, 8, 1985-1993. (7) Řezáč, J.; Riley, K. E.; Hobza, P. S66: A Well-Balanced Database of Benchmark Interaction Energies Relevant to Biomolecular Structures. J. Chem. Theory Comput. 2011, 7, 2427–2438. (8) Řezáč, J.; Hobza, P. Benchmark Calculations of Interaction Energies in Noncovalent Complexes and Their Applications. Chem. Rev. 2016, 116, 5038–5071. (9) Gruzman, D.; Karton, A. Martin, J. M. L. Performance of Ab Initio and Density Functional Methods for Conformational Equilibria of CnH2n+2 Alkane Isomers (n = 4−8). J. Phys. Chem. A 2009, 113, 11974–11983. (10) Csontos, J.; Nagy, B.; Gyevi-Nagy, L.; Kállay, M.; Tasi, G. Enthalpy Differences of the nPentane Conformers. J. Chem. Theory Comput. 2016, 12, 2679–2688. (11) Kozuch, S.; Bachrach, S. M.; Martin, J. M. L. Conformational Equilibria in Butane-1,4-diol: A Benchmark of a Prototypical System with Strong Intramolecular H-bonds. J. Phys. Chem. A 2014, 118, 293–303.

ACS Paragon Plus Environment

32

Page 33 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(12) Császár, A. G.; Perczelb, A. Ab initio characterization of building units in peptides and proteins. Prog. Biophys. Mol. Biol. 1999, 71, 243–309. (13) Wilke, J. J.; Lind, M. C.; Schaefer III, H. F.; Császár, A. G.; Allen, W. D. Conformers of Gaseous Cysteine. J. Chem. Theory Comput. 2009, 5, 1511–1523. (14) Barone, V.; Biczysko, M.; Bloino, J.; Puzzarini, C. Characterization of the Elusive Conformers of Glycine from State-of-the-Art Structural, Thermodynamic, and Spectroscopic Computations: Theory Complements Experiment. J. Chem. Theory Comput. 2013, 9, 1533– 1547. (15) Riffet, V.; Bouchoux, G. Gas-phase structures and thermochemistry of neutral histidine and its conjugated acid and base. Phys. Chem. Chem. Phys. 2013, 15, 6097–6106. (16) Fogueri, U. R.; Kozuch, S.; Karton, A.; Martin, J. M. L. The Melatonin Conformer Space: Benchmark and Assessment of Wave Function and DFT Methods for a Paradigmatic Biological and Pharmacological Molecule. J. Phys. Chem. A 2013, 117, 2269–2277. (17) Kesharwani, M. K.; Karton, A.; Martin, J. M. L. Benchmark ab Initio Conformational Energies for the Proteinogenic Amino Acids through Explicitly Correlated Methods. Assessment of Density Functional Methods. J. Chem. Theory Comput. 2016, 12, 444–454. (18) Goerigk, L.; Karton, A.; Martin, J. M. L.; Radom, L. Accurate quantum chemical energies for tetrapeptide conformations: why MP2 data with an insufficient basis set should be handled with caution. Phys. Chem. Chem. Phys. 2013, 15, 7028–7031. (19) Mládek, A.; Banáš, P.; Jurečka, P.; Otyepka, M.; Zgarbová, M.; Šponer, J. Energies and 2′Hydroxyl Group Orientations of RNA Backbone Conformations. Benchmark CCSD(T)/CBS Database, Electronic Analysis, and Assessment of DFT Methods and MD Simulations. J. Chem. Theory Comput. 2014, 10, 463–480. (20) Kruse, H.; Mladek, A.; Gkionis, K.; Hansen, A.; Grimme, S.; Sponer, J. Quantum Chemical Benchmark Study on 46 RNA Backbone Families Using a Dinucleotide Unit. J. Chem. Theory Comput. 2015, 11, 4972–4991. (21) Řeha, D.; Valdés, H.; Vondrášek, J.; Hobza, P.; Abu-Riziq, A.; Crews, B.; de Vries, M. S. Structure and IR Spectrum of Phenylalanyl–Glycyl–Glycine Tripetide in the Gas-Phase: IR/UV Experiments, Ab Initio Quantum Chemical Calculations, and Molecular Dynamic Simulations. Chem. Eur. J. 2005, 11, 6803–6817. (22) Valdés, H.; Řeha, D.; Hobza, P. Structure of Isolated Tryptophyl-Glycine Dipeptide and Tryptophyl-Glycyl-Glycine Tripeptide:  Ab Initio SCC-DFTB-D Molecular Dynamics

ACS Paragon Plus Environment

33

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 34 of 40

Simulations and High-Level Correlated Ab Initio Quantum Chemical Calculations. J. Phys. Chem. B 2006, 110, 6385–6396. (23) Valdes, H.; Pluháčková, K.; Pitonák, M.; Řezáč, J.; Hobza, P. Benchmark Database on Isolated Small Peptides Containing an Aromatic Side Chain: Comparison between Wave Function and Density Functional Theory Methods and Empirical Force Field. Phys. Chem. Chem. Phys. 2008, 10, 2747–2757. (24) Goerigk, L.; Grimme, S. A General Database for Main Group Thermochemistry, Kinetics, and Noncovalent Interactions − Assessment of Common and Reparameterized (Meta-)GGA Density Functionals. J. Chem. Theory Comput. 2010, 6, 107–126. (25) Csonka, G. I.; French, A. D.; Johnson, G. P.; Stortz, C. A. Evaluation of Density Functionals and Basis Sets for Carbohydrates. J. Chem. Theory Comput. 2009, 5, 679–692. (26) Goerigk, L.; Hansen, A.; Bauer, C.; Ehrlich, S; Najibi, A; Grimme, S. A look at the density functional theory zoo with the advanced GMTKN55 database for general main group thermochemistry, kinetics and noncovalent interactions. Phys. Chem. Chem. Phys., 2017,19, 32184–32215. (27) Sedlák, R.; Janowski, T.; Pitoňák, M.; Řezáč, J.; Pulay, P.; Hobza, P. Accuracy of Quantum Chemical Methods for Large Noncovalent Complexes. J. Chem. Theory Comput. 2013, 9, 3364– 3374. (28) Grimme, S. Supramolecular Binding Thermodynamics by Dispersion-Corrected Density Functional Theory. Chem. Eur. J. 2012, 18, 9955–9964. (29) Sure, R.; Grimme, S. Comprehensive Benchmark of Association (Free) Energies of Realistic Host–Guest Complexes. J. Chem. Theory Comput. 2015, 11, 3785–3801. (30) Sparta, M.; Neese, F. Chemical Applications Carried out by Local Pair Natural Orbital Based Coupled-Cluster Methods. Chemical Society Reviews 2014, 43, 5032-5041. (31) Řezáč, J.; Jurečka, P.; Riley, K. E.; Černý, J.; Valdes, H.; Pluháčková, K.; Berka, K.; Řezáč, T.; Pitoňák, M.; Vondrášek, J.; Hobza, P. Quantum Chemical Benchmark Energy and Geometry Database for Molecular Clusters and Complex Molecular Systems (Www.Begdb.Com): A Users Manual and Examples. Collect. Czech. Chem. Commun. 2008, 73, 1261–1270. (32) Watts, K. S.; Dalal, P.; Tebben, A. J.; Cheney, D. L.; Shelley, J. C. Macrocycle conformational sampling with MacroModel. J. Chem. Inf. Model. 2014, 54, 2680–2696. (33) Zenke, G.; Strittmatter, U.; Fuchs, S.; Quesniaux, V. F. J.; Brinkmann, V.; Schuler, W.; Zurini, M.; Enz, A.; Billich, A.; Sanglier, J.-J.; Fehr, T. Sanglifehrin A, a novel cyclophilin-

ACS Paragon Plus Environment

34

Page 35 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

binding compound showing immunosuppressive activity with a new mechanism of action. J. Immunol. 2001, 166, 7165-7171. (34) Steadman, V. A.; Pettit, S. B.; Poullennec, K. B.; Lazarides, L.; Keats, A. J.; Dean, D. K.; Stanway, S. J.; Austin, C. A.; Sanvoisin, J. A.; Watt, G. M.; Fliri, H. G.; Liclican, A. C.; Jin, D.; Wong, M. H.; Leavitt, S. A.; Lee, Y.-J.; Tian, Y.; Frey, C. R.; Appleby, T. C.; Schmitz, U.; Jansa, P.; Mackman, R. L.; Schultz, B. E. Discovery of Potent Cyclophilin Inhibitors Based on the Structural Simplification of Sanglifehrin A. J. Med. Chem. 2017, 60, 1000-1017. (35) Stewart, J. J. P. Optimization of parameters for semiempirical methods V: modification of NDDO approximations and application to 70 elements. J. Mol. Model. 2007, 13, 1173-1213. (36) Řezáč, J.; Hobza, P. Advanced Corrections of Hydrogen Bonding and Dispersion for Semiempirical Quantum Mechanical Methods. J. Chem. Theory Comput. 2012, 8, 141-151. (37) (a) Becke, A. D. Density-functional exchange-energy approximation with correct asymptotic behavior. Phys. Rev. A 1988, 38, 3098-3100; (b) Vosko, S. H.; Wilk, L.; Nusair, M. Accurate spin-dependent electron liquid correlation energies for local spin density calculations: a critical analysis. Can. J. Phys. 1980, 58, 1200-1211; (c) Perdew, J. P. Density-functional approximation for the correlation energy of the inhomogeneous electron gas. Phys. Rev. B 1986, 33, 8822-8824. (38) Weigend, F.; Ahlrichs, R. Balanced basis sets of split valence, triple zeta valence and quadruple zeta valence quality for H to Rn: Design an assessment of accuracy. Phys. Chem. Chem. Phys. 2005, 7, 3297-3305. (39) Grimme, S.; Antony, J., Ehrlich, S., Krieg, H. A consistent and accurate ab initio parametrization of density functional dispersion correction (DFT-D) for the 94 elements H-Pu. J. Chem. Phys. 2010, 132, 154104. (40) Lee, C.; Yang, W.; Parr, R. G. Development of the Colle-Salvetti correlation-energy formula into a functional of the electron density. Phys. Rev. B 1988, 37, 785-789. (41) Perdew, J. P.; Burke, K.; Ernzerhof, M. Generalized Gradient Approximation Made Simple. Phys. Rev. Lett., 1996, 77, 3865-3868. (42) Grimme, S. Semiempirical GGA-type density functional constructed with a long-range dispersion correction. J. Comput. Chem. 2006, 27, 1787–1799. (43) Miehlich, B.; Savin, A.; Stoll, H.; Preuss, H. Results obtained with the correlation energy density functionals of becke and Lee, Yang and Parr. Chem. Phys. Lett. 1989, 157, 200–206. (44) Handy, N. C.; Cohen, A. J. Left-right correlation energy. Mol. Phys. 2001, 99, 403–412.

ACS Paragon Plus Environment

35

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 36 of 40

(45) Zhang, Y.; Yang, W. Comment on “Generalized Gradient Approximation Made Simple”. Phys. Rev. Lett. 1998, 80, 890. (46) Zhao, Y.; Truhlar, D. G. A new local density functional for main-group thermochemistry, transition metal bonding, thermochemical kinetics, and noncovalent interactions. J. Chem. Phys. 2006, 125, 194101. (47) Tao, J. M.; Perdew, J. P.; Staroverov, V. N.; Scuseria, G. E. Climbing the density functional ladder: Nonempirical meta-generalized gradient approximation designed for molecules and solids. Phys. Rev. Lett. 2003, 91, 146401. (48) Sun, J.; Ruzsinszky, A.; Perdew, J. P. Strongly Constrained and Appropriately Normed Semilocal Density Functional. Phys. Rev. Lett. 2015, 115, 036402. (49) Perdew, J. P.; Ruzsinszky, A.; Csonka, G. I.; Constantin, L. A.; Sun, J. Workhorse Semilocal Density Functional for Condensed Matter Physics and Quantum Chemistry. Phys. Rev. Lett. 2009, 103, 026403. (50) Becke, A. D. Density-functional thermochemistry. III. The role of exact exchange. J. Chem. Phys. 1993, 98, 5648-5652. (51) Adamo, C.; Barone, V. Toward reliable density functional methods without adjustable parameters: The PBE0 model. J. Chem. Phys. 1999, 110, 6158-6169. (52) Staroverov, V. N.; Scuseria, G. E.; Tao, J.; Perdew, J. P. Comparative assessment of a new nonempirical density functional: Molecules and hydrogen-bonded complexes. J. Chem. Phys. 2003, 119, 12129-12137. (53) Zhao, Y.; Truhlar, D. G. The M06 suite of density functionals for main group thermochemistry, thermochemical kinetics, noncovalent interactions, excited states, and transition elements: two new functionals and systematic testing of four M06-class functionals and 12 other functionals. Theor. Chem. Acc. 2008, 120, 215-241. (54) Becke, A. D. A new mixing of Hartree–Fock and local density‐functional theories. J. Chem. Phys. 1993, 98, 1372–1377. (55) Lin, Y.-S.; Li, G.-D.; Mao S.-P.; Chai, J.-D. Long-Range Corrected Hybrid Density Functionals with Improved Dispersion Corrections. J. Chem. Theory Comput. 2013, 9, 263–272. (56) Godbout, N.; Salahub, D. R.; Andzelm, J.; Wimmer, E. Optimization of Gaussian-Type Basis Sets for Local Spin Density Functional Calculations. Part I. Boron through Neon, Optimization Technique and Validation. Can. J. Chem. 1992, 70, 560–571.

ACS Paragon Plus Environment

36

Page 37 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(57) Grimme, S.; Hansen, A.; Brandenburg, J. G.; Bannwarth, C.: Dispersion-Corrected MeanField Electronic Structure Methods. Chem. Rev., 2016, 116, 5105–5154. (58) Johnson, E. R.; Becke, A. D. A post-Hartree-Fock model of intermolecular interactions: Inclusion of higher-order corrections. J. Chem. Phys. 2006, 124, 174104. (59) Smith, D. G. A.; Burns, L. A.; Patkowski, K.; Sherrill, C. D. Revised Damping Parameters for the D3 Dispersion Correction to Density Functional Theory. J. Phys. Chem. Lett. 2016, 7, 2197–2203. (60) Witte, J.; Mardirossian, N.; Neaton, J. B.; Head-Gordon, M. Assessing DFT-D3 Damping Functions Across Widely Used Density Functionals: Can We Do Better? J. Chem. Theory Comput. 2017, 13, 2043–2052. (61) Hostaš, J.; Řezáč, J. Accurate DFT-D3 Calculations in a Small Basis Set. J. Chem. Theory Comput. 2017, 13, 3575–3585. (62) Helgaker, T.; Klopper, W.; Koch, H.; Noga, J. Basis-Set Convergence of Correlated Calculations on Water. J. Chem. Phys. 1997, 106, 9639–9646. (63) Řezáč, J.; Dubecký, M.; Jurečka, P.; Hobza, P. Extensions and Applications of the A24 Data Set of Accurate Interaction Energies. Phys. Chem. Chem. Phys. 2015, 17 (29), 19268–19277. (64) Řezáč, J.; Hobza, P. Describing Noncovalent Interactions beyond the Common Approximations: How Accurate Is the “Gold Standard,” CCSD(T) at the Complete Basis Set Limit? J. Chem. Theory Comput. 2013, 9, 2151–2155. (65) Liakos, D. G.; Sparta, M.; Kesharwani, M. K.; Martin, J. M. L.; Neese, F. Exploring the Accuracy Limits of Local Pair Natural Orbital Coupled-Cluster Theory. J. Chem. Theory Comput. 2015, 11, 1525–1539. (66) Liakos, D. G.; Neese, F. Is It Possible To Obtain Coupled Cluster Quality Energies at near Density Functional Theory Cost? Domain-Based Local Pair Natural Orbital Coupled Cluster vs Modern Density Functional Theory. J. Chem. Theory Comput. 2015, 11, 4054–4063. (67) Minenkov, Y.; Bistoni, G.; Riplinger, C; Auer, A. A.; Neese, F.; Cavallo, L. Pair natural orbital and canonical coupled cluster reaction enthalpies involving light to heavy alkali and alkaline earth metals: the importance of sub-valence correlation. Phys. Chem. Chem. Phys. 2017, 19, 9374-9391. (68) Dimitrios Liakos, private communication (69) May, A. J.; Manby, F. R. An Explicitly Correlated Second Order Møller-Plesset Theory Using a Frozen Gaussian Geminal. J. Chem. Phys. 2004, 121, 4479–4485.

ACS Paragon Plus Environment

37

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 38 of 40

(70) Tew, D. P.; Klopper, W. New Correlation Factors for Explicitly Correlated Electronic Wave Functions. J. Chem. Phys. 2005, 123, 074101. (71) Bachorz, R. A.; Bischoff, F. A.; Glöß, A.; Hättig, C.; Höfener, S.; Klopper, W.; Tew, D. P. The MP2-F12 Method in the TURBOMOLE Program Package. J. Comput. Chem. 2011, 32, 2492–2513. (72) Peterson, K. A.; Adler, T. B.; Werner, H.-J. Systematically Convergent Basis Sets for Explicitly Correlated Wavefunctions: The Atoms H, He, B–Ne, and Al–Ar. J. Chem. Phys. 2008, 128, 084102. (73) Grimme, S. Improved Second-Order Møller–Plesset Perturbation Theory by Separate Scaling of Parallel- and Antiparallel-Spin Pair Correlation Energies. J. Chem. Phys. 2003, 118, 9095-9102. (74) DiStasio, R.; Head-Gordon, M. Optimized Spin-Component Scaled Second-Order MollerPlesset Perturbation Theory for Intermolecular Interaction Energies. Mol. Phys. 2007, 105, 1073–1083. (75) Pitoňák, M.; Neogrády, P.; Černý, J.; Grimme, S.; Hobza, P. Scaled MP3 Non‐Covalent Interaction Energies Agree Closely with Accurate CCSD(T) Benchmark Data. ChemPhysChem 2009, 10, 282–289. (76) Sedlák, R.; Riley, K. E.; Řezáč, J.; Pitoňák, M.; Hobza, P. MP2.5 and MP2.X: Approaching CCSD(T) Quality Description of Noncovalent Interaction at the Cost of a Single CCSD Iteration. ChemPhysChem 2013, 14, 698–707. (77) Stewart, J. J. P. Optimization of Parameters for Semiempirical Methods VI: More Modifications to the NDDO Approximations and Re-Optimization of Parameters. J Mol Model 2013, 19, 1–32. (78) Weber, W.; Thiel, W. Orthogonalization Corrections for Semiempirical Methods. Theor Chem Acc 2000, 103, 495–506. (79) Risthaus, T.; Grimme, S. Benchmarking of London Dispersion-Accounting Density Functional Theory Methods on Very Large Molecular Complexes. J. Chem. Theory Comput. 2013, 9, 1580–1591. (80) Gaus, M.; Cui, Q.; Elstner, M. DFTB3: Extension of the Self-Consistent-Charge DensityFunctional Tight-Binding Method (SCC-DFTB). J. Chem. Theory Comput. 2011, 7, 931–948. (81) Miriyala, V. M.; Řezáč, J. Description of Non-Covalent Interactions in SCC-DFTB Methods. J. Comput. Chem. 2017, 38, 688–697.

ACS Paragon Plus Environment

38

Page 39 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(82) Řezáč, J. Empirical Self-Consistent Correction for the Description of Hydrogen Bonds in DFTB3. J. Chem. Theory Comput. 2017, 13, 4804–4817. (83) Elstner, M.; Porezag, D.; Jungnickel, G.; Elsner, J.; Haugk, M.; Frauenheim, T.; Suhai, S.; Seifert, G. Self-Consistent-Charge Density-Functional Tight-Binding Method for Simulations of Complex Materials Properties. Phys. Rev. B 1998, 58, 7260-7268. (84) Elstner, M.; Hobza, P.; Frauenheim, T.; Suhai, S.; Kaxiras, E. Hydrogen Bonding and Stacking Interactions of Nucleic Acid Base Pairs: A Density-Functional-Theory Based Treatment. J. Chem. Phys. 2001, 114, 5149-5155. (85) Grimme, S.; Bannwarth, C.; Shushkov, P. A Robust and Accurate Tight-Binding Quantum Chemical Method for Structures, Vibrational Frequencies, and Noncovalent Interactions of Large Molecular Systems Parametrized for All Spd-Block Elements (Z = 1–86). J. Chem. Theory Comput. 2017, 13, 1989–2009. (86) Sure, R.; Grimme, S. Corrected Small Basis Set Hartree-Fock Method for Large Systems. J. Comput. Chem. 2013, 34, 1672–1685. (87) Grimme, S.; Brandenburg, J. G.; Bannwarth, C.; Hansen, A. Consistent structures and interactions by density functional theory with small atomic orbital basis sets. J. Chem. Phys. 2015, 143, 054107. (88) Unpublished method, described only in ORCA manual. (89) TURBOMOLE V7.0 2015, a development of University of Karlsruhe and Forschungszentrum Karlsruhe GmbH, 1989-2007, TURBOMOLE GmbH, since 2007; available from http://www.turbomole.com. (90) Řezáč, J. Cuby: An integrative framework for computational chemistry. J. Comput. Chem. 2016, 37, 1230; http://cuby4.molecular.cz. (91) Neese, F. The ORCA program system. Wiley Interdiscip. Rev.-Comput. Mol. Sci. 2012, 2, 73-78. (92) Stewart, J. J. P. MOPAC 2016; Stewart Computational Chemistry, Colorado Springs, CO, USA, 2016. (93) Thiel, W. MNDO 2005; Max Planck Institute for Coal Research, Mülheim, Germany, 2005. (94) Hujo, W.; Grimme, S. Performance of the van der Waals Density Functional VV10 and (hybrid)GGA Variants for Thermochemistry and Noncovalent Interactions. J. Chem. Theory Comput. 2011, 7, 3866–3871.

ACS Paragon Plus Environment

39

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 40 of 40

(95) Goerigk, L. How Do DFT-DCP, DFT-NL, and DFT-D3 Compare for the Description of London-Dispersion Effects in Conformers and General Thermochemistry? J. Chem. Theory Comput. 2014, 10, 968–980. (96) Grimme, S.; Ehrlich, S.; Goerigk, L. Effect of the Damping Function in Dispersion Corrected Density Functional Theory. J. Comput. Chem. 2011, 32, 1456–1465. (97) Goerigk, L. Treating London-Dispersion Effects with the Latest Minnesota Density Functionals: Problems and Possible Solutions. J. Phys. Chem. Lett. 2015, 6, 3891–3896. (98) Gutten, O.; Rulíšek, L. Predicting the Stability Constants of Metal-Ion Complexes from First Principles. Inorg. Chem. 2013, 52, 10347-10355. (99) Christensen, A. S.; Kubař, T.; Cui, Q.; Elstner, M. Semiempirical Quantum Mechanical Methods for Noncovalent Interactions for Chemical and Biochemical Applications. Chem. Rev. 2016, 116, 5301–5337.

TOC GRAPHIC

ACS Paragon Plus Environment

40