Testing Semiempirical Quantum Mechanical Methods on a Data Set of

Feb 23, 2018 - systems. It was, however, noted that the accuracy of these methods deteriorates at intermolecular distances shorter than equilibrium. I...
0 downloads 9 Views 1MB Size
Subscriber access provided by - Access paid by the | UCSB Libraries

Article

Testing Semiempirical QM Methods on a Data Set of Interaction Energies Mapping Repulsive Contacts in Organic Molecules Vijay M. Miriyala, and Jan #ezá# J. Phys. Chem. A, Just Accepted Manuscript • DOI: 10.1021/acs.jpca.8b00260 • Publication Date (Web): 23 Feb 2018 Downloaded from http://pubs.acs.org on February 24, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

The Journal of Physical Chemistry A is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Testing Semiempirical QM Methods on a Data Set of Interaction Energies Mapping Repulsive Contacts in Organic Molecules V. M. Miriyala1 and J. Řezáč1, * 1

Department of Computational Chemistry, Institute of Organic Chemistry and Biochemistry,

Czech Academy of Sciences, Flemingovo Náměstí 542/2, 16610 Prague, Czech Republic *Email: [email protected]

Abstract Semiempirical QM methods with corrections for non-covalent interactions provide a favorable combination of accuracy and computational efficiency what makes them useful tool for a study of large molecular systems. It was, however, noted that the accuracy of these method deteriorates at intermolecular distances shorter than equilibrium. In this work, we explore this issue systematically using a newly developed data set of benchmark interaction energies named R160x6. This data set maps repulsive contacts in organic molecules, and it consists of 160 model complexes for which six points along the dissociation curve are provided. Testing a wide range of semiempirical QM methods against the CCSD(T)/CBS benchmark revealed that most methods, and all the dispersion-corrected ones, underestimate the repulsion systematically. The worst cases are usually hydrogen-hydrogen contacts. The best results were obtained with PM6D3H4 and DFTB3-D3H4, as these methods already contain a correction for the H-H repulsion, but the errors are still about twice as large as in equilibrium geometries.

Introduction The introduction of corrections for non-covalent interactions (dispersion, hydrogen bonds, etc.) into semiempirical quantum-mechanical (SQM) methods, including density-functional tight binding (DFTB), extended the applicability of these methods to new chemical problems. 1 With the latest version of the corrections,2–4 interaction energies in small molecular complexes are predicted with accuracy better than 1 kcal/mol, and the computational efficiency of these methods make them useful for studying large systems such as biomolecules. We have 1

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

successfully applied them to calculations of protein-ligand interactions in computer-aided drug design.5,6 Our experience with these methods shows that they yield rather good results in equilibrium geometries or at longer distances. On the other hand, we have encountered multiple failures in systems with close contacts where the overall interaction energy is determined mainly by the Pauli repulsion.2,3,7 Such structures do occur naturally (when other attractive interactions, or external pressure force molecules into a close contact) or they can be a result of inaccurate prediction of the geometry on which the calculation is performed. The latter occurs often in the drug design studies where docking is used to generate the geometry of the protein-ligand complex. The potential energy functions used in docking often underestimate the repulsion on purpose in order to allow easier fitting of the ligand into the active site which is kept rigid. The description of intramolecular repulsion in the SQM methods is thus of great practical importance, yet so far it was not studied systematically. The data sets of benchmark interaction energies on which the methods are parameterized and tested usually contain only equilibrium geometries or dissociation curves not extending significantly below the equilibrium distance. 8,9 The difficulties in the description of the repulsion in SQM methods has multiple sources. In the MNDO-type SQM methods, they stem from the approximations made in the formulation of the methods such as using minimal valence basis, non-orthogonal orbitals, approximate treatment of two-electron integrals etc. The missing repulsion can be added back in the empirical core-core potentials, but their transferability may be limited. The original formulation of the core-core potentials in the MNDO method led to exaggerated repulsion at van der Waals distances, what was corrected by additional fully empirical terms in all the subsequent methods (from AM1 on). Moreover, even if the method is capable of describing the repulsion properly, relevant model systems may not be included in the training set the method is fitted on. Additionally, some methods were intentionally designed to underestimate the repulsion in order to compensate for missing London dispersion. This can be demonstrated well on PM6 10 where an empirical modification of the core-core terms was introduced (Eq. 6 in Ref. 10) that allowed attractive interaction in rare gas dimers even without proper dispersion correction. This modification complicated the addition of a standalone dispersion correction. 3,11 The same formalism is used also in PM7 despite having a separate dispersion correction built into the method.12 2

ACS Paragon Plus Environment

Page 2 of 20

Page 3 of 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Another possible source of underestimated repulsion is the use of a basis set constructed from very compact atomic orbitals. In MNDO-type methods, the orbital exponents are free parameters, while in DFTB, the reference atomic electron densities are optimized under a radial constraint. While this is a valid approach to building a basis better suited for atoms in molecules, its impact on intramolecular interactions may have not been considered in detail. We have already reported problems with hydrogen-hydrogen repulsion in hydrocarbons that was found in multiple methods including PM6 and DFTB.2 Both these methods also underestimate repulsion between halogens and other atoms that led us to developing a correction for halogen bonding which have a form of additional repulsive potential. 13,14 Besides these specific cases, the problem has not been systematically studied. In this work, we have developed a large data set of model systems mapping repulsive contacts between H, C, N and O elements in small molecules. For each system, a dissociation curve starting in the repulsive region was calculated using a composite scheme estimating CCSD(T) interaction energy at the complete basis set (CBS) limit. This data set, named R160x6, is then used for testing wide range of common SQM and DFTB methods. We also explore the possibility of introducing a simple pairwise repulsive correction fitted to these data. Among the classical MNDO-based SQM methods, we have tested MNDO, 15 AM1,16 RM1,17 PM610 (and its variant that incorporate the corrections for dispersion and the hydrogen bonding, PM6-D3H418 and PM7.12 We also include the OM219,20 and OM321 methods which feature orthogonalization corrections that are supposed to improve the description of Pauli repulsion. Among DFTB methods we test variants of both the original second-order DFTB 22 and the thirdorder expansion DFTB3,23 both with dispersion correction24 and possibly also with H-bonding corrections.2,3 We have tested also the recently introduced GFN-xTB empirical tight binding approach.25 All the tested methods exhibited rather large errors in the description of the repulsive contact data set, in most cases underestimating the repulsion systematically. Therefore, we attempted to parameterize an additional correction that would add the missing repulsion to PM6-D3H4 and DFTB3-D3H4. This correction takes the form of a pairwise exponential repulsive term damped

3

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

at covalent distances. This correction improved the description of the strong repulsive contacts, but it was not transferable to geometries closer to equilibrium. If the same correction was parameterized on the S66x8 data set, the results turned out to be similar to the original D3H4 approach which includes a repulsive correction for the element pair where the error is worst, hydrogen-hydrogen contacts. It became clear that further improvements in this direction would require modifications of the SQM methods themselves, e.g. by including repulsive non-covalent contacts in the training set used in the parameterization of these methods. The R160x6 data set would be well suited for this purpose.

Description of the R160x6 data set The data set consists of 160 complexes formed by combining 12 optimized monomers in various configurations. The monomers are small to medium-sized molecules they represent the motifs that are most commonly found in organic molecules. The 12 monomers we have used in this work are CO, H2, N2, acetylene, ammonia, benzene, ethylene, formaldehyde, methane, methanol, pyridine and water. The complexes are designed so that they feature a distinct atomatom contact, regardless of whether such an orientation leads to favorable or unfavorable interaction. For the C, H, N and O elements present in the set, all possible combinations give rise to ten element pairs, C-C, C-H, C-O, H-H, N-C, N-H, N-N, N-O, O-H and O-O. We have taken care that the dataset contains sufficient number of representation of each element pair. The number of complexes representing each element pair are 15 for C-C, 30 for C-H, 12 for C-O, 21 for H-H, 9 for N-C, 8 for N-H, 10 for N-N, 22 for N-O, 12 for O-H, 21 for O-O. A detailed list of the complexes and their structures are provided as the supporting information. To make sure that all the complexes contribute to the data set with a comparable weight, we set the intermolecular distance in each system so that the interaction energy is around 5 kcal/mol. This geometry is a starting point for generating five more points along the dissociation curve of each complex. The resulting data set thus contains a total of 960 points. Details on the construction of the geometries and dissociation curves are provided in the Methods section below. The benchmark interaction energies of all the dissociation curves are calculated at CCSD(T)/CBS level.

4

ACS Paragon Plus Environment

Page 4 of 20

Page 5 of 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Methods Construction of the R160x8 data set 1) Orientation of the monomers in the complexes: The monomers were optimized at MP2/ccpVTZ level. These optimized monomers were used in the preparation of complexes by aligning them in different configurations. For each monomer we assign one or more connectors, pairs of a selected interacting atom and a vector defining the direction in which it will approach its partner in the complex. Different orientations of the connectors with respect to the local geometry are labeled as perpendicular, linear and axial. For each molecule, the connectors are chosen so that the selected atom will dominate the interaction and the contribution of the rest of the molecule is minimized. The complexes are then built by aligning the connector vectors of two molecules against each other. For example, if we need to build a benzene-water complex with C-H element pair interaction, the first monomer benzene would have a connector on C atom perpendicular to the plane of the molecule and the second monomer should have a connector on H atom in the linear extension of the O-H bond. The remaining degree of freedom defining the mutual orientation of the molecules is the torsion around this intermolecular vector. We orient the molecules in such a way that they point in opposite directions so that their residual interaction is minimized. Using different combinations on the 12 monomers we generated a balanced set of 160 complexes that contains sufficient diversity of element pair interactions. 2) Determination of the intermolecular distance. The distance in the starting geometries is set so that there will be a repulsion of approximately 5 kcal/mol. This is achieved by running a preliminary scan along the intermolecular vector at BLYP-D3(BJ)/def2-QZVP level. From this scan, two points with interaction energies around 5 kcal/mol are taken and the distance corresponding to interaction energy of 5 kcal/mol is obtained using a linear interpolation. This was repeated for all the 160 complexes; these geometries are used as the starting point of the dissociation curves. 3) Construction of the dissociation curves. The dissociation curves are constructed by scaling the initial intermolecular distance by factors 1.0, 1.05, 1.10, 1.15, 1.25 and 1.65. The scaling factors were chosen so that the steep repulsive part of the curve is sampled in more detail. This

5

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

allows reasonably accurate reconstruction of a smooth curve from these six points. The data set hence generated contains 960 geometries. 4) Interaction energy calculations. The benchmark CCSD(T)/CBS interaction energies along the dissociation curves are estimated using a following composite scheme built from MP2 energy and a CCSD(T) correction. Counterpoise correction 26 is employed to minimize the effects of the basis set superposition error (BSSE). Instead of extrapolation to the CBS, we use explicitly correlated MP2-F12 calculation in cc-pVQZ-F12 basis. We have previously shown that it yields results comparable to extrapolation from aug-cc-pVTZ and aug-cc-pVQZ basis sets. The CCSD(T) correction is calculated in aug-cc-pVDZ basis. The final CCSD(T)/CBS estimate is constructed as ECCSD(T)/CBS = EMP2-F12 + ECCSD(T)/aug-cc-pVDZ – EMP2/aug-cc-pVDZ This setup yields accuracy of about 2% with respect to true CCSD(T)/CBS limit 27 and it is similar to the one used in other data sets we have produced such as S66x8. The accuracy of this setup was verified by additional calculations in larger basis set and by a separate evaluation of the BSSE in smaller set of the systems. These results suggested that the accuracy of the current setup is more than sufficient for testing of SQM methods. The average of the CCSD(T)/CBS interaction energies in the first points of the curves is 4.1 kcal/mol and 100 of them are within +/- 1 kcal/mol from this value. There are, however, several outliers where the CCSD(T)/CBS interaction energy differs from the DFT-D3 estimate more significantly, with maximum and maximum of 6.8 and -2.2 kcal/mol.

Methods tested The generated benchmark data set was used to analyze the performance of several SQM methods. MNDO,15 AM1,16 RM117 and PM712 were used without any modifications. For PM6, we use two variants of the D3H4 correction.2,7 Both include a repulsive correction applied to H-H atom pairs. It was found in the original work 2 that without such a correction, it is not possible to obtain accurate description of geometries and interaction energies of non-covalent complexes of

6

ACS Paragon Plus Environment

Page 6 of 20

Page 7 of 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

aliphatic hydrocarbons. The original correction has a form of a sigmoidal function (Eq. 1) which captures the onset of the repulsion but then turn to a constant value so that there is no gradient applied at covalent distances.

(1)

This H-H repulsion correction was later updated to an exponential form (see Eq. 2 below) damped at short distances,7 we denote this version of the method PM6-D3H4'. This is the version of the method we now use in the applications. For the OM219,20 and OM321 methods, we use the D3 correction as parameterized by Grimme, 24 as well as the parameterization used in the OM3-D3H4 approach.2 The second-order DFTB calculations use the MIO parameter set 23 and they are optionally coupled with the corresponding dispersion correction (DFTB-D method). 28 Third-order DFTB (DFTB3) uses the 3OB parameter set 29 developed for it. For DFTB3, the recent reparameterization of the D3H4 corrections is used.3 The GFN-xTB method25 includes a dispersion correction.

Exponential repulsion correction for PM6 and DFTB3 To test whether the description of the repulsion can be improved by a standalone empirical correction, we have implemented simple pairwise exponential repulsive potential. It will be added to PM6-D3H4 and DFTB3-D3H4 which were chosen as the most widely used approaches from both families. Such setup will be denoted as D3H4R. When this repulsive correction is used, the original H-H repulsion term is removed from the D3H4 correction. The repulsive correction between two atoms at distance r takes the form of a simple exponential term with two parameters c1 and c2 (Eq. 2). The parameters are fitted for each combination of elements.

7

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 20

Erep = c1 exp[-c2 r] (2)

To apply the correction only in the non-covalent region, the function is damped so that it becomes constant at short distances (this damping has similar properties as the approach used in the D3H4' correction but is formulated differently). Damping to a constant value is important for avoiding the introduction of a large gradient on covalent bonds. Two distance thresholds are defined in terms of sum of covalent radii of the atoms: r0 = 1.2 (rcov,A + rcov,B) and r1 = 1.5(rcov,A + rcov,B). The damped repulsion correction is then constructed as Erep'(r) =

Erep(r)

if r > r1

Erep(r0)

if r < r0

(3)

fsw(r, r0, r1) Erep(r) + (1 - fsw(r, r0, r1)) Erep(r0) otherwise where fsw is a smooth switching function fsw(r, r0, r1) = -20 x7 + 70 x6 – 84 x5 + 35 x4

and x = (r - r0) / (r1 – r0).

(4)

For comparison of the original and new form of the correction, we plot the energy of the repulsion energy between two hydrogen atoms, as parameterized for PM6, in Figure 1. Note that the original correction is much weaker; it was developed as a smallest possible correction that improves the description of equilibrium intramolecular distance in hydrocarbons. 2 Later, it was found that in some applications, stronger correction is needed that is capable of making the potential more repulsive at shorter distances, that led to the second generation of the correction.

Computational Details The DFT and MP2-F12 calculations were performed in Turbomole version 6.6 30 using the resolution of identity approximation. The CCSD(T) calculations were carried out in the Psi4 program.31

8

ACS Paragon Plus Environment

Page 9 of 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

All DFT-D3 calculations use the Becke-Johnson (BJ) damping. This damping function makes the dispersion correction converge to a constant value at short distances, so it does not affect the slope of the DFT potential in the repulsive region. All DFTB, DFTB-D and DFTB3 calculations were performed using the DFTB+ program, 32 the PM6, PM7, MNDO, AM1 and RM1 calculations using the MOPAC program. 33 The OM2 and OM3 results have been calculated using the MNDO program. 34 The Grimme’s GFN-xTB method has been calculated using the xtb program supplied by the authors of the method.25 The D3H4 corrections as well as the repulsion correction described above were added to the SQM calculations using the Cuby framework,35 which was used also for the automation of the calculations of the data sets. The R160x6 data set will be available as a part of the Cuby framework.

Results and Discussion Validation of the benchmark calculations All the benchmark interaction energies are provided in Table S4 of the Supporting Information. In few systems, the CCSD(T)/CBS interaction energies deviated significantly from the preliminary DFT-D3 calculations. It was found that is is due to the error of the DFT-D3 calculation themselves, but we performed two additional tests to verify the accuracy of our benchmark interaction energies. First, we analyzed the effect of the basis set superposition error. The merits of counterpoise correction has been a long debated topic36 but it was not clear if the prior experience is applicable also to complexes with such a short intermolecular distances. The benchmark results were obtained using counterpoise correction and we thus repeated some of them without the correction. This was performed on a selection of systems covering all the element pairs (systems 01, 08, 09, 10, 11, 17, 18, 22, 27, 47, 49, 61, 62, 65, 89, 101, 107, 113, 116, 145 and 157). Our results show that in these systems, the mean absolute deviation between the interaction energies with and without counterpoise corrections is only 0.017 kcal/mol. It is thus clear that the BSSE does not affect the quality of our benchmark calculations.

9

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Second, we tested the effect of the size of the basis set used in the ΔCCSD(T) correction. For the same subset of the systems, we calculated this term also in aug-cc-pVTZ basis set. The mean absolute deviation between these results and the original benchmark where this term is calculated in aug-cc-pVDZ basis is only 0.021 kcal/mol. Again, such an error is negligible, especially when we consider that the data set will be used for testing of methods that yield errors several orders of magnitude larger.

Performance of SQM methods Several available semiempirical quantum mechanical methods have been tested on the R160x6 benchmark data set and their performance has been analyzed. The results are presented in several Tables and Figures. Before going into more details of the individual element pair interactions and imposing repulsion correction, we analyze the overall errors of the tested methods evaluated as a root mean square error (RMSE), mean unsigned error (MUE) and mean signed error (MSE) over the R160x6 data set. These errors are listed in Table 1 and RMSE is plotted in Figure 2. It is clear that all the methods yield rather large errors in repulsive geometries (RMSE between 1.5 and 2 kcal/mol) what is in strong contrast with their performance at equilibrium distances (some of the methods with corrections for non-covalent interactions yield errors as low as 0.7 kcal/mol in the S66 data set). However, these errors should be also compared to the results obtained with better methods, most importantly DFT-D3. The B-LYP-D3 method with the Becke-Johnson (BJ) damping yields excellent results in the S66x8 data set (RMSE of 0.22 kcal/mol), yet it exhibits substantially larger error of 0.94 kcal/mol in the R160x6 data set. It is known that the B-LYP functional is overly repulsive, so it is not surprising that it yields a positive MSE of 0.30 kca/mol (even worse systematic error would be observed with the zero damping). On the other hand, the PBE functional underestimates the repulsion, and in the R160x6 data set PBE-D3(BJ) method yields similar RMSE of 0.96 kcal/mol but the MSE is negative at -0.11 kcal/mol. DFT-D3 thus clearly outperforms the SQM methods, but the relative loss of accuracy in the repulsive systems is similar.

10

ACS Paragon Plus Environment

Page 10 of 20

Page 11 of 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

The simplest method tested, MNDO, yields very large errors (RMSE 5.4 kcal/mol) which are systematically positive (MSE 3.7 kcal/mol) what indicates overestimated repulsion. This is a known issue that was corrected in all the following SQM methods (starting with AM1) by additional empirical terms (formulated as a sum of Gaussian functions) to the core-core potentials. In all remaining cases except AM1, the MSE is negative what indicates that the repulsion is underestimated. The comparison of calculations without and with dispersion corrections shows that the addition of dispersion correction, which is attractive, makes the results only worse (see e.g. DFTB vs. DFTB-D, DFTB3 vs. DFTB3-D3H4, OM2 vs. OM2-D3 etc.). The only exception is PM6-D3H’ which is not only better than PM6 itself, but it produces the lowest RMSE at 1.57 kcal/mol. This can be attributed to the fact that it includes partial correction for the repulsion that works well even at the short distances. The performance of the OM2 and OM3 is somewhat disappointing as from theoretical point of view, these methods should be able to describe the repulsion better. The RM1 (which is a reparameterization of AM1) provides the smallest systematic error (MSE close to zero) but the overall accuracy is still low because of the remaining random error. The systematic error (MSE) was then analyzed separately for each element pair. The results are summarized in Table 2 where a color scale is used to indicate the sign and magnitude of the errors. Analogous table reporting the overall error expressed as RMSE is available in Supporting Information as Table S1. It is clear that the largest systematic error is always the hydrogenhydrogen repulsion. However, also the other element pairs usually exhibit underestimated repulsion. What is interesting that practically all the methods underestimate repulsion in the O-H, and many of them also in the N-H element pairs, although all the methods do not yield enough attraction in H-bonds featuring these elements.

11

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Parameterization of the repulsion correction The systematic underestimation of the repulsion observed in the tested method suggests that their accuracy could be improved by addition of a repulsive correction. We have parameterized the correction described above for the two methods we use the most, PM6-D3H4 and DFTB3-D3H4. The parameterization was performed on the R160x6 data set by the means of gradient optimization minimizing the RMSE in the data set. The derivatives of this objective function with respect to the parameters were calculated using finite differences. To ensure convergence to physically sound set of parameters, the optimization was started from parameters estimated for each pair of elements separately. The resulting parameters are listed in Tables S2 and S3 in the Supporting Information. To test the transferability of the parameterization from strongly repulsive contacts to distances closer to equilibrium and vice versa, we have repeated the same parameterization procedure also on the S66x8 data set (this setup will be referred to as D3H4R(s66x8)).

Testing the repulsion correction The repulsion correction reduces the error in the R160x6 data set as expected. In the case of PM6-D3H4R, the error drops to 1.41 kcal/mol what is only slightly less than the error of PM6D3H4' (1.57 kcal/mol). This is because in the case of PM6, the error is dominated by the H-H interactions (Figure 3) that are handled efficiently in PM6-D3H4'. However, despite being parameterized on RMSE, the correction removes the systematic part of the error very well, the MSE is only 0.06 kcal/mol. More significant improvement is seen in the case of DFTB3-D3H4R. The overall RMSE of 1.32 kcal/mol is the lowest among all the tested methods and the systematic part of the error is also very small (MSE = -0.1 kcal/mol). However, when this parameterization of the correction is applied to less extreme geometries, the results are significantly worse than those from the uncorrected methods. In the S66 data set which contains only equilibrium geometries, the RMSE grows to 3.3 kcal/mol in the case of DFTB3-D3H4R (compared to 0.65 kcal/mol for original DFTB3-D3H4) and to 3.37 kcal/mol in the case of PM6-D3H4R. Apparently, a simple correction of this type can improve the 12

ACS Paragon Plus Environment

Page 12 of 20

Page 13 of 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

description of artificial geometries with very large repulsion, but it is not transferable to geometries closer to the equilibrium. Adding another parameter to make the correction more flexible in term of extent and slope did not bring any substantial improvement, so it seems that the accuracy of the correction is limited by the fact that it is not dependent on the electronic structure of the system. To approach the problem from the other side, we have reoptimized the repulsion correction on the S66x8 data set. This yields results practically identical to the original PM6-D3H4' and DFTB3-D3H4 – the error in the S66 data set is 0.67 kcal/mol for both PM6-D3H4R S66x8 and DFTB3-D3H4RS66x8. This parameterization, however, provides even worse description of the R160x6 data set (RMSE of 2.19 and 2.40 kcal/mol, respectively) than the original methods including only H-H correction.

Conclusions Our study shows that all semiempirical QM methods have difficulties with description of intramolecular distances. The errors are several times larger than these ones observed at equilibrium distances. The SQM methods must be thus used with caution in systems where close repulsive contacts may occur, either because of artifacts in the preparation of the geometry, or because of external forces. Analysis of the origins of these large errors shows that the most important source is the underestimated repulsion between hydrogen atoms, but the repulsion is underestimated in practically all element pairs when dispersion correction is added. Our attempt to correct this problem by adding a standalone repulsive correction to both PM6D3H4 and DFTB-D3H4 was not successful, the correction was not transferable between strong repulsive contacts and geometries close to the equilibrium. For now, the correction for H-H repulsion that is a part of the D3H4 corrections is the best solution available. In future, better results can be probably achieved by including molecular complexes at repulsive distances in the training sets used in the development of the SQM methods. For this purpose, the R160x6 data set presented here would be a useful source of accurate reference data.

13

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Supporting Information The supporting information contains Table S1 that presents a heat map of RMSE in kcal/mol for different SQM methods tested over R160x6 benchmark dataset. Tables S2 and S3 contain element pairwise parameters optimized over R160x6 and S66x8 benchmark datasets respectively for SCC-DFTB and PM6 methods. Table S4 contains list of benchmark energies for the 960 complexes corresponding to different element pair contacts. We have also included the geometries of all the 960 complexes as a separate zip file.

Acknowledgements This work has supported by the Czech Science Foundation by grant No. P208/16-11321Y and is a part of the Research Project RVO 61388963 of the Institute of Organic Chemistry and Biochemistry, Academy of Sciences of the Czech Republic.

References: (1) (2) (3) (4) (5)

(6)

Christensen, A. S.; Kubař, T.; Cui, Q.; Elstner, M. Semiempirical Quantum Mechanical Methods for Noncovalent Interactions for Chemical and Biochemical Applications. Chemical Reviews. 2016, 5301–5337. Řezáč, J.; Hobza, P. Advanced Corrections of Hydrogen Bonding and Dispersion for Semiempirical Quantum Mechanical Methods. J. Chem. Theory Comput. 2012, 8, 141–151. Miriyala, V. M.; Řezáč, J. Description of Non-Covalent Interactions in SCC-DFTB Methods. J. Comput. Chem. 2017, 38, 688–697. Řezáč, J. Empirical Self-Consistent Correction for the Description of Hydrogen Bonds in DFTB3. J. Chem. Theory Comput. 2017, 13, 4804– 4817. Pecina, A.; Haldar, S.; Fanfrlík, J.; Meier, R.; Řezáč, J.; Lepšík, M.; Hobza, P. SQM/COSMO Scoring Function at the DFTB3-D3H4 Level: Unique Identification of Native Protein-Ligand Poses. J. Chem. Inf. Model. 2017, 57, 127–132. Pecina, A.; Meier, R.; Fanfrlík, J.; Lepšík, M.; Řezáč, J.; Hobza, P.; Baldauf, C. The SQM/COSMO Filter: Reliable Native Pose Identification Based on the Quantum-Mechanical Description of Protein–ligand Interactions and Implicit COSMO Solvation. Chem. Commun. 2016, 52, 3312–3315. 14

ACS Paragon Plus Environment

Page 14 of 20

Page 15 of 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

(7)

(8) (9) (10) (11) (12) (13)

(14)

(15) (16) (17) (18)

(19)

Vorlová, B.; Nachtigallová, D.; Jirásková-Vaníčková, J.; Ajani, H.; Jansa, P.; Řezáč, J.; Fanfrlík, J.; Otyepka, M.; Hobza, P.; Konvalinka, J.; et al. Malonate-Based Inhibitors of Mammalian Serine Racemase: Kinetic Characterization and Structure-Based Computational Study. Eur. J. Med. Chem. 2015, 89, 189–197. Řezáč, J.; Hobza, P. Benchmark Calculations of Interaction Energies in Noncovalent Complexes and Their Applications. Chemical Reviews. 2016, 5038–5071. Řezáč, J.; Riley, K. E.; Hobza, P. S66: A Well-Balanced Database of Benchmark Interaction Energies Relevant to Biomolecular Structures. J. Chem. Theory Comput. 2011, 7, 2427–2438. Stewart, J. J. P. Optimization of Parameters for Semiempirical Methods V: Modification of NDDO Approximations and Application to 70 Elements. J. Mol. Model. 2007, 13, 1173–1213. Korth, M.; Pitoňák, M.; Řezáč, J.; Hobza, P. A Transferable H-Bonding Correction for Semiempirical Quantum-Chemical Methods. J. Chem. Theory Comput. 2010, 6, 344–352. Stewart, J. J. P. Optimization of Parameters for Semiempirical Methods VI: More Modifications to the NDDO Approximations and ReOptimization of Parameters. J. Mol. Model. 2013, 19, 1–32. Maximilian Kubillus, Tomáš Kubař, Michael Gaus, J. Ř. and M. E. Parameterization of the DFTB3 Method for Br, Ca, Cl, F, I, K, and Na in Organic and Biological Systems. J. Chem. Theory Comput. 2015, 11, 332–342. Brahmkshatriya, P. S.; Dobeš, P.; Fanfrlik, J.; Rezáç, J.; Paruch, K.; Bronowska, A.; Lepšík, M.; Hobza, P. Quantum Mechanical Scoring: Structural and Energetic Insights into Cyclin-Dependent Kinase 2 Inhibition by pyrazolo[1,5-A]pyrimidines. Curr. Comput. Aided. Drug Des. 2013, 9, 118–129. Dewar, M. J. S.; Thiel, W. Ground States of Molecules. 38. The MNDO Method. Approximations and Parameters. J. Am. Chem. Soc. 1977, 99, 4899–4907. Dewar, M. J. S.; Zoebisch, E. G.; Healy, E. F.; Stewart, J. J. P. AM1: A New General Purpose Quantum Mechanical Molecular Model. J. Am. Chem. Soc. 1985, 107, 3902–3909. Rocha, G. B.; Freire, R. O.; Simas, A. M.; Stewart, J. J. P. RM1: A Reparameterization of AM1 for H, C, N, O, P, S, F, Cl, Br, and I. J. Comput. Chem. 2006, 27, 1101–1111. Bryantsev, V. S.; Diallo, M. S.; Van Duin, A. C. T.; Goddard, W. A. Evaluation of B3LYP, X3LYP, and M06-Class Density Functionals for Predicting the Binding Energies of Neutral, Protonated, and Deprotonated Water Clusters. J. Chem. Theory Comput. 2009, 5, 1016– 1026. Weber, W.; Thiel, W. Orthogonalization Corrections for Semiempirical 15

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(20) (21) (22)

(23) (24) (25)

(26) (27) (28) (29) (30) (31)

(32) (33) (34)

Methods. Theor. Chem. Accounts Theory, Comput. Model. (Theoretica Chim. Acta) 2000, 103, 495–506. Weber, W. PhD Thesis., Universitat Zurich, 1996. Scholten, M. PhD Thesis., Universitat Dusseldorf, 2003. Yang, Y.; Yu, H.; York, D.; Cui, Q.; Elstner, M. Extension of the SelfConsistent-Charge Density-Functional Tight-Binding Method: Third-Order Expansion of the Density Functional Theory Total Energy and Introduction of a Modified Effective Coulomb Interaction. J. Phys. Chem. A 2007, 111, 10861–10873. Gaus, M.; Cui, Q.; Elstner, M. DFTB3: Extension of the Self-ConsistentCharge Density-Functional Tight-Binding Method (SCC-DFTB). J. Chem. Theory Comput. 2011, 7, 931–948. Grimme, S.; Antony, J.; Ehrlich, S.; Krieg, H. A Consistent and Accurate Ab Initio Parametrization of Density Functional Dispersion Correction (DFT-D) for the 94 Elements H-Pu. J. Chem. Phys. 2010, 132, 154104. Grimme, S.; Bannwarth, C.; Shushkov, P. A Robust and Accurate TightBinding Quantum Chemical Method for Structures, Vibrational Frequencies, and Noncovalent Interactions of Large Molecular Systems Parametrized for All Spd-Block Elements (Z = 1-86). J. Chem. Theory Comput. 2017, 13, 1989–2009. Boys, S. F.; Bernardi, F. The Calculation of Small Molecular Interactions by the Differences of Separate Total Energies. Some Procedures with Reduced Errors. Mol. Phys. 1970, 19, 553–566. Řezáč, J.; Dubecký, M.; Jurečka, P.; Hobza, P. Extensions and Applications of the A24 Data Set of Accurate Interaction Energies. Phys. Chem. Chem. Phys. 2015, 17, 19268–19277. Elstner, M.; Hobza, P.; Frauenheim, T.; Suhai, S.; Kaxiras, E. Hydrogen Bonding and Stacking Interactions of Nucleic Acid Base Pairs: A DensityFunctional-Theory Based Treatment. J. Chem. Phys. 2001, 114, 5149. Gaus, M.; Goez, A.; Elstner, M. Parametrization and Benchmark of DFTB3 for Organic Molecules. J. Chem. Theory Comput. 2013, 9, 338– 354. Furche, F.; Ahlrichs, R.; Hättig, C.; Klopper, W.; Sierka, M.; Weigend, F. Turbomole. Wiley Interdiscip. Rev. Comput. Mol. Sci. 2014, 4, 91–100. Turney, J. M.; Simmonett, A. C.; Parrish, R. M.; Hohenstein, E. G.; Evangelista, F. A.; Fermann, J. T.; Mintz, B. J.; Burns, L. A.; Wilke, J. J.; Abrams, M. L.; et al. Psi4: An Open-Source Ab Initio Electronic Structure Program. Wiley Interdiscip. Rev. Comput. Mol. Sci. 2012, 2, 556–565. Aradi, B.; Hourahine, B.; Frauenheim, T. DFTB+, a Sparse Matrix-Based Implementation of the DFTB Method. J. Phys. Chem. A 2007, 111, 5678– 5684. Stewart, J. J. P. MOPAC2016. Stewart Computational Chemistry. 2016. Thiel, W. (Max-Planck-Institut für Kohlenforschung: Mülheim an der Ruhr, G. MNDO2005, Version 7.0. 2005. 16

ACS Paragon Plus Environment

Page 16 of 20

Page 17 of 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

(35) Řezáč , J. Cuby: An Integrative Framework for Computational Chemistry. J. Comput. Chem. 2016, 37, 1230–1237. (36) van Duijneveldt, F. B.; van de Rijdt, J. G. C. M. va. D.; van Lenthe, J. H. State of the Art in Counterpoise Theory. Chem. Rev. 1994, 94, 1873– 1885. Table 1: RMSE, MSE and MUE in kcal/mol for various SQM methods tested on the R160x6 benchmark dataset and RMSE in kcal/mol for S66x8 benchmark dataset. Method

RMSE

MSE

MUE

RMSE (S66x8)

BLYP-D3

0.94

0.30

0.59

0.22

PBE-D3

0.96

-0.11

0.63

0.47

MNDO

5.40

3.74

3.80

13.08

AM1

1.99

0.23

1.30

5.27

RM1

1.92

-0.05

1.39

4.38

PM6

2.06

-0.42

1.41

2.50

PM6-D3H4

2.08

-0.38

1.05

0.66

PM6-D3H4’

1.57

-0.87

1.37

0.66

PM7

1.85

-0.74

1.23

0.95

OM2

1.83

-0.61

1.34

2.32

OM2-D3 (Grimme)

2.04

-1.26

1.51

1.19

OM3

1.64

-0.32

1.18

2.70

OM3-D3 (Grimme)

1.86

-1.15

1.35

2.48

OM3-D3H4

1.72

-0.95

1.21

0.60

DFTB

1.81

-0.83

1.40

2.68

DFTB-D

1.88

-1.19

1.47

1.78

DFTB3

1.69

-0.29

1.24

2.93

DFTB3-D3H4

1.86

-1.05

1.37

0.93

GFN-xTB

1.82

-1.01

1.32

1.06

PM6-D3H4R

1.41

0.06

0.95

1.63

DFTB3-D3H4R

1.32

-0.11

0.94

3.04

17

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Table 2: Heat Map of MSE in kcal/mol for different SQM methods tested over R160x6 benchmark dataset. The colour scheme is blue for minimum value, white for zero and red for maximum value.

Figure 1: Plot comparing the original D3H4, D3H4’ and the new D3H4R correction. We plot the repulsion energy between two hydrogen atoms, as parameterized for PM6.

18

ACS Paragon Plus Environment

Page 18 of 20

Page 19 of 20

Figure 2: Plot of RMSE error (kcal/mol) for various SQM methods tested on the R160x6 benchmark dataset.

0

1

2

3

4

5

6

BLYP-D3 PBE-D3 MNDO AM1 RM1 PM6 PM6-D3H4 SQM Methods

PM6-D3H4’ PM7 OM2 OM2-D3 (Grimme) OM3 OM3-D3 (Grimme) OM3-D3H4 DFTB DFTB-D DFTB3 DFTB3-D3H4 GFN-xTB PM6-D3H4R DFTB3-D3H4R RMSE (kcal/mol)

Figure 3: Plot of RMSE error (kcal/mol) of different element pairs for SQM variants of PM6 tested on the R160x6 benchmark dataset. H-H O-H O-O C-H Element Pairs

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

C-O C-C N-H N-O N-C N-N 0.00

0.50

1.00

1.50

2.00

2.50

3.00

RMSE (kcal/mol)

PM6

PM6-D3H4

PM6-D3H4’

PM6-D3H4R

19

ACS Paragon Plus Environment

3.50

4.00

4.50

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

TOC Graphic

20

ACS Paragon Plus Environment

Page 20 of 20