Improved Polarizable Dipole–Dipole Interaction Model for Hydrogen

May 10, 2017 - The polarizable dipole–dipole interaction model was formulated in our laboratory to rapidly simulate hydrogen bonding in biosystems...
1 downloads 0 Views 1MB Size
Subscriber access provided by NEW YORK UNIV

Article

Improved Polarizable Dipole-dipole Interaction Model for Hydrogen Bonding, Stacking, T-shaped and X-H···# Interactions Xi-Chan Gao, Qiang Hao, and Chang-Sheng Wang J. Chem. Theory Comput., Just Accepted Manuscript • Publication Date (Web): 10 May 2017 Downloaded from http://pubs.acs.org on May 18, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Chemical Theory and Computation is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Improved Polarizable Dipole-dipole Interaction Model for Hydrogen Bonding, Stacking, T-shaped and X-H···π Interactions Xi-Chan Gao,# Qiang Hao,# Chang-Sheng Wang* School of Chemistry and Chemical Engineering, Liaoning Normal University, Dalian 116029, P. R. China

ABSTRACT: The polarizable dipole-dipole interaction model was formulated in our laboratory to rapidly simulate hydrogen bonding in biosystems. In this paper, this model is improved and further parameterized for stacking, T-shaped and X-H···π interactions by adding the orbital overlap term and fitting to 19 CCSD(T)/CBS interaction energy curves of training dimers. The performance of our model is assessed through its application to more than 100 complexes, including hydrogen-bonded, stacked, T-shaped, and X-H···π complexes. For 124 relatively small testing complexes, our model reproduces benchmark equilibrium intermolecular distances with a root-mean-square deviation (RMSD) of 0.08 Å, and it reproduces benchmark interaction energies with a 0.64 kcal/mol RMSD. For 14 large noncovalent complexes, our model reproduces benchmark equilibrium intermolecular distances with a RMSD of 0.05 Å, and it reproduces benchmark interaction energies with a 0.80 kcal/mol RMSD. Extensive comparisons are made to interaction energies calculated via the M06-2X and M06-2X-D3 methods, via the well-known non-polarizable AMBER99 force field method, via the popular polarizable AMOEBA force field method, and via semi-empirical quantum mechanical (SQM) methods. Our statistical evaluations show that our model outperforms the AMBER99, AMOEBA and SQM

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

methods and is as accurate as the M06-2X and M06-2X-D3 methods. In summary, the model developed in this work is reasonable and the newly introduced orbital overlap term is effective in the accurate modeling of the noncovalent interactions. Our testing results also indicate that the polarization interaction term is important in the evaluation of hydrogen bonding whereas the orbital overlap is important in examining short hydrogen bonding, T-shaped and X-H···π interactions. Our model may serve as a new tool for modeling biological systems where hydrogen bonding, stacking, T-shaped and X-H···π interactions are of general importance.

1. INTRODUCTION Hydrogen bonding, stacking, T-shaped and X-H···π interactions play critical roles in determining the structures and properties of nucleic acids and proteins.1-3 Hydrogen bonding interactions play important roles in gene expression, protein folding and protein-nucleic acid recognition.4-7 Stacking interactions play key roles in phenomena as diverse as base-pair stacking in DNA8, side-chain interactions in proteins9, and host-guest complexes10. Černý and coworkers investigated the role of the electrostatic and dispersion forces on the stabilization of the B-DNA helix.11 They found that the balance of electrostatic and dispersion forces is important in the stabilization of the duplex DNA structure, and the absence of dispersion force which modifies mainly the stacking interaction leads to an increase of the vertical separation and also to a loss of helicity. T-shaped and X-H···π interactions involving protein and DNA components are also common in nature.12-16 As hydrogen bonding, stacking, T-shaped and X-H···π interactions play such important roles in biosystems, the ability to accurately and efficiently simulate these noncovalent interactions is critical for the computer simulation of biological processes such as protein folding and protein-nucleic acid recognition.

ACS Paragon Plus Environment

Page 2 of 38

Page 3 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

In past decades, several computational methods had been developed to accurately represent noncovalent interaction.17-20 Among these computational methods, the coupled-cluster with single, double, and perturbative triple excitations at complete basis set limit [CCSD(T)/CBS] method outperforms other methods.21-24 The CCSD(T)/CBS method can produce noncovalent interaction energy accurately enough (with errors not exceeding 1 kcal/mol),25 however it is limited to complexes containing dozens of atoms.26 More recently, several density functional theory (DFT) methods have grown increasingly popular.27 Although DFT methods provide a good ratio in terms of costs compared to the CCSD(T) method, they can only feasibly be applied to systems smaller than a few hundred atoms.28 Semiempirical quantum mechanical methods (SQM) present significant cost advantages and can be applied to complexes containing thousands of atoms. However, they are less accurate at describing noncovalent interactions.29 Recently, SQM methods have been improved through the inclusion of dispersion and hydrogen-bond corrections.30-32 Molecular mechanical (MM) force field methods constitute another major tool for the application of theoretical approaches to simulate structural and energetic properties of molecular systems. In contrast to quantum mechanical methods (QM), MM techniques are efficient because they are dependent upon an empirically parameterized function of atomic coordinates. AMBER,33 CHARMM,34 OPLS,35 GROMOS,36 Merck Molecular Force Field,37 and AMOEBA38 are some of the most widely used MM force fields for treating biological molecules. Toward increasing the computational accuracy39,40 and in contemplation of the fluctuations experienced in charge distribution by a molecule when approaching to another,41 polarizable force fields have been developed in recent years. Usually the charge polarization is taken into account by using site point dipole,38,42-44 fluctuating charge,45-47 charge response kernel,48-51 and Drude oscillator52-55 methods. Paton and Goodman assessed the performance of

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

various widely used force fields and found that all of the tested force fields show large mean errors for hydrogen bonding interactions.56 Valdes and coworkers studied the dispersion interaction between peptide backbone and aromatic side chains using CCSD(T), MP2, MP3, DFT, DFT-D, and AMBER99 methods and found that neither structures nor relative energies produced through the AMBER99 method are reliable.57

As noted above, even though enormous efforts have been made to carry out accurate evaluations of noncovalent interactions in proteins and nucleic acids, we are still far from satisfied. Focusing on the rapid evaluation of interaction energies of proteins and nucleic acid systems, we recently developed a polarizable dipole-dipole interaction method for estimating the hydrogen bonding strength58-61 based on the bond dipole. One of the advantages of this model is the polarization contributions are taken into consideration by using a clear physical meaning bond dipole term. In this article, this method has been improved by adding the overlap integral to accounting the noncovalent-bond charge transfer and has also been further parameterized for simulating stacking, X-H···π and T-shaped interactions. The results show that the model developed in this work efficiently reproduces benchmark interaction energies, proving the model’s utility.

2. COMPUTATIONAL METHOD

All the AMBER and AMOEBA force field based calculations were performed using the TINKER molecular modeling package.62 All ab initio calculations were carried out by using Gaussian 09 package suite.63 Quantum mechanical calculations of single-point energy were performed on benchmark geometries using the M06-2X and M06-2X-D3 methods paired with the aug-cc-pVDZ basis set. All the calculations based on our newly developed polarizable

ACS Paragon Plus Environment

Page 4 of 38

Page 5 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

dipole-dipole interaction model were done by using the in-house developed PBFF code.64 The geometry optimizations at the molecular mechanical level were performed by using a revised OPTKING program, which implemented in PSI3 computational package.65

3. THE IMPROVED POLARIZABLE DIPOLE-DIPOLE INTERACTION MODEL

Our newly developed polarizable dipole-dipole interaction energy (IE) function comprises electrostatic interaction term Ees, van der Waals interaction term Evdw, polarization contribution term Epol, and new orbital overlap term Eorb, as demonstrated in eq. (1). IE = Ees + Evdw + E pol + Eorb

(1)

As described elsewhere,58-61 the electrostatic interaction term Ees is estimated by calculating the permanent bond dipole-dipole interaction. Also, Lennard-Jones 12-6 potential eq. (3) is used to calculate the van der Waals interaction term Evdw, and the polarization contribution term Epol is estimated by all the interactions between permanent bond dipole moment and induced bond dipole moment or between two induced bond dipole moments. To be noted, all the permanent dipole-dipole and induced dipole-dipole interactions between two species (no matter if they are directly or indirectly interacted) are counted.58-61 Ees =∑

µi µ j

i∈I j∈J

r3

( 2cosθ cosθ ′ + sin θ sin θ ′ cos ζ )

 Aij Bij  Evdw = ∑  12 − 6  i∈I   Rij Rij  j∈J

E pol = ∑ i∈I j∈J

( µ δµ i

j

(2)

(3)

+ µ jδµi + δµiδµ j ) rij 3

( 2 cos θ cos θ ′ + sin θ sin θ ′ cos ζ )

(4)

To accounting the charge transfer issue in the noncovalent-bonding,66,67 Eorb term is also

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 38

employed to account for orbital overlapping concept. n

Eorb = ∑ 2 Dm f m {exp[−2am ( Rm − Req ,m )] −2exp[−am ( Rm − Req ,m )]}

(5)

m =1

Assuming that total n X-H···Y hydrogen bonds exist between two monomers. For the mth hydrogen bond, Rm of eq. (5) denotes the distance between the proton acceptor Y atom and proton donor H, and Req,m denotes the equilibrium hydrogen bond distance. Dm and am are parameters, and fm of eq. (5) is calculated via eq. (6).

  S − 0.5⋅( p m + q m)   f m = 1+ exp  −2α m⋅  (S − p m)(q m − S )     where S = nlp σ *X − H

−1

(6)

is the overlap integral between the X-H antibonding orbital σ* of the

proton donor and the nonbonding orbital nlp of the proton acceptor. αm, pm, and qm are parameters that are dependent on the hydrogen bond type. Here, hydrogen-like wave functions are used to calculate the overlap integral S.68

The parameter optimization performed in this paper targets the water and organic molecules, such as alcohol, acid, alkane, benzene, nucleic acid base etc. Given our goal of accurately predicting equilibrium intermolecular distances and interaction energies between these organic molecules, as presented in Figure 1, 19 complexes are chosen as the training dimers to determine the parameters related to the chemical bond dipoles, atom types, and orbital overlap. Detailed information on the 19 training dimers can be found in Table S1 in the supporting information. The overall parameterization strategy is presented as follows.

First of all, the optimized structures at the counterpoise-corrected (CP-corrected) MP2/ccpVTZ level are extracted from literature.69 Then, the one-dimensional scan is performed by

ACS Paragon Plus Environment

Page 7 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

varying the key distances from 1.0 to 5.5 Å, and the value of Ees+Evdw is calculated by extracting the initial chemical bond dipole moments from Lange’s Handbook of Chemistry70 and setting the initial van der Waals parameters ε and R* according to the AMBER99 force field71 for each distance. The initial energy curve is thus obtained from Ees+Evdw and the initial equilibrium intermolecular distance is also determined accordingly. Thirdly, the Epol term is calculated by using the atomic partial charge in the initial equilibrium structure determined by Ees+Evdw. The induced bond dipole moment δµ in eq. (4) is estimated via the function δµ = d·(q-q0) where d is the bond length, q and q0 are the atomic partial charges in dimer and in monomers, respectively.72 The new energy curve is thus obtained from Ees+Evdw+Epol and the equilibrium intermolecular distance is determined according to the new energy curve. Then, the permanent dipole moments and van der Waals parameters ε and R* are iterated until the difference between the equilibrium intermolecular distances determined by the new energy curve and the corresponding QM distances below 0.1 Å. Finally, the orbital overlap term is employed and parameterized by matching the CCSD(T)/CBS interaction energies curves from the literature. It is notable that all the parameters in the Ees, Evdw, Epol, and Eorb terms are optimized iteratively, and determined at the same time. The parameters derived through this work as well as those determined in pervious works61,70,71 are listed in Table 1 and Table S2.

Table 2 shows benchmark equilibrium intermolecular distances Rref produced from CPcorrected MP2/cc-pVTZ calculations69 and equilibrium intermolecular distances R produced through our model for the training dimers. Table 2 shows that our model satisfactorily reproduces the benchmark equilibrium intermolecular distances. In comparison to the benchmark MP2/cc-pVTZ equilibrium intermolecular distances, our model gives 0.05 Å for the root-meansquare deviation (RMSD). Benchmark interaction energies IEref produced via the CCSD(T)/CBS

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

method69 and interaction energies IE produced through our model for the training dimers are also listed in Table 2. It is shown that our model can reproduce benchmark CCSD(T)/CBS interaction energies very well. Compared to benchmark interaction energy levels, our model gives 0.23 kcal/mol for RMSD. Figure S1 shows the interaction potential energy curves obtained via the CCSD(T)/CBS method69 and through our model. Figure S1 shows that our model can effectively reproduce CCSD(T)/CBS potential energy curves for the studied training dimers. These results suggest that the parameters derived here are reasonable.

4. VALIDATION OF OUR POLARIZABLE DIPOLE-DIPOLE INTERACTION METHOD

To validate our new polarizable dipole-dipole interaction method, we examined the optimized structures and also the interaction energies of 124 small dimers containing 35 hydrogen-bonded complexes, 27 stacked complexes, 32 T-shaped complexes, 30 X-H···π & other complexes, as shown in Figures S2-S5. All the benchmark high-level QM results of the 124 testing dimers were drawn from the literatures69,72-76, and detailed information on them can be found in Tables S3-S6.

4.1 Structural Properties of 124 Small Testing Dimers

For the 124 small testing dimers, the structures were optimized by using the newly determined parameters as listed in Table 1 and Table S2. As shown in Table 3 and Tables S7S10, our model can nicely reproduce the intermolecular distances of all 124 dimers. Here we select the RMSD, maximum absolute deviation (MAD), and mean relative deviation (MRD) as the indicators to assess the prediction accuracy.

ACS Paragon Plus Environment

Page 8 of 38

Page 9 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

The hydrogen bond distances evaluated by our method, together with their corresponding benchmark values, for 35 hydrogen-bonded dimers are listed in Table S7, the equilibrium distances errors are collected in Table 3. As listed in Table 3, for all the hydrogen bonded dimers, our model reproduces benchmark hydrogen bond distances with a 0.04 Å RMSD and 1.68% MRD. The MAD for the lengths of these relatively strong N-H···O, N-H···N, O-H···O, and O-H···N type hydrogen bonds evaluated under our model is 0.10 Å, found for the dimers 26, 29, and 33, however, their relative deviations are only 5.02%, 4.97%, and 5.13%. For the relative weaker stacked, T-shaped, and other type of dimers, our model reproduces the benchmark distances with the RMSD (MAD, MRD) of 0.08 Å (0.18 Å, 1.87%) for stacked dimers, 0.10 Å (0.20 Å, 3.14%) for T-shaped dimers, and 0.09 Å (0.18 Å, 2.36%) for other type of dimers, respectively. Further details can be found in Table S7-S10 in the supplementary materials. These results suggest that our model can produce accurate equilibrium intermolecular distances for these noncovalent complexes.

4.2 Interaction Energies of 124 Small Testing Dimers

The interaction energies produced through our model, as well as some other popular methods, on the 124 tested dimers are presented in Table 4 and Tables S11-S18. The corresponding high-level QM results drawn from the literatures are also listed for comparison.

4.2.1 Interaction Energies of 35 Hydrogen-bonded Dimers

To validate the reliability of our method on evaluating the hydrogen bonding interaction energies in biomolecular, 35 hydrogen-bonded dimers from JSCH-2005, S22 and S66 sets69,72 (Figure S2 in the supporting information) are chosen as testing set.

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

As presented in Table 4 and Table S11, our estimated interaction energies exhibit a RMSD of 0.94 kcal/mol, with a MAD of 2.04 kcal/mol observed in complex 18 (G···C WC). The CPcorrected and non CP-corrected M06-2X methods exhibit RMSDs of 1.41 and 0.65 kcal/mol, with MADs of 2.74 and 1.83 kcal/mol, respectively. The RMSDs (MADs) are still 1.02 (2.15) and 0.50 (1.24) kcal/mol, respectively, after applied the D3 dispersion corrections.77 The RMSD of the non-polarizable AMBER99 force field is 5.78 kcal/mol with a MAD of 11.20 kcal/mol observed in the complex 19 (mG···mC WC). The RMSD of the polarizable AMOEBA force field is 1.37 kcal/mol with a MAD of 4.09 kcal/mol found in the complex 18 (G···C WC). These comparisons show that our model outperforms the two force fields and is almost as accurate as the M06-2X and M06-2X-D3 methods. By comparing the results obtained from AMBER99, AMOEBA, as well as our model, we are able to highlight that the Epol term is important even in these neutral hydrogen-bonded dimers, and the newly introduced Eorb term is also important and contributes to the accurate evaluation of the interaction energies, as listed in Table S12.

4.2.2 Interaction Energies of 27 Stacked Complexes

We here apply our polarizable model to 27 π···π stacked complexes (Figure S3) picked from JSCH-2005 set and other references to check our method and also newly determined parameters. As presented in Table 4 and Table S13, our estimated interaction energies exhibit a RMSD of 0.50 kcal/mol, with a MAD of 1.14 kcal/mol observed in complex 60 (A TRP st), while the RMSDs of the non-polarizable AMBER99 force field and polarizable AMOEBA force field are 2.64 and 0.59 kcal/mol, respectively. These results indicate that our polarizable model performs as well as the well-known polarizable AMOEBA force field for π···π stack interactions. Both our model and the AMOEBA force field give marginally better results comparing with non-

ACS Paragon Plus Environment

Page 10 of 38

Page 11 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

polarizable AMBER99 force field, suggesting that the including of polarization effect in force field methods might also be important in getting accurate π···π interaction energies, consistent with the findings concluded by MacKerell et al.78 This phenomenon is also consistent with our energy decomposition results listed in Table S14, which shows a relative large interaction energy contribution from the Epol term in some stacked complexes. The fact that the Eorb term is nearly zero-energy contribution to the total interaction energy (Table S14) may be the reason why our polarizable model only delivers similar performance compared to the polarizable AMOEBA force field.

4.2.3 Interaction Energies of 32 T-shaped Aromatic-aromatic Complexes

As described in Figure S4, 32 T-shaped aromatic-aromatic complexes from literatures are also employed to validate our model. As presented in Table 4 and Table S15, our estimated interaction energies exhibit a RMSD of 0.58 kcal/mol, with a MAD of 1.36 kcal/mol observed also in complex 84 (C HIS face), comparing with the RMSDs of the CP-corrected and non CPcorrected M06-2X methods are 0.78 and 0.39 kcal/mol, respectively. The RMSD is slightly reduced to 0.30 kcal/mol after the consideration of D3 dispersion corrections and also the basis set superposition error. The RMSDs of the non-polarizable AMBER99 force field and polarizable AMOEBA force field are 2.84 and 0.91 kcal/mol, respectively. Both our polarizable model and the AMOEBA force field give much better results comparing with non-polarizable AMBER99 force field, also intimating that the polarization effect is apparently considerable in getting accurate interaction energies of these T-shaped aromatic-aromatic complexes. This phenomenon is also consistent with our energy decomposition results listed in Table S16. The

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

small contribution from Eorb term in our model may be counted for slightly improvements compared with AMOEBA force field.

4.2.4 Interaction Energies of 30 X-H···π & Other Complexes

As listed in Figure S5, 30 X-H···π aliphatic-aromatic complexes and aliphatic-aliphatic complexes from S66 and other reference are also employed to test our method. As presented in Table 4 and Table S17, our estimated interaction energies exhibit a RMSD of 0.28 kcal/mol, with a MAD of 0.75 kcal/mol observed in complex 108 (U···Cyclopentane). Comparing with the RMSDs of the CP-corrected and non CP-corrected M06-2X methods are 0.29 and 0.83 kcal/mol, respectively. The RMSD is changed to 0.40 kcal/mol after the consideration of D3 dispersion corrections and also the basis set superposition error. The RMSD of the polarizable AMOEBA force field is 0.57 kcal/mol. Due to the absence of parameters, AMBER99 force field is unable to simulate most of the complexes in this testing set. Similar to the previous results for the Tshaped aromatic-aromatic complexes, our model can provide M06 comparable interaction energies, which are slightly better than AMOEBA force field, may be also due to the contribution from Eorb term. These results indicate again that our method is powerful and promising for the tasks of analyzing the interactions in these kinds of aliphatic-aromatic complexes and aliphatic-aliphatic complexes.

4.2.5 The Accuracy According to the Simulations on 124 Testing Dimers

For all the 124 testing dimers, as compared to the benchmark high-level QM results, our model produces interaction energy values of 0.64 kcal/mol RMSD, whereas the non CPcorrected M06-2X method gives the corresponding value of 0.68 kcal/mol, the CP-corrected

ACS Paragon Plus Environment

Page 12 of 38

Page 13 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

M06-2X method gives the corresponding value of 0.91 kcal/mol, the non CP-corrected M06-2XD3 method gives the corresponding value of 1.05 kcal/mol, the CP-corrected M06-2X-D3 method gives the corresponding value of 0.61 kcal/mol, the AMBER99 method gives the corresponding value of 3.95 kcal/mol, and the AMOBEA method gives the corresponding value of 0.96 kcal/mol, as listed in Table 4. These results indicate that our model can predict the interaction energies for noncovalent complexes in good agreement with those yielded by CCSD(T)/CBS or MP2/CBS method.

To further determine the accuracy of our method, we reproduced several literature available interaction potential energy curves. As shown in Figure S6, 51 interaction potential energy curves obtained via the expensive CCSD(T)/CBS method and from our model are presented. The interaction potential energy curves obtained from our model agreed well with those obtained via high-level quantum mechanical method, further illustrating that our model is reasonable.

4.3 The Comparison with Semiempirical Quantum Mechanical (SQM) Methods

The accuracy of some SQM methods has recently been improved by including hydrogenbond and dispersion corrections and tested on the famous S66 set. SQM interaction energies were taken from the literatures79-81 and are listed in Table S19. Table 5 presents collected interaction energy errors produced through our model and from 15 SQM methods for the S66 set. Table 5 shows that the four PM6 methods (PM6-D3H4, PM6-D3H+, PM6-DH+, and PM6DH2), which include both dispersion and hydrogen-bond corrections, outperform other SQM methods. Table 5 shows that in comparison to the benchmark CCSD(T)/CBS interaction energies, PM6-D3H4 gives 0.63 kcal/mol and 11.01% for RMSD and MRD, PM6-D3H+ gives 0.81 kcal/mol and 12.93% for RMSD and MRD, PM6-DH+ gives 0.81 kcal/mol and 14.29% for

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

RMSD and MRD, PM6-DH2 gives 0.91 kcal/mol and 14.49% for RMSD and MRD, and our model gives 0.39 kcal/mol and 6.19% for RMSD and MRD. These statistical evaluations show that our model outperforms all of the 15 SQM methods examined.

4.4 Bulk Dimers

We further test our method by applying it to 14 larger literature available biological or nonbiological noncovalent complexes. All the quantum mechanical results were extracted from published papers.58,82-84 The structures, together with typical intermolecular bond lengths, are presented in Figure 2. More detailed information for these complexes is listed in Table S20. Compared with the ab initio benchmark results, our estimated donor-acceptor distances exhibit a MAD of 0.10 Å and a RMSD of 0.05 Å. The MRD is only 1.56%. Table 6 presents the benchmark interaction energies58,82-84 and the interaction energies obtained through our model, M06-2X, M06-2X-D3, AMBER99, and AMOEBA methods. Our model produces interaction energy values of 0.80 kcal/mol RMSD and 4.82% MRD with a MAD of 1.68 kcal/mol observed in complex 138 composed of two octadecanes. The CP-corrected M06-2X method gives 3.34 kcal/mol and 20.98% for RMSD and MRD, respectively, with a MAD of 6.09 kcal/mol observed in stacked complex 130 composed of two coronenes. The non CP-corrected M06-2X method gives 2.36 kcal/mol and 11.27% for RMSD and MRD, respectively, with a MAD of 6.27 kcal/mol observed in hydrogen-bonded complex 125 composed of two glycine pentapeptides. The CP-corrected M06-2X-D3 method gives 1.90 kcal/mol and 9.63% for RMSD and MRD, respectively, with a MAD of 4.79 kcal/mol observed in hydrogen-bonded complex 125 composed of two glycine pentapeptides. The non CP-corrected M06-2X-D3 method gives 4.14 kcal/mol and 28.08% for RMSD and MRD, respectively, with a

ACS Paragon Plus Environment

Page 14 of 38

Page 15 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

MAD of 8.60 kcal/mol observed also in hydrogen-bonded complex 125. The AMBER99 method gives 3.82 kcal/mol and 20.97% for RMSD and MRD, respectively, with a MAD of as large as 6.40 kcal/mol observed in stacked complex 130 containing two coronene molecules. The polarizable AMOEBA force field gives 3.02 kcal/mol and 18.07% for RMSD and MRD, respectively, with a MAD of as large as 5.36 kcal/mol observed also in stacked complex 130. These comparisons demonstrate that our model outperforms not only the AMBER99 or AMOEBA force field method but also the M06-2X or M06-2X-D3 methods for large noncovalent complexes examined.

The decomposed interaction energies of these 14 larger testing dimers are listed in Table S21. For the uncharged π-π stacked dimers 128-131, which exhibit little polarization or orbital overlap contributions in the interaction energies, our method also provides much better results may due to better parameterization. As sharing the same expression on van der Waals interaction term, our van der Waals parameters were extracted and injected into AMBER99 force field calculations. The results show that the deviations of the AMBER99 calculations on these 4 stacked dimers reduced to 2.54, 3.72, 2.68, 1.62 kcal/mol; comparing with the former 3.84, 5.76, 6.40, 4.22 kcal/mol. These results also indicate the transferability of our parameters to other similar force fields. More importantly, our newly implemented orbital overlap term seems play an important role in short hydrogen-bonded complexes and also T-shaped complexes. For example, comparing to the literature benchmark results,58 the deviations of interaction energies evaluated by our method are 1.38 and 0.04 kcal/mol for the two short range hydrogen-bonded dimers (125 and 127), whereas, -2.41 and 3.37 kcal/mol for AMOEBA polarizable force field. The interaction energy errors are also reduced to -0.35 and -0.64 kcal/mol for the two T-shaped

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

complexes (132 and 133). These results also provide evidence that the Eorb term is importance and may help to improve the accuracy of force field methods.

5. EFFICIENCY

Our model is not only accurate but is also very efficient. Timing studies were performed on an Intel i7 quad core server, and some of our results are summarized in Table S22. Table S22 shows that for the relatively small dimer 45 containing only 24 atoms, our model completed the interaction energy calculation in less than 0.68 seconds CPU time, which is more than 2,600 times faster than the computation time of the non CP-corrected M06-2X (1,796 seconds CPU time) and roughly 6,500 times faster than that of the CP-corrected M06-2X-D3 (4,427 seconds CPU time). For the large dimer 126 containing 80 atoms, our model completed the interaction energy calculation in less than 1.05 seconds CPU time, making it over 36,000 times faster than the non CP-corrected M06-2X (38,839 seconds CPU time) method and more than 94,000 times faster than the CP-corrected M06-2X-D3(99,325 seconds CPU time) method. For the even larger dimer 138 containing 112 atoms, our model completed the interaction energy calculation in less than 2.55 seconds CPU time, making it over 39,000 times faster than the non CP-corrected M062X (100,231 seconds CPU time) method and more than 140,000 times faster than CP-corrected M06-2X-D3 (370,322 seconds CPU time) method. These comparisons show that our model is very efficient.

6. CONCLUSION The polarizable dipole-dipole interaction method has been further developed and elaborated. The parameters were primarily derived based on the experimental bond dipole moment values and also optimized according to the high-level quantum mechanical results. Extensive

ACS Paragon Plus Environment

Page 16 of 38

Page 17 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

comparisons drawn with interaction energies obtained from the M06-2X, M06-2X-D3, AMBER99, AMOEBA, and SQM methods show that for small noncovalent systems, our model outperforms the AMBER99, AMOEBA and SQM methods and is as accurate as the M06-2X and M06-2X-D3 methods; for large noncovalent systems, our model generates more accurate results than the AMBER99, AMOBEA, M06-2X and M06-2X-D3 methods. Our timing studies show that our model is very efficient. We expect the model developed in this work will serve as a new tool for biological process simulations. It is encouraging that we succeeded at improving our polarizable dipole-dipole interaction model by introducing the Eorb term as a function of the overlap integral between the X-H antibonding orbital of proton donors and the nonbonding orbital of proton acceptors. This term is found to be especially useful in increasing the accuracy of the predicted interaction energies for hydrogen bonding, T-shaped and X-H···π interactions, while appears little impact on π-π stacking interactions. As this orbital overlap term is well separated from the other terms in our model, this concept and also the equations could be easily transferred to other force field methods. Overall, the current polarizable dipole-dipole interaction model could be viewed as a preliminary version of force field, which can be used for the evaluation of the interactions among the peptides (composed by neutral amino acids except cysteine and methionine), nucleosides, and small organic molecules. The comparison with other force fields, which have been parameterized and tested for tens of years, suggests that further improvement of this model on the ion···π interactions, the intramolecular forces, and also the simulations of biological systems in solution phase should be our focuses. The studies on these directions are in progressing.

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

AUTHOR INFORMATION Corresponding Author * E-mail: [email protected]. Author Contributions The manuscript was written through contributions of all authors. All authors have given approval to the final version of the manuscript. #These authors contributed equally. Notes The authors declare no competing financial interest. ACKNOWLEDGMENTS We gratefully acknowledge the financial support from National Natural Science Foundation of China (21573098, 21173109), Program for Leading Figures in Dalian, and the Starting up Foundation of Liaoning Normal University (202633), China. ASSOCIATED CONTENT Supporting Information. Potential energy curves for 19 training dimers; the parameters used in this work; pictures, equilibrium intermolecular distances, interaction energies and decomposition of interaction energies for 124 small testing dimers; potential energy curves for 51 testing dimers; information and decomposition of interaction energies for 14 large noncovalent complexes; interaction energies for the S66 dataset for different theoretical levels; the comparisons of computational times; the description about the dipole-dipole interactions; the optimization process of parameters in orbital overlap term; coordinates for all 157 noncovalent complexes. This information is available free of charge via the Internet at http://pubs.acs.org.

ACS Paragon Plus Environment

Page 18 of 38

Page 19 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

REFERENCES (1) Butcher, S.E.; Pyle, A.M. The Molecular Interactions That Stabilize RNA Tertiary Structure: RNA Motifs, Patterns, and Networks. Acc. Chem. Res. 2011, 44, 1302-1311. (2) Meyer, E.A.; Castellano, R.K.; Diederich, F. Interactions with Aromatic Rings in Chemical and Biological Recognition. Angew. Chem., Int. Ed. 2003, 42, 1210-1250. (3) Černý, J.; Hobza, P. Non-Covalent Interactions in Biomacromolecules. Phys. Chem. Chem. Phys. 2007, 9, 5291-5303. (4) Westhof, E. Isostericity and Tautomerism of Base Pairs in Nucleic Acids. FEBS Lett. 2014, 588, 2464-2469. (5) Li, P.T.X.; Vieregg, J.; Tinoco, I., Jr. How RNA Unfolds and Refolds. Annu. Rev. Biochem. 2008, 77, 77-100. (6) Dill, K.A.; MacCallum, J.L. The Protein-Folding Problem, 50 Years On. Science 2012, 338, 1042-1046. (7) Gromiha, M.M.; Siebers, J.G.; Selvaraj, S.; Kono, H.; Sarai, A. Role of Inter and Intramolecular Interactions in Protein-DNA Recognition. Gene 2005, 364, 108-113. (8) Riley, K.E.; Hobza, P. Noncovalent Interactions in Biochemistry. WIREs Comput. Mol. Sci. 2011, 1, 3-17. (9) Burley, S.K.; Petsko, G.A. Aromatic-Aromatic Interaction: A mechanism of Protein Structure Stabilization. Science 1985, 229, 23-28. (10) Hunter, C.A. The Role of Aromatic Interactions in Molecular Recognition. Chem. Soc. Rev. 1994, 23, 101-109. (11) Černý, J.; Kabeláč, M.; Hobza, P. Double-Helical→Ladder Structural Transition in the BDNA is Induced by a Loss of Dispersion Energy. J. Am. Chem. Soc. 2008, 130, 16055-16059.

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(12) Davey, C.A.; Sargent, D.F.; Luger, K.; Maeder, A.W.; Richmond, T.J. Solvent Mediated Interactions in the Structure of the Nucleosome Core Particle at 1.9 Å Resolution. J. Mol. Biol. 2002, 319, 1097-1113. (13) Wilson, K.A.; Kellie, J.L.; Wetmore, S.D. DNA-Protein π-Interactions in Nature: Abundance, Structure, Composition and Strength of Contacts between Aromatic Amino Acids and DNA Nucleobases or Deoxyribose Sugar. Nucleic Acids Res. 2014, 42, 6726-6741. (14) Pace, C.J.; Kim, D.; Gao, J. Experimental Evaluation of CH-π Interactions in a Protein Core. Chem-Eur. J. 2012, 18, 5832-5836. (15) Umezawa, Y.; Nishio, M. CH/π Interactions in the Crystal Structure of TATA-Box Binding Protein/DNA Complexes. Bioorg. Med. Chem. 2000, 8, 2643-2650. (16) Kryger, G.; Silman, I.; Sussman, J.L. Structure of Acetylcholinesterase Complexed with E2020 (Aricept®): Implications for the Design of New Anti-Alzheimer Drugs. Structure 1999, 7, 297-307. (17) Foster, M.E.; Sohlberg, K. Empirically Corrected DFT and Semi-Empirical Methods for Non-Bonding Interactions. Phys. Chem. Chem. Phys. 2010, 12, 307-322. (18) Zhao, Y.; Truhlar, D.G. Density Functionals for Noncovalent Interaction Energies of Biological Importance. J. Chem. Theory Comput. 2007, 3, 289-300. (19) Schneebeli, S.T.; Bochevarov, A.D.; Friesner, R.A. Parameterization of a B3LYP Specific Correction for Noncovalent Interactions and Basis Set Superposition Error on a Gigantic Data Set of CCSD(T) Quality Noncovalent Interaction Energies. J. Chem. Theory Comput. 2011, 7, 658-668.

ACS Paragon Plus Environment

Page 20 of 38

Page 21 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(20) Stewart, J.J.P. Optimization of Parameters for Semiempirical Methods VI: More Modifications to the NDDO Approximations and Re-Optimization of Parameters. J. Mol. Model. 2013, 19, 1-32. (21) Riley, K.E.; Pitoňák, M.; Jurečka, P.; Hobza, P. Stabilization and Structure Calculations for Noncovalent Interactions in Extended Molecular Systems Based on Wave Function and Density Functional Theories. Chem. Rev. 2010, 110, 5023-5063. (22) Helgaker, T.; Klopper, W.; Koch, H.; Noga, J. Basis-Set Convergence of Correlated Calculations on Water. J. Chem. Phys. 1997, 106, 9639-9646. (23) Feller, D. The Use of Systematic Sequences of Wave Functions for Estimating the Complete Basis Set, Full Configuration Interaction Limit in Water. J. Chem. Phys. 1993, 98, 7059-7071. (24) Hao, Q.; Simmonett, A.C.; Yamaguchi, Y.; Fang, D.-C.; Schaefer, H.F. Structures and Energetics of H6+ Clusters. J. Phys. Chem. A 2009, 113, 13608-13620. (25) Hobza, P. Calculations on Noncovalent Interactions and Databases of Benchmark Interaction Energies. Acc. Chem. Res. 2012, 45, 663-672. (26) Pitoňák, M.; Neogrády, P.; Černý, J.; Grimme, S.; Hobza, P. Scaled MP3 Non-Covalent Interaction Energies Agree Closely with Accurate CCSD(T) Benchmark Data. ChemPhysChem 2009, 10, 282-289. (27) Zhao, Y.; Truhlar, D.G. Density Functionals with Broad Applicability in Chemistry. Acc. Chem. Res. 2008, 41, 157-167. (28) Antony, J.; Sure, R.; Grimme, S. Using Dispersion-Corrected Density Functional Theory to Understand Supramolecular Binding Thermodynamics. Chem. Commun. 2015, 51, 1764-1774. (29) Korth, M. Third-Generation Hydrogen-Bonding Corrections for Semiempirical QM Methods and Force Fields. J. Chem. Theory Comput. 2010, 6, 3808-3816.

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(30) Korth, M.; Pitoňák, M.; Řezáč, J.; Hobza, P. A Transferable H-Bonding Correction for Semiempirical Quantum-Chemical Methods. J. Chem. Theory Comput. 2010, 6, 344-352. (31) Řezáč, J.; Fanfrlík, J.; Salahub, D.; Hobza, P. Semiempirical Quantum Chemical PM6 Method Augmented by Dispersion and H-Bonding Correction Terms Reliably Describes Various Types of Noncovalent Complexes. J. Chem. Theory Comput. 2009, 5, 1749-1760. (32) Řezáč, J.; Hobza, P. Advanced Corrections of Hydrogen Bonding and Dispersion for Semiempirical Quantum Mechanical Methods. J. Chem. Theory Comput. 2012, 8, 141-151. (33) Cornell, W.D.; Cieplak, P.; Bayly, C.I.; Gould, I.R.; Merz, K.M., Jr.; Ferguson, D.M.; Spellmeyer, D.C.; Fox, T.; Caldwell, J.W.; Kollman, P.A. A Second Generation Force Field for the Simulation of Proteins, Nucleic Acids, and Organic Molecules. J. Am. Chem. Soc. 1995, 117, 5179-5197. (34) MacKerell, A.D., Jr.; Bashford, D.; Bellott, M.; Dunbrack, R.L., Jr.; Evanseck, J.D.; Field, M.J.; Fischer, S.; Gao, J.; Guo, H.; Ha, S.; Joseph-McCarthy, D.; Kuchnir, L.; Kuczera, K.; Lau, F.T.K.; Mattos, C.; Michnick, S.; Ngo, T.; Nguyen, D.T.; Prodhom, B.; Reiher, W.E.; Roux, B.; Schlenkrich, M.; Smith, J.C.; Stote, R.; Straub, J.; Watanabe, M.; Wiórkiewicz-Kuczera, J.; Yin, D.; Karplus, M. All-Atom Empirical Potential for Molecular Modeling and Dynamics Studies of Proteins. J. Phys. Chem. B 1998, 102, 3586-3616. (35) Jorgensen, W.L.; Maxwell, D.S.; Tirado-Rives, J. Development and Testing of the OPLS All-Atom Force Field on Conformational Energetics and Properties of Organic Liquids. J. Am. Chem. Soc. 1996, 118, 11225-11236. (36) Oostenbrink, C.; Villa, A.; Mark, A.E.; Van Gunsteren, W.F. A Biomolecular Force Field Based on the Free Enthalpy of Hydration and Solvation: The GROMOS Force-Field Parameter Sets 53A5 and 53A6. J. Comput. Chem. 2004, 25, 1656-1676.

ACS Paragon Plus Environment

Page 22 of 38

Page 23 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(37) Halgren, T.A. Merck Molecular Force Field. I. Basis, Form, Scope, Parameterization, and Performance of MMFF94. J. Comput. Chem. 1996, 17, 490-519. (38) Shi, Y.; Xia, Z.; Zhang, J.; Best, R.; Wu, C.; Ponder, J.W.; Ren, P. Polarizable Atomic Multipole-Based AMOEBA Force Field for Proteins. J. Chem. Theory Comput. 2013, 9, 40464063. (39) Mackerell, A.D., Jr. Empirical Force Fields for Biological Macromolecules: Overview and Issues. J. Comput. Chem. 2004, 25, 1584-1604. (40) Banáš, P.; Hollas, D.; Zgarbová, M.; Jurečka, P.; Orozco, M.; Cheatham, T.E.; Šponer, J.; Otyepka, M. Performance of Molecular Mechanics Force Fields for RNA Simulations: Stability of UUCG and GNRA Hairpins. J. Chem. Theory Comput. 2010, 6, 3836-3849. (41) Keaveney, S.T.; Harper, J.B.; Croft, A.K. Computational Approaches to Understanding Reaction Outcomes of Organic Processes in Ionic Liquids. RSC Adv. 2015, 5, 35709-35729. (42) Dang, L.X.; Rice, J.E.; Caldwell, J.; Kollman, P.A. Ion Solvation in Polarizable Water: Molecular Dynamics Simulations. J. Am. Chem. Soc. 1991, 113, 2481-2486. (43) Cieplak, P.; Caldwell, J.; Kollman, P. Molecular Mechanical Models for Organic and Biological Systems Going Beyond the Atom Centered Two Body Additive Approximation: Aqueous Solution Free Energies of Methanol and N-Methyl Acetamide, Nucleic Acid Base, and Amide Hydrogen Bonding and Chloroform/Water Partition Coefficients of the Nucleic Acid Bases. J. Comput. Chem. 2001, 22, 1048-1057. (44) Kaminski, G.A.; Stern, H.A.; Berne, B.J.; Friesner, R.A.; Cao, Y.X.; Murphy, R.B.; Zhou, R.; Halgren, T.A. Development of a Polarizable Force Field for Proteins via Ab initio Quantum Chemistry: First Generation Model and Gas Phase Tests. J. Comput. Chem. 2002, 23, 1515-1531.

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(45) Rick, S.W.; Stuart, S.J.; Berne, B.J. Dynamical Fluctuating Charge Force Fields: Application to Liquid Water. J. Chem. Phys. 1994, 101, 6141-6156. (46) Patel, S.; Brooks, C.L. CHARMM Fluctuating Charge Force Field for Proteins: I Parameterization and Application to Bulk Organic Liquid Simulations. J. Comput. Chem. 2004, 25, 1-16. (47) Patel, S.; Mackerell, A.D.; Brooks, C.L. CHARMM Fluctuating Charge Force Field for Proteins: II Protein/Solvent Properties from Molecular Dynamics Simulations Using a Nonadditive Electrostatic Model. J. Comput. Chem. 2004, 25, 1504-1514. (48) Isegawa, M.; Kato, S. Polarizable Force Field for Protein with Charge Response Kernel. J. Chem. Theory Comput. 2009, 5, 2809-2821. (49) Morita, A.; Kato, S. Ab initio Molecular Orbital Theory on Intramolecular Charge Polarization:  Effect of Hydrogen Abstraction on the Charge Sensitivity of Aromatic and Nonaromatic Species. J. Am. Chem. Soc. 1997, 119, 4021-4032. (50) Lu, Z.; Yang, W. Reaction Path Potential for Complex Systems Derived from Combined Ab initio Quantum Mechanical and Molecular Mechanical Calculations. J. Chem. Phys. 2004, 121, 89-100. (51) Isegawa, M.; Kato, S. Electronic Polarization Effect on Low-Frequency Infrared and Raman Spectra of Aprotic Solvent: Molecular Dynamics Simulation Study with Charge Response Kernel by Second Order Møller-Plesset Perturbation Method. J. Chem. Phys. 2007, 127, 244502. (52) Lamoureux, G.; Roux, B. Modeling Induced Polarization with Classical Drude Oscillators: Theory and Molecular Dynamics Simulation Algorithm. J. Chem. Phys. 2003, 119, 3025-3039.

ACS Paragon Plus Environment

Page 24 of 38

Page 25 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(53) Anisimov, V.M.; Lamoureux, G.; Vorobyov, I.V.; Huang, N.; Roux, B.; MacKerell, A.D., Jr. Determination of Electrostatic Parameters for a Polarizable Force Field Based on the Classical Drude Oscillator. J. Chem. Theory Comput. 2005, 1, 153-168. (54) Baker, C.M.; Lopes, P.E.M.; Zhu, X.; Roux, B.; MacKerell, A.D., Jr. Accurate Calculation of Hydration Free Energies Using Pair-Specific Lennard-Jones Parameters in the CHARMM Drude Polarizable Force Field. J. Chem. Theory Comput. 2010, 6, 1181-1198. (55) Lemkul, J.A.; Huang, J.; Roux, B.; MacKerell, A.D., Jr. An Empirical Polarizable Force Field Based on the Classical Drude Oscillator Model: Development History and Recent Applications. Chem. Rev. 2016, 116, 4983-5013. (56) Paton, R.S.; Goodman, J.M. Hydrogen Bonding and π-Stacking: How Reliable are Force Aields? A Critical Evaluation of Force Field Descriptions of Nonbonded Interactions. J. Chem. Inf. Model. 2009, 49, 944-955. (57) Valdes, H.; Pluháčková, K.; Pitonák, M.; Řezáč, J.; Hobza, P. Benchmark Database on Isolated Small Peptides Containing an Aromatic Side Chain: Comparison between Wave Function and Density Functional Theory Methods and Empirical Force Field. Phys. Chem. Chem. Phys. 2008, 10, 2747-2757. (58) Li, S.-S.; Huang, C.-Y.; Hao, J.-J.; Wang, C.-S. A Polarizable Dipole-Dipole Interaction Model for Evaluation of the Interaction Energies for N-H···O=C and C-H···O=C HydrogenBonded Complexes. J. Comput. Chem. 2014, 35, 415-426. (59) Gao, X.-C.; Huang, C.-Y.; Wang, C.-S. Rapid Evaluation of the Interaction Energies for Hydrogen-Bonded Uracil and Thymine Dimers,Trimers and Tetramers. Comput. Theor. Chem. 2014, 1048, 46-53.

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 38

(60) Hao, J.-J.; Li, S.-S.; Jiang, X.-N.; Li, X.-L.; Wang, C.-S. Rapid Evaluation of the Interaction Energies for O-H···O Hydrogen-Bonded Complexes. Theor. Chem. Acc. 2014, 133, 1516-1527. (61) Hao, J.-J.; Wang, C.-S. Rapid Evaluation of the Interaction Energies for CarbohydrateContaining Hydrogen-Bonded Complexes via the Polarizable Dipole-Dipole Interaction Model Combined with NBO or AM1 Charge. RSC Adv. 2015, 5, 6452-6461. (62) Ponder, J.W. TINKER Molecular Modeling Package, V7.1; Washington University Medical School: St. Louis, MO, 2015. (63) Frisch, M.J.; Trucks, G.W.; Schlegel, H.B.; Scuseria, G.E.; Robb, M.A.; Cheeseman, J.R.; Scalmani, G.; Barone, V.; Mennucci, B.; Petersson, G.A.; Nakatsuji, H.; Caricato, M.; Li, X.; Hratchian, H.P.; Izmaylov, A.F.; Bloino, J.; Zheng, G.; Sonnenberg, J.L.; Hada, M.; Ehara, M.; Toyota, K.; Fukuda, R.; Hasegawa, J.; Ishida, M.; Nakajima, T.; Honda, Y.; Kitao, O.; Nakai, H.; Vreven, T.; Montgomery, J.A., Jr.; Peralta, J.E.; Ogliaro, F.; Bearpark, M.; Heyd, J.J.; Brothers, E.; Kudin, K.N.; Staroverov, V.N.; Keith, T.; Kobayashi, R.; Normand, J.; Raghavachari, K.; Rendell, A.; Burant, J.C.; Iyengar, S.S.; Tomasi, J.; Cossi, M.; Rega, N.; Millam, J.M.; Klene, M.; Knox, J.E.; Cross, J.B.; Bakken, V.; Adamo, C.; Jaramillo, J.; Gomperts, R.; Stratmann, R.E.; Yazyev, O.; Austin, A.J.; Cammi, R.; Pomelli, C.; Ochterski, J.W.; Martin, R.L.; Morokuma, K.; Zakrzewski, V.G.; Voth, G.A.; Salvador, P.; Dannenberg, J.J.; Dapprich, S.; Daniels, A.D.; Farkas, O.; Foresman, J.B.; Ortiz, J.V.; Cioslowski, J.; Fox, D.J. Gaussian 09, Revision D.01; Gaussian, Inc.: Wallingford, CT, 2013. (64) PBFF (Polarizable Bond-dipole-based Force Field Package) is a fortran code for the computation

of

non-covalent

interactions.

The

package

http://cswang.home.lnnu.edu.cn.

ACS Paragon Plus Environment

is

publicly

available

at

Page 27 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(65) Crawford, T.D.; Sherrill, C.D.; Valeev, E.F.; Fermann, J.T.; King, R.A.; Leininger, M.L.; Brown, S.T.; Janssen, C.L.; Seidl, E.T.; Kenny, J.P.; Allen, W.D. Software News and Update PSI3: An Open-Source Ab initio Electronic Structure Package. J. Comput. Chem. 2007, 28, 1610-1616. (66) Hobza, P.; Havlas, Z. Blue-Shifting Hydrogen Bonds. Chem. Rev. 2000, 100, 4253-4264. (67) Morozov, A.V.; Kortemme, T.; Tsemekhman, K.; Baker, D. Close Agreement between the Orientation Dependence of Hydrogen Bonds Observed in Protein Structures and Quantum Mechanical Calculations. PNAS 2004, 101, 6946-6951. (68) Mulliken, R.S.; Rieke, C.A.; Orloff, D.; Orloff, H. Formulas and Numerical Tables for Overlap Integrals. J. Chem. Phys. 1949, 17, 1248-1267. (69) Řezáč, J.; Riley, K.E.; Hobza, P. S66: A Well-Balanced Database of Benchmark Interaction Energies Relevant to Biomolecular Structures. J. Chem. Theory Comput. 2011, 7, 2427-2438. (70) Dean, J. A. Properties of Atoms, Padicals, and Bonds. Lange's Handbook of Chemistry, fifteenth edition; McGraw-Hill, Inc.: New York, 1999; 4.53. (71) Wang, J.; Cieplak, P.; Kollman, P.A. How Well Does a Restrained Electrostatic Potential (RESP) Model Perform in Calculating Conformational Energies of Organic and Biological Molecules? J. Comput. Chem. 2000, 21, 1049-1074. (72) Jurečka, P.; Šponer, J.; Černý, J.; Hobza, P. Benchmark Database of Accurate (MP2 and CCSD(T) Complete Basis Set Limit) Interaction Energies of Small Model Complexes, DNA Base Pairs, and Amino Acid Pairs. Phys. Chem. Chem. Phys. 2006, 8, 1985-1993. (73) Šponer, J.; Jurečka, P.; Hobza, P. Accurate Interaction Energies of Hydrogen-Bonded Nucleic Acid Base Pairs. J. Am. Chem. Soc. 2004, 126, 10142-10151.

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(74) Jurečka, P.; Hobza, P. True Stabilization Energies for the Optimal Planar Hydrogen-Bonded and Stacked Structures of Guanine···Cytosine, Adenine···Thymine, and Their 9- and 1-Methyl Derivatives:  Complete Basis Set Calculations at the MP2 and CCSD(T) Levels and Comparison with Experiment. J. Am. Chem. Soc. 2003, 125, 15608-15613. (75) Rutledge, L.R.; Durst, H.F.; Wetmore, S.D. Evidence for Stabilization of DNA/RNAProtein Complexes Arising from Nucleobase-Amino Acid Stacking and T-Shaped Interactions. J. Chem. Theory Comput. 2009, 5, 1400-1410. (76) Tsuzuki, S.; Honda, K.; Fujii, A.; Uchimaru, T.; Mikami, M. CH/π Interactions in Methane Clusters with Polycyclic Aromatic Hydrocarbons. Phys. Chem. Chem. Phys. 2008, 10, 28602865. (77) Grimme, S.; Antony, J.; Ehrlich, S.; Krieg, H. A Consistent and Accurate Ab initio Parametrization of Density Functional Dispersion Correction (DFT-D) for the 94 Elements H-Pu. J. Chem. Phys. 2010, 132, 154104. (78) Vorobyov, I.V.; Anisimov, V.M.; MacKerell, A.D., Jr. Polarizable Empirical Force Field for Alkanes Based on the Classical Drude Oscillator Model. J. Phys. Chem. B 2005, 109, 1898818999. (79) Dral, P.O.; Wu, X.; Spörkel, L.; Koslowski, A.; Thiel, W. Semiempirical QuantumChemical Orthogonalization-Corrected Methods: Benchmarks for Ground-State Properties. J. Chem. Theory Comput. 2016, 12, 1097-1120. (80) Kromann, J.C.; Christensen, A.S.; Steinmann, C.; Korth, M.; Jensen, J.H. A ThirdGeneration Dispersion and Third-Generation Hydrogen Bonding Corrected PM6 Method: PM6D3H+. PeerJ 2014, 2, e449.

ACS Paragon Plus Environment

Page 28 of 38

Page 29 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(81) Christensen, A.S.; Elstner, M.; Cui, Q. Improving Intermolecular Interactions in DFTB3 Using Extended Polarization from Chemical-Potential Equalization. J. Chem. Phys. 2015, 143, 084123. (82) Grimme, S. Do Special Noncovalent π-π Stacking Interactions Really Exist? Angew. Chem., Int. Ed. 2008, 47, 3430-3434. (83) Sedlak, R.; Janowski, T.; Pitoňák, M.; Řezáč, J.; Pulay, P.; Hobza, P. Accuracy of Quantum Chemical Methods for Large Noncovalent Complexes. J. Chem. Theory Comput. 2013, 9, 33643374. (84) Wagner, J.P.; Schreiner, P.R. Nature Utilizes Unusual High London Dispersion Interactions for Compact Membranes Composed of Molecular Ladders. J. Chem. Theory Comput. 2014, 10, 1353-1358.

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 30 of 38

Table 1. The Parameters Determined in This Work Description Parameters for bond dipole moments µ0 (Debye) µ0 (N-H) sp3 nitrogen µ0 (C-H) carbon in ethyne and ethene µ0 (O-H) oxygen in carboxylic acid Parameters for lone pair dipole moments µ0 (Debye) µ0 (N-lp) sp2 nitrogen with a lone pair in 5-membered ring of adenine sp2 nitrogen with a lone pair in 5-membered ring of histidine and guanine sp2 nitrogen with a lone pair in 6-membered ring of adenine, guanine and pyridine sp2 nitrogen with a lone pair in 6-membered ring of cytosine Parameters for van der Waals interactions carbon sp carbon in ethyne sp2 carbon in ethene sp2 carbon in histidine sp2 carbon in 6-membered ring of indole except that at junction of two rings sp2 carbon at junction of two 6-membered rings sp2 carbon at junction of three 6-membered rings sp2 carbon in carbonyl group C4, C5 and C8 of adenine and guanine sp2 carbon in base except that at junction of two rings and in carbonyl group nitrogen sp2 nitrogen with a lone pair sp3 nitrogen oxygen sp2 oxygen in carboxylic acid sp3 oxygen in water sp3 oxygen in carboxylic acid hydrogen H in water H attached to oxygen in carboxylic acid H attached to nitrogen in base, histidine, indole H attached to sp3 nitrogen H attached to guanine N1 H in alkane H in cyclohexane H attached to aliphatic carbon except carbon in alkane and α-carbon in peptide H attached to aromatic carbon with electronegative neighbor H attached to aromatic carbon with no electronegative neighbor αm Parameters for orbital overlap term Dm pm N-H···O(sp2) (HB) 4.60 0.093 0.06 N-H···O(sp3) (HB) 0.47 0.008 0.05 N-H···N(sp2) (HB) 1.95 0.085 0.02 N-H···N(sp3) (HB) 1.55 0.03 0.01 O-H···O(sp2) (HB) 3.27 0.008 0.01 O-H···O(sp3) (HB) 1.47 0.01 0.13 O-H···N(sp2) (HB) 0.55 0.01 0.02 O-H···N(sp3) (HB) 3.15 0.008 0.05 C-H···O (HB) 0.65 0.005 0.04 C-H···N (HB) 1.05 0.01 0.005 O-H···C (X-H···π) 0.19 0.006 0.04 N-H···N (X-H···π) 1.52 0.11 0.011 N-H···C (X-H···π) 0.31 0.13 0.031 C-H···N (X-H···π) 0.75 0.09 0.061 C-H···C (X-H···π) 0.13 0.15 0.026

ACS Paragon Plus Environment

Value 1.51 0.70 1.71

R* (Å) 1.8080 1.8580 1.9080 1.9080 1.9080 1.9080 1.8080 1.7080 1.9080 1.8740 1.7740 1.5612 1.7210 1.6737 0.6500 0.4000 0.5650 1.0000 0.4500 1.4370 1.3000 1.1000 1.1000 1.3000 qm 0.171 0.35 0.16 0.25 0.15 0.30 0.16 0.30 0.14 0.07 0.20 0.21 0.23 0.14 0.14

1.43 1.80 1.30 1.95 ε (kcal/mol) 0.2060 0.1200 0.1260 0.1260 0.1260 0.1060 0.1660 0.0860 0.1660 0.1800 0.1700 0.2100 0.1804 0.2104 0.0157 0.0157 0.0157 0.0157 0.0157 0.0157 0.0157 0.0157 0.0157 0.0250 am 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.0 1.5 1.0

Page 31 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Table 2. Equilibrium Intermolecular Distances (Å) and Interaction Energies (kcal/mol) for 19 Training Dimersa Dimers Rref U···U (BP) 1.78 Water···Water 2.01 MeOH···MeNH2 1.97 MeNH2···MeOH 2.26 MeNH2···MeNH2 2.28 Water···Pyridine 2.00 AcOH···AcOH 1.69 AcNH2···AcNH2 1.86 PHE PHE st 3.75 PHE Pyridine st 3.64 U U st 3.30 PHE PHE TS 2.50 PHE Pyridine TS 2.48 Pyridine Pyridine TS 2.59 PHE···Water (OH···π) 2.53 PHE···MeNH2 (NH···π) 2.49 U···Ethyne 3.31 Pyridine···Ethene 3.41 Neopentane···Pentane 4.57 Root-mean-square deviation (RMSD) Maximum absolute deviation (MAD) Mean relative deviation (MRD) a

R R−Rref IEref IE IE−IEref 1.79 0.01 -17.18 -16.99 0.19 1.97 -0.04 -4.92 -5.16 -0.24 1.91 -0.06 -7.55 -8.16 -0.61 2.20 -0.06 -3.06 -2.88 0.18 2.34 0.06 -4.16 -3.88 0.28 1.96 -0.04 -6.86 -6.77 0.09 1.67 -0.02 -19.09 -19.42 -0.33 1.86 0.00 -16.26 -16.61 -0.35 3.78 0.03 -2.82 -3.03 -0.21 3.72 0.08 -3.44 -3.33 0.11 3.39 0.09 -9.83 -9.93 -0.10 2.58 0.08 -2.88 -2.73 0.15 2.42 -0.06 -3.33 -3.48 -0.15 2.63 0.04 -3.54 -3.52 0.02 2.47 -0.06 -3.28 -3.30 -0.02 2.49 0.00 -3.23 -3.41 -0.18 3.37 0.06 -3.74 -3.48 0.26 3.46 0.05 -1.87 -1.85 0.02 4.54 -0.03 -2.61 -2.63 -0.02 0.05 0.23 0.09 0.61 1.75% 3.62%

Rref and IEref are the benchmark equilibrium intermolecular distances and interaction energies

taken from reference 69. R and IE are the equilibrium intermolecular distances and interaction energies produced by our model.

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Table 3. Errors (Å) of Our Model with Respect to the Benchmark Equilibrium Intermolecular Distances on the 124 Small Testing Dimers

H-bonded dimers (35) Stacked dimers (27) T-shaped dimers (32) X-H···π & other dimers (30) All the 124 dimers

RMSD MAD MRD 0.04 0.10 1.68% 0.08 0.18 1.87% 0.10 0.20 3.14% 0.09 0.18 2.36% 0.08 0.20 2.17%

ACS Paragon Plus Environment

Page 32 of 38

Page 33 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47

Journal of Chemical Theory and Computation

Table 4. Errors (kcal/mol) of Different Methods with Respect to the Benchmark CCSD(T)/CBS Interaction Energies on the 124 Small Testing Dimers

Methods M06-2X M06-2X CP M06-2X-D3 M06-2X-D3 CP AMBER99 AMOEBA Our model

H-bonded dimers (35) RMSD MAD MRD 0.65 1.83 4.55% 1.41 2.74 7.45% 0.50 1.24 4.20% 1.02 2.15 5.17% 5.78 11.20 30.11% 1.37 4.09 7.02% 0.94 2.04 6.23%

Stacked dimers (27) RMSD MAD MRD 0.78 1.58 9.91% 0.64 1.31 6.80% 1.48 2.50 19.34% 0.30 0.68 3.43% 2.64 7.19 22.38% 0.59 1.17 6.26% 0.50 1.14 5.72%

T-shaped dimers (32) RMSD MAD MRD 0.39 0.97 5.35% 0.78 1.46 13.23% 0.80 1.49 12.65% 0.30 0.68 4.54% 2.84 5.97 38.01% 0.91 2.27 13.34% 0.58 1.36 7.86%

X-H···π & other dimers (30) RMSD MAD MRD 0.83 1.60 23.63% 0.29 0.73 7.52% 1.28 2.33 36.87% 0.40 0.74 11.06% 0.57 1.26 12.80% 0.28 0.75 6.99%

ACS Paragon Plus Environment

All the 124 dimers RMSD MAD MRD 0.68 1.83 10.54% 0.91 2.74 8.82% 1.05 2.50 17.58% 0.61 2.15 6.05% 3.95 11.20 30.17% 0.96 4.09 9.74% 0.64 2.04 6.72%

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 34 of 38

Table 5. Errors of Different Methods with Respect to the Benchmark CCSD(T)/CBS Calculations on the S66 Dataset Methods AM1 PM3 PM6 PM6-DH2 PM6-DH+ PM6-D3H+ PM6-D3H4 PM7 OM1 OM2 OM2-D2 OM2-D3 OM3 OM3-D2 OM3-D3 Our model

Ref 79 79 80 80 80 80 81 79 79 79 79 79 79 79 79

RMSD 6.2 5.1 3.02 0.91 0.81 0.81 0.63 0.9 4.7 2.9 1.3 1.1 3.3 1.4 1.2 0.39

ACS Paragon Plus Environment

MAD 20.0 16.1 7.99 3.53 2.47 2.09 2.37 2.6 12.8 5.4 4.0 4.0 7.2 5.7 5.3 1.18

MRD 101.2% 86.2% 56.73% 14.49% 14.29% 12.93% 11.01% 18.5% 87.5% 60.4% 19.5% 14.5% 71.2% 18.5% 12.0% 6.19%

Page 35 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47

Journal of Chemical Theory and Computation

Table 6. Interaction Energies (kcal/mol) of the 14 Large Noncovalent Complexesa Our model Complex 125 antiparallel Gly pentapeptide dimer 126 parallel Gly pentapeptide dimer

M06-2X without CP with CP IE IE−IEref IE IE−IEref -39.96 -6.27 -36.16 -2.47

M06-2X-D3 without CP with CP IE IE−IEref IE IE−IEref -42.29 -8.60 -38.48 -4.79

IE -36.08

IE−IEref -2.39

IE -36.10

IE−IEref -2.41

-33.89

-3.23

-25.34

1.84

-26.42

0.76

Ref 58

IEref -33.69

IE -33.11

IE−IEref 0.58

58

-27.18

-27.51

-0.33

-31.56

-4.38

-28.08

-0.90

-6.71

-30.41

AMBER99

AMOEBA

127

parallel Ala tripeptide dimer

58

-24.54

-25.06

-0.52

-26.39

-1.85

-23.46

1.08

-28.47

-3.93

-25.33

-0.79

-22.92

1.62

-21.17

3.37

128

Anthracene dimer ST

82

-11.46

-10.85

0.61

-13.10

-1.64

-10.25

1.21

-14.86

-3.40

-12.00

-0.54

-7.62

3.84

-9.20

2.26

129

Tetracene dimer ST

82

-16.33

-15.12

1.21

-17.87

-1.54

-14.10

2.23

-20.31

-3.98

-16.53

-0.20

-10.57

5.76

-12.43

3.90 5.36

130

Coronene dimer ST

83

-24.36

-24.57

-0.21

-23.30

1.06

-18.27

6.09

-27.00

-2.64

-21.97

2.39

-17.96

6.40

-19.00

131

Adenine···Circumcoronene ST

83

-18.19

-19.60

-1.41

-17.39

0.80

-13.58

4.61

-20.49

-2.30

-16.68

1.51

-13.97

4.22

-15.48

2.71

132

Anthracene dimer TS

82

-8.25

-8.84

-0.59

-8.40

-0.15

-6.29

1.96

-9.84

-1.59

-7.73

0.52

-6.37

1.88

-4.99

3.26

133

Tetracene dimer TS

82

-11.12

-12.07

-0.95

-10.76

0.36

-8.25

2.87

-12.75

-1.63

-10.23

0.89

-8.09

3.03

-6.64

4.48

134

Tetradecahydroanthracene dimer

82

-8.88

-9.08

-0.20

-8.85

0.03

-4.98

3.90

-11.92

-3.04

-8.04

0.84

-

-

-7.59

1.29

135

Octadecahydrotetracene dimer

82

-11.83

-12.20

-0.37

-11.44

0.39

-6.55

5.28

-15.64

-3.81

-10.74

1.09

-

-

-10.18

1.65

136

[4]-ladderane dimer (Cs)

84

-5.5

-5.4

0.1

-7.4

-1.9

-4.9

0.6

-8.8

-3.3

-6.3

-0.8

-

-

-

-

137

[5]-ladderane dimer (Ci)

84

-6.6

-6.8

-0.2

-8.9

-2.3

-6.1

0.5

-10.7

-4.1

-7.8

-1.2

-

-

-

-

138

Octadecane dimer

83

-11.06

-12.74

-1.68

-10.51

0.55

-5.80

5.26

-14.03

-2.97

-9.32

1.74

-

-

-11.54

-0.48

RMSD

0.80

2.36

3.34

4.14

1.90

3.82

MAD

1.68

6.27

6.09

8.60

4.79

6.40

5.36

MRD

4.82%

11.27%

20.98%

28.08%

9.63%

20.97%

18.07%

a

Aug-cc-pVDZ basis set was employed for these DFT calculations.

ACS Paragon Plus Environment

3.02

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 1. The 19 training dimers used in this work.

ACS Paragon Plus Environment

Page 36 of 38

Page 37 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Figure 2. The 14 large noncovalent complexes. The plain data are the equilibrium intermolecular distances predicted by our model. The bold data in parentheses are the benchmark equilibrium intermolecular distances taken from references. All distances are in Å.

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

for Table of Contents use only:

ACS Paragon Plus Environment

Page 38 of 38