Structures and Energies of Hydrogen-Bonded DNA Base Pairs. A

Feb 1, 1996 - Basis Set Convergence of the Post-CCSD(T) Contribution to Noncovalent Interaction Energies. Daniel G. A. Smith , Piotr Jankowski , Micha...
25 downloads 12 Views 417KB Size
J. Phys. Chem. 1996, 100, 1965-1974

1965

Structures and Energies of Hydrogen-Bonded DNA Base Pairs. A Nonempirical Study with Inclusion of Electron Correlation Jirˇ´ı Sˇ poner,*,†,‡ Jerzy Leszczynski,§ and Pavel Hobza† J. HeyroVsky´ Institute of Physical Chemistry, Academy of Sciences of the Czech Republic, DolejsˇkoVa 3, 182 23 Prague 8, Czech Republic, Institute of Biophysics, Academy of Sciences of the Czech Republic, Kra´ loVopolska´ 135, 612 65 Brno, Czech Republic, and Department of Chemistry, Jackson State UniVersity, Jackson, 39217 Mississippi ReceiVed: September 15, 1995X

Hydrogen bonding of DNA bases was investigated by reliable nonempirical ab initio calculations. Gradient optimization was carried out on 30 DNA base pairs using the Hartree-Fock (HF) approximation and the 6-31G** basis set of atomic orbitals. The optimizations were performed within Cs symmetry. However, the harmonic vibrational analysis indicates that 13 of the studied base pairs are intrinsically nonplanar. Interaction energies of base pairs were then evaluated at the planar optimized geometries with inclusion of the electron correlation energy using the second-order Møller-Plesset (MP2) method. The stabilization energies of the studied base pairs range from -24 to -9 kcal/mol, and the calculated gas phase interaction enthalpies agree well (within 2 kcal/mol) with the available experimental values. The binding energies and molecular structures of the base pairs are not determined solely by the hydrogen bonds, but they are also strongly influenced by the polarity of the monomers and by a wide variety of secondary long-range electrostatic interactions that also involve the hydrogen atoms bonded to ring carbon atoms. The stabilization of the base pairs is dominated by the Hartree-Fock interaction energy. This result confirms that the stability of the base pairs originates in the electrostatic interactions. For weakly bonded base pairs, the correlation interaction energy amounts to as much as 30-40% of the stabilization. For some other base pairs, however, repulsive correlation interaction energy was found. The latter fact is explained as a result of a reduction of the electrostatic attraction upon inclusion of the electron correlation. The empirical London dispersion energy does not reproduce the correlation interaction energy. For the sake of comparison, results of a first gradient optimization for a DNA base pair at a correlated level (CC base pair, MP2/6-31G** level) are reported. In addition, the ability of the economical density functional theory (DFT) method to reproduce the ab initio data was investigated. The DFT method with present functionals is not suitable to consistently study the whole range of the DNA base interactions. However, it gives good estimates of interaction energies at the reference HF/6-31G** geometries.

1. Introduction Hydrogen bonding of DNA bases is important for stability of the DNA double helix, and it plays a critical role in providing the specificity for information transfer. For the most part, guanine is paired with cytosine; this complex has three hydrogen bonds, and adenine is complexed with thymine through two hydrogen bonds. In addition to these standard Watson-Crick base pairs, many other base pairs contribute to the conformational variability of DNA. Hydrogen bonding of DNA bases was investigated by a number of theoretical studies.1 Satisfactory description of H-bonded complexes of DNA bases is an important problem facing the development of any empirical force field that is to be used to model the DNA molecule and its interactions with drugs, proteins, and other species. The thermodynamic characteristics (association enthalpies and constants) of base pair formation can be obtained from the gas phase experiments.2 However, the amount of available gas phase data for nucleic acid base pairs is very small, and its accuracy is still questionable. Further, gas phase experiments on base pairs do not provide the geometries of interacting * Correspondence address: J. Heyrovsky´ Institute of Physical Chemistry. E-mail: [email protected]; [email protected]. † J. Heyrovsky ´ Institute of Physical Chemistry, Academy of Sciences of the Czech Republic. ‡ Institute of Biophysics, Academy of Sciences of the Czech Republic. § Jackson State University. X Abstract published in AdVance ACS Abstracts, January 1, 1996.

0022-3654/96/20100-1965$12.00/0

monomers and complexes.2 H-bonded interactions of DNA bases can also be studied in nonpolar solvents,3 whereas in polar solvents the stacked DNA base pairs are preferred.4 In contrast, X-ray diffraction methods allow one to study the geometry of H-bonded complexes of DNA bases, but no information concerning the energy of these complexes is obtained. Therefore, DNA base interactions represent a difficult task for experimental studies, and real progress in gaining knowledge about DNA base pairing can be obtained only by combining experimental and theoretical methods. Among the theoretical approaches, the most sophisticated is the ab initio quantumchemical method, the only method which does not use any empirical parameters. The reliability of the ab initio calculations depends on two factors: the quality (size) of the basis set of atomic orbitals and the inclusion of electron correlation effects. All of the less expensive semiempirical methods and empirical potentials require a parametrization using experimental or ab initio data and are not independent of other data. The reliability of such calculations is limited. In this paper, we study the properties (optimal geometries, interaction energies, enthalpies, dipole moments, relative roles of various energy contributions, comparisons with gas phase data, etc.) of H-bonded DNA base pairs using the up-to-date higher level nonempirical ab initio techniques. Let us now briefly discuss the aims and limitations of the ab initio calculations on DNA base pairing. The main aim was to characterize a wide variety of H-bonded DNA base pairs (we © 1996 American Chemical Society

1966 J. Phys. Chem., Vol. 100, No. 5, 1996 studied 30 DNA base pairs; see the Method Section) near the upper performance limits of today’s computational facilities. Geometries of these base pairs were optimized using the Hartree-Fock (HF) approximation with the medium-sized 6-31G** basis set of atomic orbitals (6-31G split-valence basis set with polarization functions on all atoms); this level of calculation is designated HF/6-31G**. The interaction energies were then evaluated using the HF/6-31G**-optimized geometries at a higher level with inclusion of electron correlation effects. The second-order Møller-Plesset (MP2) perturbational method was used with the 6-31G* basis set of atomic orbitals. (This calculation is denoted MP2/6-31G*//HF/6-31G**; the standard notation B//A means that the energy is evaluated at computational level B through the geometry optimized at level A.) The MP2 method recovers a significant portion of the electron correlation energy and is now feasible (in contrast to other more sophisticated correlation methods) for larger systems such as DNA base pairs. Calculations carried out at the MP2 level represent a qualitative improvement of the theoretical description compared to the HF level. This particularly concerns the following points: (i) the HF approximation overestimates the strength of the electrostatic contribution, which is the leading energetic term in H-bonded complexes (HF dipole moments are overestimated by 10-20%), while the MP2 method gives realistic values for the dipole moments and electrostatic interactions; and (ii) the important London dispersion energy originates in the electron correlation and is not included within the HF approximation at all. In other words, in all studies on DNA base pairs made at the HF level (i.e., practically all studies made before 1994) the dispersion energy is either completely ignored or approximated by an empirical potential. It is important to emphasize that the MP2 level calculations with medium-sized polarized sets of atomic orbitals used here represent the lowest cost computational method that consistently includes all energy contributions within the quantum chemical framework. As discussed in detail elsewhere, this is also the first theoretical method that enables the comparison of stacked and H-bonded complexes of DNA bases.1g,i Correlation calculations were until now not feasible for DNA bases and base pairs. Only very recently were a few H-bonded base pairs investigated at a level similar to that used here.1g,h,j,s Compared to these studies, the present analysis considers a larger sample of 30 H-bonded DNA base pairs, which is necessary to understand the role of various contributions influencing the hydrogen bonding. Such extended studies were until now carried out only using much less reliable methods1a-c,i,n,q (at the best ab initio HF level with minimal sets of atomic orbitals1b,i,n). It should be stressed that it is not possible to evaluate the quality of the lower level methods without performing a comparison with qualitatively better calculations. Such a comparison was another reason to make these extensive MP2 calculations. Many conclusions based on the older studies are confirmed by the present calculations, because the characteristics of the H-bonded complexes are fortunately less dependent on the quality of the ab initio method than, for example, the stacking interactions1g or the proton transfer processes.1s However, some other results need to be modified or revisited. Work is in progress in our laboratory to compare the MP2 results with available empirical potentials (AMBER, CHARMM, OPLS, etc.) and semiempirical quantum-chemical methods (AM1, PM3, MNDO/M, etc.) used in studies of the DNA molecule. In addition, we also carried out a set of calculations using the density functional theory (DFT) approach. This quantum-chemical method has been proposed as a rather inexpensive alternative to MP2 ab initio calculations. Because

Sˇ poner et al. the popularity of DFT calculations is growing rapidly, we compare here the ab initio data with the DFT method, to verify the reliability of the DFT method for DNA base interactions. As explained above, the use of the MP2 method with medium-sized polarized basis sets represents an important improvement in the quantum-chemical description of DNA base pairs compared to all older and less expensive procedures. However, further development of hardware and software will be associated with a further increase in the quality of computational methods for DNA base pairs. Such calculations can improve some of the present results (probably most so with respect to the comparison of stacked and H-bonded base pairs).1g These corrections are, however, expected to be much smaller than that upon inclusion of the correlation effects through the MP2 approximation. Let us now try to estimate the expected computer demands associated with further improvements in the description of DNA base interactions. The present HF/6-31G** gradient optimizations with vibrational analysis took about 300-1500 CPU hours on SGI/R4400 systems per single optimized base pair. The SGI/R8000 and Cray Y-MP supercomputers were approximately 3.5 times faster. Single-point MP2 calculations of interaction energies required 4-15 CPU hours on a Cray Y-MP processor per base pair. Some improvement in the quality of the calculations could be achieved by carrying out the geometry optimization at the MP2 level. In this study we report the first such gradient optimization for a DNA base pair: cytosine dimer. This required 2 weeks of CPU time using two SGI/R8000 processors; however, we were not able to optimize larger base pairs at this level. Further qualitative improvement would be achieved mainly by consideration of higher level correlation methods (MP4(SDTQ), CCSD(T)). On the basis of our preliminary results on smaller model complexes (H-bonded and stacked formamidine and formamide dimers, benzene-argon and benzene dimer), we can conclude that a small increase in the level of inclusion of the electron correlation effects (MP3, MP4SDQ, CCSD methods) does not represent a real improvement in the quality of the calculations. (The same holds also for increase in the size of the basis set at the MP2 or HF levels.) It is necessary to include the triple excitations in the calculation, but these calculations (MP4(SDTQ) or even CCSD(T) method) are extremely tedious. We estimate that single-point CCSD(T) calculations on the highly symmetrical (C2h) planar H-bonded cytosine dimer with the rather small 6-31G* basis set would require at least 1 GB of memory, more than 25 GB of scratch disc space, and days of CPU time on SGI/R8000 processors. Use of the recommended but larger Dunning’s correlation-consistent basis sets of atomic orbitals, as well as calculations on larger base pairs without the high C2h symmetry, would increase all computer demands very dramatically. For a review of various quantum-chemical methods used for weak intermolecular interactions see refs 5a-c and references therein. 2. Method 2.1. Calculations. Geometries of all the H-bonded base pairs were optimized at the HF/6-31G** level using the gradient optimization method, starting from the optimized structures obtained previously at the HF/MINI-1 level.1i The calculations were carried out for planar base pairs. No other restriction was imposed, and the thymine methyl group was fully optimized. All optimized structures were checked by analysis of harmonic vibrational frequencies obtained from diagonalization of force constant matrices at the HF/6-31G** level, because the intrinsic nonplanarity of DNA base amino groups6 can lead to nonplanarity of the base pairs.1h,i,s

Structures and Energies of H-Bonded DNA Base Pairs The interaction energies were calculated for the HF/6-31G**optimized geometries using the second-order Møller-Plesset perturbation method.7 A standard split-valence 6-31G basis set augmented by a set of diffuse d-polarization functions with an exponent of 0.25 added to the second-row elements (designated 6-31G* (0.25)) was used for the single-point MP2 predictions. The flat polarization functions improve the description of the dispersion energy for weakly bonded van der Waals complexes.5b,c,8a,b In the case of the H-bonded base pairs, the influence of the modified polarization functions is not large; the 6-31G* (0.25) basis set was applied because it was used in our previous studies on stacked complexes of DNA bases.1g,9 This choice thus allows for a consistent comparison of both types of complexes. We evaluated the following energy contributions. The interaction energy of the complex, EMP2, is obtained as a sum of the HF interaction energy, EHF, and the correlation interaction energy, ECOR. The former term includes the electrostatic interactions, induction interactions, the short-range repulsion, and other smaller terms.5a-c The correlation interaction energy consists of the intersystem correlation energy (mainly dispersion attraction) and the change of intrasystem correlation energy upon complexation.5a-c Stabilization energies of the base pairs, ET, were calculated as the sum of the MP2 interaction energies and the deformation (reorganization) energies of bases, EDEF. Deformation energy of bases upon complexation was calculated as the difference of the energies of planar optimized bases and energies of bases having the same geometry as within the complex.10 Finally, the interaction enthalpies at 0 K, ∆H, were obtained by correcting the stabilization energies, ET, for the zeropoint vibrational energies (ZPE), taken from the HF/6-31G** harmonic vibrational frequency calculations. The HF/6-31G** level ZPEs were multipled by the standard scaling factor of 0.9. The EHF and ECOR energies were corrected for the basis set superposition error (BSSE);11 all orbitals of the “ghost” subsystem were used. In addition, in order to estimate electron correlation effects on the molecular structure of H-bonded base pairs, the smallest CC base pair was optimized at the MP2/6-31G** level. The CC and GC Watson-Crick base pairs were also optimized using the nonlocal Becke3LYP12 DFT functional with the 6-31G** basis set. Finally, the DFT interaction energies of all base pairs were evaluated at the HF/6-31G**-optimized geometries (Becke3LYP/6-31G**//HF/6-31G** level). All DFT interaction energies were corrected for the basis set superposition error. All ab initio and DFT calculations were performed with the Gaussian92/DFT set of programs.13 The starting and optimized geometries of base pairs can be obtained from the authors upon request. 2.2. Choice of the DNA Base Pairs and Their Nomenclature. To cover a wide range of DNA base pairs, the calculations were carried out for the all possible combinations of neutral major tautomers of adenine (A), cytosine (C), guanine (G), and thymine (T) DNA bases, with exclusion of base pairs directly involving the H(N1) pyrimidine and H(N9) purine groups in the hydrogen bonding.1a,b,14 (The only exception is the GC pair designated GCNEW; see below.) The N1(pyrimidine) and N9(purine) interactions are ignored because of their attachment to sugars in polynucleotides; no other selection was made. The same set of H-bonded base pairs was extensively studied by lower level quantum-chemical1b-d,i and empirical potential calculations.1a Most of these base pairs were actually observed in oligonucleotide or monomer crystals, secondary and tertiary RNA structures, or polynucleotide fibers

J. Phys. Chem., Vol. 100, No. 5, 1996 1967 and solutions (see, for example, Table II in ref 1a). There is no generally accepted nomenclature for this set of base pairs; different authors use different designations (cf. refs 1a-d,q,t and 14). In the present paper, we use the nomenclature from ref 1b. The Watson-Crick, reverse Watson-Crick, Hoogsteen, and reverse Hoogsteen TA and GC base pairs were designated using abbreviations WC, RWC, H, and RH, respectively. The remaining pairs were numbered (e.g., GA1-GA4, GG1-GG4) according to ref 1b. This numbering was originally chosen in order of decreasing base pair stability obtained by lower level ab initio calculations. The present calculations give a different order of stability, but we retained the older numbering to prevent further confusion. In addition, this nomenclature was used in other studies,1i,5a,b and some other authors already adopted this numbering scheme in their papers.1t H-bonded structures of all base pairs can be unambiguously found using Figures 1 and 2 and Tables 1-4. In addition, two other Watson-Crick purine‚‚‚pyrimidine base pairs were studied: inosine‚‚‚cytosine (IC) and 2-aminoadenine‚‚‚thymine (2aminoAT). Both pairs are frequently involved in studies aimed at understanding the base pairing at a chemical level, because their shape resembles the classical GCWC and TAWC pairs, while their H-bonding patterns are different. The 2aminoAT base pair has three hydrogen bonds, similar to the GCWC base pair, but it has a different distribution of hydrogen donors and acceptors.1q It should be noted that the number of existing DNA base pairs is much larger than the set included in the present paper. The computer demand of the ab initio method does not allow investigation of all of them, and we hope that the presently investigated sample of 30 base pairs is sufficient for a proper characterization of H-bonded base pairs. 3. Results and Discussion 3.1. Stabilization Energies and Gas Phase Interaction Enthalpies. Figures 1 and 2 show the atomic structure of DNA bases and the base pairing of all optimized base pairs. Table 1 summarizes the main physical characteristics of the base pairs: the HF interaction energies obtained with the 6-31G** and 6-31G* (0.25) basis sets (EHF1, EHF2), the deformation energies of bases upon base pair formation, EDEF, obtained at the HF/ 6-31G** level, and the MP2 correlation interaction energies, ECOR, obtained with the 6-31G* (0.25) basis set. The sum of the EHF2 and ECOR contributions is the interaction energy of the pair, EMP2. In addition, the interaction energies calculated by the DFT method, EDFT, are given; these DFT calculations will be analyzed separately. All energies are evaluated at the HF/ 6-31G**-optimized geometries. Total stabilization energy of base pairs, ET, is the sum of the EMP2 interaction energy and the deformation energy. The studied base pairs, except IC and 2aminoAT, are ordered according to their stabilization energy (ET) in descending order (Table 1, Figure 2). The next column of Table 1 contains the calculated enthalpies at 0 K (∆H), which will be used below for a comparison with the gas phase experimental data. The last two columns present the number of negative eigenvalues of the Hessian matrix (NNE), obtained at the HF/6-31G**-optimized geometry by the harmonic vibrational analysis and the dipole moments of the base pairs calculated at the HF/6-31G** level, respectively. The interaction enthalpies for base pairs having negative eigenvalues of the Hessian matrix are given in parentheses to indicate that these are only estimates based on vibrational analyses made on the transition-state or saddle-point structures; see below. These (∆H) data are overestimated (in absolute value); however, the error should be less than 1 kcal/mol and mostly several tenths of a kcal/mol only.1i

1968 J. Phys. Chem., Vol. 100, No. 5, 1996

Sˇ poner et al.

Figure 1. Structures of DNA bases and their HF/6-31G** dipole moments.

For 13 base pairs, one to three negative eigenvalues (NNE > 0) of the Hessian matrix were obtained from the harmonic vibrational analysis. A structural interpretation of pairs with NNE > 0 is that these planar optimized structures do not represent minima on the respective potential energy surfaces. We have found that all of these negative eigenvalues are associated with either pyramidalization of the DNA base amino groups or the pyramidalization combined with out-of-plane motions (propeller twist, buckle, or their combination) of the bases. None of these negative eigenvalues are associated with the rotation of the methyl group.1j In other words, all these base pairs are intrinsically nonplanar,1h,i and in all cases the amino group pyramidalization6 is involved. The nonplanarity mostly concerns base pairs having an unpaired amino group,1i because hydrogen bonding efficiently eliminates the pyramidalization of the DNA base amino groups. Previously, Gould and Kollman proved that, at the HF/6-31G* level, the TAWC and GCWC base pairs, as well as the TAH base pair, are perfectly planar.1j The same holds true for the CC base pair.1g These base pairs are also planar according to the present HF/6-31G** calculations (Table 1). However, Floria´n and Leszczynski have reported a somewhat propellertwisted optimal geometry for the 7-methyloxoformycin B‚‚‚2,4diaminopyridine base pair1h and for two rare tautomeric forms of the GC pair1s at the HF/6-31G* level. We recently investigated nonplanarity of 28 base pairs using unconstrained gradient optimization with analytical second derivatives at the lower HF/MINI-1 level,1i and optimal nonplanar structures were found for the same set of pairs, for which the present vibrational analysis also indicates nonplanarity. The nonplanarity was mostly connected with an only marginal energy improvement (0.0-2.4 kcal/mol), while the base pairs were very flexible. It can be concluded from all these studies that most of the DNA base pairs, including the GCWC and TAWC pairs, are planar and also that for the remaining pairs the energy improvement due to nonplanarity is rather small. However, the base pairs

Figure 2. HF/6-31G**-optimized geometries of DNA base pairs and their dipole moments obtained at the same level. For the GG base pairs, the subsystem dipole moments are also depicted (dotted arrows).

are very flexible. We shall analyze the nonplanarity and flexibility of DNA base pairs in the near future.15 Two base pairs from the initial set, GG2 and GC2, are not included in Table 1, because the geometries of these base pairs converged toward the more stable H-bonded base pairs GG1 and GCNEW. The GCNEW base pair is very stable and was not considered in the initial set of base pairs because its hydrogen bonding involves the HN1 group of cytosine. The GG2 and GC2 base pairs seem not to represent minima on the HF/6-31G** potential energy surfaces of the GG and GC dimers. These results contradict the outcomes of previous gradient optimizations carried out at the HF/MINI-1 level,1i where local minima were found for both GG2 and GC2 base pairs. However, we do not exclude that the GG2 and GC2 base pairs are local minima. The inclusion of electron correlation during optimization would reduce the strength of the electrostatic interactions, which should lower somewhat the energetic preference for the GG1 and GCNEW base pairs.

Structures and Energies of H-Bonded DNA Base Pairs

J. Phys. Chem., Vol. 100, No. 5, 1996 1969

TABLE 1: Energies (kcal/mol) of H-Bonded DNA Base Pairs, Obtained at the HF/6-31G**-Optimized Geometriesa GCWC GG1 GCNEW CC GG3 GA1 GT1 GT2 AC1 GC1 AC2 GA3 TAH TARH TAWC TARWC AA1 GA4 TC2 TC1 AA2 TT2 TT1 TT3 GA2 GG4 AA3 2aminoAT IC

EHF1

EHF2

EDFT

EDEF

EHF1+EDEF

ECOR

EMP2

ET

∆H

NNE

dipole

-25.5 -25.0 -22.8 -17.3 -16.8 -12.6 -14.0 -13.7 -11.9 -12.7 -11.4 -11.0 -10.9 -10.9 -10.3 -10.2 -8.7 -8.8 -9.2 -9.1 -8.0 -9.0 -9.1 -9.1 -7.5 -7.3 -6.9 -12.6 -19.3

-24.6 -25.1 -22.7 -16.1 -16.0 -12.2 -14.2 -13.8 -10.8 -11.6 -10.4 -10.8 -10.4 -10.3 -9.7 -9.6 -7.8 -7.9 -8.9 -8.7 -7.2 -9.3 -9.3 -9.3 -6.8 -6.5 -6.2 -11.9 -18.5

-26.5 -24.8 -22.7 -18.4 -17.4 -14.5 -14.8 -14.7 -13.7 -14.2 -13.4 -12.7 -12.5 -12.6 -11.9 -11.4 -10.7 -10.8 -10.7 -10.1 -10.3 -9.8 -9.9 -9.6 -9.2 -9.2 -8.8 -14.8 -19.2

+2.1 +2.5 +2.2 +1.4 +0.9 +1.1 +1.3 +1.2 +0.9 +1.0 +0.9 +1.2 +0.7 +0.7 +0.7 +0.7 +0.6 +0.6 +0.9 +0.9 +0.6 +0.6 +0.6 +0.6 +0.7 +0.7 +0.6 +1.0 +1.4

-23.4 -22.5 -20.6 -15.9 -15.9 -11.5 -12.6 -12.5 -11.0 -11.7 -10.5 -9.8 -10.2 -10.2 -9.6 -9.5 -8.1 -8.1 -8.3 -8.2 -7.4 -8.4 -8.4 -8.5 -6.8 -6.6 -6.3 -11.6 -17.8

-1.2 +0.4 +0.6 -2.7 -1.8 -3.0 -0.9 -0.9 -3.5 -2.7 -3.7 -3.0 -2.9 -2.9 -2.7 -2.7 -3.7 -3.4 -2.7 -2.7 -3.8 -1.3 -1.3 -1.3 -3.5 -3.5 -3.7 -3.2 -0.9

-25.8 -24.7 -22.2 -18.8 -17.8 -15.2 -15.1 -14.7 -14.3 -14.3 -14.1 -13.8 -13.3 -13.2 -12.4 -12.4 -11.5 -11.4 -11.6 -11.4 -11.0 -10.6 -10.6 -10.6 -10.3 -10.0 -9.8 -15.1 -19.4

-23.8 -22.2 -19.9 -17.5 -17.0 -14.1 -13.9 -13.5 -13.5 -13.4 -13.2 -13.1 -12.7 -12.6 -11.8 -11.7 -11.0 -10.7 -10.7 -10.7 -10.3 -10.0 -10.0 -9.9 -9.6 -9.3 -9.2 -14.1 -18.0

-21.9 -21.5 (-19.0) -15.5 (-16.6) (-13.3) (-13.3) (-13.0) -11.7 -12.3 -11.4 (-12.3) -11.4 -11.3 -10.5 -10.4 -9.3 (-9.9) (-9.5) (-9.5) -8.8 -9.1 -9.2 -9.1 (-8.9) (-9.4) (-7.8) (-12.2) -16.4

0 0 1 0 1 1 1 1 0 0 0 2 0 0 0 0 0 2 1 1 0 0 0 0 2 3 1 1 0

6.5 0 3.1 0 10.5 5.6 7.7 8.6 4.8 12.7 9.7 8.8 6.4 5.9 2.0 2.5 0 9.2 4.5 5.9 4.9 0 1.3 0 7.3 0 0 4.2 5.0

a HF1 E , HF/6-31G** interaction energy; EHF2, HF/6-31G*(0.25) interaction energy; EDFT, Becke3LYP/6-31G** interaction energy; EDEF, deformation energy; ECOR, MP2/6-31G*(0.25) correlation interaction energy; EMP2 ) EHF2 + ECOR, interaction energy of the pair; ET, total stabilization energy (ET ) EMP2 + EDEF). Interaction enthalpy ∆H at 0 K is a sum of the interaction energy ET and the scaled HF/6-31G** ZPE contribution. NNE is the number of negative eigenvalues in the Hessian matrix obtained for the planar optimized pair from the harmonic vibrational analysis. Dipole moment (D) was also calculated at the HF/6-31G** level.

The deformation energies range from +0.6 to +2.5 kcal/ mol.10 As expected, the largest deformation energies are found for the three most stable base pairs, where the intermolecular interactions are strong. The deformation energies reported here are significantly smaller than those obtained previously at the HF/MINI-1 level.1i Overestimation of the deformation energy of DNA bases upon complexation is one of the artifacts caused by the minimal basis sets.1i The total stabilization energies of the base pairs range from -23.8 to -9.2 kcal/mol. The energetic ordering of the base pairs is similar, but not the same as found at the HF/MINI-1 level.1b,i The GCWC base pair is the most stable base pair, followed by the GG1 base pair. The most striking difference between the present results and the older HF/MINI-1 data is the relative position of the GG4 base pair on the energy scale. This base pair belongs to the 10 most stable base pairs at the HF/MINI-1 level,1b,i but upon inclusion of the polarization functions and correlation effects, it becomes the second least stable. The base pairs consisting of G and C are usually rather stable, which can be explained by the much larger dipole moments of G and C compared to the dipole moments of A and T. The stabilization of the H-bonded DNA base pairs originates from the Hartree-Fock contribution. This result confirms the electrostatic nature of the H-bonded base pairs. However, the correlation contributions are also important and amount to 3040% of the total stabilization energy for the least stable base pairs. Let us recall that the correlation interaction energy consists of two parts (which, unfortunately, cannot be separated): the intermolecular correlation energy corresponds to the attractive dispersion energy, while the change of intramolecular correlation energy covers the reduction of the DNA base dipole moments (or electrostatic interactions) when passing from the

HF level to the MP2 level. This term is repulsive in the case of attractive electrostatic dipole-dipole interaction and becomes attractive in the case of repulsive dipole-dipole interaction. Because dipole-dipole interaction is attractive for H-bonded base pairs, the intramolecular correlation interaction energy is repulsive. For GG1 and GCNEW base pairs, this term is so strongly repulsive that even the total correlation interaction energy becomes repulsive despite the attractive dispersion energy. The correlation interaction energy for H-bonded DNA base pairs is significantly smaller than the stabilization predicted using the empirical London dispersion energy.1b,i,9 The same result was recently reported, for example, for the amino group interaction in methyl amine dimer.16 For stacked complexes of DNA bases, on the other hand, the empirical dispersion energy has been shown to significantly underestimate the correlation contribution to the stabilization.9 The opposite results obtained for stacked and H-bonded DNA base pairs thus underline the limited applicability of simple empirical dispersion potentials: their usage is inadequate when, for example, H-bonded and stacked DNA base pairs are to be compared. As mentioned above, there is a very limited number of gas phase experiments on DNA base pairs. The experimental enthalpies, obtained by field mass spectroscopy,2a are -21, -16, -13, and -9 kcal/mol for the GC, CC, TA, and TT base pairs, respectively. The calculated interaction enthalpies (at 0 K) of these base pairs (Table 1) are -22, -15, -11, and -9 kcal/ mol, so the theoretical prediction agrees well with experimental stability order of the base pairs. This is the best agreement between theory and gas phase experiment achieved so far. (We do not specify the type of H-bond pairing in the case of gas phase experimental data because the experimental geometry is not known. In the experiments discussed, however, the most

Sˇ poner et al.

1970 J. Phys. Chem., Vol. 100, No. 5, 1996 stable pairs were detected, so that also for the comparison of theory and experiment the most stable calculated base pairs were selected.) Very recently, Dey et al.2b studied the association of free nucleic acid bases by IR laser desorption of the molecules into a supersonic expansion. They reported the following order of the “pseudoassociation constants”: GC > GG > CC > TA > AA, TT, AG > AC, TC, GT (formation of the last three base pairs was not observed). This experimental order agrees with our calculated stabilization energies as far as the three most stable base pairs are concerned, but there are differences for the remaining base pairs. Nevertheless, this experimental order cannot be directly compared with the calculated data because the experimental order is due to the free energy (∆G) of formation of the base pairs. Work is in progress in our laboratory to evaluate ∆G characteristics for the formation of H-bonded DNA base pairs in vacuo. In the first study made at the correlated level of theory on H-bonded DNA base pairs, Gould and Kollman1j reported the following interaction enthalpies: -25.4 for the GCWC base pair and -12.8 and -11.9 kcal/mol for the TAH and TAWC base pairs, respectively. The difference between their and our results is mainly due to Gould and Kollman’s non-use of the counterpoise correction of the basis set superposition error for the correlation part of the interaction energies. We believe, as do many others,11b,c that both the EHF and ECOR interaction energies should be corrected using the counterpoise method, as explained in our previous studies.1g,5b,c 3.2. Molecular Structures of the Base Pairs. Figure 1 shows the structures of isolated DNA bases and their dipole moments. We present here the HF/6-31G** dipole moments because this level of theory was also used for optimization of the base pairs. More accurate MP2 geometries of nonplanar and planar isolated DNA bases6c and their correlated dipole moments9 can be found elsewhere. Although differences in HF and MP2 planar optimized DNA base geometries are insignificant, the HF/6-31G** dipole moments are somewhat overestimated compared to the values obtained at the MP2 level. Table 2 summarizes the optimum intermolecular geometries of the base pairs. In the first column, the hydrogen bond lengths (i.e., the distances between the hydrogen donors, X, and acceptors, Y) and the X-H‚‚‚Y angles are given. The next column summarizes the distances to some other groups that seem to influence the structures and stabilites of the base pairs. These interactions will be called “secondary interactions”1q throughout this paper. Figure 2 presents the optimized geometries of all base pairs, together with their HF/6-31G** dipole moments; see also Table 1. It should be mentioned that base pair dipole moments do not correlate with the stabilization energies. The zero dipole moment of several base pairs (GG1, CC, AA1, TT2, TT3, GG4, AA3) is due to the presence of an inversion center in these base pairs. The most stable GCWC base pair possesses three almostlinear hydrogen bonds: the O6‚‚‚(H)N4 bond, 2.92 Å; the N2(H)‚‚‚O2 bond, 3.02 Å; and the N1(H)‚‚‚N3 bond, 3.04 Å. The second most stable GG1 base pair contains two hydrogen bonds, but its stability is comparable to the GCWC base pair. This confirms the known fact that stability of the H-bonded DNA base pairs cannot be solely rationalized on the basis of the number of hydrogen bonds.1b,h,q Table 1 demonstrates that the GG1 base pair exhibits a repulsive correlation interaction energy ECOR and a large deformation energy, which also indicates a strong interaction. This GG1 base pair contains two secondary interactions between the amino groups and the carbonyl groups; the amino group hydrogen-carbonyl oxygen distance is 2.63 Å. Although this distance is too large to

TABLE 2: Selected Intermolecular Distances and Angles of the HF/6-31G**-Optimized Geometries of DNA Base Pairs H-bonded pair GC WC GG1 GCNEW CC GG3 GA1 GT1 GT2 AC1 GC1 AC2 GA3 TAH TARH TAWC TARWC AA1 GA4 TC2 TC1 AA2 TT2 TT1 TT3 GA2 GG4 AA3 2aminoAT IC

primary H-bonds N2(H)‚‚‚O2 N1(H)‚‚‚N3 O6‚‚‚(H)N4 N1(H)‚‚‚O6 O6‚‚‚(H)N1 N1(H)‚‚‚O2 O6‚‚‚(H)N1 N3‚‚‚(H)N4 N4(H)‚‚‚N3 N1(H)‚‚‚N7 N2(H)‚‚‚O6 O6‚‚‚H(N6) N1(H)‚‚‚N1 O6‚‚‚H(N3) N1(H)‚‚‚O4 06‚‚‚H(N3) N1(H)‚‚‚O2 N6(H)‚‚‚N3 N1‚‚‚(H)N4 N2(H)‚‚‚N3 N3‚‚‚(H)N4 N7‚‚‚(H)N4 N6(H)‚‚‚N3 O6‚‚‚(H)N6 N1(H)‚‚‚N7 O4‚‚‚H(N6) N3(H)‚‚‚N7 O2‚‚‚H(N6) N3(H)‚‚‚N7 N6(H)‚‚‚O4 N3(H)‚‚‚N1 O2‚‚‚H(N6) N3(H)‚‚‚N1 N6(H)‚‚‚N1 N1‚‚‚(H)N6 N2(H)‚‚‚N1 N3‚‚‚H(N6) O4‚‚‚H(N4) N3(H)‚‚‚N3 O2‚‚‚H(N4) N3(H)‚‚‚N3 N1‚‚‚H(N6) N6(H)‚‚‚N7 O4‚‚‚H(N3) N3(H)‚‚‚O4 N3(H)‚‚‚O4 O2‚‚‚(H)N3 O2‚‚‚(H)N3 N3(H)‚‚‚O2 N2(H)‚‚‚N7 N3‚‚‚H(N6) N2(H)‚‚‚N3 N3‚‚‚(H)N2 N6(H)‚‚‚N7 N7‚‚‚(H)N6 N6(H)‚‚‚O6 N3‚‚‚(H)N1 N2(H)‚‚‚O4 O6‚‚‚(H)N4 N1(H)‚‚‚N3

3.02/178.1 3.04/176.1 2.92/177.0 2.87/178.1 2.87/178.1 2.82/175.0 2.92/179.0 3.05/173.2 3.05/173.2 2.96/172.5 3.27/166.2 2.95/179.9 3.19/179.3 2.98/171.8 2.89/179.8 2.98/170.8 2.90/179.8 3.08/175.3 3.13/178.0 3.01/179.0 3.22/175.7 3.16/177.9 3.08/166.4 2.94/163.4 3.24/175.6 3.14/170.1 2.95/175.6 3.14/170.2 2.95/176.3 3.09/172.7 2.99/178.8 3.09/172.6 2.99/178.1 3.16/179.4 3.16/179.4 3.10/177.8 3.19/178.0 2.95/175.5 3.22/164.7 2.95/173.3 3.23/163.3 3.17/163.3 3.16/175.1 2.98/167.4 2.98/167.4 2.98/166.2 2.97/167.3 2.98/166.2 2.98/166.2 3.12/171.4 3.24/164.4 3.15/178.8 3.15/178.8 3.18/158.4 3.18/158.4 3.05/177.1 3.08/179.8 3.09/179.3 2.99/173.5 2.94/175.8

selected secondary interactions

H(N2)‚‚‚O6 2.63 O6‚‚‚H(N2) 2.63 H(N2)‚‚‚O2 2.62

O6‚‚‚H(C8) 2.65 H(N2)‚‚‚H(C2) 2.37 H(N2)‚‚‚O4 2.85 H(N2)‚‚‚O2 2.87

O2‚‚‚H(C8) 2.89 O4‚‚‚H(C8) 2.89 O2‚‚‚H(C8) 2.96 O4‚‚‚H(C2) 2.95

O2‚‚‚O2 3.63 O4‚‚‚O2 3.64

(C2)H‚‚‚O2 2.87

a

First column: hydrogen bond lengths/angles (X(H)‚‚‚Y/X-H‚‚‚Y) for the base pairs. Second column: selected secondary interactions.

consider these contacts as true hydrogen bonds, they contribute to the stabilization of the base pair through long-range electrostatic attraction. The GCNEW base pair exhibits a similar pattern of hydrogen bonding, with two N(H)‚‚‚O bonds and a repulsive correlation interaction energy. However, the GCNEW base pair contains only one attractive amino group-carbonyl group secondary interaction. This could explain why this base pair is somewhat

Structures and Energies of H-Bonded DNA Base Pairs less stable than the GG1 base pair. The attractive secondary interaction is responsible for the fact that the adjacent N(H)‚‚‚O bond is shorter than the other hydrogen bond in the base pair. The CC base pair contains two N‚‚‚(H)N hydrogen bonds, which are significantly longer (3.05 Å) than the N(H)‚‚‚O bonds in the GG1 and GCNEW base pairs. N‚‚‚(H)N bonds are usually longer than the O‚‚‚(H)N bonds, because the van der Waals radius of the oxygen atom is somewhat smaller than that of the nitrogen atom, and the oxygen atom is more electronegative than the nitrogen atom. The GG3 base pair contains an attractive secondary interaction between the H(C8) hydrogen of one guanine and the O6 oxygen of the other guanine (rOH ) 2.65 Å). It changes the mutual position of bases compared to the geometry that would be required by the two primary hydrogen bonds only. The N(H)‚‚‚N bond is compressed (2.96 Å), while the N(H)‚‚‚O bond is weakened (3.27 Å) and rather nonlinear. There was an ongoing dispute as to whether the H(C) group could be involved in attractive intermolecular interactions; however, it is now clear from experimental studies and quantum-chemical calculations carried out on small systems that the H(C) group participates in such interactions.17 Our study indicates that this conclusion is also valid for larger complexes such as DNA base pairs. The GA1 base pair possesses a secondary interaction between the amino group hydrogen of guanine and the H(C2) hydrogen of adenine. Because this secondary interaction is repulsive, the adjacent H-bond is weakened (rNN ) 3.19 Å). As revealed previously at the HF/MINI-1 level, the H(N2)‚‚‚H(C2) repulsion can be eliminated by a pyramidalization of the amino group combined with a propeller twist.1i This leads to more favorable interaction of the amino group nitrogen lone pair and the H(C2) hydrogen, thus making it resemble other hydrogen-nitrogen lone pair interactions in DNA.6c,16,18 The GT1 base pair contains a rather weak attractive secondary interaction between the guanine amino group hydrogen and the thymine O4 oxygen, and the adjacent H-bond is somewhat shorter than the other H-bond. The GT2 base pair has the same geometric properties and interaction energy as the GT1 base pair. This means that the reorientation of thymine with respect to guanine when going from the GT1 to the GT2 base pair does not influence the base pairing. The difference between the GT1 and GT2 optimized structures is practically equivalent to rotation of thymine around its N3-C6 axis by 180°. (Such a reorientation of thymine will be called “thymine reversion” throughout this paper.) This interchanges the position of the O4 and O2 thymine carbonyl groups with respect to guanine, and the similarity between the GT1 and GT2 pairs thus demonstrates that the H-bonds and secondary interactions formed by O2 and O4 groups of thymine are identical. AC1, GC1, and AC2 base pairs contain only N‚‚‚(H)N bonds, which are all longer than 3.00 Å. The GC1 base pair exhibits a significantly lengthened N3‚‚‚H(N4) bond (3.22 Å). This base pair does not contain any evident secondary interaction; the shift probably improves the electrostatic interaction. The GA3 base pair contains a repulsive secondary contact between the guanine amino group hydrogen and the adenine H(C8) hydrogen, resulting in weakening of the adjacent N1(H)‚‚‚N7 bond. The four AT base pairs have very similar interaction energies and all seem to contain one weak O‚‚‚H(C) secondary interaction. The reversion of thymine does not influence the energy and geometry of base pairs. The TAH and TARH base pairs are slightly more stable than the TAWC and TARWC base pairs, in agreement with previous calculations.1b,i,j The AA1 and GA4 base pairs do not contain any secondary interactions. Also, results obtained for the two TC base pairs confirm that

J. Phys. Chem., Vol. 100, No. 5, 1996 1971 the reversion of thymine does not affect the structure and stability of the hydrogen bonding. These base pairs contain a repulsive close contact of the carbonyl oxygens (3.63 Å), which is responsible for a prolongation of the N3(H)‚‚‚N3 bonds. No apparent indication of secondary interactions was found for the other six base pairs. The three TT base pairs have similar stabilization energies, and again, reversion of thymine does not affect the base pairing. The GA2, GG4, and expecially the AA3 structures represent weakly bonded base pairs with a significant part of the stabilization originating in the electron correlation contributions. As far as the remaining two base pairs are concerned, the 2aminoAT base pair is rather weak, despite the fact that it contains three hydrogen bonds. The three hydrogen bonds are longer than the hydrogen bonds of the GCWC base pair. 2aminoA has a very small molecular dipole moment (0.9 D), so this DNA base can hardly form strong H-bonded complex. The IC base pair is more stable. It contains one attractive secondary electrostatic interaction (see Table 2), and the inosine dipole moment is rather large, with the same orientation as the guanine dipole moment (cf. Figure 1). It is obvious that the strength of DNA base pairs cannot be estimated solely on the basis of the number of hydrogen bonds, so two other simple estimates were proposed. First, the strength of the base pairs is determined by the polarity of the monomers,1h i.e., by the size and mutual orientation of their dipole moments. This fully explains why the base pairs formed by highly polar bases (G, C) are significantly stronger than base pairs formed by less polar bases. Further, mutual orientation of subsystems is responsible for the fact that stability of the various structures of one base pair can differ considerably; see, for example, the mutual orientation of monomer dipoles in GG1 and GG4 (Figure 2). The difference in the complexation energies of the GCWC and 2aminoAT base pairs can be explained by considering the polarity of the monomers.1h Pranata et al.1q proposed another explanation of the different stability of the GCWC and 2aminoAT base pairs, based on a consideration of the secondary interactions in triply hydrogen-bonded base pairs.1q The sum of these secondary contributions is repulsive in the case of the 2aminoAT base pair and attractive for the GCWC base pair (see also Figure 12 in ref 1q). Both the secondary interaction model and the monomer polarity model are also able to explain the different stabilities of the GT* and G*T tautomeric base pairs (* indicates the rare tautomer), revealed by recent lower level ab initio calculations.1n The two above-mentioned explanations of differences in DNA base pair hydrogen-bonding stability do not contradict each other. H-bonding between DNA bases originates mainly in the electrostatic interaction of electric multipoles of the bases. With strongly polar bases the dipole moment is seemingly the most important. The dipole-dipole description could not reveal the fine structural features of the base pair, and for these purposes it is useful to pass from molecular characteristics to the atomic characteristics (point charges, dipoles, etc.); then, the secondary interactions become immediately apparent. It must be stressed, however, that hydrogen bonding is the result of a delicate balance among various energy contributions; any attempt to find a simplified model must therefore fail in some cases. Only the full ab initio description can unambiguously elucidate the nature of binding in various DNA base pairs. Another difference between the GCWC and 2aminoAT base pairs is the intrinsic nonplanarity of 2aminoAT (one negative eigenvalue of the Hessian matrix for the planar base pair), which might be surprising, because 2-aminoadenine has a smaller amino group pyramidalization than guanine.6 Floria´n and

Sˇ poner et al.

1972 J. Phys. Chem., Vol. 100, No. 5, 1996 Leszczynski explained propeller twisting of the 7-methyloxoformycin B‚‚‚2,4-diaminopyridine base pair, which has hydrogen bonding similar to that of the 2aminoAT base pair, as a consequence of its lower strength compared to the GCWC base pair; however, the secondary interaction model can be used as well. In both the 2aminoAT and 7-methyloxoformycin B‚‚‚2,4diaminopyridine base pairs, there are secondary repulsive interactions of the amino group hydrogen atoms of one monomer with the H(N) group of the other monomer, participating in the middle H-bond. Propeller twist combined with pyramidalization of the amino groups improves this interaction. We have found that the HF/6-31G** unconstrained optimized geometry of the 2aminoAT pair is slightly propeller twisted (5°), with the energy difference between the planar and nonplanar structures less than 0.1 kcal/mol. This is a lower nonplanarity than reported for the other triply bonded 7-methyloxoformycin B‚‚‚2,4-diaminopyridine base pair (12°, 0.1 kcal/mol). However, in view of the previous results on isolated DNA bases, the nonplanarity of the pairs obtained with the HF method could be somewhat underestimated (see also the discussion in ref 6b). Also important is the extent to which the intrinsic geometry imposed by chemistry of the bases places limitations on the optimum orientation of the hydrogen donor and hydrogen acceptor atoms and hence limits the strength of association. Changes in the valence bond angles of bases upon dimerization are moderate (less than 4°, not shown); despite this, most of the H-bonds are linear or almost linear. In addition, Table 2 demonstrates that there is no clear correlation between the H-bond linearity and base pair stability. It seems, therefore, that in general the intrinsic geometry and rigidity of bases do not influence the stability of H-bonded base pairs significantly, probably with the exception of the AA3 pair having significantly nonlinear (158°) H-bonds. However, let us again emphasize that stability of H-bonded base pairs is a result of many contributions that are coupled and compensate mutually and cannot be fully separated (isolated). The concept of two or three linear hydrogen donor-hydrogen acceptor H-bonds is a useful approximation to rationalize the interactions but is not an exhaustive description of the interactions within the base pair. Some H-bonds’ nonlinearity is thus not necessarily associated with less favorable intermolecular interactions caused by the inability of the monomers to establish optimal “linear” bonds. 3.3. Optimization with Inclusion of the Electron Correlation. The present geometries of base pairs resulting from HF/6-31G** gradient optimization are affected by several factors, the most important being (i) size of the basis set; (ii) neglect of correlation energy; and (iii) lack of correction for basis set superposition error during the optimization. From smaller clusters (e.g. formamide dimer19) we know that the 6-31G** basis set yields resonable intermolecular geometries. To estimate the importance of the remaining two factors, we performed the gradient optimization of the CC base pair at the MP2/6-31G** level. The results can be summarized as follows: The MP2/6-31G**//MP2/6-31G** interaction energy amounts to -20.2 kcal/mol, which is only slightly lower than that at the MP2/6-31G**//HF/6-31G** (-19.1 kcal/mol) and MP2/6-31G*(0.25)//HF/6-31G** (-18.8 kcal/mol) levels. The MP2/6-31G** deformation energy of +1.8 kcal/mol is slightly larger than that at the HF/6-31G** level. The MP2/6-31G** gradiently optimized N3‚‚‚(H)N H-bond length is 2.92 Å. This is a rather significant shortening compared to the HF-optimized geometry (3.05 Å). Such a result is not surprising, because the dispersion attraction is not included at the HF level. Let us recall that the MP2/6-31G*(0.25) step-by-step (nongradient, rigid monomers) optimization led to increased bond

TABLE 3: Dependence of the MP2/6-31G** Interaction Energy (EMP2) and Its HF and Correlation Components EHF and ECOR on the N4(H)‚‚‚N3 Distance in the CC Paira N4(H)‚‚‚N3

EHF

ECOR

EMP2

EDFT

2.80 2.83 2.86 2.89 2.92 2.95 2.98 3.01 3.04 3.07 3.10 3.15 3.20 3.30 3.40 3.60 3.80 4.00 4.30 4.60 5.00

-15.771 -16.389 -16.929 -17.373 -17.684 -17.976 -18.162 -18.280 -18.339 -18.344 -18.302 -18.142 -17.881 -17.163 -16.261 -14.260 -12.288 -10.515 -8.320 -6.032 -5.009

-3.348 -3.122 -2.911 -2.710 -2.546 -2.350 -2.186 -2.033 -1.890 -1.755 -1.629 -1.435 -1.263 -0.961 -0.711 -0.324 -0.048 +0.148 +0.326 +0.404 +0.412

-19.119 -19.551 -19.840 -20.088 -20.230 -20.326 -20.345 -20.313 -20.229 -20.099 -19.931 -19.577 -19.144 -18.124 -16.972 -14.584 -12.336 -10.663 -7.994 -6.228 -4.597

-19.600 -19.909 -20.114 -20.294 -20.372 -20.380 -20.331 -20.230 -20.082 -19.895 -20.078 -19.282 -18.937 -17.930 -16.498 -14.052 -11.621 -9.522 -7.202 -5.746 -4.493

a EDFT is the Becke3LYP/6-31G** interaction energy. The energy dependence is obtained by varying the H-bond length starting from the MP2/6-31G** (Becke3LYP/6-31G**)-optimized geometry. All energies are given in kcal/mol.

TABLE 4: Dependence of the MP2/6-31G*(0.25) Interaction Energy (EMP2) and Its HF and Correlation Components on the N1(H)‚‚‚O6 Distance in the GG1 Paira N1(H)‚‚‚O6

EHF

ECOR

EMP2

3.00 3.10 3.30 3.60 4.00 4.30 4.60 5.00

-24.555 -23.671 -21.204 -17.235 -12.762 -10.106 -8.229 -6.328

+0.447 +0.501 +0.629 +0.806 +0.955 +1.000 +0.986 +0.912

-24.108 -23.170 -20.575 -16.429 -11.807 -9.106 -7.243 -5.416

a

All energies are in kcal/mol.

lengths for the CC dimer, 3.03 Å.1g The differences in these two MP2-optimized geometries can be attributed to the following factors: (i) the point-by-point procedure does not include the relaxation of monomers; and (ii) the gradient method is spoiled by the BSSE, shortening the H-bonds. Therefore, the N‚‚‚(H)N bond length of the CC dimer was finally reoptimized using the step-by-step method at the MP2/6-31G** level, starting from the gradient-optimized MP2/6-31G** structure. Because of the center of symmetry, only one parameter, the N3‚‚‚(H)N4 distance, was varied. The optimized bond length, including now both the intramolecular relaxation and the BSSE, amounts to 2.98 Å, and the binding energy is improved by 0.3 kcal/mol. Table 3 shows the dependence of the interaction energy, EMP2, on the N‚‚‚(H)N distance. Also, the data obtained by the DFT method are included for comparison (see below). Note that, at larger base-base separations, the correlation interaction energy is positive (+0.4 kcal/mol). This is because the intersystem correlation energy is roughly proportional to r-6, while the change in intramolecular correlation energy is proportional to r-3. This means that the intersystem correlation energy (dispersion energy), which is always attractive, decreases faster than the repulsive change of the intramolecular correlation energy. For the same reason, the correlation interaction energy for two other base pairs, GG1 and GCNEW, is repulsive even at its minimum (obtained at the HF level, Table 1). Table 4 shows the dependence of the interaction energy on base-base separa-

Structures and Energies of H-Bonded DNA Base Pairs tion for the GG1 base pair. At larger separations, the correlation interaction energy increases to as much as +1.0 kcal/mol. 3.4. Reliability of the DFT Method. A less expensive alternative to the MP2 calculations is the DFT method using nonlocal functionals. The DFT method seems to provide good results for H-bonded complexes, including the vibrational analysis.20 It gives a good description of the properties of isolated DNA bases, their geometry, amino group pyramidalization, dipole moments, and their molecular electrostatic fields.21 However, the DFT method does not reproduce well the results of correlated ab initio calculations for the tautomeric equilibria of DNA bases.21c We recently investigated the reliability of the DFT method for a representative set of molecular clusters, ranging from strong ionic pairs to true weak van der Waals molecules.9,22 The results for H-bonded complexes are reasonable, but far from excellent. If larger basis sets (TZ2P) are used, the DFT method overestimates the binding energies as compared to the correlated ab initio calculations.22b The DFT method failed completely for the van der Waals clusters.22a For stacked complexes of DNA bases, the DFT method reproduces well the changes in interaction energy due to twist and displacement, because the electrostatic part of the interaction energy is correctly reproduced by the DFT method.9 On the other hand, the DFT method strongly underestimates the stabilization energy of stacked base pairs, which is attributed mainly to the missing intermolecular correlation contribution.9 In addition, the DFT method gives a quite incorrect dependence of the stacking energy on the vertical separation of DNA bases.9 Similar failure was found for other systems such as benzene‚‚‚Ar and Ar‚‚‚Ar.22a Here we compare the DFT and MP2 methods for H-bonded DNA base pairs. First, we optimized the CC and GCWC base pairs at the Becke3LYP/6-31G** DFT level. The optimizations give the following H-bond lengths: CC, 2.90 Å; GCWC, 2.78 Å (O6‚‚‚(H)N4), 2.92 Å (N3‚‚‚(H)N1), and 2.93 Å (N2(H)‚‚‚O2). The DFT interaction energies, corrected for the basis set superposition error, are -29.6 and -20.4 kcal/mol for the GCWC and CC base pairs, respectively, while the corresponding EMP2 values from Table 1 are more positive, -25.8 and -18.8 kcal/mol. The DFT deformation energy of bases is +3.3 kcal/ mol for the GCWC and +2.3 kcal/mol for the CC base pair. The DFT method thus overestimates the deformation energy (cf. the EDEF values in Table 1), while the H-bond lengths obtained by the DFT method seem to be too short. The DFT method also overestimates the stabilization energy (the sum of interaction and deformation energies) of the GCWC and CC DNA base pairs, compared to the MP2/6-31G*(0.25)//HF/631G** calculations. The respective DFT values are -26.6 and -18.1 kcal/mol, and the corresponding ab initio data are -23.8 and -17.5 kcal/mol. Our conclusions are supported by the results of Santamaria and Vazquez,1m obtained for the GCWC and TAWC base pairs with a different functional and with a basis set optimized for the DFT calculations. Their interaction energy for the TAWC base pair was -13.9 kcal/mol, and for the GCWC base pair, -27.7 kcal/mol. The deformation energy was +2.0 and +4.8 kcal/mol for the TAWC and GCWC base pairs, respectively. It should be noted, however, that part of the reported difference between the DFT and MP2 data could be attributed to the fact that, while the DFT energies were obtained at the DFT-optimized geometries (DFT//DFT), the MP2 interaction energies could be evaluated only at geometries obtained with the HF method (MP2//HF). The use of consistent MP2optimized geometries (MP2//MP2) would increase the intermolecular stabilization. In fact, for the CC base pair the DFT/

J. Phys. Chem., Vol. 100, No. 5, 1996 1973 /DFT interaction energy (-20.4 kcal/mol) agrees with the MP2/ MP2 interaction energy (-20.2) kcal/mol (cf. section 3.3), while the MP2//HF value is -18.8 kcal/mol. Unfortunately, we could not make the same comparison for the GCWC base pair, which exhibits a much larger difference between the DFT//DFT and MP2//HF interaction energies. With our present computational resources, we were not able to carry out the MP2 gradient optimization for the GCWC pair. We also calculated the DFT interaction energies for the HF/ 6-31G**-optimized geometries of the base pairs (Table 1). These interaction energies closely follow the MP2/6-31G*(0.25)//HF/ 6-31G** data, the differences ranging from -0.9 to +1.4 kcal/ mol. This is, at first glance, a very good result. However, as shown elsewhere, the same DFT procedure strongly underestimates the MP2 binding energies for the stacked DNA base pairs, even if the empirical London dispersion energy is added to the DFT data.9 The (present) DFT method is thus unable to reproduce, even qualitatively, the MP2 ab initio data when comparing the stacking and H-bonding interactions of DNA bases. Despite the apparently excellent DFT interaction energies obtained at the HF/6-31G**-optimized geometries, we cannot recommend this method, as presently used, for intermolecular complexes. 4. Conclusions (i) HF/6-31G** gradient optimization was carried out on 30 DNA base pairs under Cs symmetry. Their interaction energies were then calculated with inclusion of the electron correlation at the MP2 level. (ii) Harmonic vibrational analysis revealed that, for some of the studied base pairs, the planar optimized structure represents only a transition state or saddle point. (iii) The stabilization energies of the base pairs range from -24 to -9 kcal/mol. The calculated gas phase enthalpies agree well with the available experimental data. The stabilization of H-bonded DNA base pairs originates in the HF contribution; that is, it is primarily due to the electrostatic interactions. However, for weakly bonded base pairs, the correlation interaction energy amounts to as much as 30-40% of the stabilization. On the other hand, for some strong base pairs the correlation interaction energy is repulsive. To obtain reliable interaction energies for H-bonded DNA base pairs, the correlation energy must be included at least through the single-point calculation. (iv) The empirical London dispersion energy does not reproduce the correlation interaction energy. (v) Two base pairs, GG2 and GC2, were not obtained as local minima at the HF/6-31G** level. (vi) The stabilization energies and structures of base pairs are not determined solely by the primary hydrogen bonds, but they are strongly influenced by the polarity of the monomers and by a wide variety of secondary long-range electrostatic interactions. These interactions involve the H(C) hydrogen atoms. (vii) We carried out a first optimization for a DNA base pair at a corrleated level (CC base pair, MP2/6-31G** level). Compared to the HF/6-31G** level, the deformation energy increases from 1.4 to 1.8 kcal/mol and the N3‚‚‚(H)N4 H-bond lengths are shortened from 3.05 to 2.92 Å. When the BSSE is corrected by the step-by-step procedure, the equilibrium length of the N3‚‚‚(H)N4 bond of the CC base pair increases up to 2.98 Å. (vii) The gradient optimization carried out using the DFT method leads to overestimation of the deformation energies, and the obtained bond lengths are too short. The DFT interaction energies, calculated at the HF/6-31G**-optimized geometries,

Sˇ poner et al.

1974 J. Phys. Chem., Vol. 100, No. 5, 1996 closely follow the MP2 data. However, because the same calculation procedure strongly underestimates the MP2 binding energies for stacked DNA base pairs, the DFT method, as presently used, is not suitable to study the base-base interactions in general. Acknowledgment. This study was supported in part by ONR Grant N00014-95-1-0049, by NIH Grant 332090, and by Grant 203/94/1490 from the grant agency of the Czech Republic. The Mississippi Center for Supercomputing Research is acknowledged for a generous allotment of computer time. We thank M. J. Stewart and Dr. J. Floria´n for their comments on the manuscript. References and Notes (1) (a) Poltev, V. I.; Shulyupina, N. V. J. Biomol. Struct. Dyn. 1986, 3, 739. (b) Hobza, P.; Sandorfy, C. J. Am. Chem. Soc. 1987, 109, 1302. (c) Claverie, P.; Caron, F.; Boevue, J. C. Int. J. Quantum Chem. 1981, 19, 229. (d) Kudryatskaya, Z. G.; Danilov, V. I. J. Theor. Biol. 1976, 59, 303. (e) Aida, M. J. Comput. Chem. 1988, 9, 362. (f) Hrouda, V.; Floria´n, J.; Hobza, P. J. Phys. Chem. 1993, 97, 1542. (g) Hobza, P.; Sˇponer, J.; Pola´sˇek, M. J. Am. Chem. Soc. 1995, 117, 792. (h) Floria´n, J.; Leszczynski, J. J. Biomol. Struct. Dyn. 1995, 12, 1055. (i) Sˇponer, J.; Hobza, P. Chem. Phys., in press. (j) Gould, I. R.; Kollman, P. A. J. Am. Chem. Soc. 1994, 116, 2493. (k) Jursa, J.; Kypr, J. Gen. Physiol. Biophys. 1991, 10, 373. (l) Dive, D.; Dehareng, D.; Ghuysen, J. M. Theor. Chim. Acta 1993, 85, 409. (m) Santamaria, R.; Vazquez, A. J. Comput. Chem. 1994, 9, 981. (n) Jiang, S.-P.; Raghunathan, G.; Ting, K.-L.; Xuan, J. C.; Jernigan, R. L. J. Biomol. Struct. Dyn. 1994, 12, 367. (o) Colson, A.-O.; Besler, B.; Close, D. M.; Sevilla, M. D. J. Phys. Chem. 1992, 96, 661. (p) Sagarik, K. P.; Rode, B. M. Inorg. Chim. Acta 1983, 78, 177. (q) Pranata, J.; Wierschke, S. C.; Jorgensen, W. L. J. Am. Chem. Soc. 1991, 113, 2810. (r) Anwander, E. H. S.; Probst, M. M.; Rode, B. M. Biopolymers 1990, 29, 757. (s) Floria´n, J.; Leszczynski, J. Submitted to J. Am. Chem. Soc. (t) Trollop, K. I.; Gould, I. R.; Hillier, I. H. Chem. Phys. Lett. 1993, 209, 113. (2) (a) Yanson, I. K.; Teplitsky, A. B.; Sukhodub, L. F. Biopolymers 1979, 18, 1149. (b) Dey, M.; Moritz, F.; Grotemeyer, J.; Schlag, E. W. J. Am. Chem. Soc. 1994, 116, 9211. (c) Desfrancois, C.; Abdoul-Carime, H.; Schulz, C. P.; Schermann, J. P. Science, in press. (3) (a) Tso, P. O. P.; Melvin, I. S.; Olson, A. C. J. Am. Chem. Soc. 1963, 85, 1289. (b) Schweizer, M. P.; Broom, A. D.; Tso, P. O. P.; Hollis, D. P. J. Am. Chem. Soc. 1968, 90, 1042. (c) Solie, T. N.; Schellman, J. A. J. Mol. Biol. 1968, 33, 61. (d) van Holde, K. E.; Rosetti, G. P. Biochemistry 1967, 6, 2189. (e) Farquhar, E. L.; Downing, M.; Gill, S. J. Biochemistry 1968, 7, 1224. (f) Nakano, N. I.; Igarashi, S. J. Biochemistry 1970, 9, 577. (g) Imoto, T. Biochim. Biophys. Acta 1977, 475, 409. (h) Mayevsky, A. A.; Sukhorukov, B. I. Nucleic Acids Res. 1980, 8, 3029. (4) (a) Koygoku, Y.; Lord, R. C.; Rich, A. J. Am. Chem. Soc. 1967, 89, 496. (b) Pitha, J.; Jones, R. N.; Pithova, P. Can. J. Chem. 1966, 44, 1045. (c) Katz, L.; Penman, S. J. Mol. Biol. 1966, 15, 220. (d) Miller, J. H.; Sobell, H. M. J. Mol. Biol. 1967, 24, 345. (e) Imoto, T. Biochim. Biophys. Acta 1977, 475, 409. (5) (a) Hobza, P.; Zahradnı´k, R. Intermolecular Complexes. The Role of Van der Waals Systems in Physical Chemistry and in the Biodisciplines; Elsevier; Amsterdam, 1988. (b) Hobza, P.; Zahradnı´k, R. Chem. ReV. 1988, 88, 871. (c) Hobza, P.; Selzle, H. L.; Schlag, E. W. Chem. ReV. 1994, 94, 1601. (6) (a) Leszczynski, J. Int. J. Quantum Chem. Quantum Biol. Symp. 1992, 19, 43. (b) Sˇponer, J.; Hobza, P. J. Mol. Struct. (THEOCHEM) 1994, 304, 35. (c) Sˇ poner, J.; Hobza, P. J. Phys. Chem. 1994, 98, 3161. (d) Sˇ poner, J.; Hobza, P. J. Am. Chem. Soc. 1994, 116, 709.

(7) Møller, C.; Plesset, M. S. Phys. ReV. 1934, 46, 618. (8) (a) Kroon-Batenburg, L. M. J.; van Duijneveldt, F. B. J. Mol. Struct. 1985, 121, 185. (b) Hobza, P.; Melhorn, A.; C ˇ a´rsky, P.; Zahradnı´k, R. J. Mol. Struct. (THEOCHEM) 1986, 138, 387. (9) Sˇ poner, J.; Leszczynski, J.; Hobza, P. J. Comput. Chem., in press. (10) The deformation energy was calculated assuming a planar structure for the dimer and monomers. However, at the HF/6-31G** level, the nonplanar guanine with pyramidal amino group is 0.35 kcal/mol more stable than the planar one. For C and A, this inversion barrier is negligible. Therefore, the deformation energy, as well as ET and ∆H, could be corrected by +0.7 kcal/mol for all GG base pairs and by +0.35 kcal/mol for all GC, GA, and GT base pairs. However, part of this energy would be regained if the Cs symmetry restriction is not imposed on the base pairs. That is why we decided to use planar monomers to calculate the deformation energy. The ZPE contribution was obtained from fully optimized geometries of monomers. (11) (a) Boys, S. F.; Bernardi, F. Mol. Phys. 1970, 19, 553. (b) van Duijneveldt, F. B.; van Duijneveldt-van de Rijdt, J. G. C. M.; van Lenthe, J. Chem. ReV. 1994, 94, 1873. (c) Chalasinski, G.; Szczesniak, M. M. Chem. ReV. 1994, 94, 1723. (12) Becke, A. D. J. Chem. Phys. 1994, 98, 5648. (13) Frisch, M. J.; Trucks, G. W.; Head-Gordon, M.; Gill, P. M. V.; Wong, M. W.; Foresman, J. B.; Johnson, B. G.; Schlegel, H. B.; Rob, M. A.; Replogle, E. S.; Gomperts, R.; Andres, J. L.; Raghavachari, K.; Binkley, J. S.; Gonzales, C.; Martin, R. L.; Fox, D. J.; Defrees, D. J.; Baker, J.; Stewart, J. J. P.; Pople, J. A. Gaussian 92; Gaussian Inc.: Pittsburgh, PA, 1992. (14) Donohue, J. Proc. Natl. Acad. Sci. U.S.A. 1966, 55, 904. (15) In many cases, the optimization of nonplanar base pairs requires generation of the analytic second derivatives,1i which is very tedious. It should also be noted that our correlated calculations on isolated nonplanar DNA bases6 predict a larger nonplanarity (a significantly larger inversion barrier for the DNA base amino group pyramidalization) than we found at the HF/6-31G** level. Thus, more experience is necessary to select a level of calculations suitable for proper characterization of the nonplanarity of the base pairs. Further, anharmonicity of DNA base pairs should be properly taken into consideration. (16) (b) Sˇ poner, J.; Burcl, R.; Hobza, P. J. Biomol. Struct. Dyn. 1994, 11, 1357. (17) (a) Taylor, R.; Kennard, O. J. Am. Chem. Soc. 1982, 104, 5063. (b) Jeffrey, G. A.; Muluszynska, H. Int. J. Biol. Macromol. 1984, 4, 173. (c) Willberg, K. B.; Waldron, R. F.; Shulte, G.; Saunders, M. J. Am. Chem. Soc. 1991, 113, 971. (d) Steiner, T.; Sanger, W. J. Am. Chem. Soc. 1993, 115, 4540. (e) Leonard, G. A.; McAuley-Hecht, K. E.; Gibson, N. J.; Brown, T.; Watson, W. P.; Hunter, W. N. Biochemistry 1994, 33, 4755. (f) Alkorta, I.; Maluendes, S. J. Phys. Chem. 1995, 99, 6457. (g) Desiraju, G. C. Acc. Chem. Res. 1991, 24, 290. (h) Tsuzuki, S.; Uchimaru, T.; Tanabe, K. J. Chem. Phys. 1993, 97, 7899. (i) Turi, L.; Dannenberg, J. J. J. Chem. Phys. 1993, 97, 7899. (j) Visheswhara, S. Chem. Phys. Lett. 1978, 59, 29. (k) Reetz, M. T.; Hutte, S.; Goddard, R. J. Am. Chem. Soc. 1993, 115, 9339. (18) Sˇ poner, J.; Hobza, P. J. Biomol. Struct. Dyn. 1994, 12, 671. (19) Hrouda, V.; Floria´n, J.; Pola´sˇek, M.; Hobza, P. J. Phys. Chem. 1994, 98, 4742. (20) (a) Floria´n, J.; Johnson, B. G. J. Phys. Chem. 1994, 98, 3681. (b) Floria´n, J.; Johnson, B. G. J. Phys. Chem. 1995, 99, 5899. (c) Kim, K.; Jordan, K. D. J. Phys. Chem. 1994, 98, 10084. (d) Combariza, J. E.; Kestner, N. R. J. Phys. Chem. 1995, 99, 2717. (21) (a) Estrin, D. A.; Paglieri, L.; Corongiu, G. J. Phys. Chem. 1994, 98, 5653. (b) Sˇ poner, J.; Hobza, P. Int. J. Quantum Chem., in press. (c) Gould, I. R.; Burton, N. A.; Hall, R. J. J. Mol. Struct. (THEOCHEM) 1995, 331, 147. (22) (a) Hobza, P.; Sˇ poner, J.; Reschel, T. J. Comput. Chem. 1995, 16, 1315. (b) Nowek, A.; Leszczynski, J. J. Chem. Phys., submitted.

JP952760F