Comparative Assessment of Different RNA Tetranucleotides from the

Sep 28, 2016 - Figure 1) is stacked between bases 3 and 4. ..... The 1-3-stacked conformer of r(CCCC) is characterized by two independent 1-3 and 2-4 ...
0 downloads 0 Views 2MB Size
Article pubs.acs.org/JPCB

Comparative Assessment of Different RNA Tetranucleotides from the DFT-D3 and Force Field Perspective Rafał Szabla,† Marek Havrila,†,‡ Holger Kruse,*,† and Jiří Šponer*,†,‡ †

Institute of Biophysics, Academy of Sciences of the Czech Republic, Královopolská 135, CZ-61265 Brno, Czech Republic CEITEC − Central European Institute of Technology, Masaryk University, Campus Bohunice, Kamenice 5, CZ-62500 Brno, Czech Republic



S Supporting Information *

ABSTRACT: Classical force field (FF) molecular dynamics (MD) simulations of RNA tetranucleotides have substantial problems in reproducing conformer populations indicated by NMR experiments. To provide more information about the possible sources of errors, we performed quantum mechanical (QM, TPSS-D3/def2-TZVP) and molecular mechanics (MM, AMBER parm99bsc0+χOL3) calculations of different r(CCCC), r(GACC), and r(UUUU) conformers obtained from explicit solvent MD simulations. Solvent effects in the static QM and MM calculations were mimicked using implicit solvent models (COSMO and Poisson−Boltzmann, respectively). The comparison of QM and MM geometries and energies revealed that the two methodologies provide qualitatively consistent results in most of the cases. Even though we found some differences, these were insufficient to indicate any systematic corrections of the RNA FF terms that could improve the performance of classical MD in simulating tetranucleotides. On the basis of these findings, we inferred that the overpopulation of intercalated conformers in the MD simulations of RNA tetramers, which were not observed experimentally, might be predominantly caused by imbalanced water− solvent and water−water interactions. Apart from the large-scale QM calculations performed to assess the performance of the AMBER FF, a representative spectrum of faster QM methods was tested.



INTRODUCTION RNA plays a variety of important roles in living organisms such as regulation and expression of genes1,2 and catalysis.3,4 As more and more functions of RNA are being discovered, knowledge about the structure and dynamics of RNA is necessary for a comprehensive understanding of many biological processes and RNA-based therapeutic technologies.5,6 A common tool in this context is explicit solvent molecular dynamics (MD) simulations,7 in which the potential energy is evaluated by a classical, fixed-charge force field (FF). Due to the empirical and very approximate nature of the FFs, long time-scale MD simulations of RNA are often imperfect and may experience problems in predicting the correct relative stabilities of conformers, leading to imbalanced conformer populations when compared to experimental results. A major error source is the challenging description of the highly charged and flexible RNA backbone.8 The presence of 2′-OH groups and their capability to form various hydrogen-bonding patterns allow RNA molecules to form an astonishing spectrum of noncanonical structures,9,10 which also complicates a balanced FF description. Numerous corrections to the existing RNA FFs have been proposed so far, often, but not exclusively, focusing on the reparametrization of torsion terms.11−16 Nevertheless, the correct description of many noncanonical RNA structures © XXXX American Chemical Society

remains challenging, and the possible error sources are in large part unrecognized. Single-stranded RNA tetranucleotides have recently been identified as targets for FF benchmarking.17−21 Their small size enables a clear interpretation of the nuclear Overhauser effect (NOE) cross-peaks in NMR experiments and faster MD simulation studies than in the case of larger nucleotide chains.17 The latter point is particularly important because only a wellconverged conformational ensemble of the molecule allows for a clear evaluation of the relative populations of different substates.20 Previous works based on both NMR experiments and MD simulations addressed r(CCCC), r(GACC), r(UUUU), r(AAAA), and r(CAAU) tetranucleotides and revealed substantial discrepancies between the conformers observed by means of both approaches.17−22 In particular, all of the widely used RNA FFs predict significant populations of the so-called 3-1-4 intercalated structures in which base 1 (counting bases from the 5′ to the 3′-end, cf. Figure 1) is stacked between bases 3 and 4. The 3-1-4 and 1-3-2 intercalated structures are very stable in simulations of r(CCCC), r(GACC), r(AAAA), Received: July 27, 2016 Revised: September 27, 2016

A

DOI: 10.1021/acs.jpcb.6b07551 J. Phys. Chem. B XXXX, XXX, XXX−XXX

Article

The Journal of Physical Chemistry B

Figure 1. Conductor-like screening model (COSMO) TPSS-D3/def2-TZVP equilibrium geometries of the studied r(CCCC) conformers. The red nucleotide is at the 3′-end. The numbers next to Amaj indicate the nucleotide numbering.

for highly flexible molecules, which hinders the calculation of free energies and enthalpies. Furthermore, widely applied continuum solvation models lack specific solvent−solute interactions that are known to influence the solute geometry to various degrees.29,30 Nevertheless, QM computations of complete nucleic acid building blocks were proven to be a viable tool for supporting the interpretation of MD simulations and identifying potential pitfalls of the FF description.31−37 The spurious population of intercalated RNA tetramers in the simulations and their absence in experiments were tentatively ascribed to inherent deficiencies of the RNA FF parametrization. It was suggested that the interbase van der Waals interactions are very strong and that the solvent−solute and solvent−solvent interactions are imbalanced.21,38 Condon et al. also pointed out that strong base−phosphate interactions (formed by the amino groups of nucleobases) play a salient role in stabilizing intercalated structures, which were not reported experimentally.21 The work by Bergonzo and Cheatham24 indicated that the problems in describing tetranucleotides by MD simulations may also be caused by the imbalanced description of explicit hydration, that is, the water model or coupling between the water model and the RNA FF. In this work, we analyze the possible origins of the inaccuracies in simulations of RNA tetranucleotides by applying density functional theory (DFT) calculations with Grimme’s D3 dispersion correction39,40 to a series of diverse conformers of r(CCCC), r(GACC), and r(UUUU). We perform QM and molecular mechanics (MM) (AMBER bsc0χOL311,13,41,42) optimizations using implicit solvation models; the COSMO43 model for QM and the Poisson−Boltzmann (PB) for the MM computations. Despite displaying some nonequivalences between the MM and QM potential energy surfaces, we do not identify any major weakness of the RNA FF that would explain the spurious simulation results in a straightforward manner. This contrasts with our earlier calculations on guanine quadruplex DNA stems, where we noticed clear errors and attributed them to the intrinsic performance of the DNA FF.44 For the RNA tetranucleotides, we show that the QM and MM

and r(CAAU). They persist from hundreds of nanoseconds up to a couple of microseconds and are highly populated in qualitatively or nearly quantitatively converged temperature Treplica exchange molecular dynamics (T-REMD) and Hamiltonian-REMD simulations.19−21 Nevertheless, no NOE crosspeaks between nucleobases that correspond to these conformers have been reported in any of the NMR experiments published so far.17,19,21 In fact, the NMR data show that the above-listed tetranucleotides populate A-form helices in solution, whereas MD simulations show a rich spectrum of other structures. Even though the intercalated structures are less populated in r(UUUU) simulations, a large number of interbase NOEs were computationally predicted by MD simulations and not reported in NMR experiments, involving several different RNA FFs.21 In other words, the simulations erroneously predicted a much higher degree of stacking in the r(UUUU) tetramer than observed experimentally.21 The unstacked structure of poly(U) was also confirmed by small-angle X-ray scattering measurements.23 Our inability to correctly describe even such short RNA fragments via MD is obviously frustrating and sparked intense research efforts. Recently, Bergonzo and Cheatham24 have showed that the use of the presumably more accurate OPC explicit water model,25 and modifications to the phosphate oxygen parameters,15 yielded a reasonable agreement between MD simulations of r(GACC) and NMR experiments. However, the same procedure applied to r(CCCC) only partially alleviated the errors observed in the MD simulations, and the computations failed to correctly reproduce the experimental data.24 Modern quantum-mechanical (QM) computations provide a much more accurate description of potential energy surfaces compared to the available RNA FFs.10 Unfortunately, the applicability of accurate QM methods is limited to single-point calculations and geometry optimizations of moderately sized biomolecules up to a few hundred atoms.10,26−28 QM singlepoint energies suffer from the lack of conformational sampling B

DOI: 10.1021/acs.jpcb.6b07551 J. Phys. Chem. B XXXX, XXX, XXX−XXX

Article

The Journal of Physical Chemistry B

energy computations on equivalent QM and MM structures as possible. The number of affordable structures is determined by the number of structures that can be calculated by the QM method. QM Computations. Structure optimizations were performed with Turbomole 6.360 and the meta-GGA TPSS functional61 with the D339 dispersion correction using the Becke−Johnson damping variant.40 This level of theory is further denoted as TPSS-D3. The Becke−Johnson damping of D3 is applied for all DFT-D3 calculations. The def2-TZVP all electron basis set was used in most of the DFT calculations, unless specified otherwise.62 The TPSS-D3/def2-TZVP level was also employed in single-point energy calculations on top of the MM-optimized structures (the so-called cross-examinations described in the Results and Discussion section). To estimate the reliability of the TPSS-D3 calculations, we computed singlepoint energies using the double-hybrid functional PWPB95D363,64 and the def2-QZVPP(-g,-f) all electron basis set available in ORCA (V. 3.0.1),65 for all r(CCCC) conformers. Additional single-point calculations with the TPSS-gCP-D3/ def2-SVP (including the geometrical counterpoise correction gCP66), MP2/cc-pVDZ, and HF-3c67 methods as well as with the meta-hybrid functionals PW6B95-D364,68 and M06-2X69 were performed with the ORCA program.65 Turbomole 7.0 was used for the PBEh-3c70 calculations that employed the def2mSVP70 as provided by Turbomole. Single-point calculations using the semiempirical PM6-D3H+ method were carried out using Mopac (version 12)71 and a modified dftd339 program that implements the D3H+72 correction. The DFT computations were expedited by using the resolution of identity (RI-J) approximation73 for the Coulomb part of the two-electron integrals. The chain of sphere (COSX) approximation74 was applied for the Fock exchange part of all hybrid and doublehybrid functionals. The numerical quadrature grids m4 and GRID4 were used for all DFT calculations in Turbomole and ORCA programs, respectively.75 To approximate the solvation effects exerted by bulk water, we applied the COSMO43 implicit solvation model using a ε value of 78.4. Additional calculations and optimizations with the direct COSMO for real solvents (D-COSMO-RS)76−78 implicit solvation model were performed for the Amaj and intercalated conformers of r(CCCC). All of the optimizations were carried out using the open-source optimizer xopt, involving a rational function approach for the step size and approximate normal coordinates. 54,55 TPSS-D3/def2-TZVP (including the COSMO solvation model) was the primary level of theory used for QM optimizations and energy comparisons and is further denoted as QM-COSMO throughout the article.

conformational energies are in qualitative agreement for the intercalated and many other conformers. We obviously still cannot rule out that the unsatisfactory behavior of MD simulations of the RNA tetranucleotides is caused by the accumulation of small errors in the RNA FF that are hidden behind the overall noise in our computations. Nevertheless, our results may suggest that a major error source for the discrepancies between NMR experiment and MD simulation is the solvation treatment and, to a lesser degree, the nucleic acid FF itself. This is also consistent with the sensitivity of the MD results to the explicit solvation model noted above. In addition to using the large-scale QM calculations to assess the performance of the AMBER FF for tetranucleotides, we have also used the highly robust DFT-D3 computations to benchmark a representative spectrum of faster QM methods.



THEORETICAL METHODS MD Simulations of RNA Tetramers. The MD simulations were performed using AMBER12 and AMBER1445 software packages with established protocols.46−49 We used the NAB and xLEaP modules of AMBER12 to build all target RNA tetramers in A-form conformation and to prepare the corresponding topologies and starting coordinates. The parm99 version50,51 of the AMBER Cornell et al.41 FF with the bsc011 and χOL313,42 corrections was used in all simulations. Starting structures were immersed in a truncated octahedral box of TIP3P52 explicit water molecules with a minimal distance between the water box edge and the solute of 11 Å. Solvated systems were neutralized with Na+ cations with atomic radii of 1.868 Å and a 0.0028 kcal/mol well depth. For more details, see the Supporting Information (SI). To obtain the initial starting coordinates for the optimizations, the MD trajectory was averaged over a 50− 100 ps time window. Coordinates in the simulation were written each picosecond, meaning that the averaging included 50−100 snapshots per conformation. The averaging windows were manually checked for geometrical consistency to prevent accidental mixing of two conformations (i.e., there were no backbone flips or anti−syn transitions during the averaging periods). Clustering of the MD simulations was carried out using PTRAJ53 with the average linkage agglomerative algorithm. The variable sieve value ensured that the initial clustering pass contained 200 frames, whereas the critical distance ε value was set to 1.7 Å. An analogous clustering method was utilized by Henriksen and Cheatham.18 MM Optimizations. MM optimization and single-point energy calculations were carried out using the sander module of AMBER and the open-source xopt optimizer involving a rational function approach for the step size and approximate normal coordinates.54,55 The (linear) PB implicit solvation model56 and the generalized Born (GB) implicit solvation model of Hawkins, Cramer, and Truhlar57,58 using the Tsui et al. parameters59 were applied. In the case of the PB method, the grid spacing was set to 0.2 Å. Exact details of the inputs can be found in the SI. The trends of the MM-GB results were mostly identical with the MM-PB data. The corresponding energies and their discussion can be found in Table S1 in the SI. It is apparent that conformational sampling is one of the main limitations of our work. Our sampling capability is determined by the affordable QM sampling, and it would not help to have better sampling at the MM level (i.e., more MM computations), as it would not improve the QM versus MM comparison. One solution is to perform as many potential



RESULTS AND DISCUSSION MD Simulations and Conformer Description. We performed MD simulations of r(CCCC) (4.0 μs), r(GACC) (two independent simulations of 5.0 and 2.5 μs), and r(UUUU) (4.7 μs) using the standard A-form as the starting conformation. The main aim of our simulations was to prepare suitable starting structures for subsequent energy computations, representing the known conformers that were detected in the earlier studies.17−21 Thus, on the basis of our simulations, we prepared time-averaged MD structures of commonly observed conformers of r(CCCC) and r(GACC), which were also discussed in previous works.18−21 Base−phosphate interactions79 are a stabilizing factor in the r(CCCC) and r(GACC) conformers. To compare the results with an RNA tetramer that C

DOI: 10.1021/acs.jpcb.6b07551 J. Phys. Chem. B XXXX, XXX, XXX−XXX

Article

The Journal of Physical Chemistry B

Figure 2. COSMO TPSS-D3/def2-TZVP equilibrium geometries of the studied r(GACC) conformers. The red nucleotide is at the 3′-end.

terminal base guanine. The 1-3-stacked conformer of r(CCCC) is characterized by two independent 1-3 and 2-4 stacks. In the case of r(UUUU), we additionally considered one unstacked conformer (cf. unstacked in Figure 4) that was the second most populated substate in the MD simulations and would be consistent with the experimental suggestion of an unstacked poly(U) structure.21,23 We stress that the populations of different tetranucleotide conformers extracted from our MD simulations should be treated with caution. None of our MD simulations approached convergence and these results provide a rather qualitative picture. Converged MD simulations of r(CCCC) and r(GACC) tetramers were characterized before,20,24 whereas previous simulations reported for the r(UUUU) tetramer did not yield a converged ensemble.21 It is worth noting that the conformer populations discussed in Condon’s work21 were obtained with a clustering approach solely based on base stacking criteria. In contrast, we utilized a root-mean-square deviation-based clustering scheme to estimate populations and prepare the averaged starting structures for further optimizations. Despite these differences, our observations for the r(UUUU) tetramer are in good agreement with the conclusions drawn by Condon and co-workers.21 The MD simulations performed in this work are sufficient to obtain representative starting structures for further QM-COSMO and MM-PB optimizations of all important tetranucleotide conformers observed previously,20,21,24 which was the aim of our simulations. Structural Analysis of QM- and MM-Optimized Geometries of r(CCCC) and r(GACC) Tetramers. We have optimized the selected tetranucleotide conformers using the Amber FF with the PB continuum solvent model (denoted as MM-PB) and employing the QM-COSMO approach (see Figures 1 and 2). Subsequently, we performed a crossexamination of the methods, for which we use the notation QM/QM, PB/QM, QM/PB, and PB/PB (where QM/PB means QM-COSMO single-point energies computed on the MM-PB-optimized geometries, etc.). We also performed additional MM optimizations and cross-examinations employing the GB implicit solvation model, which were generally consistent with the MM-PB results and can be found in the SI

should be unable to form base−phosphate interactions, we have chosen five representative conformers of the r(UUUU) tetramer MD simulation that resemble some of the investigated r(CCCC) structures. As reported in previous works,18−21 the 31-4 intercalated structures of r(CCCC) and r(GACC) are very stable in the MD simulations and are populated over relatively long simulation periods. More precisely, in the case of our r(CCCC) tetramer simulation, the 3-1-4 intercalation was formed after 2250 ns and persisted till the end of the simulation (4000 ns) with a 10 ns break at 3550 ns. In the case of the r(GACC) tetramer, the 3-1-4 intercalated structure was formed at 4250 and 2250 ns in the first and second simulations, respectively, and remained stable until the simulations were terminated. The population of the 3-1-4 intercalated structure was significantly lower in the r(UUUU) simulation and only present in a time frame shorter than 100 ns. This suggests that the base−phosphate interactions (absent in r(UUUU)) play an important role in the stabilization of the intercalated structures during the FF MD simulations. The A-form structures reported experimentally for both r(CCCC) and r(GACC) were mostly observed in the initial parts of the simulations, as they were used as the starting conformations in the MD simulations. The optimized A-major conformers (referred to as Amaj from here on) are used as reference structures for energy comparisons in further sections of this article. Apart from the Amaj, Amin (similar to Amaj, but with the 3′-terminal residue in an inverted position), and 3-1-4 intercalated conformers, we also consider several other relevant structures that are populated in the simulations (see Figures 1 and 2). Among them are the 1-4-2-stack-A and -B structures of r(CCCC) characterized by a 1-4-2 type intercalation, additionally to the third cytosine being positioned outside of the stacked bases and lacking any direct base−base interactions. The letters A and B denote the two different variants of this specific intercalation. The 1-4-2-stack-B conformer is additionally stabilized by a base−phosphate interaction formed by the amino group of the intercalating cytosine. The inverted conformers of r(CCCC) and r(GACC) show stacking between the first and fourth bases in the sequence, that is, a 2-3-4-1 stacking pattern. The syn and anti annotations for r(GACC) conformers (Figure 2) refer to the configuration of the 5′D

DOI: 10.1021/acs.jpcb.6b07551 J. Phys. Chem. B XXXX, XXX, XXX−XXX

Article

The Journal of Physical Chemistry B

In contrast, the QM-COSMO optimization resulted in a change of the sugar pucker in the 3′-terminal ribose from C3′-endo to C2′-endo. This entailed the formation of an additional hydrogen bond between the 2-′OH group and the carbonyl group of the second cytosine in the sequence, along with a much more pronounced stacking of the second and fourth cytosines and the above-mentioned change in the β backbone angle of the second cytidine. It was suggested previously that base−phosphate and base stacking interactions are too strong in the available RNA FFs and that this might explain the significant population of intercalated and inverted conformers in MD simulations of tetranucleotides.21,23,84−86 Particularly pronounced base stacking arrangements are present in the inverted and intercalated structures discussed in this work. Base−phosphate interactions are formed by the amino groups of the 3′-terminal guanine bases in the inverted-syn and intercalated-syn conformers of r(GACC) and by the amino groups of cytosine bases in all the remaining cases. These strong hydrogen bonds connecting the nucleobases with the backbone can be found in almost any of the investigated tetranucleotide conformers with the exception of the 1-4-2-stack-A structure of r(CCCC). However, comparisons of MM-PB- and QM-COSMO-optimized geometries and their cross-evaluated energies (see the discussion below and Figure S2 in the SI) do not indicate any significant deviations that would unambiguously show the overestimation of base−phosphate and base stacking interactions suggested before.21,23,84−86 Isolated benchmarking of base−phosphate interactions also indicated a reliable MM performance in previous studies, although a failure of the empirical MM potentials for specific cases cannot be excluded.79 It is obviously still possible that a systematic imbalance within the MM description of either stacking or base−phosphate interactions does exist but is too subtle to be unambiguously detected by our calculations. The “energy resolution” of our calculations is limited by our ability to study only a few structures. Nevertheless, it looks more likely that the apparent overstabilization of these interactions may arise due to insufficient balance of solute−solute, solute−solvent, and solvent−solvent interactions, that is, that it can be related to the explicit hydration in the MD simulations. Significant Stability of Intercalated Structures on the Potential Energy Surface Observed in FF Computations Is Confirmed by the QM-COSMO Approach. Before introducing the conformational energy results, we would like to make a cautionary comment. Obviously, the most appropriate analysis to depict FF weaknesses is to compare QM/QM and PB/PB relative energies, that is, QM energies on QM-optimized structures and MM energies on equivalent MMoptimized structures. However, it is not so easy for fragments of nucleic acids, as the QM and MM optimizations often result in structures that are not sufficiently equivalent (vide supra), mainly due to a different set of intramolecular hydrogen bonds. This creates a noise in the energy comparisons, in particular as it can happen also in the reference A-RNA Amaj structure. The alternative option is to make both QM and MM energy computations consistently on either MM-optimized or QMoptimized structures. However, such calculations are then affected by the fact that one of the levels of the energy evaluation is done on a nonstationary point. These problems create a genuine noise in the computations which adds to the uncertainty stemming from the limited sampling and the use of a continuum solvent model. Nevertheless, some of our energy

to this article (see Table S1). In the literature, there is often no clear preference for either the GB or PB model for free-energy calculations.80−82 Although PB is the more physical model, the empirical fitting to experimental results of GB might be advantageous by implicitly correcting other, more general, deficiencies. According to our data in the SI, the MM-PB calculations seem to be a bit more reliable with respect to (wrt) the QM calculations, as we found a few outliers in the MM-GB calculations. Still, the general trend of MM-PB and MM-GB data remains fairly similar. Although both QM-COSMO and MM-PB optimizations generally preserve the most important structural features of the investigated RNA tetramers, we observe some differences between the QM-COSMO- and MM-PB-optimized geometries. In most of the conformers, the backbone conformation is retained during the optimization procedure performed with both methods. There are a few exceptions, which yield different values for the β backbone angle of the 3′-end nucleotide. This is observed in the case of the Amin and 1-4-2-stack-B structures of r(CCCC) and the intercalated-syn structure of r(GACC). In all three instances, MM-PB prefers values of the β angle close to 180°, whereas the QM-COSMO optimization converges to values around 240°. A similar discrepancy between the β angles of QM- and MM-optimized structures was observed in the case of DNA guanine quadruplex stem topologies and is ascribed to a higher flexibility of the β angles in QM computations.33 Note that the β dihedral potential of the AMBER nucleic acid FF has been recently reparametrized.83 The other structural features, such as hydrogen-bonding and stacking patterns, remain mostly unchanged during the optimization procedures (see Figure S2 in the SI). However, it is apparent that in several tetranucleotide conformers, the QM-COSMO optimization yields a sugar−phosphate interaction between the terminal 5-′OH hydroxyl group and one of the formally negatively charged backbone oxygen atoms, which is not formed during the MM optimizations. This is found for the Amin conformer of r(CCCC) and the Amaj, Amin, 1-2-4stacked, and inverted-syn conformers of r(GACC). Furthermore, in the case of the Amaj, 1-3-stacked, and inverted-A conformers of r(CCCC) and the Amin conformer of r(GACC), base−phosphate interactions formed during the QM optimizations are not present in the MM-PB-optimized structures. Such structural nonequivalences between the QM and MM structures can obviously affect the energy ordering of different conformers assessed by QM and MM. As we discussed in our preceding studies, these differences arise because the MM solute structure description is inherently less flexible than the more physical QM description. It does not mean that either approach is more accurate but that QM and MM sometimes respond differently to the continuum solvent.32−34 Note that the FF is parametrized to model solvent electrostatics a priori rather than gas-phase ones, whereas the DFT gas-phase Hamiltonian is modified by a self-consistent reaction field procedure. Our general experience suggests that QM-COSMO optimizations tend to yield more gas-phase-like structures compared to the FF for many nucleic acid systems. Missing explicit solute−solvent interactions seem to play a large role in the QM implicit solvation approach. We found some more pronounced structural differences between the QM-COSMO- and MM-PB-optimized geometries only in the case of the 1-4-2-stack-A conformer of r(CCCC) (see Figure S2 in the SI). The MM-PB optimization preserved the conformation obtained from the MD averaging procedure. E

DOI: 10.1021/acs.jpcb.6b07551 J. Phys. Chem. B XXXX, XXX, XXX−XXX

Article

The Journal of Physical Chemistry B

Figure 3. Relative energies in kcal/mol of different r(CCCC) and r(GACC) conformers compared to the energy of the Amaj substate. Wide, light gray, light blue, and light maroon bars correspond to the MM-PB-optimized geometries, whereas narrow, solid black, solid blue, and solid maroon bars correspond to the QM-COSMO-optimized structures. Hence, the QM/PB dataset shows single-point QM-COSMO energies computed on MM-PB-otpimized geometries, etc.

results and trends discussed below are sufficiently clear and are most likely not affected by these uncertainties. Some other trends, however, may be less certain, and we try to differentiate these two cases (real trends vs random noise energy fluctuations) in the text. In agreement with the explicit solvent MD results, all PB/ QM and most PB/PB energies of the intercalated structures are lower when compared to the Amaj form (see Figure 3a,b). The intercalated-syn conformer of r(GACC) is an exception and is higher in energy than Amaj for QM/QM, QM/PB and PB/PB computations (see Figure 3b). The intercalated conformer of r(CCCC) has the lowest MM-PB energy compared to all remaining structures (lower in energy by over 10 kcal/mol than the Amaj substate). This conformer is characterized by a 3-1-4 intercalation and a base−phosphate interaction formed by the intercalating cytosine. The 1-4-2-stack-A and -B structures of r(CCCC) (with 1-4-2 intercalation) have higher energies than the Amaj substate, in agreement with the MM-PB optimization results. In the case of r(GACC), the MM-PB-optimized intercalated-anti conformer is lower in energy than the Amaj structure by ≈3.0 kcal/mol, in contrast to the intercalated-syn conformer of r(GACC), which lies much higher in energy. Interestingly, all the MM-optimized inverted conformers (2-34-1 stacking pattern) of both r(CCCC) and r(GACC) are also relatively low in energy compared to the remaining conformers. QM-COSMO computations reveal a qualitative agreement with the MM-PB energies in the vast majority of the investigated structures. The QM/QM results suggest that the intercalated substate of r(CCCC) should be even lower in energy than indicated by the FF. The same applies to the intercalated-anti structure of r(GACC) and all the inverted conformers of the r(CCCC) and r(GACC) tetramers. This observation is particularly important because neither intercalated nor inverted structures were found by the NMR experiments, as evidenced by the absence of the corresponding

interbase cross-peaks in the NOESY spectra. In general, both r(CCCC) and r(GACC) tetramers exhibit very similar trends in the relative energies of specific types of conformers (compare Figure 3a,b). The finding that in our calculations, the experimentally unobserved structures have typically lower energies than the Aform conformation is not so surprising per se. The calculations derive single-structure continuum solvent potential energies and not explicit solvent ensemble free energies (see also below and for example ref 73) and cannot be directly compared with the experimental stabilities.87 The really significant result is that the QM calculations do not indicate any systematic improvement (shift) of the conformer energies towards the experimental observations compared to the MM data. Note that in a preceding study of DNA quadruplex conformer energies, the difference of QM and MM potential energies could be straightforwardly added as a correction to the relative stabilities (free energies) derived from MM-based simulations.33 The procedure resulted in a decisive shift of the computed free energies, yielding a better agreement with the experiments. 33 This is not the case in the present tetranucleotide calculations. To additionally test the quality (and reproducibility) of the averaged starting structures used for geometry optimizations, we manually chose 6 snapshots from the MD simulations, each representing the Amaj and intercalated substates of the r(CCCC) tetramer (12 snapshots altogether) (see Table S4 in the SI). MM-PB optimizations of these snapshots resulted in only marginal structural and energetic (below 2.5 kcal/mol) deviations from the previously optimized geometries. Also QMCOSMO single point energies computed on the top of the MM-PB-optimized snapshots led to similar energetic trends with energy deviations below 3.0 kcal/mol. Figure S1 in the SI presents further PB single-point calculations along parts of the trajectory containing the Amaj and intercalated conformers, F

DOI: 10.1021/acs.jpcb.6b07551 J. Phys. Chem. B XXXX, XXX, XXX−XXX

Article

The Journal of Physical Chemistry B

approximate nature and are in general unable to reproduce explicit solvent−solute interactions.89 Such explicit solvation effects might have a large contribution to the relative free energies of the different conformers, particularly because they differ in the solvent accessible area and the amount of potential hydrogen bonds that can be created with the surrounding solvent molecules. It is potentially possible to obtain a QM-based thermodynamic picture of different conformers in the single structure approximation, using rigid rotor and harmonic oscillator approximations. Such calculations would be, however, very demanding and lie outside the scope of this project. They would still not resolve the issue of full sampling, as their singlestructure picture is only suitable for small or rigid systems, and not for flexible biomolecules where MD-like sampling is essential. Going beyond the single-structure picture is currently not feasible with DFT methods, as the major hurdle of conformational sampling is difficult to overcome, especially for the highly flexible Amaj conformation. Regardless of the above-discussed shortcomings, the QMCOSMO geometries and energies provide valuable information about the FF behavior. In particular, they indicate that the low relative energy of the intercalated and inverted structures observed in the MM-PB results is the correct physical trend for the scrutinized model, that is, tetranucleotides in implicit solvent, assuming that the physics provided by QM-COSMO (i.e., TPSS-D3) is correct. In other words, considering the trend over all the data, we found no systematic shift of the QM potential energies compared to the MM potential energies that would explain the populations of spurious conformers in the MD simulation. This is the most important result. Multiple benchmark studies,64,90 applications, and also blind tests for experimental complexation energies91 support the assumption that TPSS-D3 is capable of correctly describing the delicate balance of intermolecular forces in a tetranucleotide system, that is, its various H-bonds, stacks, and backbone conformations. We assume to have at least a semiquantitatively correct description of the energetics on the potential energy surface. Furthermore, as the relative energy differences between the intercalated (or inverted) and Amaj conformers are large, even an error of 20% in the relative energies would not change the qualitative picture. The simple FF explicit water models commonly employed in MD simulations pay for efficiency with incorrect electrostatics, often wrong water dipole moment, and hindered solvent-shell structure dynamics (rigid water molecules).92 Additional errors from the employed fixed point-charge approximation in the nucleic acid FF, that is, no polarization and integer charge ion parameters, further add to the problem. The significant impact of the explicit solvation model on the relative populations of Amaj and intercalated conformers was recently confirmed for the r(GACC) tetramer.24 In particular, Bergonzo and Cheatham24 showed that the more accurate OPC water model25 correctly reproduces the relative populations of different r(GACC) conformers obtained from NMR experiments. However, the lack of quantitative consistency with experiments in the case of r(CCCC) suggests that the error sources are not yet fully understood,24 or not fully corrected by the OPC model. The OPC model is still a simple pair-additive potential, and remains far from a truly physical water model. It, nevertheless, brings a shift in the desired direction. It is obvious that one of the main uncertainties of our comparison between the QM and MM descriptions of the RNA

showing a clear energy separation of these substates, both in solution and in the gas phase. Therefore, we infer that the optimized geometries considered in this study should be sufficiently representative of the main conformers of the considered RNA tetranucleotides, and the corresponding energies should in general represent authentic trends. In other words, our computational protocol appears to be quite robust. The 1-4-2-stack-A structure reveals perhaps the largest discrepancies in the relative QM-COSMO and MM-PB results. Both PB/QM and PB/PB energies indicate that the 1-4-2stack-A structures are lower in energy than the Amaj conformer by approximately 4−5 kcal/mol, whereas the opposite trend is observed for the QM/PB and QM/QM estimates (see Figure 3a). Thus, these quite significant deviations between the QMCOSMO and MM-PB energies are observed for the geometries optimized with both methods. The extent of these deviations cannot be simply ascribed to different solvation model approximations. Another, similar case is the inverted-A structure of r(CCCC), but the deviations are of smaller magnitude. Assuming that the QM-COSMO approach provides the “correct” (i.e., reference) potential energy surface of the tetranucleotide, we suggest that the FF suffers from larger errors for these particular conformers. Nevertheless, it is very difficult to assign this observation to just one specific issue in the parametrization, for example, a specific interaction or structural aspect. As described above, a number of structural changes occur during the QM-COSMO optimization of the 14-2-stack-A conformer and the difference of 10 kcal/mol (because of sign change) between the QM-COSMO and MMPB energies might be the outcome of error accumulation of several FF terms, additionally to uncertainties from the different implicit solvation treatments. Nevertheless, we suggest that separate detailed discussions of single specific structures are already below the accuracy limits of our computations, in contrast to trends (or lack of trends) seen over all structures and all performed energy computations. In addition, it is obvious that, due to the richness of RNA structures, it is not guaranteed that improvement of one specific structure would not cause worsening of FF performance in simulations of other systems.88 Insufficient Description of Solvent−Solute Interactions Might Explain the Overstabilization of Intercalated Structures in MD Simulations. At first sight, the QM/ QM and QM/PB results (on both QM-COSMO and MM-PB structures) seem to be contradictory with the experimental findings, as the computations often indicate stabilization of the unobserved conformers compared to the observed canonical Aform. There are several possible explanations of this contradiction. First of all, we stress (as already noted above) that electronic energy differences (ΔE) should not be directly compared to experimental results, which reflect Gibbs freeenergy differences (ΔG). Although the general energetic trends are often consistent for ΔE and ΔG values, some deviations between these different physical quantities are not uncommon. Our QM computations lack any kind of conformational sampling and we do not estimate the molecular partition functions, both of which could change the presented picture, considering the fact that we look at several differently shaped conformers. In fact, the calculation of a solution-phase ΔG would be very challenging considering the system size, negative charge, and significant flexibility of the studied RNA molecules. It is important to note that implicit solvation models have an G

DOI: 10.1021/acs.jpcb.6b07551 J. Phys. Chem. B XXXX, XXX, XXX−XXX

Article

The Journal of Physical Chemistry B

Figure 4. COSMO TPSS-D3/def2-TZVP equilibrium geometries of the studied r(UUUU) conformers. The red nucleotide is at the 3′-end.

molecules is the use of continuum solvent models. At first sight, this limitation could be lifted by using gas-phase conformational energies. However, basic physical considerations show that this approach cannot be used. The tetranucleotides are 3-fold negatively charged and their electronic structure is physically unstable in the gas phase. One can safely assume that at most 1 excess electron would be bound to the molecule. The resulting DFT gas-phase energies would be unreliable and unphysical. In consequence, QM computations of such negatively charged systems must be performed including continuum solvation models, such as COSMO. Actually, even when disregarding the electronic structure distortions, the sheer effect of the three negative charges on the long-range electrostatic forces would make any conformational searches in the gas phase biochemically irrelevant. Alternative tricks such as the introduction of counterions or saturating the phosphate groups would modify the system and lead to other side-effects, which preclude any meaningful gas-phase calculations. As we pointed out elsewhere, the largest nucleic acid systems that can be justifiably calculated in the gas phase are dinucleotides with one negatively charged phosphate group.37 To obtain some additional insight, we performed TPSS-D3 geometry optimizations using the recently proposed DCOSMO-RS implicit solvation model.76−78 D-COSMO-RS (direct COSMO for real solvents) estimates explicit solvent− solute interactions such as hydrogen bonds through locally interacting cavity surface segments between solute and solvent. It provides a much more physical description of solvation effects than the regular COSMO implicit solvent model, by partially mimicking explicit solvents.76−78 The calculation was performed on the key Amaj and intercalated conformers of the r(CCCC) tetranucleotide. Interestingly, the corresponding ΔE between the Amaj and intercalated conformers dropped to 5.2 kcal/mol from the value of 14.0 kcal/mol obtained in the regular calculations based on the COSMO model. A very similar energy shift was estimated from single-point energy calculations with D-COSMO-RS, performed for the QMCOSMO-optimized geometries (ΔE = 7.2 kcal/mol). These calculations indicate that adding terms that approximate explicit solvent effects may result in visible shifts of relative conformer energies (note that for a comparison with the MM-PB model, the standard COSMO model is more equivalent). From our preliminary experience, optimizations of larger nucleic acid building blocks with D-COSMO-RS are clearly more complicated due to lower numerical stability and hindered convergence when compared to COSMO. Our D-COSMO-RS result is another indirect indication that the explicit solvation effects, and consequently, the quality of explicit water models in MD simulations, may be critical for a balanced description of different conformers of the RNA tetranucleotides.

Optimizations of r(UUUU) Conformers Serve as a Good Comparison for the r(CCCC) and r(GACC) Results. The r(UUUU) tetranucleotide should not form any base− phosphate interactions that could stabilize the intercalated and inverted conformers. Our MD simulations of r(UUUU) indicate that the 3-1-4 intercalations are indeed among the least populated conformers (see the SI). This observation is in agreement with the previous results of Condon et al.,21 which also revealed low population of intercalated structures and dominant contribution from random-coil (unstacked) molecular arrangements. Therefore, r(UUUU) serves as a valuable comparison for the r(CCCC) and r(GACC) tetramers. Here, we considered five relevant conformers of r(UUUU) (see Figure 4), some of which also correspond to the most prominent r(CCCC) conformers discussed above. The largest structural differences between the minimum energy geometries obtained with the QM-COSMO and MMPB approaches are found for the Amaj conformer. The MM-PB optimization preserves all the key features of the time-averaged Amaj MD structure of r(UUUU) taken from our MD simulation. In contrast, the QM-COSMO optimization results in the redistribution of several intramolecular hydrogen bonds in the intended reference structure. In particular, the 2′-OH groups of residues 2 and 3 are rotated in the QM-COSMO optimization, yielding hydrogen bonds with the O4′ oxygen atoms of neighboring sugar rings. In the case of the MM-PB optimized and the MD averaged structures, these 2′-OH groups are involved in hydrogen bonds with O5′ oxygen atoms of the sugar−phosphate backbone. A different hydrogen-bonding pattern in the QM-COSMO optimized geometry also influences other structural parameters, for example, the degree of stacking. These structural discrepancies are reflected in the relative energies of the considered r(UUUU) conformers. Because the energies of all conformers are compared to the energy of Amaj, all of the QM/PB energies of the remaining r(UUUU) conformers are systematically shifted by approximately −8 kcal/mol (see Figure 5) compared to the PB/PB data. In other words, the MM-PB-optimized structure of the Amaj conformer is significantly higher in energy in the QM/PB evaluation wrt the remaining structures. According to both QM-COSMO and MM-PB results, the intercalated-A structure (3-1-4 intercalation) is the lowest energy conformer of the r(UUUU) tetranucleotide (see Figure 5). Thus, even when no base−phosphate interactions are involved, the intercalated states can still be much lower in potential energy when compared to other conformers. The unstacked conformer is the only structure considered here that is consistent with the NOESY spectra recorded for the r(UUUU) tetramer,21 that is, the optimizations do not predict any interbase NOEs below 5.0 Å and the bases are not involved in any stacking interactions. H

DOI: 10.1021/acs.jpcb.6b07551 J. Phys. Chem. B XXXX, XXX, XXX−XXX

Article

The Journal of Physical Chemistry B

long-range dispersion description is crucial for systems like RNA tetramers, the M06-2X results are in very good agreement with the DFT-D3 reference values (see Figure 6a). Similar behavior was reported for the Sarcin−Ricin internal loop RNA motif.32 Our conformer energies seem to be less reliant on a correct long-range dispersion description for DFT. In contrast, unwarily used second-order Møller Plesset perturbation theory (MP2) with small double-ζ basis sets often displays quite significant problems due to a rather small one-particle (atomic orbital) basis set.37,93 In our case, it overestimates the energy difference between the 1-3-stacked and Amaj conformers by approximately 10 kcal/mol, when compared to the PWPB95D3 reference calculations. Thus, it is apparent that reducing costs by applying a very small basis set in MP2 calculations provides unreliable estimates of energy differences. To search for a low-cost quantum chemical method that could be successfully applied to larger RNA fragments and oligonucleotides, we compared the PM6-D3H+, HF-3c, PBEh3c, TPSS-gCP-D3/def2-SVP, and PW6B95-gCP-D3/def2-SVP approaches with the reference PWPB95-D3 results (see Figure 6b). PM6-D3H+ is a semiempirical method including the D3 (zero-damping) dispersion correction and a simple H-bond correction to account for the shortcomings of PM6 in describing hydrogen bonds. HF-3c employs Hartree−Fock with an almost minimal basis set and three correction terms that enables efficient calculations of large systems, and it is usually computationally faster than the respective pure DFT calculations with a reasonable basis set. The recently developed PBEh-3c method is a low-cost hybrid DFT composite scheme similar to HF-3c but which utilizes a double-ζ polarized basis set and was shown to be a promising candidate for large-scale QM computations.70 The two latter methods include the TPSS and PW6B95 functionals with a modest double-ζ basis set (def2-SVP) and the geometrical counterpoise correction for basis set superposition error (gCP).66 The PBEh-3c and DFTgCP-D3 calculations qualitatively reproduce the relative energy differences between different r(CCCC) conformers computed with larger basis sets, and in some cases, even quantitative consistency was reached. Therefore, TPSS-D3 and PW6B95D3 computations involving a small basis set and the gCP correction as well as the PBEh-3c approach could successfully serve for performing QM computations on larger RNA fragments, or for a preliminary and efficient assessment of moderate-size systems. In most of the cases, PM6-D3H+ and HF-3c also yield qualitatively consistent results with the higher level DFT-D3 estimates. However, in several cases, the PM6D3H+ and HF-3c results substantially differ from the reference values and overstabilize or overdestabilize some of the r(CCCC) conformers wrt the others (e.g., inverted-A, 1-4-2stack-B, etc.). Thus, special care needs to be taken while applying these two methods to nucleic acid building blocks. The statistical analysis of the employed methods based on minimum, maximum, and mean absolute deviations from the reference PWPB95 value provides additional insight into the applicability of these approaches to nucleic acid building blocks (see Table 1). In particular, HF-3c and PM6-D3H+ calculations resulted in the largest maximum absolute deviations (exceeding 10 kcal/mol) from all the investigated low-cost methods. Relatively low maximum and mean absolute deviations were found for the PW6B95-gCP-D3 and PBEh-3c approaches, and the obtained energies are of comparable accuracy to the TPSSD3/def2-TZVP approach utilized for geometry optimizations in this work. The mean and maximum absolute deviations are

Figure 5. Relative energies in kcal/mol of different r(UUUU) conformers compared to the energy of the Amaj substate. Wide, light gray, and light green bars correspond to the MM-PB-optimized geometries, whereas narrow, solid black, and solid green bars correspond to the QM-COSMO-optimized structures. Hence, the QM/PB dataset shows single-point QM-COSMO energies computed on MM-PB optimized geometries, etc.

Comparison of Different Methodologies: DFT, PM6D3H, HF-3c, and PBEh-3c. Besides trying to find a potential weakness of the simulation FF, we have used our optimized tetranucleotide structures to benchmark several different QM methods that are suitable for calculations of larger nucleic acid systems. This task is easier, as we can safely assume that calculating energy by one QM method using the geometry obtained using another QM method is more consistent than interchanging QM and MM approaches for structure/energy computations. Furthermore, the calculations are done within the same solvation model, that is, COSMO. In Figure 6, we present a comparison of single-point energy calculations performed on top of the QM-COSMO optimized geometries of the r(CCCC) and r(GACC) tetramers. One goal was to validate the quality of the QM-COSMO computations carried out at the TPSS-D3/def2-TZVP level of theory. For this purpose, we performed single-point energy calculations using the double-hybrid PWPB95-D3 functional and the def2QZVPP(-g,-f) basis set. Additionally, we employed the metahybrid PW6B95-D3, which was proven to be very reliable in estimating numerous properties in extended benchmark calculations.64 All three methods are semiquantitatively consistent with each other and the relative energy differences are, in most cases, smaller than 2 kcal/mol (Figure 6a). Therefore, as described in previous works, we conclude that TPSS-D3 is an accurate and robust method for computations on RNA oligonucleotides. We additionally examined two QM approaches widely applied to various molecular systems. The hybrid functional M06-2X (with the def2-TZVP basis set) is known for its reliability in providing accurate thermochemistry and energetic barriers, but it is incapable of accounting for London dispersion interactions in the long-range regime due to the wrong asymptotic decay of its potential.90 Even though a correct I

DOI: 10.1021/acs.jpcb.6b07551 J. Phys. Chem. B XXXX, XXX, XXX−XXX

Article

The Journal of Physical Chemistry B

Figure 6. Comparison of conformer energies wrt the Amaj conformer obtained with different QM methods for the r(CCCC) and r(GACC) tetramers. (a) DFT calculations with a large basis set and MP2; (b) low-cost quantum chemical approaches. In both cases, the results are compared to reference values calculated using the PWPB95-D3/def2-QZVPP(-g,-f) method. The abbreviations QZ, TZ, and DZ denote the basis sets def2QZVP, def2-TZVP, and def2-SVP, respectively, except in the case of MP2 where DZ stands for cc-pVDZ.

PW6B95-D3 functional along with a polarized triple-ζ basis set yields generally insignificant deviations from our reference QM values. Thus, this level of theory might serve as an alternative reference method for larger nucleic acid fragments, for which more accurate QM methods (e.g., double-hybrid functionals) are not applicable. Even though M06-2X provides conformational energies of equivalent accuracy, we stress that it might not be a safe choice for larger systems due to its inability in describing long-range London dispersion interactions, especially when one is interested in interaction energies where less error compensation might occur.

Table 1. Minimum, Maximum, and Mean Absolute Deviations (kcal/mol) from the Reference over All 15 r(GACC) and r(CCCC) Conformersa method

min. abs. deviation

max. abs. deviation

mean abs. deviation

PW6B95-D3/def2-TZVP M06-2X/def2-TZVP TPSS-D3/def2-TZVP TPSS-gCP-D3/def2-SVP PW6B95-gCP-D3/def2-SVP PBEh-3c HF-3c PM6-D3H+ MP2/cc-pVDZ

0.0 0.3 0.1 0.2 0.3 0.1 0.2 1.2 0.9

1.9 1.7 4.1 6.3 4.2 4.3 11.6 13.5 9.7

1.0 1.0 2.1 2.6 1.6 1.7 4.6 5.4 3.2



CONCLUSIONS We have performed a series of QM-COSMO and MM-PB optimizations to evaluate the relative energies of the most populated substates in MD simulations of r(CCCC), r(GACC), and r(UUUU) tetranucleotides. The work had two goals. Firstly, we wanted to find out whether there are evident intrinsic RNA FF errors that would explain the poor performance of tetranucleotide simulations wrt the benchmark NMR data. Secondly, we used the data to test the performance of a series of low-cost QM methods.

a

Reference method: PWPB95-D3/def2-QZVPP(-g,-f); deviations wrt the Amaj substate.

somewhat larger for the TPSS-gCP-D3/def2-SVP method, suggesting that more care needs to be taken while employing TPSS-gCP-D3/def2-SVP as the primary QM method for systems of similar molecular composition (i.e., additional benchmark calculations might be necessary). The meta-hybrid J

DOI: 10.1021/acs.jpcb.6b07551 J. Phys. Chem. B XXXX, XXX, XXX−XXX

Article

The Journal of Physical Chemistry B

imate explicit water−solute interactions. These results also suggest that solvation effects have a significant influence on the relative conformer energies of the r(CCCC) tetranucleotide. In particular, the ΔE between the Amaj and intercalated conformers is reduced from 14.0 to 5.2 kcal/mol when DCOSMO-RS is used, which further supports our hypothesis. Benchmark calculations for the low-cost quantum chemical methods reveal a rather satisfactory performance of several approaches. In particular, the PBEh-3c and DFT-gCP-D3 methods exhibit very good performance in comparison to the reference double-hybrid functional calculations with a quadruple-ζ basis set. These methods can potentially be used for QM calculations of larger RNA building blocks. In contrast, the semiempirical PM6-D3H+ method and the HF-3c approach might lead to large deviations in relative conformer energies.

In general, both QM-COSMO and MM-PB approaches provide qualitatively consistent energies and geometries for most of the investigated conformers. We have found several exceptions for which the FF does not predict some specific structural features and relative conformer energies indicated by the QM methodology. It appears that these differences may be caused by error accumulation from various sources. However, as discussed in detail, a quantitative comparison of the MM and QM descriptions is hampered by a non-equivalence between the corresponding QM- and MM-optimized structures, despite starting from identical configurations. Namely, the QMCOSMO approach shows a considerably larger propensity to form intramolecular H-bonds than MM-PB. Other limitations are related to the usage of a continuum solvent model and the differences between the continuum solvent models applied in the MM and QM calculations. All these factors cause substantial noise in the energy data. This could be one of the reasons why the results do not provide any clear indication of intrinsic FF terms that could be easily improved. Consistently with the FF, the QM-COSMO approach predicts most of the intercalated and inverted conformers as energetically preferred to the A-forms on the potential energy surfaces of the r(GACC) and r(CCCC) tetramers. QMCOSMO results also found the 3-1-4 intercalations to be energetically favorable in the case of the r(UUUU) tetramer, even though r(UUUU) does not form any amino-to-phosphate hydrogen bonds. Therefore, we suggest that the stability of intercalated substates is the correct physical trend in the parametrization of the AMBER RNA FF within the approximation of the continuum solvent. We see no indication that the differences between the QM and MM potential energy surfaces could explain the differences between the MD simulation free-energy ensembles and the solution experiments. The disagreement of the MD simulations with the NMR experiments is most likely caused by other factors and not by an intrinsic (intramolecular) bias of the RNA FF to form these structures. Even though there were previous indications that the AMBER FF overestimates the base−phosphate and stacking interactions,21,23,84−86 the comparison of QM-COSMO- and MM-PB-derived geometries and energies does not imply the same conclusions for the studied RNA tetramers. We obviously cannot rule out that we failed to disclose such a systematic trend due to the overall noise of the energy computations. However, our results rather support the suggestion that the overpopulation of the intercalated and inverted structures in MD simulations is predominantly caused by imbalanced water−solute and water−water interactions, as also proposed by Condon et al.21 This observation is in good agreement with the recent findings of Bergonzo and Cheathem,20 which indicate that the more accurate OPC water model 25 significantly improves the performance of MD simulations of tetranucleotides. Obviously, the implicit solvation models (COSMO, PB) used in our static QM and MM computations are incapable of treating effects from specific water−solute interactions. Therefore, our calculations can be used to evaluate only the intrinsic parametrization of the RNA FF, and we cannot tell anything about errors stemming from the specific MM explicit water models. It is obviously possible that a correct balance of the FF for RNA tetranucleotides is not achievable within the pair-additive approximation. We also performed tentative TPSS-D3 calculations using the more accurate DCOSMO-RS implicit solvation model, which includes approx-



ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jpcb.6b07551. Geometries of r(CCCC), r(GACC), and r(UUU) (ZIP) Numerical values for all graphs, GB results, PB energy plots for the Amaj and intercalated conformers along the trajectory, and coordinates of all investigated structures (PDF)



AUTHOR INFORMATION

Corresponding Authors

*E-mail: [email protected] (H.K.). *E-mail: [email protected] (J.Š.). Phone: +420 541 517 133. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS Financial support from Grant P305/12/G034 from the Czech Science Foundation is gratefully acknowledged. J.S. acknowledges support by Praemium Academiae.



REFERENCES

(1) Lee, R. C.; Feinbaum, R. L.; Ambros, V. The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14. Cell 1993, 75, 843−854. (2) Ruvkun, G. Glimpses of a Tiny RNA World. Science 2001, 294, 797−799. (3) Kruger, K.; Grabowski, P. J.; Zaug, A. J.; Sands, J.; Gottschling, D. E.; Cech, T. R. Self-splicing RNA: Autoexcision and autocyclization of the ribosomal RNA intervening sequence of tetrahymena. Cell 1982, 31, 147−157. (4) Guerrier-Takada, C.; Gardiner, K.; Marsh, T.; Pace, N.; Altman, S. The RNA moiety of ribonuclease P is the catalytic subunit of the enzyme. Cell 1983, 35, 849−857. (5) Kole, R.; Krainer, A. R.; Altman, S. RNA therapeutics: beyond RNA interference and antisense oligonucleotides. Nat. Rev. Drug Discovery 2012, 11, 125−140. (6) Good, P. D.; Krikos, A. J.; Li, S. X.; Bertrand, E.; Lee, N. S.; Giver, L.; Ellington, A.; Zaia, J. A.; Rossi, J. J.; Engelke, D. R. Expression of small, therapeutic RNAs in human cell nuclei. Gene Ther. 1997, 4, 45−54. (7) Šponer, J.; Banás,̌ P.; Jurečka, P.; Zgarbová, M.; Kührová, P.; Havrila, M.; Krepl, M.; Stadlbauer, P.; Otyepka, M. Molecular Dynamics Simulations of Nucleic Acids. From Tetranucleotides to the Ribosome. J. Phys. Chem. Lett. 2014, 5, 1771−1782. K

DOI: 10.1021/acs.jpcb.6b07551 J. Phys. Chem. B XXXX, XXX, XXX−XXX

Article

The Journal of Physical Chemistry B (8) Šponer, J.; Mládek, A.; Šponer, J. E.; Svozil, D.; Zgarbová, M.; Banás,̌ P.; Jurečka, P.; Otyepka, M. The DNA and RNA sugarphosphate backbone emerges as the key player. An overview of quantum-chemical, structural biology and simulation studies. Phys. Chem. Chem. Phys. 2012, 14, 15257−15277. (9) Leontis, N. B.; Stombaugh, J.; Westhof, E. The non-WatsonCrick base pairs and their associated isostericity matrices. Nucleic Acids Res. 2002, 30, 3497−3531. (10) Šponer, J.; Šponer, J. E.; Petrov, A. I.; Leontis, N. B. Quantum Chemical Studies of Nucleic Acids: Can We Construct a Bridge to the RNA Structural Biology and Bioinformatics Communities? J. Phys. Chem. B 2010, 114, 15723−15741. (11) Pérez, A.; Marchán, I.; Svozil, D.; Sponer, J.; Cheatham, T. E., III; Laughton, C. A.; Orozco, M. Refinement of the AMBER Force Field for Nucleic Acids: Improving the Description of α/γ Conformers. Biophys. J. 2007, 92, 3817−3829. (12) Yildirim, I.; Stern, H. A.; Kennedy, S. D.; Tubbs, J. D.; Turner, D. H. Reparameterization of RNA χ Torsion Parameters for the AMBER Force Field and Comparison to NMR Spectra for Cytidine and Uridine. J. Chem. Theory Comput. 2010, 6, 1520−1531. (13) Zgarbová, M.; Otyepka, M.; Šponer, J.; Mládek, A.; Banás,̌ P.; Cheatham, T. E.; Jurečka, P. Refinement of the Cornell et al. Nucleic Acids Force Field Based on Reference Quantum Chemical Calculations of Glycosidic Torsion Profiles. J. Chem. Theory Comput. 2011, 7, 2886−2902. (14) Yildirim, I.; Kennedy, S. D.; Stern, H. A.; Hart, J. M.; Kierzek, R.; Turner, D. H. Revision of AMBER Torsional Parameters for RNA Improves Free Energy Predictions for Tetramer Duplexes with GC and iGiC Base Pairs. J. Chem. Theory Comput. 2012, 8, 172−181. (15) Steinbrecher, T.; Latzer, J.; Case, D. A. Revised AMBER Parameters for Bioorganic Phosphates. J. Chem. Theory Comput. 2012, 8, 4405−4412. (16) Denning, E. J.; Priyakumar, U. D.; Nilsson, L.; Mackerell, A. D. Impact of 2′-hydroxyl sampling on the conformational properties of RNA: Update of the CHARMM all-atom additive force field for RNA. J. Comput. Chem. 2011, 32, 1929−1943. (17) Yildirim, I.; Stern, H. A.; Tubbs, J. D.; Kennedy, S. D.; Turner, D. H. Benchmarking AMBER Force Fields for RNA: Comparisons to NMR Spectra for Single-Stranded r(GACC) Are Improved by Revised χ Torsions. J. Phys. Chem. B 2011, 115, 9261−9270. (18) Henriksen, N. M.; Roe, D. R.; Cheatham, T. E. Reliable Oligonucleotide Conformational Ensemble Generation in Explicit Solvent for Force Field Assessment Using Reservoir Replica Exchange Molecular Dynamics Simulations. J. Phys. Chem. B 2013, 117, 4014− 4027. (19) Tubbs, J. D.; Condon, D. E.; Kennedy, S. D.; Hauser, M.; Bevilacqua, P. C.; Turner, D. H. The Nuclear Magnetic Resonance of CCCC RNA Reveals a Right-Handed Helix, and Revised Parameters for AMBER Force Field Torsions Improve Structural Predictions from Molecular Dynamics. Biochemistry 2013, 52, 996−1010. (20) Bergonzo, C.; Henriksen, N. M.; Roe, D. R.; Swails, J. M.; Roitberg, A. E.; Cheatham, T. E. Multidimensional Replica Exchange Molecular Dynamics Yields a Converged Ensemble of an RNA Tetranucleotide. J. Chem. Theory Comput. 2014, 10, 492−499. (21) Condon, D. E.; Kennedy, S. D.; Mort, B. C.; Kierzek, R.; Yildirim, I.; Turner, D. H. Stacking in RNA: NMR of Four Tetramers Benchmark Molecular Dynamics. J. Chem. Theory Comput. 2015, 11, 2729−2742. (22) Schrodt, M. V.; Andrews, C. T.; Elcock, A. H. Large-Scale Analysis of 48 DNA and 48 RNA Tetranucleotides Studied by 1 μs Explicit-Solvent Molecular Dynamics Simulations. J. Chem. Theory Comput. 2015, 11, 5906−5917. (23) Chen, H.; Meisburger, S. P.; Pabit, S. A.; Sutton, J. L.; Webb, W. W.; Pollack, L. Ionic strength-dependent persistence lengths of singlestranded RNA and DNA. Proc. Natl. Acad. Sci. U.S.A. 2012, 109, 799− 804. (24) Bergonzo, C.; Cheatham, T. E., III. Improved Force Field Parameters Lead to a Better Description of RNA Structure. J. Chem. Theory Comput. 2015, 11, 3969−3972.

(25) Izadi, S.; Anandakrishnan, R.; Onufriev, A. V. Building Water Models: A Different Approach. J. Phys. Chem. Lett. 2014, 5, 3863− 3871. (26) Hu, L.; Söderhjelm, P.; Ryde, U. Accurate reaction energies in proteins obtained by combining QM/MM and large QM calculations. J. Chem. Theory Comput. 2013, 9, 640−649. (27) Holroyd, L. F.; van Mourik, T. Tyrosine-glycine revisited: Resolving the discrepancy between theory and experiment. Chem. Phys. Lett. 2015, 621, 124−128. (28) Toroz, D.; van Mourik, T. Structure of the gas-phase glycine tripeptide. Phys. Chem. Chem. Phys. 2010, 12, 3463−3473. (29) Lenz, S. A. P.; Kellie, J. L.; Wetmore, S. D. Glycosidic bond cleavage in deoxynucleotides: Effects of solvent and the DNA phosphate backbone in the computational model. J. Phys. Chem. B 2012, 116, 14275−14284. (30) Tomasi, J.; Mennucci, B.; Cammi, R. Quantum mechanical continuum solvation models. Chem. Rev. 2005, 105, 2999−3093. (31) Smith, D. A.; Holroyd, L. F.; van Mourik, T.; Jones, A. C. A DFT study of 2-aminopurine-containing dinucleotides: prediction of stacked conformations with B-DNA structure. Phys. Chem. Chem. Phys. 2016, 18, 14691−14700. (32) Kruse, H.; Havrila, M.; Šponer, J. QM Computations on Complete Nucleic Acids Building Blocks: Analysis of the Sarcin-Ricin RNA Motif Using DFT-D3, HF-3c, PM6-D3H, and MM Approaches. J. Chem. Theory Comput. 2014, 10, 2615−2629. (33) Šponer, J.; Mládek, A.; Špačková, N.; Cang, X.; Cheatham, T. E.; Grimme, S. Relative Stability of Different DNA Guanine Quadruplex Stem Topologies Derived Using Large-Scale Quantum-Chemical Computations. J. Am. Chem. Soc. 2013, 135, 9785−9796. (34) Kruse, H.; Šponer, J. Towards biochemically relevant QM computations on nucleic acids: controlled electronic structure geometry optimization of nucleic acid structural motifs using penalty restraint functions. Phys. Chem. Chem. Phys. 2015, 17, 1399−1410. (35) Barone, G.; Fonseca Guerra, C.; Bickelhaupt, F. M. B-DNA Structure and Stability as Function of Nucleic Acid Composition: Dispersion-Corrected DFT Study of Dinucleoside Monophosphate Single and Double Strands. ChemistryOpen 2013, 2, 186−193. (36) Churchill, C. D. M.; Wetmore, S. D. Developing a computational model that accurately reproduces the structural features of a dinucleoside monophosphate unit within B-DNA. Phys. Chem. Chem. Phys. 2011, 13, 16373−16383. (37) Kruse, H.; Mladek, A.; Gkionis, K.; Hansen, A.; Grimme, S.; Sponer, J. Quantum Chemical Benchmark Study on 46 RNA Backbone Families Using a Dinucleotide Unit. J. Chem. Theory Comput. 2015, 11, 4972−4991. (38) Chen, A. A.; García, A. E. High-resolution reversible folding of hyperstable RNA tetraloops using molecular dynamics simulations. Proc. Natl. Acad. Sci. U.S.A. 2013, 110, 16820−16825. (39) Grimme, S.; Antony, J.; Ehrlich, S.; Krieg, H. A consistent and accurate ab initio parametrization of density functional dispersion correction (DFT-D) for the 94 elements H-Pu. J. Chem. Phys. 2010, 132, No. 154104. (40) Grimme, S.; Ehrlich, S.; Goerigk, L. Effect of the damping function in dispersion corrected density functional theory. J. Comput. Chem. 2011, 32, 1456−1465. (41) Cornell, W. D.; Cieplak, P.; Bayly, C. I.; Gould, I. R.; Merz, K. M.; Ferguson, D. M.; Spellmeyer, D. C.; Fox, T.; Caldwell, J. W.; Kollman, P. A. A Second Generation Force Field for the Simulation of Proteins, Nucleic Acids, and Organic Molecules. J. Am. Chem. Soc. 1995, 117, 5179−5197. (42) Banás,̌ P.; Hollas, D.; Zgarbová, M.; Jurečka, P.; Orozco, M.; Cheatham, T. E.; Šponer, J.; Otyepka, M. Performance of Molecular Mechanics Force Fields for RNA Simulations: Stability of UUCG and GNRA Hairpins. J. Chem. Theory Comput. 2010, 6, 3836−3849. (43) Klamt, A.; Schüürmann, G. COSMO: a new approach to dielectric screening in solvents with explicit expressions for the screening energy and its gradient. J. Chem. Soc., Perkin Trans. 2 1993, 799−805. L

DOI: 10.1021/acs.jpcb.6b07551 J. Phys. Chem. B XXXX, XXX, XXX−XXX

Article

The Journal of Physical Chemistry B (44) Gkionis, K.; Kruse, H.; Šponer, J. Derivation of Reliable Geometries in QM Calculations of DNA Structures: Explicit Solvent QM/MM and Restrained Implicit Solvent QM Optimizations of GQuadruplexes. J. Chem. Theory Comput. 2016, 12, 2000−2016. (45) Case, D.; Berryman, J.; Betz, R.; Cerutti, D.; Cheatham, T., III; Darden, T.; Duke, R.; Giese, T.; Gohlke, H.; Goetz, A.; et al. AMBER 14; University of California, 2015. (46) Darden, T.; York, D.; Pedersen, L. Particle mesh Ewald: An N· log(N) method for Ewald sums in large systems. J. Chem. Phys. 1993, 98, 10089−10092. (47) Ryckaert, J.-P.; Ciccotti, G.; Berendsen, H. J. C. Numerical integration of the Cartesian equations of motion of a system with constraints: molecular dynamics of n-alkanes. J. Comput. Phys. 1977, 23, 327−341. (48) Berendsen, H. J. C.; Postma, J. P. M.; Gunsteren, W. F. v.; DiNola, A.; Haak, J. R. Molecular dynamics with coupling to an external bath. J. Chem. Phys. 1984, 81, 3684−3690. (49) Essmann, U.; Perera, L.; Berkowitz, M. L.; Darden, T.; Lee, H.; Pedersen, L. G. A smooth particle mesh Ewald method. J. Chem. Phys. 1995, 103, 8577−8593. (50) Cheatham, T. E.; Cieplak, P.; Kollman, P. A. A Modified Version of the Cornell et al. Force Field with Improved Sugar Pucker Phases and Helical Repeat. J. Biomol. Struct. Dyn. 1999, 16, 845−862. (51) Wang, J.; Cieplak, P.; Kollman, P. A. How well does a restrained electrostatic potential (RESP) model perform in calculating conformational energies of organic and biological molecules? J. Comput. Chem. 2000, 21, 1049−1074. (52) Jorgensen, W. L.; Chandrasekhar, J.; Madura, J. D.; Impey, R. W.; Klein, M. L. Comparison of simple potential functions for simulating liquid water. J. Chem. Phys. 1983, 79, 926−935. (53) Roe, D. R.; Cheatham, T. E., III. PTRAJ and CPPTRAJ: Software for Processing and Analysis of Molecular Dynamics Trajectory Data. J. Chem. Theory Comput. 2013, 9, 3084−3095. (54) Kruse, H. Xopt, local development version; Institute of Biophysics: Brno, 2016. https://github.com/hokru/xopt. (55) Eckert, F.; Pulay, P.; Werner, H.-J. Ab initio geometry optimization for large molecules. J. Comput. Chem. 1997, 18, 1473− 1483. (56) Luo, R.; David, L.; Gilson, M. K. Accelerated PoissonBoltzmann calculations for static and dynamic systems. J. Comput. Chem. 2002, 23, 1244−1253. (57) Hawkins, G. D.; Cramer, C. J.; Truhlar, D. G. Pairwise solute descreening of solute charges from a dielectric medium. Chem. Phys. Lett. 1995, 246, 122−129. (58) Hawkins, G. D.; Cramer, C. J.; Truhlar, D. G. Parametrized Models of Aqueous Free Energies of Solvation Based on Pairwise Descreening of Solute Atomic Charges from a Dielectric Medium. J. Phys. Chem. 1996, 100, 19824−19839. (59) Tsui, V.; Case, D. A. Theory and applications of the generalized born solvation model in macromolecular simulations. Biopolymers 2000, 56, 275−291. (60) Ahlrichs, R.; Bär, M.; Häser, M.; Horn, H.; Kölmel, C. Electronic structure calculations on workstation computers: The program system turbomole. Chem. Phys. Lett. 1989, 162, 165−169. (61) Tao, J.; Perdew, J. P.; Staroverov, V. N.; Scuseria, G. E. Climbing the Density Functional Ladder: Nonempirical Meta-Generalized Gradient Approximation Designed for Molecules and Solids. Phys. Rev. Lett. 2003, 91, No. 146401. (62) Weigend, F. Accurate Coulomb-fitting basis sets for H to Rn. Phys. Chem. Chem. Phys. 2006, 8, 1057−1065. (63) Goerigk, L.; Grimme, S. Efficient and Accurate Double-HybridMeta-GGA Density Functionals-Evaluation with the Extended GMTKN30 Database for General Main Group Thermochemistry, Kinetics, and Noncovalent Interactions. J. Chem. Theory Comput. 2011, 7, 291−309. (64) Goerigk, L.; Grimme, S. A thorough benchmark of density functional methods for general main group thermochemistry, kinetics, and noncovalent interactions. Phys. Chem. Chem. Phys. 2011, 13, 6670−6688.

(65) Neese, F. The ORCA program system. WIREs Comput. Mol. Sci. 2012, 2, 73−78. (66) Kruse, H.; Grimme, S. A geometrical correction for the interand intra-molecular basis set superposition error in Hartree-Fock and density functional theory calculations for large systems. J. Chem. Phys. 2012, 136, No. 154101. (67) Sure, R.; Grimme, S. Corrected small basis set Hartree-Fock method for large systems. J. Comput. Chem. 2013, 34, 1672−1685. (68) Zhao, Y.; Truhlar, D. G. Design of Density Functionals That Are Broadly Accurate for Thermochemistry, Thermochemical Kinetics, and Nonbonded Interactions. J. Phys. Chem. A 2005, 109, 5656−5667. (69) Zhao, Y.; Truhlar, D. G. The M06 suite of density functionals for main group thermochemistry, thermochemical kinetics, noncovalent interactions, excited states, and transition elements: two new functionals and systematic testing of four M06 functionals and 12 other functionals. Theor. Chem. Acc. 2008, 119, 525. (70) Grimme, S.; Brandenburg, J. G.; Bannwarth, C.; Hansen, A. Consistent structures and interactions by density functional theory with small atomic orbital basis sets. J. Chem. Phys. 2015, 143, No. 054107. (71) Stewart, J. J. P. Optimization of parameters for semiempirical methods V: Modification of NDDO approximations and application to 70 elements. J. Mol. Model. 2007, 13, 1173−1213. (72) Kromann, J. C.; Christensen, A. S.; Steinmann, C.; Korth, M.; Jensen, J. H. A third-generation dispersion and third-generation hydrogen bonding corrected PM6 method: PM6-D3H+. PeerJ 2014, 2, No. e449. (73) Weigend, F. A fully direct RI-HF algorithm: Implementation, optimised auxiliary basis sets, demonstration of accuracy and efficiency. Phys. Chem. Chem. Phys. 2002, 4, 4285−4291. (74) Neese, F.; Wennmohs, F.; Hansen, A.; Becker, U. Efficient, approximate and parallel Hartree-Fock and hybrid DFT calculations. A ‘chain-of-spheres’ algorithm for the Hartree-Fock exchange. Chem. Phys. 2009, 356, 98−109. (75) Eichkorn, K.; Weigend, F.; Treutler, O.; Ahlrichs, R. Auxiliary basis sets for main row atoms and transition metals and their use to approximate Coulomb potentials. Theor. Chem. Acc. 1997, 97, 119− 124. (76) Renz, M.; Kess, M.; Diedenhofen, M.; Klamt, A.; Kaupp, M. Reliable Quantum Chemical Prediction of the Localized/Delocalized Character of Organic Mixed-Valence Radical Anions. From Continuum Solvent Models to Direct-COSMO-RS. J. Chem. Theory Comput. 2012, 8, 4189−4203. (77) Sinnecker, S.; Rajendran, A.; Klamt, A.; Diedenhofen, M.; Neese, F. Calculation of Solvent Shifts on Electronic g-Tensors with the Conductor-Like Screening Model (COSMO) and Its SelfConsistent Generalization to Real Solvents (Direct COSMO-RS). J. Phys. Chem. A 2006, 110, 2235−2245. (78) Klamt, A.; Diedenhofen, M. Calculation of Solvation Free Energies with DCOSMO-RS. J. Phys. Chem. A 2015, 119, 5439−5445. (79) Zgarbová, M.; Jurečka, P.; Banás,̌ P.; Otyepka, M.; Šponer, J. E.; Leontis, N. B.; Zirbel, C. L.; Šponer, J. Noncanonical Hydrogen Bonding in Nucleic Acids. Benchmark Evaluation of Key BasePhosphate Interactions in Folded RNA Molecules Using QuantumChemical Calculations and Molecular Dynamics Simulations. J. Phys. Chem. A 2011, 115, 11277−11292. (80) Genheden, S.; Ryde, U. The MM/PBSA and MM/GBSA methods to estimate ligand-binding affinities. Expert Opin. Drug Discovery 2015, 10, 449−461. (81) Hou, T.; Wang, J.; Li, Y.; Wang, W. Assessing the Performance of the MM/PBSA and MM/GBSA Methods. 1. The Accuracy of Binding Free Energy Calculations Based on Molecular Dynamics Simulations. J. Chem. Inf. Model. 2011, 51, 69−82. (82) Genheden, S.; Ryde, U. Comparison of end-point continuumsolvation methods for the calculation of protein-ligand binding free energies. Proteins 2012, 80, 1326−1342. (83) Zgarbová, M.; Šponer, J.; Otyepka, M.; Cheatham, T. E.; Galindo-Murillo, R.; Jurečka, P. Refinement of the Sugar-Phosphate Backbone Torsion Beta for AMBER Force Fields Improves the M

DOI: 10.1021/acs.jpcb.6b07551 J. Phys. Chem. B XXXX, XXX, XXX−XXX

Article

The Journal of Physical Chemistry B Description of Z- and B-DNA. J. Chem. Theory Comput. 2015, 11, 5723−5736. (84) Brown, R. F.; Andrews, C. T.; Elcock, A. H. Stacking Free Energies of All DNA and RNA Nucleoside Pairs and DinucleosideMonophosphates Computed Using Recently Revised AMBER Parameters and Compared with Experiment. J. Chem. Theory Comput. 2015, 11, 2315−2328. (85) Banás,̌ P.; Mládek, A.; Otyepka, M.; Zgarbová, M.; Jurečka, P.; Svozil, D.; Lankaš, F.; Šponer, J. Can We Accurately Describe the Structure of Adenine Tracts in B-DNA? Reference Quantum-Chemical Computations Reveal Overstabilization of Stacking by Molecular Mechanics. J. Chem. Theory Comput. 2012, 8, 2448−2460. (86) Maffeo, C.; Luan, B.; Aksimentiev, A. End-to-end attraction of duplex DNA. Nucleic Acids Res. 2012, 40, 3812−3821. (87) Šponer, J.; Šponer, J. E.; Mládek, A.; Banás,̌ P.; Jurečka, P.; Otyepka, M. How to understand quantum chemical computations on DNA and RNA systems? A practical guide for non-specialists. Methods 2013, 64, 3−11. (88) Havrila, M.; Zgarbová, M.; Jurečka, P.; Banás,̌ P.; Krepl, M.; Otyepka, M.; Šponer, J. Microsecond-Scale MD Simulations of HIV-1 DIS Kissing-Loop Complexes Predict Bulged-In Conformation of the Bulged Bases and Reveal Interesting Differences between Available Variants of the AMBER RNA Force Fields. J. Phys. Chem. B 2015, 119, 15176−15190. (89) Tomasi, J.; Mennucci, B.; Cammi, R. Quantum mechanical continuum solvation models. Chem. Rev. 2005, 105, 2999−3093. (90) Goerigk, L.; Kruse, H.; Grimme, S. Benchmarking Density Functional Methods against the S66 and S66x8 Datasets for NonCovalent Interactions. ChemPhysChem 2011, 12, 3421−3433. (91) Sure, R.; Antony, J.; Grimme, S. Blind Prediction of Binding Affinities for Charged Supramolecular Host-Guest Systems: Achievements and Shortcomings of DFT-D3. J. Phys. Chem. B 2014, 118, 3431−3440. (92) Guillot, B. A reappraisal of what we have learnt during three decades of computer simulations on water. J. Mol. Liq. 2002, 101, 219−260. (93) Goerigk, L.; Karton, A.; Martin, J. M. L.; Radom, L. Accurate quantum chemical energies for tetrapeptide conformations: why MP2 data with an insufficient basis set should be handled with caution. Phys. Chem. Chem. Phys. 2013, 15, 7028.

N

DOI: 10.1021/acs.jpcb.6b07551 J. Phys. Chem. B XXXX, XXX, XXX−XXX