3786
J. Phys. Chem. B 2008, 112, 3786-3796
Structure, Stability, and Dynamics of Canonical and Noncanonical Base Pairs: Quantum Chemical Studies Ashim Roy, Swati Panigrahi, Malyasri Bhattacharyya, and Dhananjay Bhattacharyya*,† Biophysics DiVision and Center for Applied Mathematics and Computational Science, Saha Institute of Nuclear Physics, 1/AF Bidhannagar, Kolkata 700064, India ReceiVed: August 29, 2007; In Final Form: December 10, 2007
The importance of non-Watson-Crick base pairs in the three-dimensional structure of RNA is now well established. The structure and stability of these noncanonical base pairs are, however, poorly understood. We have attempted to understand structural features of 33 frequently occurring base pairs using density functional theory. These are of three types, namely (i) those stabilized by two or more polar hydrogen bonds between the bases, (ii) those having one polar and another C-H‚‚‚O/N type interactions, and (iii) those having one H-bond between the bases and another involving one of the sugars linked to the bases. We found that the base pairs having two polar H-bonds are very stable as compared to those having one C-H‚ ‚‚O/N interaction. Our quantitatively analysis of structures of these optimized base pairs indicates that they possess a different amount of nonplanarity with large propeller or buckle values as also observed in the crystal structures. We further found that geometry optimization does not modify the hydrogen-bonding pattern, as values of shear and open angle of the base pairs remain conserved. The structures of initial crystal geometry and final optimized geometry of some base pairs having only one polar H-bond and a C-H‚ ‚‚O/N interaction, however, are significantly different, indicating the weak nature of the nonpolar interaction. The base pair flexibility, as measured from normal-mode analysis, in terms of the intrinsic standard deviations of the base pair structural parameters are in conformity with those calculated from RNA crystal structures. We also noticed that deformation of a base pair along the stretch direction is impossible for all of the base pairs, and movements of the base pairs along shear and open are also quite restricted. The base pair opening mode through alteration of propeller or buckle is considerably less restricted for most of the base pairs.
Introduction The structure and dynamics of macromolecules like RNA depend on several interactive forces with hydrogen bonding interactions between bases being one of the most important. Other relevant forces are arising from base-base stacking, base-backbone, or backbone-backbone interactions, etc. Thorough structural insight into the complicated regime of secondary and tertiary interaction occurring in RNA would facilitate a better understanding of its versatile biological functions. It has been long known that the regular stretches of A-form RNA are often defined by contiguous canonical Watson-Crick base pairs, formed through hydrogen bonds between complementary bases adenine-uracil, guanine-cyctocine, and sometimes guanine-uracil (wobble base pair), which are also the major structural motifs of various functional RNAs such as ribosome, ribozyme, tRNA, etc.1 In addition to these, a large fraction of RNA base-base interactions involve highly variable noncanonical (non-WC) base pairs. Each nucleobase possesses three edges for hydrogen bonding: (i) the Watson-Crick edge, (ii) the Hoogsteen or “C-H” edge, and (iii) the sugar edge. The sugar edge occasionally involves the 2′-OH group of ribose also for H-bonding. Two nucleotides in RNA can interact with each other through any of these three edges in either cis or trans * To whom correspondence should be addressed. E-mail:
[email protected]. Phone: +91-33-2337-0379, ext. 3506. Fax: +91-33-2337-4637. † Center for Applied Mathematics and Computational Science.
orientation with respect to the hydrogen bonds of the sugar rings. We defined a shorthand nomenclature to indicate the base pairing edge as well as orientation and have explained that in Table 1.2 This leads to a total of 12 different base edges families with 168 possible base pairing patterns.3 Also, water mediated and protonated base pairs have also been identified recently,4 and these perhaps give rise to extra stability to the folded threedimensional structures. Our recently developed software BPFIND2 detects all canonical and noncanonical base pairs occurring in nucleic acid threedimensional structures only when at least two direct hydrogen bonds are possible between two bases through any edge. In addition it can also detect base pairs when there is one hydrogen bond between the bases and the second one is between one base and the 2′-OH group of the sugar connected to the second base. The direct hydrogen bonds can be of polar (N-H...O/N) or nonpolar (C-H‚‚‚O/N) types. Among the noncanonical base pairs, the most frequent ones are trans A:G Hoogsteen/sugar edge (H:S T) and trans A:U Hoogsteen/WC (H:W T), whose frequencies of occurrences are comparable to that of the wobble G:U base pair (W:W C). These high frequencies of occurrences of noncanonical base pairs certainly indicate that these have an important role in the stability of structural RNA. They stabilize tertiary contacts between remote segments of the RNA macromolecules and form specific RNA motifs. They are frequently involved in the formation of internal RNA loops and specific segments containing several contiguous non-WC base pairs.
10.1021/jp076921e CCC: $40.75 © 2008 American Chemical Society Published on Web 03/05/2008
Quantum Chemical Studies of Base Pairs
J. Phys. Chem. B, Vol. 112, No. 12, 2008 3787
TABLE 1: Base Pairs Studied along with Their Source and Frequency of Occurrence in RNA Structure Database, Sorted According to Frequency no.
base pair
Leontis/Westhof nomenclature4
frequency
PDB ID and residue names
environment
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
G:C W:W C A:U W:W C G:U W:W C A:G H:S T A:G s:s T A:U H:W T A:A H:H T G:A W:W C G:A S:W T A:A W:W T A:C w:s T A:U W:W T A:A H:W T A:U H:W C A:G: w:s C A:A h:s T A:C w:s C G:G S:S T A:U w:s T G:U w:s C A:C w:w C A:U s:w C G:C w:s C G:G H:W T A:A w:w C U:U h:w T C:C h:s T A:U w:s C A:A s:s T C:U h:s T A:C W:W T C:U W:W T C:C w:h C
cis Watson-Crick/Watson-Crick (canonical) cis Watson-Crick/Watson-Crick (canonical) cis Watson-Crick/Watson-Crick (wobble) trans Hoogsteen/sugar edge trans Sugar Edge/sugar edge trans Hoogsteen/Watson-Crick trans Hoogsteen/Hoogsteen cis Watson-Crick/Watson-Crick trans sugar edge/Watson-Crick trans Watson-Crick/Watson-Crick trans Watson-Crick/sugar edge trans Watson-Crick/Watson-Crick trans Hoogsteen/Watson-Crick cis Hoogsteen/Watson-Crick cis Watson-Crick/sugar edge trans Hoogsteen/sugar edge cis Watson-Crick/sugar edge trans Sugar Edge/sugar edge trans Watson-Crick/sugar edge cis Watson-Crick/sugar edge cis Watson-Crick/Watson-Crick cis sugar edge/Watson-Crick cis Watson-Crick/sugar edge trans Hoogsteen/Watson-Crick cis Watson-Crick/Watson-Crick trans CHO Edge/Watson-Crick trans Hoogsteen/sugar edge cis Watson-Crick/sugar edge trans Sugar Edge/sugar edge trans Hoogsteen/sugar edge trans Watson-Crick/Watson-Crick trans Watson-Crick/Watson-Crick cis Watson-Crick/CHO-edge
21593 6862 2749 2303 1496 1164 444 411 335 253 214 210 205 202 180 170 134 133 107 89 81 76 66 65 63 49 56 52 52 44 34 19 18
1DFU 97 (M)-79 (N) 1ASY 607 (S)- 666 (S) 1ASY 610 (S)-625 (S) 1MZP 21 (B)-44 (B) 1FFK 243 (O)-274 (O) 1ASY 614 (S)-608 (S) 1ASY 609 (R)- 623 (R) 1QVG 2596 (O)-2582 (O) 1FFK 166 (O)-924 (O) 1QVG 2566 (O)-2699 (O) 1HNW 696(A)-797(A) 1ASZ 615 (R)-648 (R) 1FFK 460 (O)-455 (O) 1FFZ 2470 (A)-2277 (A) 1SM1 991 (O)-2020 (O) 1GTR 13(B)-22(B) 1HNX 1280(A)-1149(A) 1FFK 315 (O)-336 (O) 4TNA 8-21 1S72 196(O)-425(O) 1M5K 20 (B)-63 (B) 1M1K 329 (A)-346 (O) 1FFK 1302(O)-1353(O) 1QVG 868 (O)-775 (O) 1FFK 635 (O)-620 (O) 1W2B 2781 (O)- 2791 (O) 1JBR 8(C)-22(F) 1F1T 11(A)-26(A) 1HNW 50(A)-360(A) 1S72 2690(O)-2704(O) 1QVG 1742 (O)-2037 (O) 1JJ2 1394 (O)-1432 (O) 1Q82 1176 (A)-1196 (A)
double helical double helical double helical double helical; triplet triplet isolated triplet double helical triplet parallel helix quadruple isolated triplet. triplet triplet triplet triplet isolated triplet quadruple double helical double helical double helix triplet double helical parallel helix double helix isolated partial stacking double helix double helical isolated isolated
These noncanonical base pairs often occur within secondary structural blocks and play an important role in the evolution and folding of RNA architectures and in building up the complex three-dimensional structures of RNAs.5 Conformation of a non-Watson-Crick base pair is also important for their biological function, and a detailed structural characterization was necessary to reveal conformational specificities of several classes of noncanonical pairs, thereby highlighting their role in maintaining the complicated folded structure of RNA. Earlier attempts to characterize the shape of a noncanonical base pair was restricted to analysis of C1′‚‚‚ C1′ distances, termed “isostericity”3 and was found useful in modeling RNA structures from sequence conservation. A recent study used three-rotational and three-translational parameters to describe the relative orientation of two bases in a pair. It was observed that most of the base pairs in cis orientation have a large negative propeller twist, which is nearly zero for the trans base pairs.6 The base pairs involving sugar edges were generally found to be nonplanar, presumably to avoid steric clash involving bulky sugar moiety. Structures of quantum chemically optimized base pairs, however, was never analyzed quantitatively although such a study could highlight inherent properties of these noncanonical base pairs. Furthermore, the flexibility of these base pairs, which is extremely important to understand their role in structure and function, was never addressed. As indicated earlier our BPFIND software detects three type of base pairing namely (i) those stabilized by two or more direct polar hydrogen bonding (N-H‚‚‚O or N-H‚‚‚N) between two base atoms, (ii) those held by one polar and a nonpolar hydrogen bond involving C-H‚‚‚O or C-H‚‚‚N type of interaction between base atoms, and (iii) those held together by one direct hydrogen bond between base atoms and one involving sugar O2′ atom connected to one of the bases. Though the second
type of base pairs involves an intrinsically weak nonpolar force and very often assumes nonplanar geometry, many of them still contribute substantially to the RNA structural stability as they are seen to occur quite frequently in RNA crystal structures.2 The base pairs involving the hydrogen bond to the sugar residues have additional flexibility due to the single covalent glycosidic bond and can be nonplanar even with perfectly linear hydrogen bonds. A similar detection criterion is also followed by other method, which however considers base pairs mediated by a single H-bond also.1,7 Since a biomolecular structure is ultimately governed by free energy changes, a thorough understanding of the structure-energy relationship in RNA non-WC base pairs is essential. Although there are quite a few reports on quantum chemical energy calculation of the canonical and some non-WC base pairs,8-10 there is no effort to systematically characterize H-bonding stabilities of all frequently observed base pairs in RNA. Furthermore, it is difficult to characterize these C-H‚‚ ‚O/N interactions by molecular mechanics type force fields due to inappreciable charge distribution of the C-H groups of the nucleotide bases. Moreover, understanding the quantitative structure-energetic relationship of any base pair is also never attempted. We find here that these H-bonds have significantly different strengths of interaction and suggest that they should be considered differently. Moreover, the geometry-optimized structure of a base pair generally corresponds to a 0 K situation while a true biophysical understanding may come from its dynamic variability. In this work we have taken 33 base pairs (as shown in Figure 1) that occurred a considerable number of times in RNA crystal structures. We have carried out geometry optimization of all the base pairs by using density functional theory (DFT) with a hybrid B3LYP functional, as adopted by others8,9 and have
3788 J. Phys. Chem. B, Vol. 112, No. 12, 2008
Roy et al.
Figure 1. Structures of the 33 base pairs optimized by B3LYP/6-31G**.
quantitatively characterized alteration of structures during optimization. We have determined hydrogen-bonding strengths corresponding to each type of H-bond, such as N-H‚‚‚O, N-H‚ ‚‚N, O-H‚‚‚N, etc, using a statistical procedure. Some of these base pairs have multiple polar H-bonds, some have one polar and one nonpolar C-H‚‚‚O/N type of interaction between the bases, while some have a H-bond to the sugar O2′ or H2′. A complete analysis of the third type of base pairs with H-bonds involving sugar hydroxyl group demands the study of the flexibility of sugar puckering and glycosidic torsional freedom; hence, their structure and dynamics are totally different. We have therefore restricted ourselves to the first two varieties only
for most of the analyses. We have analyzed structure and dynamics in 23 base pairs, where a direct H-bond exists between the bases through different approaches such as changes in calculated infrared (IR) frequency and intensity. We have also attempted to understand base pair dynamics for these 23 base pairs by calculating intrinsic flexibility of each of the base pairs from the normal-mode frequencies and calculated expected standard deviations of the intra-base-pair parameters, viz., propeller, buckle, open-angle, stagger, shear, and stretch. These standard deviation values are in excellent agreement with those obtained from crystal data.6 The analysis of base pair dynamics further indicates that the movement of most base pairs is highly
Quantum Chemical Studies of Base Pairs favorable along their propeller or buckle directions. Such movements maintain hydrogen-bonding strength by generally modifying pyramidalization of the H-bond forming amino groups. The other movements along open, shear, and stretch are more hindered as they are associated with distortion of the H-bonds between the bases. The results also indicate that the base pairs stabilized by two or more polar H-bonds are stable with equivalent strength whereas those having C-H‚‚‚O/N are rather weak. Material and Data Sheets We have selected 33 base pairs of usual Watson-Crick type as well as noncanonical type (Table 1) from crystal structures of RNA as indicated on our website http://www.saha.ac.in/biop/ bioinformatics.html. These base pairs were detected by BPFIND software with considerable frequency in 145 functional RNA crystal structures. We have chosen only those base pairs whose frequency of occurrence is non-negligible. We have also not considered any base pair that are stabilized by additional protonmediated H-bonding. Base pair stabilized by two N-H‚‚‚N/O type of hydrogen bonding are denoted by “W” (Watson-Crick), “H” (Hoogsteen), or “S” (sugar), and those stabilized by one C-H‚‚‚N/O types hydrogen bonding are denoted by lower case letters “w”, “h”, or “s”. These characters, i.e., W, H, etc. are used to mention the edge of a base through which it forms H-bonds.2 The last letter in base pair nomenclature indicates cis (C) or trans (T) orientation of the glycosidic bonds with respect to the virtual H-bonds. Computational We used MOLDEN software11 for model building hydrogen atomic positions and graphical visualization of the RNA base pairs and GAMESS-US12 for ab initio calculations using the hybrid B3LYP functional13 with the 6-31G (2d, 2p) basis set. Recent studies showed that this quantum chemical approach is sufficient to illustrate H-bonded planar base pairs.9 We have replaced the sugar phosphate parts of each nucleotide of 23 base pairs by methyl groups to reduce our computational cost. We could have replaced the sugar phosphate moieties by hydrogen atoms to further reduce the computational cost; however, this gives rise to additional H-bond donors (N-H) at the sugar edge of a base, which may lead to nonphysiological base pairing on optimization as reported earlier.10 For the third variety of base pairs we have considered sugars attached to one of the bases, terminated by OH groups at the 5′ and 3′ positions. Geometry optimization of the molecule in the ground state was carried out using DFT theory. Normal-mode analyses of the optimized structures were performed by numerical Hessian method. Basis set superposition errors (BSSE) were calculated by the MOROKUMA method14 using a Hartree-Fock procedure (HF) with 6-31G (2d,2p) basis set for the fully relaxed geometry optimized structures of the base pairs. The interaction energy ∆E of a dimer A‚‚‚B is defined as the electronic energy difference between the dimer and isolated optimized monomers (EA and EB) where monomer energies are obtained assuming the geometries of the optimized dimer and using the basis set of the dimer. Mathematically this is given opt opt opt as ∆E ) Eopt XY - EX - EY where EXY is the energy of the opt opt optimized base pair and EX and EY are the energy of the bases X and Y in the optimized base pair geometry. We also calculated the deformation energy, which is a repulsive contribution due to changes of monomer geometries upon complex formation and is defined as the energy required in deforming the isolated and optimized bases to the geometry they assume
J. Phys. Chem. B, Vol. 112, No. 12, 2008 3789 TABLE 2: Interaction Energy Components (in kcal/mol) of the Base Pairs no.
base pair
∆Ecor
∆E
Edef
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 30 31 32 33
G:C W:W C A:U W:W C G:U W:W C A:G H:S T A:G s:s T A:U H:W T A:A H:H T G:A W:W C G:A S:W T A:A W:W T A:C w:s T A:U W:W T A:A H:W T A:U H:W C A:G w:s C A:A h:s T A:C w:s C G:G S:S T A:U w:s T G:U w:s C A:C w:w C A:U s:w C G:C w:s C G:G H:W T A:A w:w C U:U h:w T C:C h:s T A:U w:s C C:U h:s T A:C W:W T C:U W:W T C:C w:h C
-26.51 -14.43 -14.68 -9.50 -5.71 -13.31 -9.76 -15.00 -10.61 -12.10 -8.76 -12.71 -11.28 -14.21 -6.02 -7.41 -15.48 -8.12 -13.13 -9.69 -6.28 -9.43 -8.13 -17.58 -6.07 -9.33 -5.22 -16.33 -7.40 -14.32 -10.44 -8.28
-33.80 -18.29 -19.03 -13.29 -7.79 -17.79 -12.69 -20.05 -14.22 -14.88 -14.32 -17.25 -14.09 -18.66 -7.78 -28.27 -21.34 -12.06 -43.56 -12.48 -8.44 -12.37 -15.99 -21.38 -7.71 -11.58 -11.72 -19.35 -9.49 -18.19 -14.79 -11.69
3.61 1.02 1.73 1.51 0.29 1.53 0.97 2.13 1.33 0.66 2.60 1.39 0.82 1.47 0.18 17.96 2.70 1.79 28.00 0.55 0.37 0.10 5.28 1.45 -0.11 -0.12 4.55 0.03 0.16 1.61 1.22 0.92
in the base pair. The deformation energy is hence calculated as o opt o o o Edef ) (Eopt X - EX) + (EY - EY) where EX and EY are the energy of the optimized isolated bases. Taking into consideration the basis set superposition error (BSSE) and deformation energy, the total interaction energy of a base pair is given by opt opt ∆Ecor ) Eopt XY - (EX + EY ) + BSSE + Edef
(1)
Putting the expression for deformation energy in this equation the total interaction energy of a base pair stands out as o o ∆Ecor ) Eopt XY - EX - EY + BSSE
(2)
Intra-base-pair parameters, viz., buckle, open-angle, propeller, stagger, shear, and stretch, were calculated using the NUPARM software6 version 2.0. Results and Discussion Energy of Interaction. Optimized structures of all 33 selected Watson-Crick as well as noncanonical base pairs are shown in Figure 1. It is apparent that all of the base pairs exhibit the expected H-bonding scheme of the corresponding base pair type. Interaction energy of the analyzed base pairs in RNA is found to vary between -5 to -27 kcal/mol (Table 2), indicating a wide range of stabilization provided by these frequently occurring base pairs. It is observed that base pair stabilized by C-H mediated H-bond are less favorable (∆Ecorr > -10 kcal/ mol) whereas all the base pairs with two polar H-bonds, except few (G:G S:S T; A:G H:S T; A:A H:H T) are highly favorable (∆Ecorr < -10 kcal/mol). Compared to all other base pairs, the GC base pair is stabilized maximally, and this also explain their significantly higher frequency of occurrence in any RNA structure. Canonical A:U and G:U wobble base pairs have
3790 J. Phys. Chem. B, Vol. 112, No. 12, 2008 interaction energies around -15 kcal/mol, which is comparable to the interaction energy of many noncanonical base pairs. Notably, the interaction energies of GC WC and AT WC base pairs were reported by Sˇ poner et al.8 as -27.5 and -15.0 kcal/ mol respectively whereas these were calculated as -25.2 kcal/ mol for the GC WC base pair and -12.2 kcal/mol for the AT WC base pair by Mo.15 From our results it is shown that the interaction energy of the G:C W:W C base pair is -26.51 kcal/ mol, whereas for the A:U W:W C base pair it is -14.43 kcal/ mol which are in good agreement with the previous reports. It should be noted that both these groups calculated the interaction energy of base pairs by considering the bases only and did not consider sugar part or replaced sugar part by a methyl group. On the other hand we have replaced sugar part of a nucleotide by a methyl group (to reduce artificial polarity of the bases) and thus we have taken into consideration of two possible relative orientations of base pair with respect to sugar moiety. It is further noted that interaction energy is not always related to the frequency of occurrence although G:C W:W C has highest frequency with best stabilization energy. The 18th base pair G:G H:W T is very stable (∆E ) -17.58 kcal/mol) but a rarely observed base pair. On the other hand the A:G s:s T base pair, even with small interaction energy (∆E ) -5.71 kcal/mol), is observed very frequently. This higher frequency might be due to the possibility of formation of a third H-bond involving the 2-amino group of guanine with the sugar moiety of adenine. Deformation energy of these base pairs has positive values except in few cases, which have negligibly small negative value. Surprisingly, the base pairs having very small deformation energy component are all having one nonpolar interaction. This, possibly indicate that only one strong H-bond does not significantly alter structures of the optimized bases, which multiple H-bonds may do. Deformation energies of the bases with sugar are generally higher, indicating more variability of the sugar part. In general most of the base pairs having multiple polar H-bonds have high interaction energy ranging from -10 kcal/ mol to -27 kcal/mol. Few base pairs with two polar H-bonds namely A:G H:S T (fourth base pair), A:A H:H T (seventh base pair) and G:G S:S T (18th base pair) are found to have weaker interaction energy as exception. The modes of interactions for these base pairs are found to be only a pair of N-H‚‚‚N type hydrogen bonds. The base pairs stabilized by one polar and one nonpolar interaction are all found to have interaction energies ranging between -5.71 kcal/mol (for A:G s:s T) to -9.43 kcal/ mol (for A:U s:w C). Optimized structures of a few base pairs involving sugar have single H-bond and as a result even smaller interaction energy. Intra-Base-Pair Parameters. The structures of the base pairs for geometry optimization were taken from crystal structures of folded RNA macromolecules and hence they were possibly rather strained due to different local and nonlocal interactions, as shown in Table 1. The structural intra-base-pair parameters of these base pairs in crystal geometry as well as energyminimized geometry, as calculated by NUPARM 2.0, are shown in Table 3. The three rotational intra-base-pair parameters buckle, open-angle, and propeller and the three translational parameters namely stagger, shear, and stretch describe the planarity and proximity of association between two bases. Among these six-parameters buckle, propeller, and stagger describe the overall nonplanarity of a base pair compared to ideal coplanar geometry while open-angle, shear, and stretch are related to the hydrogen-bonding pattern and proximity. Definitions and nomenclature of these parameters are explained
Roy et al. TABLE 3: Intra-Base-Pair Parameters of the Base Pairs in Crystals and Optimized Geometriesa no.
base pair
buckle (°)
-1.04 -0.3 2 A:U W:W C 7.52 -0.98 3 G:U W:W C -1.85 3.94 4 A:G H:S T -8.52 6.42 5 A:G s:s T -13.64 1.52 6 A:U H:W T 14.00 -3.20 7 A:A H:H T 3.34 -0.06 8 G:A W:W C 3.69 6.14 9 G:A S:W T 16.46 6.74 10 A:A W:W T 15.42 4.40 11 C:A s:w T 45.45 20.24 12 A:U W:W T -7.28 -0.67 13 A:A H:W T 1.02 -1.40 14 A:U H:W C -3.17 -0.24 15 A:G w:s C -8.41 -4.35 16; A:A h:s T -6.06 -6.17 17 A:C w:s C 1.16 41.62 18 G:G S:S T -19.42 -12.95 19 A:U w:s T 5.77 -1.87 20 G:U w:s C 23.36 -7.15 21 A:C w:w C -19.67 -4.42 22 A:U s:w C 31.27 2.34 23 G:C w:s C 12.18 -8.60 24 G:G H:W T 24.14 10.26 25 A:A w:w C 10.15 8.96 26 U:U h:w T 4.87 -2.05 27 C:C h:s T -10.01 -11.33 28 A:U w:s C 2.90 -45.45 29 A:A s:s T -66.22 -62.28 30 C:U h:s T 1.21 -11.39 31 A:C W:W T 6.06 -1.10 32 C:U W:W T -9.70 -0.83 33 C:C w:h C -13.94 -5.13 1 G:C W:W C
open angle (°) 0.45 -3.43 0.27 0.03 -2.38 4.11 -7.45 -20.31 30.40 23.45 1.95 7.49 -3.81 - 0.06 -0.72 -2.93 -12.22 -18.41 -1.67 0.00 27.17 37.07 -0.91 -7. 96 6.22 0.95 -2.51 2.23 43.94 34.47 -3.64 -3.60 -32.88 -37.10 -0.87 0.08 -33.38 -42.52 -27.57 -32.74 0.02 13.55 -6.37 -6.63 -32.23 -36.80 -7.80 -9.91 2.24 14.47 14.60 9.07 -9.87 -12.77 -27.86 -30.68 14.01 41.18 -3.39 -13.50 -1.63 -0.78 1.35 1.95 -17.69 -13.06
propeller stagger (°) (Å) -2.54 2.21 -9.33 0.35 -2.52 -1.30 -12.13 -40.88 -8.76 -33.6 4.88 1.91 -6.87 -32.08 -9.66 -12.75 -33.27 -17.40 7.20 -12.06 -20.56 -5.81 4.10 -0.57 21.72 17.58 -18.60 -2.63 -42.52 -16.38 2.18 2.13 -20.09 -10.15 27.07 30.85 -15.74 -16.09 3.18 -4.68 -7.73 -4.84 -4.82 2.83 -15.72 4.47 -2.81 3.48 -15.67 -0.22 -1.88 2.52 -7.22 -12.97 -4.84 -20.69 0.67 -43.59 -1.07 -8.16 -29.72 -1.93 -17.55 -15.81 -6.66 -31.91
-0.35 0.12 0.33 0.00 0.05 -0.15 -0.16 0.05 0.05 -0.90 0.27 0.00 0.46 0.01 0.09 -0.21 - 0.33 -0.07 -0.37 0.00 -0.76 1.29 -0.15 0.03 1.04 0.00 0.34 -0.01 0.55 0.04 -0.14 0.13 1.02 -1.21 0.38 0.00 0.31 -1.19 -0.84 0.03 -0.67 -0.01 -0.15 0.03 -0.24 0.03 -0.40 0.03 -0.76 0.09 -0.11 0.01 0.46 -0.16 0.47 -1.29 1.88 -0.09 0.32 -1.05 -0.01 0.03 0.16 0.29 0.12 -0.32
shear (Å)
stretch (Å)
0.29 0.17 -0.13 0.09 -2.31 -2.42 2.14 1.95 1.57 1.38 0.04 0.15 2.66 2.67 0.61 -0.01 1.71 1.82 2.87 2.27 -0.18 0.49 -0.1 5 -0.06 2.55 2.50 -0.23 -0.13 1.94 2.23 2.28 2.28 -0.61 -0.86 1.72 1.17 -0.51 0.28 2.44 4.15 3.00 2.60 0.03 0.16 3.55 4.35 0.17 0.17 2.58 2.50 2.29 2.70 4.53 5.19 -0.26 0.56 0.98 -2.29 5.00 5.15 2.61 2.33 -0.35 -0.25 -1.92 -1.86
2.85 2.92 2.79 2.86 2.72 2.84 3.29 3.24 3.12 3.20 2.71 2.79 2.86 2.88 2.78 2.94 3.28 3.32 2.88 2.98 3.50 3.66 2.88 2.81 2.77 2.94 2.82 2.79 2.78 3.07 2.85 2.85 4.51 4.21 3.53 3.53 4.09 4.04 3.71 3.26 2.39 2.70 3.01 2.98 3.26 3.18 2.75 2.85 2.71 2.75 2.94 3.04 2.57 2.89 4.16 4.20 2.50 3.02 2.94 2.83 2.97 2.94 3.01 2.95 3.30 3.26
a First line for each entry gives parameter values in the crystal geometry, whereas the second line gives those in the optimized geometry.
wonderfully in ref 16. Thus the extent of deformation compared to an ideal planar geometry of a base pair can be easily visualized from the parameter values. The first three canonical (WWC) base pairs, namely A:U, G:C, and G:U, adopt a near planar geometry in energy-
Quantum Chemical Studies of Base Pairs minimized structures. Previous studies of Watson-Crick A:T or G:C base pairs by Hartee-Fock method also reported planar structures of the base pairs17,18 whereas a nonplanar structure19 was detected by MP2/6-31G (d) or MP/-6-31G (d,p) methods. However, quantitative analyses of the base pairs nonplanarity were not reported earlier. Our analysis suggests various degrees of nonplanarity of the base pairs by DFT optimization, which also incorporates electron correlation effects. The planar orientation of the canonical base pairs may be due to neglect of the sugar moiety and stacking between neighboring base pairs in the crystal environment (Table 1). Most of the noncanonical base pairs also tend to adopt near planar orientation with perfect hydrogen bond geometry (Table 3). The base pairs interacting through sugar show sufficient nonplanarity in most cases. Stagger values of most of the base pair become zero or close to zero on optimization. This clearly indicates that most of the base pairs have a tendency to assume planer geometry on optimization as the influence of stacked base pairs disappeared during their optimizations. The value of open-angle, shear, and stretch, which describe H-bonding pattern, mostly retain their initial conformation. These values are also close to the corresponding average values obtained from analysis of 145 crystal structures of functional RNA.6 This possibly justifies the choice of the initial structures from a big pool of similar conformations. This further indicates that energy minimization did not perturb the hydrogen bond geometry of the initial structures, which was reported to occur earlier for some base pairs with single or poor H-bonds.9,10 Base pairs with H-bonds involving the primary amino group (N2 of guanine, N4 of cytocine, and N6 of adenine) tend to adopt a large propeller due to pyramidalization of the amino groups, while those having interactions through secondary amino group (N3 of uracil and N1 of guanine) are more planar with a small value of propeller twist. Our previous study20 of different amino groups containing biomolecular systems by different quantum chemical methods, namely MP2/6-31 (2d,2p), HF/631G (2d,2p), HF-6-311G (2d,2p), or HF/6-311+G(2d, 2p), also suggested larger pyramidalization of the primary amino groups as compared to the secondary amino group. It is seen that all the base pairs stabilized by two primary amino groups, such as N6 of adenine, N4 of cytosine, or N2 of guanine have large propeller twist. Examples of such base pairs are A:G H:S T (4th), A:A H:H T (7th), G:A S:W T (9th), A:A W:W T (10th), A:A H:W T (13th), and G:G S:S T (18th). On the other hand a H-bond involving a secondary amino group tends to reduce the propeller twist in G:C W:W C (1st), A:U W:W C (2nd), G:U W:W C (3rd), A:U H:W T (6th), A:U W:W T (11th), A:U H:W C (14th), and G:G H:W T (24th). It may also be noted that it was reported earlier that these pyramidalizations of primary amino groups are their inherent tendency and the possibility of a weak intramolecular hydrogen bond also cannot make them planer,21 which is also observed in most of the base pairs having a N-H‚‚‚N type of interaction. The canonical base pairs, on the other hand, have strong H-bonding between primary amino groups and oxygen as the acceptor. Such strong H-bonds may be sufficient to reduce pyramidalization of the primary amino groups. A number of base pairs, however, undergo considerable movement during optimization. We tried to understand the reasons behind these changes and found that these base pairs were either (a) stabilized by additional sugar mediated hydrogen bonding, (b) stacking interaction with neighboring base pairs of the folded RNA macromolecules as in A:A W:W T (10th), A:C w:w C (21st), A:U s:w C (22nd), A:A w:w C (25th), and
J. Phys. Chem. B, Vol. 112, No. 12, 2008 3791 A:C W:W T (31st) base pairs, or (c) by other stabilizing interactions, such as base triple formation as in A:G H:S T (4th), A:A H:H T (7th), G:A S:W T (9th), A:A H:W T (13th), A:U H:W C (14th), A:G w:s C (15th), A:G s:s T ( 5th), and G:G H:W T (24th) base pairs, which were considered as base pairs only in our study. A few base pairs with one H-bond through the sugar hydroxyl group and another C-H‚‚‚O interaction, such as C:C h:s T and C:U h:s T (27th and 30th, respectively) optimize to structures with single H-bond, as the initial C-H‚ ‚‚O interaction is too weak to sustain during optimization. In addition, there are crowding effects arising due to C1′ or H1′ atoms. The A:A s:s T (29th) base pair underwent considerable movement to the w:s T orientation upon optimization as the initial structure had only two C-H‚‚‚O interactions while the later is stabilized by a significantly stronger N-H‚‚‚O H-bond. For this reason, we have not analyzed its interaction energies. Such a variation was also noted earlier.22 The base pairs with sugar edge, even without explicit sugars, remain nonplanar mainly due to crowding effect involving methyl groups at the C1′ position, which is also the expected scenario in normal RNA where the bulky ribose sugar might have a larger number of contacts. The last base pair (C:C w:h C) adopts a large propeller possibly due to weak C-H‚‚‚O interactions as well as pyramidal amino groups. The C:U W:W T (22nd) base pair retains a reasonably large propeller possibly to avoid electrostatic repulsion between the two O2 groups of the two pyrimidines. The A:U H:W C and A:U H:W T base pairs tried to adopt a planer geometry with reduced propeller twist possibly to get additional stability by a C-H‚‚‚O type interaction between the O2 atom of uracil and the H8 atom of adenine. From Table 1 it is found that there are five base pairs in our set, which were isolated base pairs in crystal structure also. They were expected to retain similar structural parameters upon optimization and these are found to be true for trans A:U WC/ WC (W:W T), trans G:G SE/SE (S:S T), and trans C:U WC/ WC (W:W T) base pairs. As indicated above the A:U H:W T base pair became flat after optimization, due to involvement of a secondary amino group in H-bonding and also to improve a C-H‚‚‚O hydrogen bond distance. On the other hand, the C:C w:h C base pair adopts a nonplanar geometry upon optimization, presumably due to its extra flexibility. Energetics of Different Types of H-Bonds. Widely different types of hydrogen bonds are reported in chemistry while only few of them are possible in biomolecules, namely N-H‚‚‚O, N-H‚‚‚N, S-H‚‚‚O, S-H‚‚‚N, etc. Recent analysis of crystal structures of organic or bimolecular fragments indicated that perhaps C-H‚‚‚O or C-H‚‚‚N forms another class of hydrogen bond.23 Following these observations some other groups characterized these as blue shifting hydrogen bonds having substantially smaller interaction energies.24 However, such H-bonds may still be important for understanding ligand receptor interactions and hence in structure based drug design also. We have therefore attempted to estimate the strengths of the different types of H-bonds from our interaction energy values. The selected base pairs have a number of H-bonds of wide variety, namely 20 N-H‚‚‚O type, 29 N-H‚‚‚N type, 4 O-H‚ ‚‚N type, 10 C-H‚‚‚O type, and 5 C-H‚‚‚N type, and all of the base pair interaction energies are at least the sum of two such energy components. i.e.
Eiint ) nNHO ENHO + nNHN ENHN + nOHN EOHN + nCHO ECHO + i i i i ECHN + δi (3) nCHN i where nNHO , nNHN , nCHO , and nCHN are number of hydrogen i i i i
3792 J. Phys. Chem. B, Vol. 112, No. 12, 2008
Roy et al.
TABLE 4: Energy Components (in kcal/mol) Due to Formation of Different Types of H-Bonds type of hydrogen bond NH‚‚‚O NH‚‚‚N OH‚‚‚N CH‚‚‚O CH‚‚‚N
energy components
energy components fitting eq 6a
-7.12 -5.72 -7.24 -1.67 -0.57
-7.04 -5.64 -7.16 -1.64 -0.49
a The constant term describing nonspecific interaction, as obtained from fit, is -0.16 kca/mol.
bonds of the corresponding type in a base pair and ENHO, ENHN, etc. are the average interaction energies in forming a H-bond of that type. In most cases nXi , where X is N-H‚‚‚O, N-H‚‚‚N, etc., are one or zero while some base pairs have two N-H‚‚‚O or N-H‚‚‚N types of hydrogen bonds. As for example, in case of G:C W:W C base pair, nNHO is 2, NNHN is 1, while nOHN, nCHO, and nCHN are zero. In order to determine the value of the energy components, viz., ENHO, ENHN, etc, we have followed a least-squares fit like procedure, whereby minimizing Σδi2 (eq 3) by optimizing it with respect to ENHO, ENHN, etc. Thus we differentiate eq 4 by ENHO, ENHN, EOHN, ECHO, and ECHN and equate those to zero. This gives us a set of five linear simultaneous equations as shown in eq 5.
ENHO - nNHN ENHN - nOHN EOHN ∑i δi2 ) ∑i (Eiint - nNHO i i i ECHO - nCHN ECHN)2 (4) nCHO i i )2 + ENHN ∑ nNHO nNHN + ∑i (nNHO i i i i nOHN + ECHO ∑ nNHO nCHO + EOHN ∑ nNHO i i i i i i nCHN ) ∑ nNHO Eiint (5a) ECHN ∑ nNHO i i i i i ENHO ∑ nNHO nNHN + ENHN ∑ (nNHN )2 + i i i i i nOHN + ECHO ∑ nNHN nCHO + EOHN ∑ nNHN i i i i i i nCHN ) ∑ nNHN Eiint (5b) ECHN ∑ nNHN i i i i i ENHO ∑ nNHO nOHN + ENHN ∑ nNHN nOHN + i i i i i i )2 + ECHO ∑ nCHO nOHN + EOHN ∑ (nOHN i i i i i nOHN ) ∑ nOHN Eiint (5c) ECHN ∑ nCHN i i i i i ENHO ∑ nNHO nCHO + ENHN ∑ nNHN nCHO + i i i i i i nOHN + ECHO ∑ (nCHO )2 + EOHN ∑ nCHO i i i i i CHN CHO i ECHN ∑ nCHO n ) n Eint (5d) ∑ i i i i i nCHN + ENHN ∑ nNHN nCHN + ENHO ∑ nNHO i i i i i i nOHN + ECHO ∑ nCHO nCHN + EOHN ∑ nCHN i i i i i i ECHN ∑ (nCHN )2 ) ∑ nCHN Eiint (5e) i i i i
ENHO
We have solved the above simultaneous equations (eq 5) by GNU-octave (version 2.1.33) using Eiint values from Table 2
Figure 2. Correlation between interaction energy values calculated by combined DFT/HF method and those obtained from solving eq 5 are shown here.
and obtained the values of energy components as listed in Table 5. These energy component values can also be used to recalculate the total interaction energy values and these are highly correlated to the original data with correlation coefficient ) 0.82 (as shown in Figure 2). It is found from Table 4 that the different types of H-bonds lead to different amount of interaction energies and classification of H-bonds just into two types, viz. polar and nonpolar, is not sufficient to explain the varieties. It is found that N-H‚‚‚O and N-H‚‚‚N interactions are significantly distinct while O-H‚‚‚N interactions are strongest among these. Earlier reports24,25 on C-H‚‚‚O type interactions using the MP2/aug-cc-pTZV method are also in agreement with our results as their reported limiting value of interaction is about -2 kcal/mol as compared to our average value of -0.88 kcal/mol. Furthermore, it is expected that a base pair stabilized by two N-H‚‚‚O bonds would be significantly more stable (by about -4 kcal/mol) than a base pairs stabilized by two N-H‚‚‚N H-bonds. We have also attempted to fit an equation
Eiint ) nNHO ENHO + nNHN ENHN + nOHN EOHN + nCHO ECHO + i i i i ECHN + K + δi (6) nCHN i using the same method. The fitted data for ENHO, ENHN, EOHN, ECHO, and ECHN along with K are given in Table 4. The value of K is very small (-0.16 kcal/mol) and the new set of interaction energy components gives almost identical correlation coefficient. The small value of K suggests that all of the base pairs might have a small base independent attraction. Thus, it appears that the bases do not have any default nonspecific attractive interaction. Analysis of the Vibrational Frequency of Base Pairs. Infrared (IR) spectroscopy is one of the important tools for the detection of hydrogen bonds. Hydrogen bond formation is most unambiguously determined by analysis of alteration of IR frequency and intensity upon complexation and such changes have been found to be in very good agreement with the theoretically derived spectra. We have calculated the theoretical IR spectra of the base pairs as well as for four methylated bases by numerical Hessian method. Infrared frequencies for the amino N-H bond stretching are in excellent agreement to those measured by different experiments. The calculated values are, however, systematically larger than the experimental values.25,26
Quantum Chemical Studies of Base Pairs
J. Phys. Chem. B, Vol. 112, No. 12, 2008 3793
TABLE 5: Calculation of Dynamics of the Base Pairs along the Five Intra-Base-Pair Parameter Directionsa no.
base pair
buckle
1 2 3 4 5 6 7 8 9 10 12 13 14 15 18 21 22 24 25 26 30 31 33
G:C W:W C A:U W:W C G:U W:W C A:G H:S T A:G s:s T A:U H:W T A:A H:H T G:A W:W C G:A S:W T A:A W:W T A:U W:W T A:A H:W T A:U H:W C A:G w:s C G:G S:S T A:C w:w C AU s:w C G:G H:W T A:A w:w C U:U h:w T A:C W:W T C:U W:W T C:C w:h C
43.19; 82.5 29.84; 38.0 32.04; 43.7 20.49; 28.9 12.47; 12.7 25.0; 24.4 20.06; 24.4 16.88; 21.4 14.59;16.8 19.33; 28.6 19.71; 27.3 22.86; 35.1 24.63; 23.6 18.78; 27.0 26.98; 30.2 40.6; 71.6 13.94; 15.1 20.07; 29.9 45.75; 83.3 53.32; 127.2 37.90; 45.7 47.71; 99.3
open angle
60.09; 22.3 38.22; 119.5 24.81; 56.2 48.24; 120.0 44.34; 139.9 37.72; 133.0 59.6; 306.5 64.76; 335.7 48.04; 181.6 65.49; 221.5 30.87; 94.4 36.28; 131.1 110.2; 672.4 40.9; 157.2 34.97; 111.2 93.39; 488.1 68.3; 288.3 119.9; 592.4 57.61; 177.5
propeller
stagger
31.00; 26.2 38.74; 30.6 31.55; 32.5 38.22; 40.2
73.36; 3.18 73.38; 3.0 73.57; 3.0 64.26; 2.5 44.3; 0.97 67.39; 3.0 75.3; 2.8 59.74; 1.9 37.72; 0.78 42.07; 1.1 69.77; 2.9 48.04; 1.2 74.4; 3.2 41.5; 1.0 36.28; 0.7 39.01; 0.8 73.17; 3.0 52.96; 1.8 44.08; 1.1 67.86; 2.9 53.32; 1.6 68.67; 2.8 65.53; 2.1
33.72; 23.2 29.45; 23.1 25.27; 17.6 26.37; 11.8 28.16; 15.8 35.65; 22.7 30.13; 22.9 36.61; 27.3 23.22; 16.0 24.2; 19.7 27.51; 29.6 31.71; 24.7 33.67; 45.6 15.89; 10.9 30.83; 15.3 34.56; 24.6 22.2; 9.5 39.41; 41.4
Shear 100.8; 6.0
93.57; 5.5 60.79; 2.0 74.96; 3.0 101.49; 5.9 97.32; 5.6 107.3; 7.5 100.8; 6.6 61.43; 2.1 102.5; 5.5 65.92; 2.38 70.2; 3.3 82.00; 3.7 75.89; 3.5 95.81; 6.1
a The sixth parameter, stretch, remains static in all the base pairs. The two parameters in each cell represent the (a) normal-mode frequency (cm-1) and (b) force constant to the corresponding vibration (k) (kcal/mol for rotational parameters or kcal/mol/Å2 for translational parameters), respectively.
In case of base pairs stabilized by NH‚‚‚O/N type (Supporting Information) hydrogen-bonding red shift of the N-H stretching is accompanied by sharp increase of the I/Io ratio by at least a factor of 10 where Io is the intensity of a normal mode vibration in the monomer and I is the intensity of the same vibration in the complex. Often intensity of one of the N-H bond vibration increases 50-60 times as compared to the same vibration of the isolated base. The A:G H:S: T base pair, which is the most frequently observed noncanonical base pair, does not show any strong H-bond with high intensity change. This is also reflected in the rather weak interaction energy of this base pair with only two N-H‚‚‚N hydrogen bonds and large propeller twist of its optimized structure. These H-bond distances fall within the normal H-bond distance range and D-H‚‚‚A angle differ slightly from linearity (Supporting Information). The nonpolar H-bonds in base pairs stabilized by C-H‚‚‚O/N (Supporting Information) mediated H-bonds however, show a blue shift of C-H stretching, and the I/Io ratios generally decrease on hydrogen bonding. Although, C-H stretching of U:U h:w T base pair undergo a red shift of 45 cm-1 and I/Io ratio increase by a factor of 385.7 but it is not significant as the intensity of the C-H stretching in the isolated bases was too small and we consider it a blue shifting H-bond. These H-bonds are rather elongated and D-H-A angles also differ significantly from linearity, as expected. Base Pair Opening Flexibility. Base pair opening involves breakage of hydrogen bonds holding the bases in a pair and movement of at least one base out of a helical stack. It is an important part of many biochemical transformations like replication, transcription, and enzyme catalyzed DNA modifications and in the case of selective methylation. Since both canonical and noncanonical base pairs are held within the double helix by hydrogen bonding as well as base-stacking interactions, base pair opening requires a much higher activation energy and it is an energetically unfavorable process. To date many groups have investigated the feasibility of different base pair openings in term of energy from both theoretical and experimental studies.27-30 In this work we have calculated the opening and H-bonding dynamics assuming that the vibration of a base pair
is completely harmonic. We have estimated the intra-base-pair parameters buckle, open-angle, stagger, shear, propeller, and stretch at the two extreme points of the normal mode vibration and tried to realize the feasibility of different base pair openings from the differences in the parameters at these two limits. These analyses were done for the first two types of base pairs, 23 in number. As the third category involves H-bonding with the sugar moieties, their dynamics are largely dominated by torsional flexibility and sugar pucker variations, which could be nonharmonic as well. Moreover, their complete characterization requires comparison of sugars of C2′-endo geometry or deoxyribose sugars, which are beyond the scope of the present study. We know that most molecules including the base pairs have 3N-6 degrees of freedom and one expects to determine vibrational frequencies about each of these degrees of freedoms by normal-mode analysis. Changes in frequencies about some X-H bonds are indicative of H-bonding, which have already been discussed. When two bases are loosely bound to each other by H-bonds, there would be some vibrations and rotations about these loose bonds also. In order to detect frequencies of these vibrations we have generated coordinates of all atoms of a base pair, which undergoes modification due to such vibration. The amplitude of vibration (A) in the Cartesian coordinate system corresponding to each frequency are used to obtain the altered geometry of the base pair Ximax ) Xio{1 + A sin(90)} and Ximin ) Xio{1 + A sin(-90)} (7)
where Xio are position vectors for the ith atom of a base pair in the optimized geometry. We calculated all of the intra-basepair parameters from these pairs of structures (Ximin and Ximax) corresponding to each frequency and estimated differences between the parameters. Thus, in principle, we obtain 3N - 6 sets of differences in base pair parameters. All of the differences are negligible for high-frequency motions (above 3000 cm-1) where only C-H or N-H stretching vibrations take place. The differences in base pair parameters are often significant corresponding to frequencies in the 1000-2000 cm-1 region as major C-C and C-N bond vibrations occur in these regions, which
3794 J. Phys. Chem. B, Vol. 112, No. 12, 2008
Roy et al.
also can affect calculation of shear, open-angle, etc. As for example a change in bond angle C2 (Py)-N1-C1′ causes large differences in base pair parameters shear or open-angle; hence, we do not consider those as intra-base-pair motion. Large differences in parameters in the low-frequency region generally correspond to inter base motion. Sometimes these are coupled to two or more base motions also, such as coupled open-stagger movement in the ninth base pair, G:A S:W T, A:A H:W T (12th) or G:G S:S T (15th) base pairs. Table 5 gives a list of such changes in base pair parameters and their corresponding frequencies, only when the changes are significantly large and they are not coupled to any intramolecular vibration such as bond length stretching, angle bending, etc. We have selected only those motions for which a rotational parameter, viz., propeller, buckle, or open angle, alters by 4° or more or for which a translational parameter, such as shear, stagger, or stretch, changes by 0.3 Å or more. Thus, we can say now that a G:C W:W C base pair undergoes a motion along the buckle with a frequency of 43.2 cm-1, along the propeller with a frequency of 31.0 cm-1, and along the stagger with a frequency of 73.4 cm-1. We have further calculated force constants from the frequency values as these are related through the equation ω ) (k/µ)1/2 ) 2πcυ j where υ j are the frequencies in cm-1, c is the velocity of light, k is the force constant, and µ is the reduced mass of the base pair at the respective frequency. We have used the reduced mass µ calculated by GAMESS for calculating the force constant corresponding to all of the translational motion, namely stagger and shear. We have used reduced moments of inertia for calculating force constants corresponding to rotational motions about propeller, buckle, and open angles. In order to calculate a reduced moment of inertia of a base pair about a rotational axis, for example Z h m along which propeller motion takes place,6 we have oriented a base pair such that its mean axes Xm, Ym, and Zm are along the laboratory reference frame X, Y, and Z. We have then calculated moments of inertia of both the bases about Zm axis, for example, as N1
I1zz )
(mixi2 + miyi2) ∑ i)1
N2
and I2zz )
(mixi2 + miyi2) ∑ i)1
(8)
where N1 and N2 are the number of atoms in the first base and second base respectively. The reduced moment of inertia I is calculated as I ) I1I2/(I1 + I2). Finally, force constants corresponding to the rotational degree of freedom are calculated as
k ) 4π2c2υ j 2I
(9)
These force constants values would be useful for molecular dynamics simulations of RNA treating each base as a mass, in a lattice model type simulation. We can also use these force constants (k) to predict the probability distribution F(R) for different values of a parameter through
F(R)/F(Ro) ) e-1/2k(R-Ro2/kBT
(10)
where R0 is the equilibrium value of a parameter R, kB is Boltzman’s constant, and T is the absolute temperature. The half-width at half-maximum from this probability distribution corresponds to standard deviation (σ), which were also calculated earlier from X-ray crystallographic data for some of the base pairs.6 Hence we have measured the intrinsic standard deviations (σcalc) as
σcalc ) x2 ln 2 kBT/k
(11)
Figure 3. Correlation between calculated and measured standard deviations for (a) Buckle and Propeller and (b) Stagger for different base pairs. Filled circles represent values of Buckle and those of Propeller are represented by filled squares in panel a.
The calculated values of standard deviations of different base pair parameters for the 23 base pairs (Table 5) are in excellent agreement to those reported earlier for some of the base pairs from analysis of 145 functional RNA crystal structures as shown in Figure 3. The σcalc values for open are quite small compared to those for propeller or buckle. Same trend is also seen for base pairs in crystal structures. We did not notice any movement about the stretch directions for any base pair and hence we can speculate their values as zero. Very small values of σexpt for stretch of all the base pairs were noticed earlier from analysis of RNA crystal structures.6 It appears from Table 5 that buckle, propeller, and stagger motions are most feasible for all types of base pairs, and these motions occur mainly at the lower frequency region. This may indicate that by changing buckle, propeller and stagger, the energy of a base pair does not alter significantly while it is always difficult to pull two bases from each other along the stretch direction, irrespective of polarity of hydrogen bonds and type of base pair. Furthermore, previous studies16 using energy calculation for base pairs having different H-bond lengths indicated a rather steep and nonharmonic energy landscape. This also indicates that movement of a base pair in stretch direction is a costly affair. The measured values of standard deviations of open-angles from RNA crystal structures are generally small and our calculated σ’s for open-angles also follow the same trend. Another hindered motion is that about open angle, as a change in open angle always causes alteration of the H-bond geometry. The calculated values of standard deviations σcalc for
Quantum Chemical Studies of Base Pairs buckle; stagger and propeller are in general with close agreement with the corresponding experimental values. As for example σcalc’s for the buckle for G:C W:W C, A:U W:W C, G:U W:W C, and A:A H:W T are small in both sets of data. On the other hand σcalc as well as σexpt for the buckle of A:G s:s T, A:G W:S T are large. These agreements are reflected in large correlation coefficients (R ) 0.76 for buckle, 0.52 for propeller and 0.72 for stagger) for these parameters (Figure 3). In general our σcalc’s are always smaller than σexpt by a factor of 2. This regular difference may be due to several effects: (i) many of the base pairs in the crystal are positioned in a stacked geometry with attached sugar, (ii) most of the base pairs in RNA macromolecular environment have different nonlocal interactions, such as base triple formation, perturbing their structures and giving rise to more variability, (iii) it was shown earlier that the theoretically calculated IR frequencies are generally systematically larger than corresponding frequencies from IR experiments,26 giving rise to smaller σcalc’s, and (iv) the crystal data analysis was done earlier by considering all available structures better than 3.5 Å resolution and even considering homology related structures. This might have introduced some artificial extra structural variability to the base pairs. It is expected that a fresh analysis of structural parameters from a nonredundant set of RNA crystal structures would possibly give a better agreement of the intrinsic σcalc values with observed σexpt values for most of the base pairs. Considering the intrinsic standard deviations and forceconstants, we are tempted to speculate that during base pair opening, i.e., melting of DNA or RNA, the base pairs may first undergo a transformation to increase propeller, buckle, or stagger and subsequently increase open angle or shear and lastly the hydrogen bonds break to increase separation between the two involved bases. Conclusion Most of the earlier analysis of noncanonical base pairs considered those stabilized by polar or nonpolar H-bonds with equal importance. We find from most of the analysis techniques that polar H-bonded base pairs are significantly more stable than those having nonpolar, such as C-H‚‚‚N or C-H‚‚‚O, interactions. The interaction energy data indicate that in general all base pairs having two polar H-bonds are highly stable and their role in prediction of RNA secondary structure needs to be revisited. We also noticed that energy contributions due to three types of polar H-bonds, namely N-H‚‚‚O, O-H‚‚‚N, and N-H‚ ‚‚N, are notably different while energy contribution due to nonpolar interactions are insignificant as found by both the types of quantum chemical methods adopted. There are possibilities of several other types of H-bonds, such as O-H‚‚‚O, which were not found in the frequently occurring RNA base pairs. A thorough analysis of all possible H-bond interactions would therefore require a consorted effort from several laboratories considering different biochemical systems. Due to the rather flat energy landscape of the N-methyl rotations, we often obtained an imaginary frequency about these rotational motions in our Hessian calculations. Hence we could not attempt to calculate the free energy of formation of these base pairs. Possibly by considering the sugar moieties such motions would be more restricted and hence ∆G calculations would be feasible, which could be addressed later. We have for the first time analyzed base pair opening dynamics of different canonical as well as noncanonical base pairs using normal-mode analysis, followed by calculation of base pair parameters. Our calculated intrinsic flexibility of the
J. Phys. Chem. B, Vol. 112, No. 12, 2008 3795 base pairs is in excellent agreement with the experimental data for which large numbers of base pairs were found in RNA crystal structures. The calculated flexibilities, for which sufficient data is not available in the X-ray crystal structure database, thus can be useful for modeling RNA threedimensional structures. There are, of course, some more base pairs where H-bonds take place between a base and a sugar, or where water-mediated H-bonds play an important role. Base pairs having a H-bond involving O2′ require more elaborate study considering dynamics about glycosidic torsion and sugar pucker. These on going works will be presented elsewhere. We have seen that some base pair’s structure and interaction energies are affected by the environment to some extent but occasionally isolated base pairs (in crystal environment) also deform considerably upon geometry optimization. This possibly indicates that crystal environments have relatively less importance in the formation and structure of a base pair. These analyses of the remaining base pairs are beyond the scope of the present study as it involves various torsional and puckering motions. Acknowledgment. Part of this work was financially supported by Department of Biotechnology, Government of India Grant No. BT/PR5528/BRB/10/436/2004. We are thankful to Sudipta Samanta and Jaydeb Chakraborti of S. N. Bose Centre for Basic Science, Kolkata, and Abhijit Mitra of International Institute of Information Technology, Hyderabad, for discussion. Supporting Information Available: Additional tables. This material is available free of charge via the Internet at http:// pubs.acs.org. References and Notes (1) Yang, H.; Jossinet, F.; Leontis, N.; Chen, L.; Westbrook, J.; Berman, H.; Westhof, H. Nucleic Acids Res. 2003, 31, 3450-3460. (2) Das, J.; Mukherjee, S.; Mitra, A.; Bhattacharyya, D. J. Biomol. Struct. Dyn. 2006, 24, 149-161. (3) Leontis, N. B.; Westhof, E. Q. ReV. Biophys. 1998, 31, 399-455. (4) Leontis, N. B.; Stombaugh, J.; Westhof, E. Nucleic Acids Res. 2002, 30, 3497-3531. (5) Hendix, D. K.; Brenner, S. E.; Holbrook, S. R. Q. ReV. Biophys. 2005, 38, 221-243. (6) Mukherjee, S.; Bansal, M.; Bhattacharyya, D. J. Comput.-Aided Mol. Des. 2006, 20, 629-645. (7) Xiang-Jun, L.; Wilma, K. Olson, Nucleic Acids Res. 2003, 31, 5108-5121. (8) Sˇ poner, J.; Jurecka, P.; Hobza, P. J. Am. Chem. Soc. 2004, 126, 10142-10151. (9) (a) Sˇ poner, J. E.; Sˇ pacˇkova, N.; Kulha´nek, P.; Leszczynski, J.; Sˇ poner, J. J. Phys. Chem. A 2005, 109, 2292-2301. (b) Sˇ poner, J. E.; Sˇ packova, N.; Leszczynski, J.; Sˇ poner, J. J. Phys. Chem. B 2005, 109, 11399-11410. (c) Sponer, J. E.; Reblova, K.; Mokdad, A.; Sychrovsky, V.; Leszczynski, J.; Sponer, J. J. Phys. Chem. B 2007, 111, 9153-9164. (10) Bhattacharyya, D.; Koripella, S. C.; Mitra, A.; Rajendran, V. B.; Sinha, B. J. Biosci. 2007 32, 809-825. (11) Schaffenaar, G.; Noordik, J. H. J. Comput-Aided Mol. Des. 2000, 14, 123-134. (12) Schmidt, M. W.; Baldride, K. K.; Boatz, J. A.; Elbert, S. T.; Gordon, M. S.; Jensen, J. J.; Koseki, S.; Matsunaga, N.; Nguyen, K. A.; Su, S.; Windus, T. L.; Dupuis, M.; Montgomery, J. A. J. Comput. Chem. 1993, 14, 1347-1363. (13) (a) Becke, A. D. J. Chem. Phys. 1993, 98, 5648-5652. (b) Lee, C.; Yang, W.; Parr, R. G. Phys. ReV. B 1988, 37, 785-789. (14) Kitaura, K.; Morokuma, K. Int. J. Quantum. Chem. 1976, 10, 325340. (15) Mo, Y. J. Mol. Model. 2006, 12, 665-672. (16) Dickerson, R. E.; et al. Nucleic Acids Res. 1989, 17, 1797-1803. (17) (a) Sˇ poner, J.; Florian, J.; Hobza, P.; Leszczynski, J. J. Biomol. Struct. Dyn. 1996, 13, 827- 833. (b) Gould, I. R.; Kollman, P. A. J. Am. Chem. Soc. 1994, 116, 2493-2499. (c) Basu, S.; Majumder, R.; Das, G. K.; Bhattacharyya, D. Indian J. Biochem. Biophys. 2005, 42, 378-385. (18) Brameld, K.; Dasgupta, S.; Goddard, W. A., III. J. Phys. Chem. B 1997, 101, 4851-4859.
3796 J. Phys. Chem. B, Vol. 112, No. 12, 2008 (19) (a) Ogawa, T.; Kurita, N.; Kurita, H.; Sekino, H.; Kitao, O.; Tanaka, S. Chem. Phys. Lett. 2003, 374, 271-278. (b) Danilov, V. I.; Anisimov, V. M. J. Biomol. Struct. Dyn. 2005, 22, 471-482. (20) Mukherjee, S.; Majumder, S.; Bhattacharyya, D. J. Phys. Chem. B 2005, 109, 10484-10492. (21) Sen, K.; Basu, S.; Bhattacharyya, D. Int. J. Quant. Chem. 2006, 106, 913-927. (22) Sponer, J. E.; Leszezynski, J.; Sychrovsky, V.; Sponer, J. J. Phys. Chem. 2005, 109, 18680- 18689. (23) Desiraju, G.; Steiner, T. The Weak Hydrogen Bond in Structural Chemistry and Biology; Oxford University Press: New York, 1999. (24) Vargas, R.; Garza, J. D. A. Dixon.; Hay, B. P. J. Am. Chem. Soc. 2000, 122, 4750-4755. (25) (a) Stepanian, S. G.; Sheina, G. G.; Radchenko, E. D.; Blagoi, Y. P. J. Mol. Struct. 1985, 131, 33-346. (b) Sheina, G. G.; Stepanian, S. G.; Radchenko, E. D.; Blagoi, Y. P. J. Mol. Struct. 1987, 158, 275-292. (c) Radchenko, E. D.; Plokhotnichenko, A. M.; Sheina, G. G.; Blagoi, Y. P. Biofizika 1983, 28, 559. (26) Santamaria, R.; Charro, E.; Zacarı´as, A.; Castro, M. J. Comput. Chem. 1999, 20, 511-530. (27) (a) Seibert, E.; Ross, J. B. A.; Osman, R. J. Mol. Biol. 2003, 330, 687-703. (b) Varnai, P.; Lavery, R. J. Am. Chem. Soc. 2002,
Roy et al. 124, 7272-7273. (c) Fuxreiter, M.; Luo, N.; Jedlovszky, P.; Simon, I.; Osman, R. J. Mol. Biol. 2002, 323, 823-834. (d) Giudice, E.; Lavery, R. J. Am. Chem. Soc. 2003, 125, 4998-4999. (e) Giudice, E.; Va´rnai, P.; Lavery, R. Nucleic Acids Res. 2003, 31, 1434-1443. (f) Huang, N.; Banavali, N. K.; Mackerell, A. O., Jr. Proc. Natl. Acad. Sci. U.S.A. 2003, 100, 68-73. (28) (a) Ramstein, J.; Lavery, R. Proc. Natl. Acad. Sci. U.S.A. 1988, 85, 7231-7235. (b) Ramstein, J.; Lavery, R. J. Biomol. Struct. Dyn. 1990, 7, 915-933. (c) Bernet, J.; Zakrzewska, K.; Lavery, R. J. Mol. Struct. (THEOCHEM) 1997, 398-399, 473-482. (d) Chen, Y. Z.; Mohan, V.; Griffey, R. H. Chem. Phys. Lett. 1998, 287, 570-574. (e) Chen, Y. Z.; Mohan, V.; Griffey, R. H. J. Biomol. Struct. Dyn. 1998, 15, 765-777. (29) (a) Briki, F.; Ramstein, J.; Lavery, R.; Genest, D. J. Am. Chem. Soc. 1991, 113, 2490-2493. (b) Briki, F.; Genest, D. J. Biomol. Struct. Dyn. 1993, 11, 43-56. (30) (a) Folta-Stogniew, E.; Russu, I. M. Biochemistry 1994, 33, 1101611024. (b) Leroy, J. L.; Charretier, E.; Kochoyan, M.; Gueron, M. Biochemistry 1988, 27, 8894-8898. (c) Gueron, M.; Leroy, J. L. Methods Enzymol. 1995, 261, 383-413. (d) Wa¨rmla¨nder, S.; Sˇ poner, J. E.; Sˇ poner, J.; Leijon, M. J. Biol. Chem. 2002, 277, 28491-28497.