Accurate Base Pair Energies of Artificially Expanded Genetic

Jul 10, 2019 - Accurate Base Pair Energies of Artificially Expanded Genetic .... of AEGIS purines and anti-conformations of AEGIS pyrimidines (PDF) ...
0 downloads 0 Views 954KB Size
Subscriber access provided by BUFFALO STATE

B: Biophysics; Physical Chemistry of Biological Systems and Biomolecules

Accurate Base Pair Energies of Artificially Expanded Genetic Information Systems (AEGIS): Clues for Their Mutagenic Characteristics Bhagyashree Behera, Priyabrata Das, and Nihar Ranjan Jena J. Phys. Chem. B, Just Accepted Manuscript • DOI: 10.1021/acs.jpcb.9b04653 • Publication Date (Web): 10 Jul 2019 Downloaded from pubs.acs.org on July 22, 2019

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

1

Accurate Base Pair Energies of Artificially Expanded Genetic Information Systems (AEGIS): Clues for Their Mutagenic Characteristics B. Behera1, P. Das1, N.R. Jena1* 1Discipline

of Natural Sciences, Indian Institute of Information Technology, Design and Manufacturing, Jabalpur-482005, India Corresponding Author’s Email ID: [email protected]

Abstract: Recently, several artificial nucleobases, such as B, S, J, V, X, K, P, and Z were proposed to help in the expansion of the genetic information system and diagnosis of diseases. Among these bases, P and Z were identified to form stable DNA and to participate in the replication. However, the stabilities of P:Z and other artificial base pairs are not fully understood. The abilities of these unnatural nucleobases in mispairing with themselves and with natural bases are also not known. Here, the B97X-D dispersion-corrected density functional theoretical and complete basis set (CBS-QB3) methods are used to obtain accurate structural and energetic data related to base pair interactions involving these unnatural nucleobases. The roles of protonation and deprotonation of certain artificial bases in inducing mutations are also studied. It is found that each artificial purine has a complementary artificial pyrimidine, the base pair interactions between which are similar to those of the natural Watson-Crick base pairs. Hence, these base pairs will function naturally and would not impart mutagenicity. Among these base pairs, the J:V complex is found to be the most stable and promising artificial base pair. Remarkably, the non-complementary artificial nucleobases are found to form stable mispairs, which may generate mutagenic products in DNA. Similarly, the misinsertions of natural bases opposite artificial bases are also found to be mutagenic. The mechanisms of these mutations are explained in detail. These results are in agreement with earlier biochemical studies. It is thus expected that this study would aid in the advancement of the synthetic biology to design more robust artificial nucleotides.

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

2

1. Introduction The genetic information system of living cells is enriched with four natural nucleotides, such as G, C, A, and T. The purines (pu) possess two hydrogen bond donors (D), and one hydrogen bond acceptor (A) (puDDA) and the pyrimidines (py) possess two hydrogen bond acceptors and one hydrogen bond donor (pyAAD).1 In past few years, several attempts were made to synthesize unnatural nucleotides (referred to artificially expanded genetic information system (AEGIS)) by rearranging hydrogen bond donor and acceptor groups in pu and py.2-18 The main aim of these studies was to create artificial nucleotides that can perform normal cellular activities, such as replication, translation, and generate stable duplex DNA. Some of these artificial nucleotides were found to be useful for diagnosis of diseases.18 For example, 2’-deoxyisoguanosine (B) and 2’-deoxyisocytidine (S) were incorporated into FDA approved diagnostic tools for the diagnosis of respiratory diseases caused by viruses, measure the load of HIV and hepatitis viruses in blood, and detect mutations responsible for cystic fibrosis.18 Other than these, several other nucleotides, such as J, V, K, X, P, and Z (Figure 1) were also synthesized by the Benner group.2-18 However, these first generation AEGIS nucleotides could not be replicated by high fidelity polymerases. This led to the synthesis of second generation AEGIS nucleotides2-18 by mainly modifying the Hoogsteen faces of the first generation nucleotides excluding J and P (Figure 1). Although several hydrophobic nucleotides were designed to participate in the replication reactions, these nucleotides do not make hydrogen bonding base pair interactions and hence cannot produce stable duplex DNA.2226

Among the second generation AEGIS nucleotides, occurrences of imidazo[1,2-a]-1,3,5triazin-4(8)one (P) and 2-amino-3-nitropyridin-6-one (Z) in DNA were found to stabilize both the B- and A-forms of DNAs.14 These nucleobases were also found to be recognized

ACS Paragon Plus Environment

Page 2 of 30

Page 3 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

3

by DNA polymerases.15 In addition to these, a short sequence of DNA containing P and Z (GACTZP) was found to be recognised by breast cancer cells,16 liver cancer cells,17 and protective antigen from Bacillus anthracis18. However, although Z is the preferred base pair partner of P, insertions of C and G opposite P and Z by DNA polymerases were also obtained in low and high pH conditions respectively.27 These misinsertions may lead to mutations in DNA. As base pair mutations are carcinogenic, it is imperative to understand the stabilities of different bases pairs formed by P and Z with natural and artificial nucleobases in DNA.

Figure 1: Structures of first and second generation AEGIS nucleotides.15-18 Here R stands for the glycosidic bond.

In order to understand the mutagenic characteristics of other artificial bases, in vitro and in vivo replications of imidazo[1,2-a]-1,3,5-triazine-2(8H)-4(3H)-dione (X) and 2,4diaminopyrimidine (K) by E. Coli Polymerase I were investigated by Benner and co-

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

4

workers.28 It was found that the error-free primer extension reaction of the DNA containing K in the template strand can only proceed when it is paired with X.28 However, no such restriction was found for the extension of the primer containing X in the template strand.28 This indicates that the DNA polymerase I may tolerate base pairs formed between X and natural DNA bases. Remarkably, when consecutive Ks (-KK-) or Xs (-XX-) were kept in the template strand; a little full length extension of primer was noticed presumably due to the formations of severe mismatches in DNA.28 Interestingly, in the similar situation, full length primer extension reactions were observed for Z, P, S, and B.18 These studies clearly indicate that the existence of second generation AEGIS nucleotides in DNA may lead to mutations in cells. It was suggested that the second generation AEGIS nucleotides have complementary nucleotides with which it can pair (e.g. B: S, J: V, X: K, and P: Z) in the duplex DNA. The base pairing patterns of these complexes would be similar to the Watson-Crick base pairs and hence these nucleotides may not be mutagenic. However, although a DNA melting study was conducted to know the stability of P:Z in a short DNA sequence,27 stabilities of other AEGIS base pairs in DNA are not yet known experimentally. Other than this, no data is available to show if AEGIS nucleobases can pair with their non-complementary counterparts in DNA. In order to examine the effects of base pair interactions between second generation AEGIS nucleotides and natural DNA bases, interactions between each of B, S, J, V, X, K, P, Z and all the natural DNA bases (G, C, A, and T) are carried out here. To understand the role of certain protonated unnatural base pairs in mutagenesis,27 kinetical and thermodynamical effects of these base pairs are also studied. As the Nglycosidic bond rotation in the case of oxidatively damaged guanine lesions in DNA,28-35 was proposed to be one of the main reasons of mutations in DNA, base pair interactions between AEGIS and natural nucleobases are carried out by considering both the anti- and

ACS Paragon Plus Environment

Page 4 of 30

Page 5 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

5

syn-conformations of the pu (anti-pu:anti-py, and syn-pu:anti-py). As the electrostatic repulsion between O2 (base) and O4’ (sugar) atoms can destabilize the syn-conformation of a py in DNA, base pair interactions involving syn-py (i.e. anti-pu:syn-py, and synpu:anti-py) are not studied here. The binding energies of different base pairs formed between AEGIS nucleobases and their non-complementary partners are also determined to unravel stabilities of these base pairs. 2. Computational Methodology The isolated second generation AEGIS nucleotides, such as B, S, J, V, X, K, P, and Z, were optimized by using the B97X-D dispersion-corrected density functional theoretical (DFT) method36-38 and 6-31+G* basis set in the aqueous medium. The integral equation formalism of the polarized continuum model (IEFPCM)39-41 related to the self-consistent reaction field (SCRF) theory was used to model the aqueous medium. Subsequently, base pair interactions of anti-pu and syn-pu with anti-py were carried out by considering both the unnatural and natural nucleobases. It should be mentioned that in an anti-pu:anti-py complex, base pair interactions between a pu and a py are accomplished by involving their Watson-Crick faces and in a syn-pu:anti-py base pair, they interact by involving their Hoogsteen faces (Scheme 1). As we have not considered the N-glycosidic bond, different base pairs in the anti- and syn-conformations were generated by considering Watson-Crick and Hoogsteen faces interactions respectively.29-33

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

6

Scheme 1: Structures of (a) G:C and (b) A:T base pairs in the anti- and syn-conformations. The atomic numbering scheme of pu (G and A) and py (C and T) are also shown.

To unravel the possibilities of formations of protonated K:X and P:Z base pairs, proton transfer reactions from X to K and Z to P were studied by locating the corresponding transition states by using the B97X-D/6-31+G* level of theory and the IEFPCM method. Vibrational analyses were carried out to ensure that all the optimized base pair complexes had real vibrational frequencies and all the transition states had a single imaginary vibrational frequency corresponding to the proton transfer. To obtain more reliable energies, the B97X-D/AUG-cc-pVDZ and MP2/AUG-cc-pVDZ levels of theory were used for single-point-energy calculations. As B3LYP method was extensively used to understand different reactions,42-45 B3LYP/AUG-cc-pVDZ level of theory was also used to compute single point barrier energies involved in the proton transfer reaction. Zeropoint energy-corrections obtained at the B97X-D/6-31+G* level of theory were considered to be valid for all single-point energy calculations. To obtain even more

ACS Paragon Plus Environment

Page 6 of 30

Page 7 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

7

accurate base pair and barrier energies, the complete basis set (CBS-QB3) method46 was also used for all the base pair complexes involving the anti-conformation. As normally a pu binds with a py by employing the anti-conformation in the duplex DNA and CBS-QB3 calculations are computationally very expensive, these calculations were not performed for all complexes involving the syn-conformations. However, in certain cases where the stabilities of base pairs involving the syn-conformation is comparable to the corresponding anti-conformation, the CBS-QB3 method was used to compute binding energies of synpu:anti-py base pairs. For simplicity, the anti-pu:anti-py base pair will now be termed as pu:py complexes and the syn-pu:anti-py base pairs will be termed as syn-pu:py complexes.

Equation (1) was used to calculate the zero-point energy-corrected binding energy (BE) between the A:B base pair complex. BE= EA:B –(EA+EB)------------- (1) where EA:B is the zero-point energy-corrected total energy of the A:B complex and EA and EB are the zero-point energy-corrected total energies of isolated bases. The G09 program47 was used for all computations and structures of all the complexes were visualized by using the Gauss View 5.0 program48. From the binding energy data it is clear that the MP2/AUG-cc-pVDZ results (Tables S1-S5, Supporting Information) differ significantly from the CBS-QB3 results. However, the B97X-D/AUG-cc-pVDZ results reasonably agree with the CBS-QB3 results. In an earlier study, B97X-D/AUG-cc-pVDZ results were also shown to qualitatively match with the CCSD(T) results.20 As the CBS-QB3 method includes several steps, such as (1) optimization and frequency calculations by using the B3LYP/CBSB7 level of theory, (2) single-point

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

8

energy calculations at the CCSD(T)/6-31+G* and MP4SDQ/CBSB4 levels of theory, (3) extrapolation of the total energy to the infinite basis set limit by using pair natural-orbital energies obtained at the MP2/CBSB3 level of theory including an additive correction to the CCSD(T) level, and (4) a correction for spin contamination for open-shell systems,49 the results obtained at this method are more rigorous and accurate than those obtained at the B97X-D/AUG-cc-pVDZ level of theory. Hence the results obtained at the CBS-QB3 method will mainly be discussed here. 3. Results and Discussion 3.1 Base Pair Interactions Involving B and S Some of the important base pairs involving the B and S and their relative binding energies are depicted in Figure 2. For the sake of comparison, structures of G:C, A:T, and G:T base pairs are also illustrated in this Figure (Figure 2a-c). These structures are obtained by considering the anti-conformations of pu (B, G, and A) and py (S, C, and T). Structures of all base pair complexes (anti-pu:anti-py and syn-pu:anti-py) are depicted in Figures S1 and S2 (Supporting Information). The binding energies of all optimized base pair complexes are presented in Table 1. If we compare stabilities of these base pairs, it is clear that the synpu:py complexes are appreciably less stable (except the syn-A:S base pair) than the pu:py complexes and hence their occurrences in DNA would be less likely (Table 1). Remarkably, the Hoogsteen-type binding between syn-A and T is slightly more stable than the WatsonCrick-type binding between A and T (Table 1). This is due to the fact that the syn-A:T complex is stabilized by (1) two strong hydrogen bonds, (2) hydrogen bonding interaction between H8 of syn-A and O2 of T, and (3) attractive dipole-dipole interaction (Scheme 1b)50. However, the occurrence of syn-A:T pair was observed in the Z-DNA51 and anti-parallel DNA52 but not in B-DNA.

ACS Paragon Plus Environment

Page 8 of 30

Page 9 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

9

It is found that the incorporation of S opposite B will form the B:S complex (Figure 2d), which is stabilized by three hydrogen bonds and is structurally similar to that of the G:C complex (Figure 2a, Table 1). However, the B:S complex is about 0.88 kcal/mol more stable than the G:C complex and is about 7.14 kcal/mol more stable than the A:T complex (Figure 2b, Table 1). This indicates that the binding between B and S would stabilize the duplex DNA without perturbing its duplex structure.

Figure 2: Optimized structures of (a) G:C, (b) A:T, (c) G:T and (d-i) different complexes involving B and S in aqueous medium. These structures are obtained by considering anticonformations of pu (G, A, and B) and py (C, T, and S). The relative binding energies (kcal/mol) of these complexes calculated with respect to the B:S complex and hydrogen bond distances (Å) (dotted lines) are also shown.

Interestingly, the insertion of C opposite B is found to generate two different complexes (Figure S1c). In the first complex, C acquires an inverted (Cinv) Watson-Crick conformation (also termed as reverse Watson-Crick) and makes three hydrogen bonds with B (Figure 2e), while in the other complex; it pairs with B by making two hydrogen bonds (Figure S1c). Due to three hydrogen bonds, the B:Cinv complex is slightly more stable than the B:S complex

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

10

(Figure 2d,e, Table 1) but may induce backbone distortions in DNA. Although formations of the inverted complexes were found in RNA53,54 and triple helixes,55 these are very rare in BDNA and may be formed in the case of oxidatively damaged DNA33. Table 1: Zero-point energy-corrected binding energies (kcal/mol) of different base pairs involving B and S in the aqueous medium. Binding energies of G:C, A:T, and G:T are also provided for comparison of stabilities. Base Pairs

anti-pu:anti-py

syn-pu:anti-py

CBS-QB3 B97X-D B97X-D B97X-D B97X-D /6-31+G* /AUG-cc-pVDZ /6-31+G* /AUG-cc-pVDZ G:C -14.91 -15.80 -16.45 -5.00 -5.38 A:T -9.30 -10.16 -10.19 -9.44 -10.23 (-10.52)a G:T -9.99 -10.90 -11.74 -6.03 -6.51 B:S -16.07 -17.15 -17.33 -7.30 -7.71 B:C -8.19 -8.38 -8.53 -6.90 -7.72 B:Cinv -15.95 -17.00 -17.38 B:T -11.15 -12.12 -12.80 -5.06 -5.52 G:S -11.09 -11.58 -11.95 -3.57 -3.93 G:Sinv -15.55 -16.66 -16.62 A:S -8.52 -9.28 -8.86 -8.58 -9.24 (-9.19)a aThe values of binding energies presented in parentheses belong to the CBS-QB3 method. It is further found that the insertion of T opposite B will generate the B:T complex (Figure 2f), which is stabilized by two hydrogen bonds like the G:T complex (Figure 2c). It should be mentioned that in the G:T complex, T binds with G by moving itself toward the major groove and the resulting complex falls under the category of first level Wobble-type complex (Figure 2c).56 A second level of Wobble-type complex is also possible, in which T can move toward the minor groove. However, the second level Wobble-type G:T complex is less stable and is unlikely to be formed in DNA. Interestingly, the B:T complex (Figure 2f) is structurally similar to the second level Wobble-type and is about 1.06 kcal/mol more stable than the G:T. It should be mentioned that although DNA polymerases are evolutionary familiar with the first level of Wobble-type complexes (G:T) involving natural bases, mismatch base pairs containing AEGIS nucleobases are completely unknown for them.56,57 Hence, the processing

ACS Paragon Plus Environment

Page 10 of 30

Page 11 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

11

of mismatch complexes involving B and S in DNA by DNA polymerases would be difficult, which would lead to stalling of polymerase reaction or generation of mutagenic products.56,57 The insertion of G opposite S is found to produce second level Wobble-type (Figure 2g) and inverted Watson-Crick-type (Figure 2h) base pairs. In the inverted G:Sinv base pair, S makes three hydrogen bonds with G, thereby producing a stable complex, which is slightly less stable than the B:S complex (Figure 2d, Table 1). Due to the higher stability, the inverted complex may be formed in DNA and can induce backbone distortions. Similarly, the insertion of A opposite S is found to produce the first level Wobble-type complex (Figure 2i), which is structurally and energetically similar to the G:T complex (Table 1). Remarkably, the syn-A:S Hoogsteen complex (Figure S2d) is found to be stabilized by two strong hydrogen bonds and is slightly (by ~0.33 kcal/mol) more stable than the A:S complex (Table 1). This indicates that A may adopt both the anti- and syn-conformations opposite S and both the binding patterns may lead to mutations. On the basis of these results, it may be proposed that the misinsertions of natural bases opposite B and S may lead to mutations. But, if sufficient amounts of B and S are present in DNA, S will be predominantly inserted opposite B, thereby producing unperturbed duplex DNA. 3.2. Base pair interactions involving J and V The structures of some of the important base pair complexes involving J and V and their relative binding energies are depicted in Figure 3. Structures of all base pairs are depicted in Figures S3 and S4 (Supporting Information). The binding energies of these complexes are presented in Table 2. As can be seen from this Table, all base pairs involving the antipu:anti-py (pu=J, G, and A and py=V, C, and T) base pairs are more stable than those of the syn-pu:anti-py base pairs. Hence the latter complexes would not be formed in DNA.

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

12

Figure 3: Optimized structures of important base pair complexes involving J and V obtained in the aqueous medium. These complexes were obtained by considering the anticonformations of all pu (J, G, and A) and py (V, C, and T). The relative binding energies (kcal/mol) of different complexes calculated with respect to the J:V complex and hydrogen bond distances (Å) (dotted lines) are also shown.

From Figure 3, it is evident that the binding of J with V is stabilized by three hydrogen bonds like the G:C complex and the resulting J:V complex has the non-standard Watson-Crick alignment (Figure 3a). However, it is about 2.44 kcal/mol more stable than the G:C complex and about 1.56 kcal/mol more stable than the B:S complex (Tables 1 and 2). This implies that the J:V complex will be highly stable in DNA. Interestingly, the insertion of C opposite J is found to produce two types of Wobble mismatch complexes (Figure S3c). In one of these complexes, C adopts an inverted conformation (Figure 3b) and makes two strong hydrogen bonds with J, thereby making a base pair complex, which is structurally analogous to the G:T base pair. In another complex, C moves toward the major groove and makes two hydrogen bonds with the NH2 group of J (Figure S3c). This type of complex may affect the structure of DNA and hence may not be formed in DNA. Similarly, the incorporation of T opposite J (Figure 3c) is found to produce second level Wobble-type complex. However, both the J:C and J:T complexes are significantly less stable than the J:V complex (Table 2).

ACS Paragon Plus Environment

Page 12 of 30

Page 13 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

13

Table 2: Zero-point energy-corrected binding energies (kcal/mol) of different base pairs involving J and V in the aqueous medium. Base Pairs

anti-pu:anti-py

J:V J:C J:Cinv J:T G:V GVinv A:V

B97X-D /6-31+G* -17.48 -5.52 -10.05 -8.66 -8.50 -12.36 -9.31

syn-pu:anti-py

B97X-D /AUG-cc-pVDZ -18.61 -5.91 -10.73 -9.21 -8.44 -13.47 -10.10

CBS-QB3 -18.89 -6.45 -10.82 -9.78 -10.14 -14.39 -10.37

B97X-D /6-31+G* -7.02 -7.78 -6.44 -10.79 -9.70

B97X-D /AUG-cc-pVDZ -7.07 -7.96 -6.80 -11.68 -10.44

It is further found that the insertion of G opposite V may generate an inverted Wobble-type (G:Vinv, Figure 3d) and Hoogsteen-type (G:V) (Figure S4a) base pairs. The former base pair is about 4.25 kcal/mol more stable than the latter and about 4.50 kcal/mol less stable than the J:V complex. Similarly, the insertion of A opposite V generated A:V complex (Figure 3e), which is about 8.50 kcal/mol less stable than the J:V complex (Table 2). On the basis of these results, it may be proposed that the insertion of natural bases opposite J and V may induce mutations. However, as the J:V complex is significantly more stable than these mismatches, it is most likely that V would always be inserted opposite J giving rise to unperturbed duplex DNA. 3.3. Base pair interactions involving X and K Structures of the most stable base pairs involving X and K and their relative binding energies are depicted in Figure 4. The structures of all base pairs are illustrated in Figure S5 and Figure S6 (Supporting Information). The binding energies of these base pairs are presented in Table 3. From this Table, it is clear that all the base pairs formed by involving the anticonformations are more stable (except the A:K1 pair) than those involving the synconformations and hence the latter complexes would not be formed in DNA. From Figure 4,

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

14

it is evident that K can pair with X by making three strong hydrogen bonds (Figure 4a). The geometric alignment of the X:K pair is similar to that of the G:C pair. This is in agreement with an X-ray crystallography study involving the GATCXK DNA58. However, the X:K pair is about 3.92 kcal/mol less stable than the G:C complex and about 6.36 kcal/mol less stable than the J:V complex (Tables 1-3). Notably, the three hydrogen bonded X:K base pair is only about 2.34 kcal/mol more stable than the two hydrogen bonded A:T base pair (Tables 1 and 3). However, based on the computation of gas phase free energies, the X:K pair was suggested to be less stable than the A:T pair.58 It was argued that in the X:K pair, the N1 proton of X had transferred to the N3 of K, thereby producing a X(-H+):K(+H+) pair.58 However, we could not find any spontaneous proton transfer in the X:K pair that may give rise to the X(-H+):K(+H+) pair.58 Further, as in the crystal structure, it is not possible to obtain electron densities for H atoms, it is difficult to predict whether the X:K pair was actually crystallized in the ionic form.58

Figure 4: Optimized structures of some important base pair complexes involving X and K obtained in the aqueous medium. These complexes were obtained by considering the anticonformations of all pu (X, X(-H+), G, and A) and py (K, K(+H+), C, and T). The relative binding energies (kcal/mol) calculated with respect to the X:K complex and hydrogen bond distances (Å) (dotted lines) are also shown.

ACS Paragon Plus Environment

Page 14 of 30

Page 15 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

15

Table 3: Zero-point energy-corrected binding energies (kcal/mol) of different base pairs involving X and K in the aqueous medium. Base Pairs

anti-pu:anti-py

syn-pu:anti-py

CBS-QB3 B97X-D B97X-D B97X-D B97X-D /6-31+G* /AUG-cc-pVDZ /6-31+G* /AUG-cc-pVDZ X:K -10.88 -11.72 -12.53 -4.05 -4.14 X:C -8.05 -8.51 -9.08 -5.11 -5.56 X:T1 -7.98 -8.68 -9.57 -5.00 -5.40 X:T2 -9.72 -10.54 -11.03 G:K1 -9.90 -10.07 -10.49 -2.62 -2.61 G:K2 -7.31 -7.42 -7.96 A:K1 -7.09 -7.60 -7.79 -7.21 -7.49 (-7.80)a A:K2 -3.68 -3.72 + + X(-H ):K(+H ) -27.24 -28.65 -29.63 -12.98 -13.55 X(-H+):C -5.28 -5.67 -6.40 -6.10 -6.46(-7.12)a X(-H+):T -8.04 -8.53 -9.65 -6.92 -7.55 G:K(+H+) -10.12 -10.64 -11.27 -14.37 -15.19 (-15.36)a + A:K(+H ) -11.17 -11.91 -11.71 -9.57 -9.94 aThe values of binding energies presented in parentheses belong to the CBS-QB3 method. In order to unravel the possibility of the formation of X(-H+):K(+H+) base pair, the proton transfer reaction from N1 of X to N3 of K was undertaken. To compare the stability of the X(-H+):K(+H+) pair with that of X:K pair, the former base pair was also optimized in the aqueous medium. Subsequently, the base pair interactions of X(-H+) with C and T and base pair interactions of K(+H+) with G and A were carried out. The structures of these complexes are presented in Figure S7 and binding energies are presented in Table 3. As can be found from this Table, the X(-H+):K(+H+) pair (Figure 4b) is about 17.10 kcal/mol more stable than the X:K pair (Figure 4a) and is about 13.18 kcal/mol more stable than the G:C base pair (Tables 1,3). This shows that the X(-H+):K(+H+) pair, which is also stabilized by three hydrogen bonds like the X:K and G:C pairs, would be highly stable in DNA. Interestingly, interactions of X(-H+) with C and T (Figure S7, Supporting Information) produced less stable base pairs compared to the corresponding neutral base pairs. Similarly, interactions of K(+H+) with G and A (Figure S8, Supporting Information) produced slightly more stable

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 30

16

base pairs compared to their neutral counterpart (Table 3). Remarkably, the syn-G:K(+H+) Hoogsteen pair (Figure S8) is found to be about 12.75 kcal/mol more stable than the syn-G:K and about 4 kcal/mol more stable than the G:K(+H+)(Table 3). However, the barrier energy required for the proton transfer from N1 of X to N3 of K is found to be ~5 kcal/mol at the CBS-QB3 method and ~12 kcal/mol at the MP2 method (Table 4) and this reaction is found to be endothermic (Table 4). These results indicate that the formation of X(-H+):K(+H+) would be unfeasible in DNA. It should be mentioned that on the basis of a similar barrier energy, the proton transfer reaction from N1 of G to N3 of C was suggested to be a rare event in DNA.59 However, if the proton transfer reaction would be catalysed by water molecules60 or the environmental factors, such as sequence61,62 or polymerases, and the reaction can be made exothermic, the formation of X(-H+):K(+H+) pair would be feasible in DNA. We therefore, propose that a neutral X:K pair would be formed in DNA and due to its structural and energetic resemblances with G:C and A:T respectively, the DNA Polymerase I could have extended the primer containing K in the presence of dXTP.28 Table 4: The barrier energies (kcal/mol) and endothermicities (kcal/mol) involved in the proton transfer reactions from X to K and Z to P obtained at different levels of theory in the aqueous medium. Method

X to K Barrier energy

B97X-D/6-31+G* B97X-D/AUG-cc-pVDZ B3LYP/AUG-cc-pVDZ MP2/AUG-cc-pVDZ CBS-QB3

5.42 4.05 4.18 11.81 4.58

Z to P

Endothermicity 4.23 3.53 3.25 7.61 4.31

Barrier energy 7.91 6.70 6.88 13.24 6.64

Endothermicity 7.74 7.24 6.83 9.13 7.33

It is further found that the insertion of C opposite X can produce X:C complex (Figure 4c), which is stabilized by two hydrogen bonds. In this base pair, the O2 of X and O2 of C are close enough to repel each other. Interestingly, the insertion of T opposite X produced both

ACS Paragon Plus Environment

Page 17 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

17

first (X:T1, Figure 4d) and second level (X:T2, Figure 4e) Wobble-type complexes. However, the latter complex is about 1.46 kcal/mol more stable than the former. Notably, in spite of the higher stability of X:T2, X:T1 was extended by DNA polymerase I.56,57 It was argued that as the DNA polymerases are evolutionary expert in recognising first level Wobble-type mismatches, X:T1 was only elongated in the polymerase reaction.56,57 This implies that the structural similarity of artificial base pairs with the evolutionary base pairs is an important factor in recognising an artificial nucleotide by DNA polymerases. Interestingly, the insertion of G opposite K is found to produce two complexes (Figure S6). In one of the complexes, G binds with K by making three relatively weak hydrogen bonds and the resulting complex (G:K1) is Watson-Crick-like (Figure 4f). This is more stable than the other complex (G:K2) (Figure S6a). However, the G:K1 base pair is appreciably less stable than the X:K and X:T2 complexes (Table 3, Figure 4). Remarkably, the binding of A and syn-A with K1 is found to produce almost equally stable base pairs (Table 3). The A:K1 (Figure 4g) and syn-A:K1 (Figure S6d) are structurally analogous to the first level Wobbletype and Hoogsteen-type base pairs respectively. This shows that both the A:K1 and syn-A:K1 base pairs may be formed in DNA, which may ultimately induce mutations as was proposed in earlier biochemical study.28 Further, as the G:K1 pair is about 2.70 kcal/mol more stable than the A:K1 pair, it is quite likely that G may also be inserted opposite K in DNA. Similarly, as the X:C pair is more stable than the A:K1 pair, the insertion of C opposite X would also be possible in DNA. These results are in accordance with the earlier biochemical study, where some of the clones containing K:X pair also observed to be converted to C:G pair due to insertions of G and C opposite K and X respectively.28 The conversions of XX and XK sequences to TG and AA respectively also corroborate the insertions of G and A opposite K.28

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

18

On the basis of these results, it can be proposed that K is the preferred base pair partner of X. However, the misincorporation of T opposite X is possible in DNA. Similarly, the misinsertions of A and G opposite K are also possible in DNA. These misinsertions would ultimately lead to the formations of mutagenic base pairs in DNA as schematically shown in Figure 5a. Remarkably, although mutagenic insertions of T and A opposite primers containing a single nucleotide of X and K respectively were tolerated by the Polymerase I (Figure 5a), extension of primers containing two consecutive X (–XX-) and K (–KK-) were stalled by the Polymerase I.28 We propose that this may have occurred because of the dual insertions of T opposite X and A opposite K. As these dual mispairs (-TT- or –AA-) would give rise to heavily distorted and mutagenic DNA, the polymerization reaction could have been stopped (Figure 5a).

Figure 5: Proposed mechanisms of formations of mutagenic and non-mutagenic base pairs in the duplex DNA by polymerases in the (a) absence and (b) presence of repair proteins.

Remarkably, in the presence of a repair protein (DH5a MutS), most of the clones containing a single K:X pair were converted to T:A pair.28 On the basis of binding energies and binding

ACS Paragon Plus Environment

Page 18 of 30

Page 19 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

19

patterns, it can be proposed that the insertion of A opposite K and insertion of T opposite X may help the repair protein to ultimately replace K:X by T:A pair. This may happen in two ways involving two steps each (Figure 5b). (1) According to the first mechanism, in the first step, the polymerase I may replace X in the template and insert A opposite K. In the second step, K will be replaced by T, thereby ultimately producing T:A base pair in DNA (Figure 5b). (2) According to the second mechanism, K would be replaced in the first step followed by insertion of T opposite X. In the second step, X will be replaced and A will be incorporated opposite T to ultimately produce T:A pair in DNA (Figure 5b). 3.4. Base pair interactions Involving P and Z The optimized structures of some of the important base pairs involving P and Z are depicted in Figure 6. All the optimized structures of these base pairs are illustrated in Figures S9 and S10 (Supporting Information). The binding energies of all base pairs involving P and Z are presented in Table 5. From this Table it is clear that base pairs involving the anti-pu:anti-py will be more stable than the corresponding base pairs involving the syn-pu:anti-py. Among all base pairs involving P and Z, the P:Z complex (Figure 6a) is found to be the most stable and is stabilized by three hydrogen bonds like the G:C pair (Figure 2a). However, the P:Z pair is about 0.40 kcal/mol more stable than the G:C pair (Tables 1,4). In an earlier DNA melting study, the Gibbs free energy difference (ΔΔG37) between the P:Z and G:C base pair complexes in the GCCAGTTAA sequence (G and C were replaced by P and Z, respectively) was found to be 0.85 kcal/mol.27 As this result was obtained in the gas phase and in a specified sequence, it cannot be directly compared with the CBS-QB3 result. In spite of this, as the binding energy difference is very small, it can be presumed that the G:C and P:Z complexes would have similar effects in the duplex DNA.

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 30

20

Table 5: Zero-point energy-corrected binding energies (kcal/mol) of different base pairs involving P and Z in the aqueous medium. Base Pairs

anti-pu:anti-py

P:Z P:C P:T G:Z A:Z1 A:Z2 P(+H+):Z(-H+) P(+H+):C P(+H+):T G:Z(-H+) A:Z(-H+)

B97X-D /6-31+G* -15.47 -8.82 -8.08 -12.39 -7.19 -21.50 -17.97 -10.11 -16.92 -4.14

B97X-D /AUG-cc-pVDZ -16.45 -9.43 -8.58 -13.51 -7.30 -23.01 -19.11 -11.10 -18.08 -4.49

syn-pu:anti-py CBS-QB3 -16.85 -9.61 -9.29 -14.31 -7.80 -23.20 -19.32 -11.62 -18.27 -4.29

B97X-D /6-31+G* -9.74 -5.52 -5.94 -12.24 -8.30 -6.98 -5.65 -5.10 -4.74 -2.40 -5.32

B97X-D /AUG-cc-pVDZ -10.26 -5.80 -6.35 -13.00 -8.61 -7.09 -6.11 -5.52 -5.10 -2.52 -5.74

Interestingly, the insertion of C opposite P (Figure 6b) is found to be slightly more favorable than that of T (Table 5) and the P:C pair is structurally analogous to the second level Wobbletype base pair. This is because the electrostatic repulsion between the O6 of P and O4 of T destabilizes the P:T complex (Figure 6c) compared to the P:C complex (Figure 6b). However, the incorporation of G (Figure 6d) opposite Z is found to be highly favourable than that of A (Figure 6e) (Table 5). Although both the base pairs are stabilized by two hydrogen bonds, in the G:Z pair, the O6 of G makes an additional favorable interaction with the NH2 group of Z (Figure 6c) and retains the first level Wobble-type alignment. If we compare stabilities of all base pairs involving P and Z, these follow the order P:Z > G:Z > P:C > P:T > A:Z1 (Table 5). These results indicate that the insertions of Z and C opposite G and P respectively are possible in DNA, which would ultimately induce mutations. In the earlier biochemical study,28 formations of G:Z and P:C were speculated to be the main reasons of Z and Pinduced mutations respectively. As the binding pattern of P:C is similar to the second level Wobble-type, it can be argued that, probably DNA polymerases possess inherent ability to recognise both the first level and second level Wobble-type mismatches. The rejection of

ACS Paragon Plus Environment

Page 21 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

21

X:T2 (second level Wobble-type mismatch) by DNA polymerase I may have occurred because the polymerase was expecting a G:T like structural pattern while incorporating T opposite X.

Figure 6: Optimized structures of some important base pair complexes involving P (a,b,c), Z (d,e), P(-H+) (f,g) and Z(-H+) (h) obtained in the aqueous medium. These complexes are obtained by considering anti-conformations of all pu (P, P(+H+), G, and A) and py (Z, Z(H+), C, and T). The relative binding energies (kcal/mol) of different complexes calculated with respect to the P:Z complex and hydrogen bond distances (Å) (dotted lines) are also shown.

It was earlier presumed that the formations of G:Z and P:C mutagenic products may occur due to the deprotonation of Z and protonation of P respectively. However, due to the high pKa of Z (~7.8 in isolated nucleoside) its deprotonation in the duplex DNA may not be possible. To evaluate if deprotonation of Z is kinetically and thermodynamically feasible in the duplex DNA, the barrier energies required for the proton transfer reaction from N3 of Z to N1 of P (Table 4) in the P:Z complex and the stability of P(+H+):Z(-H+) (Figure 6f) (Table 5) were estimated. Subsequently, structures (Figure S11 and S12, Supporting Information) and

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

22

stabilities (Table 5) of different mispairs formed due to the insertions of A and T opposite P(+H+) and the insertions of A and C opposite Z(-H+) were calculated. Surprisingly, the P(+H+):Z(-H+) pair (Figure 6f) is found to be about 6.35 kcal/mol more stable than the P:Z complex and is stabilized by three strong hydrogen bonds. Similarly, the P(+H+):C (Figure 6g) and G:Z(-H+) (Figure 6h) complexes are also found to be stabilized by three hydrogen bonds and are appreciably more stable than the corresponding neutral complexes (Figure 6, Table 5). However, the barrier energies involved in the proton transfer reaction from N3 of Z to N1 of P computed at the CBS-QB3 and MP2 methods are ~7 and 13 kcal/mol respectively (Table 4) and this reaction is found to be endothermic (Table 4). This suggests that the formation of P(+H+):Z(-H+) pair would not occur in DNA. Hence, in spite of high thermodynamical stabilities of ionic base pairs, their occurrences in the duplex DNA would not be feasible. 3.5. Structures and stabilities of AEGIS complementary and non-complementary base pairs The stabilities of all base pairs involving complementary AEGIS nucleobases stabilized by three hydrogen bonds follow the order J:V > B:S > P:Z > X:K (Tables 1-3,5). This suggests that the J:V complex would be highly stable in the duplex DNA. Although several studies involving X:K and P:Z were undertaken in past, very less studies were carried out on the J:V complex. This study, for the first time, reveals that the J:V complex can be more useful in realizing stable DNA and hence can be used in various biological applications including diagnosis of diseases and realizing functional artificial DNA. These results also indicate that if an artificial DNA is built with the help of these complementary AEGIS nucleobases, it may not block replication as schematically illustrated in Figure 7a.

ACS Paragon Plus Environment

Page 22 of 30

Page 23 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

23

It is further revealed that all the non-complementary AEGIS bases (Figures S13 and S14, Supporting Information) would bind with each other by involving only the anti-conformation (Table 6). The stabilities of these Wobble-type of complexes follow the order B:V > J:Z > X:V > P:V > B:K > X:Z >J:S >X:S > J:K > B:Z > P:K > P:S (Table 6). This implies that not only J and V can produce very stable base pair, but also can mispair with other AEGIS bases to produce stable mispairs in DNA. Hence if a DNA would be formed by considering all the second generation AEGIS nucleobases or a mixture of AEGIS and natural nucleobases, it may induce mutations and stall replication as illustrated in Figure 7b. Table 6: Zero-point energy-corrected binding energies (kcal/mol) of different base pairs involving AEGIS bases in the aqueous medium. Base Pairs

B:V B:K B:Z J:S J:K J:Z X:V X:Z X:S P:K P:V P:S

anti-pu:anti-py B97X-D /6-31+G* -13.84 -11.06 -6.41 -10.02 -8.11 -12.45 -11.46 -9.49 -9.12 -7.10 -11.22 -4.34

syn-pu:anti-py B97X-D /AUG-cc-pVDZ -14.95 -11.30 -6.65 -10.73 -8.56 -13.14 -12.42 -10.31 -9.70 -7.48 -11.82 -4.60

CBSQB3 -15.99 -11.99 -8.12 -10.73 -8.98 -13.43 -13.27 -10.98 -10.27 -7.74 -12.19 -5.27

B97X-D /6-31+G* -6.27 -6.22 -5.96 -8.02 -5.04 -4.98 -1.61 -7.56 -2.12 -4.68 -8.32 -2.50

ACS Paragon Plus Environment

B97X-D /AUG-cc-pVDZ -6.38 -6.49 -6.52 -8.12 -4.67 -5.31 -1.89 -8.11 -2.32 -4.80 -8.56 -2.62

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

24

Figure 7: Schematic illustrations of (a) DNA polymerase-induced replication of DNA containing natural and AEGIS nucleobases inserted opposite their complementary counterpart and (b) mutation due to mispairing between natural and AEGIS nucleobases and between uncomplimentary AEGIS nucleobases.

Conclusion On the basis of structures and base pair energies, it can be proposed that the insertions of natural nucleobases opposite the second generation AEGIS nucleobases may lead to mutations. These mutations may occur due to the formations of mainly (1) inverted base pairs (B:Cinv, G:Sinv) and (2) Wobble-type of base pairs (X:T, P:C, G:Z). It is further revealed that the protonations of K and P as well as deprotonations of X and Z would be unfeasible in DNA. Further, it is found that each second generation AEGIS nucleobase has a Watson-Crick geometric partner in DNA, the base pairing between which would produce stable and nonmutagenic products. Among these base pairs, the J:V complex is found to be the most stable. The base pairing interactions between non-complementary AEGIS nucleobases are found to produce stable mismatch complexes, which are mainly analogous to Wobble-type base pairs and hence would lead to mutations. Although consideration of full length DNA with varying sequences and the presence of DNA polymerases can provide more insights on the mutagenic

ACS Paragon Plus Environment

Page 24 of 30

Page 25 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

25

potentials of AEGIS nucleobases, the present study can be considered as a primary step to unravel structural and energetic details of base pair interactions between AEGIS nucleobases and between AEGIS and natural DNA bases. It is thus expected that this study would be helpful in designing more robust synthetic nucleobases for various applications including diagnosis of diseases and artificial life. Acknowledgement: NRJ is thankful to the Science and Engineering Research Board (SERB) of the Department of Science and Technology (DST, New Delhi) for financial support. Supporting information: Zero-point energy-corrected binding energies of all the base pair complexes obtained at the MP2/AUG-cc-pVDZ level of theory, optimized structures of different base pairs involving anti-B and syn-B, optimized .xyz coordinates of all complementary and protonated base pairs, optimized structures of different base pairs involving anti-G, anti-A, syn-G, and syn-A bound to S, optimized structures of different base pairs involving anti-J and syn-J, optimized structures of different base pairs involving anti-G, anti-A, syn-G, and syn-A bound to V, Optimized structures of different base pairs involving anti-X and syn-X, optimized structures of different base pairs involving anti-G, anti-A, syn-G, and syn-A bound to K, optimized structures of different base pairs involving anti-P and syn-P, optimized structures of different base pairs involving anti-G, anti-A, syn-G, and syn-A bound to Z, optimized structures of different base pairs involving anti-P(+H+) and syn-P(+H+) , optimized structures of different base pairs involving anti-G, anti-A, syn-G, and syn-A bound to Z(-H+), optimized structures of different base pairs involving the anti-conformations of AEGIS purines and anticonformations of AEGIS pyrimidines, optimized structures of different base pairs involving the syn-conformations of AEGIS purines and anti-conformations of AEGIS pyrimidines.

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

26

References 1. Watson, J. D.; Crick, H. C. Molecular structure of nucleic acids: a structure for deoxyribose nucleic acid. Nature 1953, 171, 737-738. 2. Walsh, J. M.; Beuning, P. J. Synthetic nucleotides as probes of DNA polymerase specificity. Journal Nucleic Acids 2012, 530963, 17 pages. 3. Benner, S. A.; Karalkar, N.; Hoshika, S.; Laos, R.; Shaw, R. W.; Matsuura, M.; Fajardo, D.; Moissatche, P.; Alternative Watson-Crick synthetic genetic systems. Cold Spring Harbor Perspect. Biol. 2016, 8, 1023770. 4. Benner, S. A.; Yang, Z. Y; Chen, F. Synthetic biology, tinkering biology, and artificial biology.What are we learning? ComptesRendusChimie 2011, 14, 372-387. 5. Rappaport, H. P. The 6-thioguanine/5-methyl-2-pyrimidinone base pair. Nucleic Acids Res. 1988, 16, 7253-7267. 6. Piccirilli, J. A.; Benner, S. A.; Krauch, T.; Moroney, S. E.; Benner, S. A. Enzymatic incorporation of a new base pair into DNA and RNA extends the genetic alphabet. Nature 1990, 343, 33-37. 7. Geyer, C. R.; Battersby, T. R.; Benner, S. A. Nucleobase pairing in expanded WatsonCrick-like genetic information systems. Structure 2003, 11, 1485-1498. 8. Hirao, I.; Harada; Y.; Kimoto, M.; Mitsui, T.; Fujiwara, T; Yokoyama, S. A. Twounnatural-base-pair system toward the expansion of genetic code. J. Am. Chem. Soc. 2004, 126, 13298-13305. 9. Kimoto, M.; Yamashige, R.; Matsunaga, K.; Yokoyama, S.; Hirao, I. Generation of high-affinity DNA aptamers using an expanded genetic alphabet. Nat. Biotechnol. 2013, 31, 453-457. 10. Hernandez, A. R.; Shao, Y.; Hoshika, S.; Yang, Z.; Shelke, S. A.; Herrou, J.; Kim, H. J.; Kim, M. J.; Piccirilli, J. A.; Benner, S. A. A crystal structure of a functional RNA molecule containing an artificial nucleobase pair. Angew Chem. Int. Ed. Engl. 2015, 54, 9853-9856. 11. Meritt, K. K.; Bradley, K. M.; Hutter, D.; Matsuura, M. F.; Rowold, D. G.; Benner, S. A. Autonomous assembly of synthetic oligonucleotides built from an expanded DNA alphabet. Total synthesis of a gene encoding Kanamycin resistance. Beilstein J. Org. Chem. 2014, 10, 2348-2360. 12. Yang, Z.; Hutter, D.; Sheng, P.; Sismour, A. M.; Benner, S. A. Artificially expanded genetic information system: a new base pair with an alternative hydrogen bonding pattern. Nucleic Acids Res. 2006, 34, 6095-6101. 13. Karalkar, N. B.; Leal, N. A.; Kim, M.-S.; Bradley, K. M.; Benner, S. A. Synthesis and enzymology of 2′-deoxy-7-deazaisoguanosine triphosphate and its complement: a second generation pair in an artificially expanded genetic information system. ACS Synth. Biol. 2016, 5, 672-678. 14. Georgiadis, M. M.; Singh, I., Kellett, W. F.; Hoshika, S.; Benner, S. A. Structural basis for a six nucleotide genetic alphabet. J. Am. Chem. Soc. 2015, 137, 6947-6955. 15. Yang, Z.; Chen, F.; Chamberlin, S. G.; and Benner, S. A; Expanded genetic alphabets in the polymerase chain reaction. Angew Chem. Int. Ed. 2010, 49, 177-180. 16. Sefah, K.; Yang, Z.; Bradley, K. M.; Hoshika, S.; Jimenez, E.; Zhu, G.; Shanker, S.; Yu, F.; Tan, W.; Benner, S. A. In vitro selection with artificial expanded genetic information systems. Proc. Natl. Acad. Sci. U. S. A. 2014, 111, 1449-1454. 17. Zhang, L.; Yang, Z.; Sefah, K.; Bradley, K. M.; Hoshika, S.; Kim, M.-J.; Kim, H.-J.; Zhu, G.; Jimenez, E.; Cansiz, S.; Teng, I.-T.; Champanhac, C.; McLendon, C.; Liu,

ACS Paragon Plus Environment

Page 26 of 30

Page 27 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

27

C.; Zhang, W.; Gerloff, D. L.; Huang, Z.; Tan, W.-H.; Benner, S. A. Evolution of Functional-Nucleotide DNA. J. Am. Chem. Soc. 2015, 137, 6734-6737. 18. Zhang, L.; Yang, Z.; Le Trinh, T.; Teng, I. T.; Wang, S.; Bradley, K. M.; Hoshika, S.; Wu, Q.; Cansiz, S.; Rowold, D. J.; McLendon, C.; Kim, M. S.; Wu, Y.; Liu, Y.; Hou, W.; Stewart, K.; Wan, S.; Liu, C.; Benner, S. A.; Tan, W. Aptamers against cells overexpressing glypican 3 from expanding genetic systems combined with cell engineering and laboratory evolution. Angew Chem. Int. Ed. 2016, 55, 12372-12375. 19. Shirato, W.; Chiba, J.; Inouye, M. A firmly hybridisable, DNA-like architecture with DAD/ADA- and ADD/DAA-type unnatural base pairs as an extracellular genetic candidate. Chem. Commun. 2015, 51, 7043-7046. 20. Jena, N. R.; Das, P.; Behera, B.; Mishra, P.C. Analogues of P and Z as efficient artificially expanded genetic information system. J. Phys. Chem. B 2018, 122, 81348145. 21. Chawla, M.; Credendino, R.; Chermak, E.; Oliva, R.; Cavallo, L. Theoretical characterization of the H‑bonding and stacking potential of two nonstandard nucleobases expanding the genetic alphabet. J. Phys. Chem. B 2016, 120, 2216-2224. 22. Malyshev, D. A.; Pfaff, D. A.; Ippoliti, S. I.; Hwang, G. T.; Dwyer, T. J.; Romesberg, F. E. Solution structure, mechanism of replication, and optimization of an unnatural base pair. Chemistry 2010, 16, 12650-12659. 23. Malyshev, D. A.; Dhami, K.; Quach, H. T.; Lavergene, T.; Ordoukhanian, P.; Torkamani, A.; Romesberg. F. E. Efficient and sequence-independent replication of DNA containing a third base pair establishes a functional six letter genetic alphabet. Proc. Natl. Acad. Sci. USA 2012, 109, 12005-12010. 24. Malyshev, D. A.; Seo. Y. J.; Ordoukhanian, P.; Romesberg, F. E. PCR with an expanded genetic alphabet. J. Am. Chem. Soc. 2009, 131, 14620-14621. 25. Malyshev, D. A.; Dhami, K.; Lavergene, T.; Chen, T.; Dai, N.; Foster, J. M.; Correa Jr. I. R.; Romesberg, F. E. A semi-synthetic organism with an expanded genetic alphabet. Nature 2014, 509, 385-388. 26. Okamoto, I.; Miyatake, Y.; Kimoto, M.; Hirao, I. High fidelity, efficiency and functionalization of Ds-Px unnatural base pairs in PCR amplification for a genetic alphabet expansion system. ACS Synth. Biol. 2016, 5, 1220-1230. 27. Wang X.; Hoshika, S.; peterson, R. J.; Kim, M.-J.; Benner, S. A.; Kahn, J. D. The biophysics of artificially expanded genetic information systems: thermodynamics of DNA duplexes containing matches and mismatches involving 2-smino-3-nitropyridin6- one (Z) and imidazo[1,2-a]-1,3,5-triazin-4(8H)one (P). ACS Synth. Biol. 2017, 6, 782-792. 28. Winiger, C. B.; Shaw, R. W.; Kim, M. J.; Moses, J. D.; Matsuura, M. F.; Benner, S. A. Expanded genetic alphabets: managing nucleotides that lack tautomeric, protonated, or deprotonated versions complementary to natural nucleotides. ACS Synth. Biol. 2017, 6, 194-200. 29. Jena, N. R.; Mishra, P. C. Is FapyG mutagenic?: evidence from the DFT study. ChemPhysChem 2013, 14, 3263-3270. 30. Jena, N. R.; Mark, A. E.; Mishra, P. C. Does tautomerization of FapyG influence its mutagenicity? ChemPhysChem 2014, 15, 1779-1784. 31. Jena, N. R.; Gaur, V.; Mishra, P. C. The R- and S-stereoisomeric effects on the guanidinohydantoin-induced mutations in DNA. Phys. Chem. Chem. Phys. 2015, 17, 18111-18120. 32. Jena, N. R; Bansal, M.; Mishra, P. C. Conformational stabilities of iminoallantoin and its base pairs in DNA: implications for mutagenicity. Phys. Chem. Chem. Phys. 2016, 18, 12774-12783.

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

28

33. Jena, N. R.; Mishra, P. C. Normal and reverse base pairing of Iz and Oz lesions in DNA: structural implications for mutagenicity. RSC Advances 2016, 6, 64019-64027. 34. Cheng, X.; Kelso, C.; Hornak, V.; Santos, C.de Los; Grollman, A. P.; Simmerling, C. Dynamic behavior of DNA base pairs containing 8-oxoguanine. J. Am. Chem. Soc. 2005, 127, 13906-13918. 35. Gehrke, T. H.; Lischke, U.; Gasteiger, K. L.; Schneider, S.; Arnold, S.; Muller, H. C.; Stephenson, D. S.; Zipse, H.; Carell, T. Unexpected non-Hoogsteen–based mutagenicity mechanism of FaPy-DNA lesions. Nat. Chem. Biol. 2013, 9, 455-461. 36. Lee, C.; Yang, W.; Parr, R. G. Development of the Colle-Salvetti correlation-energy formula into a functional of the electron density. Phys. Rev. B 1988, 37, 785-789. 37. Miehlich, B.; Savin, A.; Stoll, H.; Preuss, H. Results obtained with the correlation energy density functional of Becke and Lee, Yang and Parr. Chem. Phys. Lett. 1989, 157, 200-206. 38. Chai, J.-D.; Head-Gordon, M. Long-range corrected hybrid density functionals with damped atom-atom dispersion corrections. Phys. Chem. Chem. Phys. 2008, 10, 66156620. 39. Chai, J.-D.; Head-Gordon, M. J. Systematic optimization of long range corrected hybrid density functional. J. Chem. Phys. 2008, 128, 084106. 40. Tomasi, J.; Mennucci, B.; Cammi. R. Quantum mechanical continuum solvation models. Chem. Rev. 2005, 105, 2999-3093. 41. Scalmani, G. Frissh, M.J. Continuous surface charge polarizable continuum models of solvation. I. General formalism. J. Chem. Phys. 2010, 132, 114110. 42. Jena, N. R.; Mishra, P. C. Mechanisms of formation of 8-oxoguanine due to reactions of one and two OH radicals, and the H2O2 molecule with guanine: a quantum computational study. J. Phys. Chem. B 2005, 109, 14205-14218. 43. Jena, N. R.; Mishra, P. C. Formations of 8-nitroguanine and 8-oxoguanine due to reactions of peroxynitrite with guanine. J. Comput. Chem. 2007, 28, 1321-1335. 44. Jena, N. R.; Kushwaha, P. S.; Mishra, P. C. Reaction of hypochlorous acid with imidazole: formations of 2-chloro- and 2-oxoimidazoles. J. Comput. Chem. 2008, 29, 98-107. 45. Kumar, A.; Pottiboyina, V.; Sevilla, M. D. Hydroxyl radical (OH) reaction with guanine in an aqueous environment: a DFT study. J. Phys. Chem. B 2011, 115, 15129-15137. 46. Wood, G. P. F.; Radom, L.; Petersson, G. A.; Barnes, E. C.; Frisch, M. J.; Montgomery Jr. J. A. A restricted-open-shell complete-basis-set model chemistry. J. Chem. Phys. 2006, 125, 094106. 47. Gaussian09, Revision A.1, Frisch, M. J.; Trucks, G. W.; Schlegel, H. B.; Scuseria, G. E.; Robb, M. A.; Cheeseman, J. R.; Scalmani, G.; Barone, V.; Mennucci, B.; Petersson, G. A. et al. Gaussian, Inc., Wallingford CT, 2009. 48. GaussView,Version5, Dennington, R.; Keith, T.; Millam, J. Inc. Semichem, KS Shawnee Mission 2009. 49. Sirjean B.; Fournet, R.; Glaude, P.-A.; Ruiz-Lopez, F. R. Extension of the composite CBS-QB3 method to singlet diradical calculations. Chem. Phys. Lett. 2007, 435, 152156. 50. Quinn, J. R.; Zimmerman, S. C.; Del Bene, J. E.; Shavitt, I. Does the A.T or G.C base-pair process enhanced stability? Quantifying the effects of CH…O interactions and secondary interactions on base-pair stability using a phenomenological analysis and ab initio calculations. J. Am. Chem. Soc. 2007, 129, 934-941.

ACS Paragon Plus Environment

Page 28 of 30

Page 29 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

29

51. Wang, A. H-J.; Quiqley, G. J.; Kolpak, F. J.; Crawford, J. L.; van der Marel, G.; Rich, A. Molecular structure of a left-handed double helical DNA fragment at atomic resolution. Natur, 1979, 282, 680-686. 52. Abrescia, N. G. A.; Thompson, A.; Huynh-Dinh, T. S. Crystal structure of an antiparallel DNA fragment with Hoogsteen base pairing. Proc. Natl. Acad. Sci. (USA) 2002, 99, 2806-2811. 53. Klein, D. J.; Schmeing, T. M.; Moore, P. B.; Steitz, T. A. The kink turn: a new RNA secondary structure motif. AMBO J. 2001, 20, 4214-4221. 54. Sharma, P.; Mitra, A.; Sharma, S.; Singh, H.; Bhattacharyya, D. Quantum chemical studies of structures and binding in noncanonical RNA base pairs: the trans WatsonCrick:Watson-Crick family. J. Biomol. Struct. Dyn. 2008, 25, 709-732. 55. Cheng, Y.-K.; Pettitt, B. M. Hoogsteen versus reverse Hoogsteen base pairing: DNA triple helixes. J. Am. Chem. Soc. 1992, 114, 4465-4474. 56. Rossetti, G.; Dans, P. D.; Gomez-Pinto, I.; Ivani, I.; Gonzalez, C.; Orozco, M. The structural impact of DNA mismatches. Nucleic Acids Res. 2015, 43, 4309-4321. 57. Winiger, C. B.; Kim, M. J.; Hoshika, S.; Shaw, R. W.; Moses, J. D.; Matsuura, M. F. F; Gerloff, D. L.; Benner, S. A. Polymerase interactions with Wobble mismatches in synthetic genetic systems and their evolutionary implications. Biochemistry 2016, 55, 3847-3850. 58. Singh, I., Kim; M-J., Molt, R.; W., Hoshika, S.; Benner, A. A.; Georgiadis, M. M. Structure and biophysics for a six letter DNA alphabet that includes imidazo[1,2a]1,3,5-triazine-2(8H)-4(3H)-dione (X0 and 2-4-diaminopyrimidine (K). ACS Synth. Biol. 2017, 6, 2118-2129. 59. Florian, J.; Leszczynski, J. Spontaneous DNA mutations induced by proton transfer in the guanine.cytosine base pairs: an energetic perspective. J. Am. Chem. Soc. 1996, 118, 3010-3017. 60. Kumar, A.; Sevilla, M. D. Influence of hydration on proton transfer in the guaninecytosine radical cation (G.+-C) base pair: a density functional theory study. J. Phys. Chem. B 2009, 113, 11359-11361. 61. Chen, H. Y.; Kao, C. L.; Hsu, S. C. Proton transfer in guanine-cystosine radical anion embedded in B-form DNA. J. Am. Chem. Soc. 2009, 131, 15930-15938. 62. Chen, H. Y.; Yeh, S. W.; Hsu, S. C. N.; Kao, C. L.; Dong, T. Y. Effect of nucleobase sequence on the proton-transfer reaction and stability of the guanine-cytosine base pair radical anion. Phys. Chem. Chem. Phys. 2011, 13, 2674-2681.

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

30

TOC Graphic:

Replication of unnatural nucleotides can occur when they are paired opposite their complementary counterpart in DNA, otherwise they may induce mutations.

ACS Paragon Plus Environment

Page 30 of 30