Biophysics of Artificially Expanded Genetic Information Systems

ACS2GO © 2018. ← → → ←. loading. To add this web app to the home screen open the browser option menu and tap on Add to homescreen...
0 downloads 0 Views 2MB Size
Subscriber access provided by Fudan University

Article

The Biophysics of Artificially Expanded Genetic Information Systems. Thermodynamics of DNA Duplexes Containing Matches and Mismatches Involving 2-Amino-3-nitropyridin-6one (Z) and Imidazo[1,2-a]-1,3,5-triazin-4(8H)one (P) Xiaoyu Wang, Shuichi Hoshika, Raymond J. Peterson, Myong-Jung Kim, Steven A. Benner, and Jason D. Kahn ACS Synth. Biol., Just Accepted Manuscript • DOI: 10.1021/acssynbio.6b00224 • Publication Date (Web): 17 Jan 2017 Downloaded from http://pubs.acs.org on January 19, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

ACS Synthetic Biology is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 28

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

The Biophysics of Artificially Expanded Genetic Information Systems. Thermodynamics of DNA Duplexes Containing Matches and Mismatches Involving 2-Amino-3-nitropyridin-6-one (Z) and Imidazo[1,2-a]-1,3,5-triazin-4(8H)one (P) Xiaoyu Wang1, Shuichi Hoshika2, Raymond J. Peterson3, Myong-Jung Kim2, Steven A. Benner2,4*, and Jason D. Kahn1* 1

Department of Chemistry and Biochemistry, Univ. Maryland, College Park MD 20742

2

Foundation for Applied Molecular Evolution, 13709 Progress Boulevard, No. 7, Alachua FL 32615

3

Celadon Laboratories, 6525 Belcrest Road, Hyattsville, MD 20782

4

Firebird Biomolecular Sciences LLC, 13709 Progress Boulevard, No. 17, Alachua FL 32615

Email address for the corresponding author: [email protected] Email addresses for the other authors: [email protected] or [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected]

ACS Paragon Plus Environment

1

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 28

Abstract Synthetic nucleobases presenting non-Watson Crick arrangements of hydrogen bond donor and acceptor groups can form additional nucleotide pairs that stabilize duplex DNA independent of the standard A:T and G:C pairs. The pair between 2-amino-3-nitropyridin-6-one 2’-deoxyriboside (presenting a {donordonor-acceptor} hydrogen bonding pattern on the Watson-Crick face of the small component, trivially designated Z) and imidazo[1,2-a]-1,3,5-triazin-4(8H)one 2’-deoxyriboside (presenting an {acceptoracceptor-donor} hydrogen bonding pattern on the large component, trivially designated P) is one of these extra pairs for which a substantial amount of molecular biology has been developed. Here, we report the results of UV absorbance melting measurements and determine the energetics of binding of DNA strands containing Z and P to give short duplexes containing Z:P pairs as well as various mismatches comprising Z and P. All measurements were done at 1 M NaCl in buffer (10 mM Na cacodylate, 0.5 mM EDTA, pH 7.0). Thermodynamic parameters (∆H°, ∆S° and ∆G°37) for oligonucleotide hybridization were extracted. Consistent with the Watson-Crick model that considers both geometric and hydrogen bonding complementarity, the Z:P pair was found to contribute more to duplex stability than any mismatches involving either non-standard nucleotide. Further, the Z:P pair is more stable than a C:G pair. The Z:G pair was found to be the most stable mismatch, forming either a deprotonated mismatched pair or a wobble base pair analogous to the stable T:G mismatch. The C:P pair is less stable, perhaps analogous to the wobble pair observed for C:O6-methyl-G, in which the pyrimidine is displaced into the minor groove. The Z:A and T:P mismatches are much less stable. Parameters for predicting the thermodynamics of oligonucleotides containing Z and P bases are provided. This represents the first case where this has been done for a synthetic genetic system.

ACS Paragon Plus Environment

2

Page 3 of 28

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

According to a model published by Watson and Crick 64 years ago, 1 nucleobase pairing in DNA double helices achieves its specificity by following two rules of complementarity. Size complementarity requires that large purines pair with small pyrimidines. Hydrogen bonding complementarity requires that each hydrogen bond donor group on a nucleobase be matched with a hydrogen bond acceptor group on its complement. Natural nucleobases incompletely exploit these rules in two ways (Figure 1).2 First, adenine has only two hydrogen bonding groups, as it lacks an amino group in the minor groove that could form a third hydrogen bond to the minor groove C=O unit of its thymine complement. Second, natural nucleobases exploit only some of the possible hydrogen bonding patterns. For example, the two natural pyrimidines T and C exploit only the {acceptor-donor-acceptor} and {donor-acceptor-acceptor} hydrogen bonding patterns (pyADA and pyDAA). Alternative possible patterns of pyrimidine donor and acceptor groups ({donor-acceptor-donor}, {donor-donor-acceptor}, {acceptor-donor-donor}, and {acceptor-acceptordonor}, or pyDAD, pyDDA, pyADD, and pyAAD) are not used by any standard pyrimidine nucleobase, nor are the complementary patterns puADA, puAAD, puDAA, and puDDA presented by any standard purine. Second Generation AEGIS

First Generation AEGIS pyDAA

H N H

H3C

Donor

Acceptor

H N H

O

C

pyADA

N R

N

N R

N

H N

N

Acceptor

O

G

O

Acceptor

H N

N H R

N N

N

N

Donor

Acceptor

Acceptor H3C

Donor

Donor

Donor

Donor

R

N

A

O

Acceptor N N

Acceptor Donor

R

S

H H N

H

Rapid loss of base

N N

O

B

N H

O

H3 C

N H

N

N

K

Donor

Donor

Acceptor

Acceptor

puADA

O

Acceptor

N N

H N

R

N R

pyDAA

Acceptor

J

H

Acceptor

H N H

N H H

H

O

Donor Acceptor

X

N N

H N

N

R

N

N R

Acceptor

pyADA

O

H N H

O

C

G

pyADD Acceptor O2N

Donor

Donor

Donor

Donor

O

Acceptor

H N

N H

N N

N

N

Donor

O

N N H

N

N H

O

Donor R Acceptor

undesired tautomer

puAAD

pyAAD

H

Donor Donor Acceptor

N H

O N

N

N H

R

O

N N

Z

H N H

P

R

Acceptor

J

H

Acceptor

Donor

Acceptor

Acceptor

puADA

H N H

O N

N N

H N

Acceptor N

K

N H H

Donor

R

N R

O

Acceptor

X not acidic

pyDDA epimerizes H3C

N

no epimerization, no oxidation

Donor

Donor

Donor

H N

N

pyDAD

amA Donor

H

O

R

puDAD

H N

T

Acceptor

R

N

R

H

V

H H3C

puDAA

puADD Acceptor

too acidic

Donor

N

H N

N H

R

Acceptor

Donor

R

N

Donor

Donor

N

N

N H

Donor

puDDA H3C

N

N

puDAA H H N

too easily pyDAD oxidizes H

missing hydrogen bond group

pyAAD

O

puDAD

O

T

Acceptor

pyADD epimerizes

V

H H3C

puADD

7-deaza better tautomer ratio H3C

Acceptor

Acceptor

Acceptor

Acceptor

O N N

N R

Donor

Donor

R

S

H H N

N

H N

N H H

H C

N O

B

R

puDDA

pyDDA no epimerization, no oxidation puAAD

Donor

Donor

Donor

Donor

Acceptor

Acceptor

H

no loss of base

O2 N

N H

O

N H

N

Acceptor N N

R

Z

O

H N H

P

N R

Acceptor Donor

Figure 1. Two complementarity rules guide XNA base pairing: (a) size (large purines or analogs pair with small pyrimidines or analogs) and (b) hydrogen bonding (hydrogen bond donors, D, pair with hydrogen bond acceptors, A). Rearranging D and A groups on the bases gives an Artificially Expanded Genetic Information System (AEGIS). The chemical issues in the “first generation AEGIS” (left) are indicated in magenta; these motivated the Benner laboratory to create, over the past few years, a second generation AEGIS system (right). Electron density presented to the minor groove by the bases is believed to be a specificity determinant for polymerases. Note the one letter code used to specify each of the non-standard hydrogen bonding pattern, regardless of the heterocycle that implements it. This is analogous to the “G”, “C”, “A” and “T” used to designate the standard nucleotides.

ACS Paragon Plus Environment

3

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 28

Of course, synthetic biologists can fix these “defects” and learn much in the process.3 For example, 2-aminoadenine (amA in Figure 1, right, also known as DAP, 2,6-diaminopurine) is an analog of adenine that completes the complementarity of the canonical Watson-Crick pair. The pair between 2,6diaminopurine and thymine conforms to the general observation that pairs joined by three hydrogen bonds contribute more to duplex stability than those joined by two hydrogen bonds,4 although the stability increment of the DAP:T pair relative to the A:T pair depends on the local sequence context.5,6 It has been a quarter century since synthetic biologists first attempted to fix Nature’s other “oversight” by adding nucleotides to implement the full diversity allowed by the Watson-Crick pairing geometry.7 It has proven possible to synthesize nucleotide analogs that implement all of the additional hydrogen bonding patterns, creating an “artificially expanded genetic information system” (AEGIS).8 AEGIS has had practical utility. For example, “first generation” implementations of AEGIS puDDA and pyAAD hydrogen bonding patterns, in the form of 2’-deoxyisoguanosine (B in Figure 1, left) and 2’-deoxyisocytidine (S in Figure 1, left) were incorporated into FDA approved diagnostics tools that measure the loads of HIV and hepatitis viruses in patient blood, 9,10 panels for the diagnosis of respiratory disease viruses 11,12,13 and genetic counseling tools to detect mutations responsible for cystic fibrosis.14 However, first generation AEGIS components did not perform well when challenged to be replicated with high fidelity by polymerases, to support evolution in vitro, and to be used by living cells. For example, nucleosides that implemented the pyDDA and pyADD hydrogen bonding patterns on pyrazine heterocycles suffered epimerization via transient double bond formation at the glycosyl bond, with concomitant breakage of the deoxyriboside ring O-C1’ bond.15 ,16 The isoguanosine implementation of the puDDA hydrogen bonding pattern (B) had a tautomeric form that allowed isoG to pair with thymidine.17 The isocytidine implementation of the pyAAD pattern (S) suffered facile glycosyl bond cleavage.18 The xanthosine implementation of the puADA pattern (X) was too acidic.19 Accordingly, over the past decade, a set of “second generation” AEGIS components was created to mitigate these problems (Figure 1, right). One of the most successful of these second-generation pairs exploits, as nucleobase analogs, 2-amino-3-nitropyridin-6-one (trivially designated Z) and imidazo[1,2a]-1,3,5-triazin-4(8H)one (trivially P). These present, respectively, the complementary pyDDA and puAAD hydrogen bonding patterns. The biological compatibility of the Z:P pair has been speculated to arise from the presence of electron density in the minor groove from both Z and P, density that may be a recognition feature for polymerases.20 Polymerases have been developed that allow Z:P pairs to participate in PCR reactions.21 GACTZP DNA can also be sequenced22 and managed with restriction endonucleases.23 Six-letter GACTZP DNA is now used for in vitro evolution, as shown by the recent selection of AEGIS aptamers that bind to breast cancer cells,24 specifically to liver cancer cells, 25 to

ACS Paragon Plus Environment

4

Page 5 of 28

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

proteins engineered to be expressed on cell surfaces, 26 and most recently to the protective antigen from Bacillus anthracis, the causative agent of anthrax.27 Z and P now appear in highly multiplexed tools that detect nearly two dozen mosquito-borne viruses.28 Crystal structures of duplexes containing Z:P pairs have found double helices in both the A and B helical forms.29 Finally, many AEGIS components, including their phosphoramidites and triphosphates, are available commercially (www.firebirdbio.com). These applications create a need for quantitative thermodynamic data describing the contribution of Z:P pairs to DNA duplex stability. These data are especially important to support design software that predicts the stability of those duplexes, similar to software that has been available for some time to predict the stability of duplexes made by complementary DNA strands built from standard nucleotides. 30,31,32,33 The thermodynamics of modified nucleic acid pairs are also relevant to the design and efficacy of siRNA34 and aptamers,35 as well as modified backbone species. For example, absorbance melting curves on a variety of sequences has been used to conveniently characterize the context-dependent thermodynamics of the backbone analog Locked Nucleic Acid (LNA) 36,37 and of the nucleobase variant pseudouridine. 38 AEGIS nucleobases are widely used to stabilize probe-target interactions or secondary structures in designed DNA structures. However, they are potentially even more useful in addressing the “negative design” problem of destabilizing unwanted pairing.39 This paper provides thermodynamic data for the Z:P nucleobase pair and all Z:purine and P:pyrimidine mismatches, proposals for various mismatch structures, and suggestions for design rules. It is the first detailed published study for any AEGIS pair. MATERIALS AND METHODS Design of the oligonucleotides studied Table 1 lists all of the oligonucleotide sequences used and the rationales for their design. Initially a short, self-complementary 8-mer GGACGTCC context was used. Variants of this reference sequence were synthesized with Z and P nucleotides at positions suitable for pairing with each other, as listed in Table 1. This preliminary work confirmed that the Z:P pair confers stability, but to isolate the thermodynamic effects of Z:P by studying single substitutions, we turned to non-self complementary oligonucleotides, in which it is simpler to study the modified pairs and arbitrary mismatches in varying contexts. To this end, the reference top strand sequence 5'-GCCAGTTAA and its complement were used. Variants were synthesized with Z:P pairs instead of the initial C:G pairs at positions 2, 3, 5, or 2 and 3 together, requiring 8 Z or P-containing oligonucleotides and providing 4 examples of Z:P pairs. By hybridizing each Z or P-containing oligonucleotide with appropriate reference single strand, Z:G and C:P mismatches were obtained. An additional set of 8 oligonucleotides was designed that each contains either A at the position of P, or T at the position of Z. The entire set of oligonucleotides provide all 9 possible

ACS Paragon Plus Environment

5

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 28

purine (or imidazotriazine analog):pyrimidine (pyridinone) combinations, including as the 3 matched pairs C:G, Z:P, and T:A, and the 6 mismatches T:G, T:P, C:A, C:P, Z:A, and Z:G, in 4 different positional contexts as above.

Table 1. Oligonucleotide duplexes studied, including C:G- and Z:P-containing perfect duplexes, position variants, and mismatches. Double Strand ID

Single Strand IDs

SCI

SC1

SC2

SC2

SC3

SC3

SC4

SC4

SC5

SC5

Rationale Standard ACGT selfcomplementary reference sequence Replacement of C:G by Z:P at the termini Consecutive 5’-ZP-3’ pairs Two non-consecutive Z:P pairs Consecutive 5’-PZ-3’ pairs

Standard ACGT non-selfcomplementary reference sequence Single central Z:P pair with Z T_5P 5'-GCCAPTTAA 5ZP between A and T. Reference 3'-CGGTZAATT B_5Z sequence for 5YR mismatches* Single internal Z:P pair with Z T_3Z 5'-GCZAGTTAA 3ZP between C and A. Reference 3'-CGPTCAATT B_3P sequence for 3YR mismatches* Single penultimate Z:P pair with Z T_2Z 5'-GZCAGTTAA 2ZP between G and C. Reference 3'-CPGTCAATT B_2P sequence for 2YR mismatches* T_2,3Z Two consecutive Z:P pairs. 5'-GZZAGTTAA 2,3ZP 3'-CPPTCAATT Reference for 2,3YR mismatches* B_2,3P * YR mismatches = Z:G, C:P, T:G, T:P, Z:A, or C:A mismatches. For example, 3ZA is 5'-GCZAGTTAA/3'-CGATCAATT CG

T_REF B_REF

dsDNA Sequence 5’-GGACGTCC 3’-CCTGCAGG 5’-PGACGTCZ 3’-ZCTGCAGP 5’-GGAZPTCC 3’-CCTPZAGG 5’-GPACGTZC 3’-CZTGCAPG 5'-GGAPZTCC 3’-CCTZPAGG 5'-GCCAGTTAA 3'-CGGTCAATT

Oligonucleotide Synthesis Standard phosphoramidites (Bz-dA, Ac-dC, dmf-dG, and dT) and CPG having standard residues were purchased from Glen Research (Sterling, VA). AEGIS phosphoramidites (dZ and dP) and CPG having dZ residues were purchased from Firebird Biomolecular Sciences LLC (Alachua, FL). All oligonucleotides containing dZ and dP were synthesized on an ABI 394 DNA Synthesizer following standard phosphoramidite chemistry.40 Samples of controlled pore glass (CPG) carrying the product oligonucleotides were treated with 1 M DBU (2.0 mL, in anhydrous acetonitrile) at room temperature for 24 hours to remove the NPE group on the dZ nucleobase. Then, the CPG samples were filtered, dried, and treated with concentrated NH4OH (55 ºC, 16 h). After evaporation of the NH4OH, oligonucleotides containing dZ and dP were purified by ion-exchange HPLC, and then desalted using Sep-Pac® Plus C18

ACS Paragon Plus Environment

6

Page 7 of 28

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

cartridges (Waters). The oligonucleotides containing natural nucleobases purchased from IDT were simply desalted.

UV Absorbance Melting Curves and Data Analysis DNA stored at -20 °C as a 100 µM stock solution was thawed and diluted into melting buffer (1 M NaCl, 10 mM Na cacodylate, 0.5 mM Na2 EDTA, pH 7), in a microcentrifuge tube (1.5 mL). Samples that contained two fully complementary or mismatched oligonucleotides at total strand concentrations (CT) of either 6 µM or 15 µM were prepared, as were corresponding reference samples that contained each of the two single-stranded oligomers at concentrations of 3 µM or 7.5 µM. Pairs of oligonucleotides were mixed in nominally equimolar amounts; the use of ssDNA reference samples controls for pipetting error and allows for careful calculation of the actual as opposed to the nominal concentration of each strand. Samples were transferred to 1 mL (1 cm path length) self-masking quartz cuvettes for UV absorbance measurements. The cuvette cap was wrapped with Teflon tape before sealing to prevent evaporation. UV melting profiles were collected on a Cary®100 Bio UV-visible spectrophotometer equipped with a 12-cell sample changer and a Peltier heating/cooling system. The sample chamber was purged with N2 whenever the temperature was below 10 °C to prevent condensation on the cuvette surface. Annealing was done with an initial fast heating ramp from 25 °C to 85 °C at 10 °C/min. Data were then acquired during a slow cooling ramp from 85 °C to 0 °C at 1 °C /min; data were collected every 1 °C. These short oligonucleotides exhibited rapid equilibration and they showed no signs of chemical degradation or precipitation: repeated temperature cycles, acquisition during heating rather than cooling, or data acquisition at a slower rate of temperature change, all gave the same results as the standard method. Curve fitting showed that the data were well-fit with the van’t Hoff equation, with no systematic deviations in the residuals of the fits. Varying DNA concentration over a five-fold range gave the same results for ∆H° and ∆S° within error. These observations are consistent with two-state melting; in previous work,36 we showed that non-two-state behavior often manifests as “pre-melting’ at temperatures just below the Tm. DNA concentrations were obtained from the absorbance at 20 °C, but we checked with single-strand control melts that for these sequences the high-temperature absorbance was always within 5% of the absorbance at 85 °C. The exact concentrations derived from the control ssDNA melts done in parallel with the dsDNA melt were calculated. The extinction coefficients of the single stranded oligomers that contain only ATGC were obtained from IDT’s website, which uses the nearest neighbor calculation method (http://www.idtdna.com/pages/decoded/decoded-articles/pipet-tips/decoded/2013/01/18/oligoquantification-getting-it-right): 91500 L·mol-1·cm-1 for 5’ GCCAGTTAA, 83600 L·mol-1·cm-1 for 5’

ACS Paragon Plus Environment

7

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 28

TTAACTGGC, and 74900 L·mol-1·cm-1 for GGACGTCC. These three values were used as the best-guess extinction coefficients for P, Z containing oligomers, because the nearest-neighbor model parameters for P and Z inside DNA oligonucleotides are currently unknown. Analysis programs written in Matlab are used to fit the melting curves to the van’t Hoff equation. The underlying equations are similar to those used in our previous work36 (but fitting in Matlab is much more robust than the Excel-based methods used previously. It is similar to the Meltwin program;41 we wrote our own version to allow cross-platform analysis and in-house modification. Any excess single strand concentrations (determined as above) are considered explicitly in the fitting. Initial values for the initial and final absorbance values and estimates for the dsDNA and ssDNA baselines are iteratively refined along with ∆H° and ∆S°, and potential multi-state melting is flagged by pre-melting behavior at T < Tm. Many of the duplexes containing mismatches melt below room temperature, making it difficult to establish a dsDNA baseline for these. In this case, surprisingly, results were improved significantly by truncating the high-temperature data at 65 °C so that curvature of the ssDNA absorbance vs. temperature portion of the plot does not affect the results. An example of the program’s output is provided in Supplementary Materials. Values of the parameters ∆G°37, ∆H°, and ∆S° for each sequence were estimated as the average over multiple runs, weighted by the uncertainties from the 95% confidence intervals from the fit. The final estimates for the uncertainties in the parameters are obtained from multiple melt runs using two approaches. First, the propagated uncertainty   is given by the equation below:

σA =



n

σ i2

i=1

n

where σi is the uncertainty for each run from the fitting and n is the number of trials. Second, the observed standard error of the mean   from multiple runs is calculated according to the equation below: 

σB =



n i=1

(xi − x )2

n(n − 1)

where ̅ is the weighted average of individual best-fit values xi, calculated as follows: n

∑ (x / σ i

x=

2 i

)

i=1 n

∑ (1/ σ

2 i

)

i=1

The χ2 value is calculated for each parameter from the equation below, and the Microsoft Excel function CHIDIST is used to return the right-tailed probability of observing that value of χ2, with the degrees of freedom being n - 1.

ACS Paragon Plus Environment

8

Page 9 of 28

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

n

χ2 = ∑ i=1

(xi − x )2

σ i2

In most cases, the χ2 probability is larger than 0.5, and the observed standard error of the mean (the estimated population standard deviation) was smaller than the propagated uncertainty (the estimated sample standard deviation). In a few cases, especially for mismatched samples with low Tm, the observed standard error of the mean was significantly larger than this propagated uncertainty, suggesting that the fitting procedure underestimated the uncertainty in the melting curve fit parameters. The χ2 probabilities in these cases are smaller than 0.15, often smaller than 0.05. This is attributed primarily to nonlinear single stranded baselines, which produce unreasonably small estimated uncertainties accompanied by poor goodness of fit. Trials with nonlinear single stranded baselines were reanalyzed by truncating the high temperature data, which increases the individual estimated uncertainties to a reasonable range. We report all uncertainties in the final ∆H°, ∆S°, and ∆G° parameters as the larger of the standard error of the mean or the propagated uncertainty. Reported Tm values are adjusted to equimolar strands at 1 µM total strand concentration for ease of comparison. All thermodynamic data are provided in Table 2.

RESULTS The parent self-complementary reference sequence built from only standard nucleotides was found to have thermodynamics in a standard DNA melting buffer of ∆G°37= -8.78 kcal/mol, ∆H° = -57.9 kcal/mol, and ∆S°= -158.3 cal/mole K. The melting temperature (Tm) was measured to be 44.7 °C at 6 µM strand concentration. The predicted values for the reference duplex were calculated using the application provided on the Integrated DNA Technologies web page, and were: ∆G°37= -8.98 kcal/mol, ∆H° = -59.6 kcal/mol, and ∆S°= -163.20 cal/mole K. The predicted melting temperature (Tm) was 45.4 °C at 6 µM of each strand. This reasonable agreement suggested that it is appropriate to use the standard DNA thermodynamics predictions for baseline natural DNA under our experimental conditions. Melting experiments found that the inclusion of two Z:P pairs significantly stabilized the duplex, with ∆Tm values calculated at 1 µM of 5-8 °C. These results encouraged us to design ten non-self complementary oligonucleotides that provide duplex DNA molecules containing a single Z:P nucleobase pair in three different contexts/positions as well as one example of a ZZ:PP dinucleotide, along with a reference duplex containing a C:G pair in each position that is a Z:P in the test oligonucleotides (Table 1). The C:G pair was chosen as the thermodynamic reference state for comparison with Z:P, as both the standard and the AEGIS pair are joined by three hydrogen bonds. This reference duplex melted with these thermodynamic parameters: ∆G°37= -8.40 kcal/mol, ∆H= -58.5 kcal/mol, ∆S= -163.0 e.u., Tm = 31.9 °C, again in reasonable agreement with IDT predictions.

ACS Paragon Plus Environment

9

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 28

Quantitative UV absorbance melting curves were obtained for the Z- and P-containing oligonucleotides described in Table 1. Examples of Z:P pairs in several contexts as well as all possible pyrimidine:purine mismatches were studied. All of these short sequences were observed to melt reversibly, with no indication of pre-melting or multi-state behavior. Repeated runs on the samples, at acquisition rates up to 2 °C/min, and acquiring data either as the temperature increased or decreased, all gave very similar results. This confirms rapid equilibration and an absence of chemical degradation upon repeated heating and cooling. Thermodynamic parameters were extracted by fitting the data to the van’t Hoff equation as described in Materials and Methods, assuming two-state melting, and the results are summarized in Table 2 and illustrated in Figure 2. Several of the mismatched oligonucleotides paired too weakly to provide a sufficient dsDNA baseline to allow calculation of ∆H° and ∆S° with confidence. As errors in ∆H° and ∆S° tend to cancel, the ∆G°37 values were obtained with greater confidence. Table 2. ∆H°, ∆S°, ∆G°37, and Tm values obtained for formation of the indicated oligonucleotide pairs Sequence

∆H° (kcal/mol)

∆S° (cal/mole K)

∆G°37 (kcal/mol)

Tm (°C), 1 µMa

5ZP

-56.2 ± 2.3

-151.6 ± 7.1

-9.1 ± 0.1

35.8 ± 0.6

3ZP

-58.4 ± 2.4

-158.2 ± 7.5

-9.4 ± 0.1

37.0 ± 0.4

2ZP

-58.5 ± 2.9

-158.9 ± 9.1

-9.2 ± 0.1

36.3 ± 0.5

5ZG

-47.9 ± 2.4

-129.2 ± 7.8

-7.8 ± 0.1

27.6 ± 0.4

3ZG

-49.6 ± 3.4

-137.0 ± 10.9

-7.07 ± 0.02

23.5 ± 1.0

2ZG

-52.8 ± 3.6

-146.0 ± 11.9

-7.6 ± 0.1

26.2 ± 0.2

5CP

-58.6 ± 2.6

-171.0 ± 8.7

-5.5 ± 0.1

18.0 ± 0.3

3CP

-52.6 ± 2.4

-151.4 ± 8.2

-5.6 ± 0.1

16.4 ± 0.7

2CP

-49.2 ± 2.8

-139.8 ± 9.3

-5.8 ± 0.1

16.1 ± 0.6

5TG

-49.0 ± 4.1

-140.3 ± 13.7

-5.4 ± 0.1

14.2 ± 1.0

3TG

-45.6 ± 4.5

-129.7 ± 15.1

-5.4 ± 0.1

12.0 ± 1.2

2TG

-47.5 ± 3.7

-134.1 ± 12.2

-5.9 ± 0.1

15.9 ± 1.0

5TP

-61.3 ± 3.8

-183.2 ± 12.7

-4.5 ± 0.2

14.2 ± 0.7

NA

NA

-4.6 ± 0.1

3.3 ± 1.0

3TP

b

2TP

-45.4 ± 3.4

-128.1 ± 11.2

-5.7 ± 0.1

13.6 ± 1.2

b

NA

NA

-4.7 ± 0.2

3.0 ± 2.0

c

NA

NA

NA

< 1.0

5ZA

3ZA

2ZA

-41.3 ± 3.4

-115.7 ± 11.3

-5.4 ± 0.1

9.9 ± 1.6

c

NA

NA

NA

< 1.0

c

3CA

NA

NA

NA

< 1.0

2CA

NA

NA

-5.2 ± 0.1

4.3 ± 1.4

2,3ZP

-58.2 ± 3.8

-156.8 ± 11.4

-9.6 ± 0.3

38.4 ± 1.4

5CA

ACS Paragon Plus Environment

10

Page 11 of 28

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

2,3ZG

-46.3 ± 2.8

-128.4 ± 9.3

-6.5 ± 0.1

19.0 ± 0.5

c

NA

NA

NA

< 1.0

c

NA

NA

NA

< 1.0

c

NA

NA

NA

< 1.0

c

NA

NA

NA

< 1.0

-58.5 ± 2.7

-163.0 ± 8.5

-8.40 ± 0.03

31.9 ± 0.6

-63.1

-174.8

-8.89

34.7

-59.6

-163.2

-8.98

39.5

SC1

-57.9 ± 4.1

-158 ± 13

-8.78 ± 0.14

38.4

SC2

-69.3 ± 3.1

-191 ± 10

-10.2 ± 0.16

44.8

SC3

-60.4 ± 4.0

-162 ± 12

-10.1 ± 0.22

45.3

SC4

-67.3 ± 3,9

-183 ± 12

-10.5 ± 0.22

46.6

2,3CP

2,3TG 2,3TP

2,3ZA CG

CG calc

d

SC1, calc

d

SC5 -58.1 ± 5.8 -156 ± 18 -9.65 ± 0.27 43.2 All uncertainties were calculated as in the text. Typically, each melt was repeated three times on separate days. For SC melts, n=1 and the reported errors are the uncertainties from the fit. a

The experimental Tm values are typically 4-7 °C higher than the tabulated values because experiments were

conducted at CT = 6 µM or 15 µM. For example, the Tm of the CG reference sequence at 6 µM is 37.7 °C. b

Experimental Tm value is between 5 and 15 °C; ∆H° and ∆S° are unreliable

c

Experimental Tm is lower than 5 °C; all thermodynamic values are unreliable

d

Values are from http://biophysics.idtdna.com.

Table 3. Average observed ∆∆H°, ∆∆S°, ∆∆G°37 values and calculated benchmark ∆Tm values for the indicated base pair or mismatch Pair or mismatch Exptl. C:G Z:P Z:G C:P T:P Z:A T:G Calc. C:G

∆∆H° (kcal/mole) 0 0.83 8.4 5.1 5.2 17. 11 -4.6

∆∆S° (cal/mole K) 0 5.4 24. 7.6 6.0 46 27 -12

∆∆G°37 (kcal/mole) 0 -0.85 0.89 2.7 3.5 3.3 2.8 -0.49

Benchmark ∆Tm (°C)a 0 +4.5 -5.5 -15 -18 -20 -17 +4.6

All values are relative to the values determined experimentally in this work for the common reference sequence with C replacing Z and G replacing P, GCCAGTTAA. a

The benchmark ∆Tm is calculated using the ∆∆X° values and arbitrary but reasonable reference melting

thermodynamic parameters: ∆H° = -60.00 kcal/mole, ∆S° = -163.25 cal/mole K, ∆G°37 = -9.368 kcal/mole and Tm = 37.00 °C at CT = 1 µM.

ACS Paragon Plus Environment

11

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 28

Thus, for experimental melting temperature values (measured at CT = 6 or 15 µM) between 5 and 15 °C, we provide ∆G°37 as well as Tm corrected to 1 µM CT. The Tm values are only a few degrees lower than the experimental Tm. Further, because the apparent (though inaccurate) ∆H° and ∆S° values are constrained to give the correct experimental Tm, the corrected Tm should be reasonably accurate. For experimental Tm values less than 5 °C, we indicate that the Tm at 1 µM is < 1.0 °C. It would be necessary to use longer oligonucleotides to determine ∆H° and ∆S° for the less stable mismatches, but from the point of view of primer and probe design, it is more useful to know which mismatches are the most stable, not which ones are the least stable. To estimate thermodynamic parameters for new sequences containing isolated Z:P pairs or Z- or P- containing mismatches, add the appropriate averaged ∆∆H°, ∆∆S°, and ∆∆G°37 values provided in Table 3 to ∆H°, ∆S°, and ∆G°37 values predicted for unmodified oligonucleotides, with C replacing Z and G replacing P. The values in Table 3 were derived from the singly-substituted non-self-complementary oligonucleotides. They all share a reference sequence GCCAGTTAA, which is a potential source of error. The experimental ∆H°, ∆S°, and ∆G°37 are 8%, 7%, and 6% smaller, respectively, than the predicted values. The observed values for the T:G mismatch do not agree with predictions based on SantaLucia and coworkers’32 nearest neighbor rules for the T:G pair, which give ∆∆H° = 15 kcal/mole, ∆∆S° = 37 cal/mole K, and ∆∆G°37 = 3.5 kcal/mole for the average of the 2TG, 3TG, and 5TG sequences. To the extent that the GCCAGTTAA reference sequence is atypical and the mismatches should more properly be compared to the predicted values, the agreement with the established TG parameters would be better, all the ∆∆X° values in Table 4 would be increased accordingly, and the benchmark ∆Tm values would be decreased. Future work with more sequences will resolve these issues and provide context-dependent ∆∆X° values. Finally, the self-complementary Z:P sequences, which all have two Z:P pairs, show that the tradeoff between enthalpic and entropic stabilization is context-dependent, but the ∆∆G°37 and ∆Tm values are in the range expected from the parameters given. As more data become available, predictive parameters that consider sequence context and multiple Z/P containing pairs can be generated.

ACS Paragon Plus Environment

12

Page 13 of 28

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment

13

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 28

Figure 2. Plots showing the ∆G°37, ∆H°, ∆S°, and Tm values for oligonucleotides with the Z:P pairs and mismatches shown. Note that the Z:P nucleobase pair provides the most negative (most stable) ∆G°37 value. Entropy-enthalpy compensation is apparent, as well as the increased relative stability for mismatches near the DNA end (position 2). The 2,3 series shows that the effects of two pairs are qualitatively additive, most notably in the substantial destabilization of double-mismatch oligonucleotides.

DISCUSSION These thermodynamic analyses found that overall, the order of stability of matches/mismatches is, at pH 7: Z:P > C:G > Z:G > C:P > T:G > T:P > A:Z > A:C. Thus, Z:P nucleobase pairs provide the most negative (most stable) ∆G°37 value. Indeed, Z:P pairs evidently contribute to duplex stability modestly more (∆∆G°37 = -0.85 kcal/mole) than the corresponding C:G pair, which is also joined by three hydrogen bonds. Further, Z:P pairs contribute to duplex stability substantially more than all the mismatches examined (all have ∆∆G°37 ≥ 0.9 kcal/mole). Therefore, the Z:P nucleobase pair appears to be a robust addition to the C:G and T:A pairs as part of an expanded genetic alphabet. The stability increment appears to be entropic in nature, i.e. ∆∆S° > 0 for hybridization relative to C:G containing DNA, meaning that either that the single stranded oligonucleotides containing Z or P are more ordered than the corresponding C/G strand, or that duplexes containing Z:P pairs are more disordered. In this vein, it is interesting that the crystal structures demonstrate that Z:P sequences can readily adopt both A and B helical forms.29 The substitutions in the major groove or subtle changes in minor groove geometry could disrupt ordered water preferentially in the duplexes, decreasing the entropy gain upon melting as water is released upon thermal melting. 42 Although previous work on a cationic 3aminopropyl substitution in the major groove has shown increased water uptake upon duplex formation, 43 hydration effects are likely to be very sensitive to the detailed geometry. We also note that the pKa values of Z and P are more closely matched than C and G. This leads to the intriguing possibility that a “low barrier hydrogen bond” joins the two (Figure 3). This can be viewed as a structure where the acid-base reaction has partly occurred, and can be modeled using resonance structures for Z:P pairs with the central H of the Watson-Crick face forming both sigma bonds and hydrogen bonds to the heteroatoms on each side. One might expect that the stable low barrier hydrogen bond would contribute to enthalpic stabilization, but this is balanced by the increased cost of charge dehydration.44 Interestingly, a crystal structure of the P heterocycle shows analogous hydrogen bonding patterns (Roberto Laos, Christos Lampropolous, Steven Benner, unpublished).

ACS Paragon Plus Environment

14

Page 15 of 28

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

Interestingly, the “worst” mismatch (meaning the mismatch that contributes most to duplex stability) appears to be between Z and G (∆∆G°37 = 0.89 kcal/mole relative to C:G). This is also the “worst” mismatch observed with polymerase studies.45

O

H

O N

O

N H

O N N

(i) R

Z

P

H

H N H

(ii) R

O

O

R

O

P

N

N

R

H N

O

Z

H

P

O

N H

N

H N

R

H N H G:T wobble

G:Z wobble

R

H N H

H N

H O N H

N

N

H N H

N N

N

O2N

H N H N

R

O

N N

H N

R

N

O

R

G:Z Hoogsteen

O

H N H

G:deprotonated Z mismatch

O

H N H

N N

N R

N

N

H N H

H N

N

H

O

H3C

R

N N

O

H

O N

N P:C wobble

O

O N

N H

(iii)

H

N R

O2N

N R

N

R

O N H

N

H N

H

O N

N

H N

O

Z

N

N

R

R

H3C

N H

N

H

O H N H

Z:P match following proton transfer

Z:P match in canonical protonation state

O2N

O N

N

"low barrier hydrogen bond"

H N

O

N

R

N

O N

N

N

N H

O H N H

N H

N R

N

N R

N O

H N

H N

O2N N R

H N H

H N

N H

N

N N R

N R

O

H H

protonated P:C mismatch

H H N H3C

O

A:Z mismatch

N N

H N N

N H

T:P mismatch

R

O

N R

O B:T wobble

Figure 3. (i) Because the pKas of Z and the conjugate acid of P are not as separated as in standard pairs, the central hydrogen bond might be seen as a “low barrier hydrogen bond”. Shown is the proton transfer and the accompanying resonance structures. (ii) The G:Z and G:T wobbles have analogous geometries, displacing their purine component to the minor groove. Alternative structures for the G:Z mismatch include a Hoogsteen structure, and a geometrically standard Watson-Crick pair following deprotonation of Z. (iii) A P:C mismatch might be a wobble analogous to the B:T mismatch, known in crystallographic studies.40 Alternatively, a P:C mismatch can form a Watson-Crick geometry if one of its components is protonated, although the pKa of protonated P (~5) is rather low. The T:P and A:Z mismatches are the weakest in this set. Apparently lone pair repulsion between the top pair of oxygens dominates the thermodynamics of T:P, and the A:Z mismatch has no redeeming nucleobase:nucleobase interactions.

ACS Paragon Plus Environment

15

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 28

Several models might account for the strength of the G:Z mismatch. First, a wobble structure can be drawn for the G:Z mismatch resembling the stable G:T wobble mismatch, with the purine or purine analog displaced towards the minor groove (Figure 3). The G:Z mismatch might be a Hoogsteen pair between Z and syn-G (Figure 3). The G:Z mismatch could also be stabilized by deprotonation of the nucleobase Z, which has a pKa measured to be ~7.8 when examined as an isolated nucleoside.46 This pKa is undoubtedly higher in the single strand when Z is the context of the polyanionic backbone of DNA, but since deprotonated Z is a perfect Watson-Crick match for G, deprotonation could be thermodynamically linked to base pair formation at pH near neutrality (Figure 3). This model is appealing in that it rationalizes why the G:Z mismatch is more stable than G:T, and it is consistent with an increase in G:Z mismatching by polymerases at higher pH. However, there is no reason to be confident that the structures formed under the constraints of a polymerase active site are the same as those formed in an unbound duplex.47 Careful measurement of the pH dependence of melting, or crystallographic or NMR studies, would help distinguish among these possible structures. The “worst” mismatch involving P is the P:C mismatch, although the P:C mispair (∆∆G°37 = 2.8 kcal/mole) is much less stable than G:Z mispair. This P:C mismatching is also seen in polymerase experiments, especially at low pH.38 Accordingly, it might be attributed to the protonation of the nucleobase P, or possibly C. In either case, the protonated P:C mismatch fits both size and hydrogen bonding complementarity rules (Figure 3). However, the P:C mispair might also reflect a wobble, analogous to the wobble pair seen for C:O6-methylG48 or the isoguanosine:T mismatch49, where the pyrimidine is displaced into the minor groove. Again, the pH dependence of melting and structural studies would help distinguish among these possibilities. Figure 3 also shows proposed structures for the much less stable P:T mispair, with opposed lone pairs on the oxygen atoms, and the A:Z mispair, with only one hydrogen bond. The P:T mismatch might be “improved” by protonation of one of the carbonyls in the major groove, but this is not likely near neutral pH. These observations on the local structures of mismatches notwithstanding, all of the mismatches appear to have a large context and/or position dependence. Further, the mismatches in the 2 position (next to the DNA end) are less destabilizing than those in internal positions, a pattern often seen in standard duplexes and for RNA mismatches.50 Fraying of the DNA ends could lead to partial disruption of both mismatched and perfect match penultimate base pairs, such that the difference in free energy between perfect match and mismatch is smaller than at internal positions. We have also noted elsewhere that introducing a negative charge into a nucleobase “stack” is generally destabilizing, and more so than when introducing a positive charge into a nucleobase stack.51 It is an interesting hypothesis, to be tested for later work, that a nucleobase pair with a negative charge is

ACS Paragon Plus Environment

16

Page 17 of 28

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

more sensitive to local context, as that charge may be managed quite differently by different nucleobase pairs above it and below it in the stack. Finally, the anomalous enthalpy and entropy contributions from P-containing mismatches in position 5 suggest that this sequence context may stabilize an alternative structure. One ultimate goal of biophysics as it is applied to this kind of synthetic biology is to provide the data required by computer programs to predict duplex stability, polynucleotide secondary structure, and eventually higher order structure in synthetic genetic systems. Obstructing this is, of course, the fact that “mismatch space” and “context space” become combinatorially large when nucleotides are added to the genetic alphabet, especially as all possible mismatches are considered in all possible contexts. Even without mismatches that do not involve charged bases, programs that attempt this for the four standard nucleotides often require experimental constraints to make usefully accurate predictions. They require at least dinucleotide parameters. A set of N nucleotide “letters” creates ½(N2 + N) perfect match dinucleotides (ACGT gives 10 unique dinucleotides, not 16, due to symmetry). There are 21 ACGTZP dinucleotides. There are ½ N2(N2+1) perfect match tetranucleotides, N2/2 mismatch pairs without considering context, and N2(N-1) dinucleotides containing one mismatch. For N = 6, the total number of parameters potentially of interest is in the thousands. Here, it is worth noting that several methods exist for estimating the thermodynamic parameters associated with this large number of matches and mismatches. These include the plotting of ln CT vs. 1/Tm, van’t Hoff analysis of complete melting curves (the method used here), and direct estimates of thermodynamics using differential scanning calorimetry. Each method has its advantages/disadvantages with respect to throughput, accuracy, the amounts of material required, the temperature-dependence of thermodynamic parameters, and management of possible breakdown in “two state” approximations. Further, the choice of method should reflect use, in particular, likely concentrations and temperatures where guidance from thermodynamic data is to be used. Turner and co-workers have shown that the different methods agree well,52 absent a breakdown in the two state approximation. The possible breakdown of the two state approximation is especially severe when the separated strands form single-stranded structure. Careful investigators consider this when constructing systems from which these thermodynamic parameters might be extracted.53 Notwithstanding the greatly enlarged "sequence space" created by an expanded alphabet, an enlargement that drives us to estimate thermodynamic parameters using the method that we have chosen for its high throughput, the increased information density made possible by the extra nucleotide letters in the duplex system has an advantage: It makes single-stranded structure and self-association much easier to avoid than it is with a four nucleotide alphabet.

ACS Paragon Plus Environment

17

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 28

AEGIS nucleobases are now becoming widely used to stabilize probe-target interactions or secondary structures in designed DNA structures. However, they are potentially even more useful in addressing the “negative design” problem to prevent unwanted pairing and to minimize the conformational ambiguity of functional nucleic acid aptamers and aptazymes.54 As AEGIS pairing is used in architectures that synthesis ultra-large DNA constructs,55,56 support nanostructures,57 and move into living cells.58 it may be desirable to revisit these studies using additional methods for extracting thermodynamic parameters. Based on the field’s experience with ACGT, further studies with other methods are not expected to alter the fundamental conclusions provided here, although they might provide results with improved accuracy and precision.

ACS Paragon Plus Environment

18

Page 19 of 28

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

Acknowledgements. The project depicted was sponsored by the Department of the Defense, the United States Army (W911NF-12-C-0059), the Defense Threat Reduction Agency (HDTRA1-13-1-0004), and DARPA. Work at Celadon Laboratories and the University of Maryland was supported by an SBIR award from the U.S. Army, (W911NF-12-C-0060). The content of the information does not necessarily reflect the position or the policy of the federal government, and no official endorsement should be inferred. The production of some of the materials was supported by NASA under award NNX15AF46G. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Aeronautics and Space Administration. Further materials in this publication were made possible through the support of a grant from Templeton World Charity Foundation, Inc. (0092/AB57). The opinions expressed in this publication are those of the author(s) and do not necessarily reflect the views of Templeton World Charity Foundation, Inc.

ACS Paragon Plus Environment

19

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 28

REFERENCES 1

Watson, J. D.; Crick, F. H. C. Molecular structure of nucleic acids. A structure for deoxyribose nucleic acid. Nature 1953, 171, 737-738.

2

Rich, A. On the problems of evolution and biochemical information transfer. in Horizons In Biochemistry (eds. M. Kasha, B. Pullmann) 103-126. Academic Press, New York, 1962.

3

Benner, S. A. Synthesis as a route to knowledge. Biological Theory 2013, 8, 357-367.

4

Geyer, C. R.; Battersby, T. R.; Benner, S. A. Nucleobase pairing in expanded Watson-Crick like genetic information systems. The nucleobases. Structure 2003, 11, 1485-1498.

5

Cheong, C.; Tinoco, Jr.; I, Chollet, A. Thermodynamic studies of base pairing involving 2,6diaminopurine. Nucl. Acids. Res. 1988, 16, 5115-5122.

6

Timofeev, E; Mirzabekov, A. Binding specificity and stability of duplexes formed by modified oligonucleotides with a 4096-hexanucleotide microarray. Nucl. Acids. Res. 2001, 29, 2626-2634.

7

Switzer, C. Y.; Moroney, S. E.; Benner, S. A. Enzymatic incorporation of a new base pair into DNA and RNA. J. Am. Chem. Soc. 1989, 111, 8322-8323.

8

Benner, S. A. Understanding nucleic acids using synthetic chemistry. Acc. Chem. Res. 2004, 37, 784797.

9

Elbeik, T.; Markowitz, N.; Nassos, P.; Kumar, U.; Beringer, S.; Haller, B.; Ng, V. Simultaneous runs of the Bayer VERSANT HIV-1 version 3.0 and HCV bDNA version 3.0 quantitative assays on the system 340 platform provide reliable quantitation and improved work flow. J. Clin. Microbiol. 2004, 42, 3120-3127.

10

Elbeik, T.; Surtihadi, J.; Destree, M.; Gorlin, J.; Holodniy, M.; Jortani, S. A.; Kuramoto, K.; Ng, V.; Valdes, R.; Valsamakis, A. Terrault, N.A. Multicenter evaluation of the performance characteristics of the Bayer VERSANT HCV RNA 3. 0 assay (bDNA). J. Clin. Microbiol. 2004, 42, 563-569

11

Nolte, F. S.; Marshall, D. J.; Rasberry, C.; Schievelbein, S.; Banks, G. G.; Storch, G. A.; Arens, M. Q.; Butler, R. S.; Prudent, J. R. MultiCode-PLx system for multiplexed detection of seventeen respiratory viruses. J. Clin. Microb. 2007, 45, 2779-2786.

12

Lee, W.-M.; Grindle, K.; Pappas, T.; Marshall, D. J.; Moser, M. J.; Beaty, E. L.; Shult, P. A.; Prudent, J. R.; Gern, J. E. High-throughput, sensitive, and accurate multiplex PCR-microsphere flow cytometry system for large-scale comprehensive detection of respiratory viruses. J. Clin. Microbiol. 2007, 45, 2626-2634.

13

Glushakova, L. G.; Sharma, N.; Hoshika, S.; Bradley, A. C.; Bradley, K. M.; Yang, Z.; Benner, S. A.; Detecting respiratory viral RNA using expanded genetic alphabets and self-avoiding DNA. Anal. Biochem. 2015, 489, 62-72.

ACS Paragon Plus Environment

20

Page 21 of 28

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

14

Johnson, S. C.; Marshall, D. J.; Harms, G.; Miller, C. M.; Sherrill, C. B.; Beaty, E. L.; Lederer, S. A.; Roesch, E. B.; Madsen, G.; Hoffman, G. L.; Laessig, R. H.; Kopish, G. J.; Baker, M. W.; Benner, S. A.; Farrell, P. M.; Prudent, J. R. Multiplexed genetic analysis using an expanded genetic alphabet. Clin. Chem. 2004, 50, 2019-2027.

15

von Krosigk, U.; Benner, S. A. pH-independent triple helix formation by an oligonucleotide containing a pyrazine donor-donor-acceptor base. J. Am. Chem. Soc. 1995, 117, 5361-5362.

16

Voegel, J. J.; Benner, S. A. Non-standard hydrogen bonding in duplex oligonucleotides. The base pair between an acceptor-donor-donor pyrimidine analog and a donor-acceptor-acceptor purine analog. J. Am. Chem. Soc. 1994, 116, 6929-6930.

17

Martinot, T. A.; Benner, S. A. Artificial genetic systems. Exploiting the "aromaticity" formalism to improve the tautomeric ratio for isoguanosine derivatives. J. Org. Chem. 2004, 69, 3972-3975.

18

Kim, H. J.; Leal, N. A.; Benner, S. A. 2'-Deoxy-1-methylpseudocytidine. A stable analog of 2 'deoxy-5-methylisocytidine. Bioorg. Med. Chem. 2009, 17, 3728-3732.

19

Rao, P.; Benner, S. A. Fluorescent charge-neutral analogue of xanthosine. Synthesis of a 2'deoxyribonucleoside bearing a 5-aza-7-deazaxanthine base. J. Org. Chem. 2001, 66, 5012-5015.

20

Joyce, C. M.; Steitz, T. A. Function and structure relationships in DNA polymerases. Ann. Rev. Biochem. 1994, 63, 777-822.

21

Yang, Z.; Chen, F.; Chamberlin, S. G.; Benner, S.A.. Expanded genetic alphabets in the polymerase chain reaction. Angew. Chem. Int. Ed. 2010, 49, 177-180.

22

Yang, Z.; Chen, F.; Chamberlin, S. G.; Alvarado, J. B.; Benner, S. A. Amplification, mutation, and sequencing of a six-letter synthetic genetic system. J. Am. Chem. Soc., 2011, 133, 15105-15112.

23

Chen, F.; Yang, Z. Y.; Yang, M. C.; Alvarado, J.B.; Wang, G. G.; Benner, S. A. Recognition of an expanded genetic alphabet by type-II restriction endonucleases and their application to analyze polymerase fidelity. Nucleic Acid Res. 2011, 39, 3949-3961.

24

Sefah, K. Yang, Z.; Bradley, K. M.; Hoshika, S.; Jimenez, E.; Zhu, G.; Shanker, S.; Yu, F.; Tan, W.; Benner, S. A. In vitro selection with artificial expanded genetic information systems (AEGIS). Proc. Natl. Acad. Sci. USA 2014, 111, 1449-1456.

25

Zhang, L.; Yang, Z.; Sefah, K.; Bradley, K. M.; Hoshika, S.; Kim, M.-J.; Kim, H.-J.; Zhu, G.; Jimenez, E.; Cansiz, S.; Teng, I.-T.; Champanhac, C.; McLendon, C.; Liu, C.; Zhang, W.; Gerloff, D. L.; Huang, Z.; Tan, W.-H.; Benner, S. A. Evolution of functional six-nucleotide DNA. J. Am. Chem. Soc. 2015, 137, 6734-6737.

26

Zhang, L.; Yang, Z.; Trinh, I. L.; Teng, I-T.; Wang, S.; Bradley, K. M.; Hoshika, S.; Wu, Q.; Cansiz, S.; Rowold, D. J.; McLendon, C.; Wu, Y.; Cui, C.; Liu, Y.; Liu, C.; Benner, S. A.; Tan, W.

ACS Paragon Plus Environment

21

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 28

Generating DNA aptamers with expanded nucleotides against Glypican-3-overexpressing tumor cells. Angew. Chem. 2016, 55, 12372-12375. 27

Biondi, E.; Lane, J. D.; Dasgupta, S.; Piccirilli, J. A.; Hoshika, S.; Bradley, K. M.; Krantz, B. A.; Benner, S. A. Aptamers targeting anthrax protective antigen from laboratory in vitro selection with an expanded genetic alphabet. Nucleic Acids Res. 2016, 44, 9565-9577.

28

Glushakova, L. G.; Bradley, A.; Bradley. K. M.; Alto, B. W.; Hoshika, S.; Hutter, D.; Sharma, N.; Yang, Benner, S. A. High-throughput multiplexed xMAP Luminex array panel for detection of twenty two medically important mosquito-borne arboviruses based on innovations in synthetic biology. J. Virol. Meth. 2015, 214, 60-74.

29

Georgiadis, M. M.; Singh, I.; Kellett, W. F.; Hoshika, S.; Benner, S. A.; Richards, N. G. J. Structural Basis for a Six Nucleotide Genetic Alphabet. J. Am. Chem. Soc. 2015, 137, 6947-6955.

30

Schutz, E.; von Ahsen, N. Spreadsheet software for thermodynamic melting point prediction of oligonucleotide hybridization with and without mismatches. Biotechniques 1999, 27, 1218-1224

31

Allawi, H. T.; SantaLucia, J. Nearest-neighbor thermodynamics of internal A·C mismatches in DNA. Sequence dependence and pH effects. Biochemistry, 1998, 37, 9435-9444.

32

Allawi, H. T.; SantaLucia, J. Thermodynamics of internal C:T mismatches in DNA. Nucleic Acids Res., 1998, 26, 2694-2701.

33

SantaLucia J. A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor thermodynamics. Proc. Natl Acad. Sci. USA 1998, 95, 1460–1465.

34

Bramsen, J. B.; Laursen, M. B.; Nielsen, A. F.; Hansen, T. B.; Bus, C.; Langkjær, N.;... & Odadzic, D. A large-scale chemical modification screen identifies design rules to generate siRNAs with high activity, high stability and low toxicity. Nucleic Acids Res. 2009, 37, 2867-2881.

35

Wolk, S. K.; Shoemaker, R. K.; Mayfield, W. S.; Mestdagh, A. L.; Janjic, N. Influence of 5-Ncarboxamide modifications on the thermodynamic stability of oligonucleotides. Nucleic Acids Res. 2015, 43, 9107–9122,

36

McTigue P. M.; Peterson R. J.; Kahn J.D. Sequence-dependent thermodynamic parameters for locked nucleic acid (LNA)-DNA duplex formation. Biochemistry 2004, 43, 5388-5405.

37

Owczarzy R.; You Y.; Groth C. L.; Tataurov A.V. Stability and mismatch discrimination of locked nucleic acid-DNA duplexes. Biochemistry 2011, 50, 9352-9367.

38

Kierzek E.; Malgowska M.; Lisowiec J.; Turner D.H.; Gdaniec Z.; Kierzek R. The contribution of pseudouridine to stabilities and structure of RNAs. Nucleic Acids Res. 2014 42, 3492-3501.

39

Dirks, R. M.; Lin, M.; Winfree, E.; Pierce, N. A. Paradigms for computational nucleic acid design. Nucleic Acids Res. 2004, 32, 1392–1403.

ACS Paragon Plus Environment

22

Page 23 of 28

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

40

Yang, Z.; Hutter, D.; Sheng, P.; Sismour, A. M.; Benner, S. A. Artificially expanded genetic information system: A new base pair with an alternative hydrogen bonding pattern; Nucleic Acids Res. 2006, 34, 6095.

41

McDowell, J. A.; Turner, D. H. Investigation of the structural basis for thermodynamic stabilities of tandem GU mismatches: Solution structure of (rGAG GU CUC) 2 by two-dimensional NMR and simulated annealing. Biochemistry 1996, 35, 14077-14089.

42

Spink, C. H.; Chaires, J. B. Effects of hydration, ion release, and excluded volume on the melting of triplex and duplex DNA. Biochemistry 1999, 38, 496-508.

43

Soto, A. M.; Kankia, B. I.; Dande, P.; Gold, B.; Marky, L. A. Thermodynamic and hydration effects for the incorporation of a cationic 3‐aminopropyl chain into DNA. Nucleic Acids Res. 2002, 30, 3171-3180.

44

Robinson, A. C.; Castañeda, C. A.; Schlessman, J. L. Structural and thermodynamic consequences of burial of an artificial ion pair in the hydrophobic interior of a protein. Proc. Nat. Acad. Sci. USA 2014, 111, 11685-11690.

45

Yang, Z.; Chen, F.; Alvarado, J. B.; Benner, S. A. Amplification, mutation, and sequencing of a sixletter synthetic genetic system. J. Am. Chem. Soc. 2011, 133, 15105-15112.

46

Hutter, D.; Benner, S. A. Expanding the genetic alphabet. Non-epimerizing nucleoside with the pyDDA hydrogen bonding pattern. J. Org. Chem. 2003, 68, 9839-9842.

47

Betz, K.; Malyshev, D.A.; Lavergne, T.; Welte, W.; Diederichs, K. Dwyer, T.J.; Ordoukhanian, P.; Romesberg, F.E.; Marx, A. KlenTaq polymerase replicates unnatural base pairs by inducing a Watson-Crick geometry. Nat. Chem. Biol. 2012, 8, 612.

48

Warren, J. J.; Forsberg, L. J.; Beese, L. S. The structural basis for the mutagenicity of O6-methylguanine lesions. Proc. Nat. Acad. Sci. USA 2006, 103, 19701-19706.

49

Robinson, H.; Gao, Y. G.; Bauer, C.; Roberts, C.; Switzer, C.; Wang, A. H. J. 2 '-Deoxyisoguanosine adopts more than one tautomer to form base pairs with thymidine observed by high-resolution crystal structure analysis. Biochemistry 1998, 37, 10897-10905.

50

Kierzek, R.; Burkard, M. E.; Turner, D. H. Thermodynamics of Single Mismatches in RNA Duplexes. Biochemistry 1999, 14214-14223.

51

Geyer, C. R.; Battersby, T. R.; Benner, S. A. Nucleobase pairing in expanded Watson-Crick like genetic information systems. The nucleobases. Structure 2003, 11, 1485-1498.

52

(a) Xia, T.; SantaLucia Jr, J.; Burkard, M. E.; Kierzek, R.; Schroeder, S. J.; Jiao, X.;... & Turner, D. H. Thermodynamic parameters for an expanded nearest-neighbor model for formation of RNA Duplexes with Watson-Crick base pairs. Biochemistry 1998, 37, 14719-14735. (b) Petersheim, M.;

ACS Paragon Plus Environment

23

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 28

Turner, D. H. Base-stacking and base-pairing contributions to helix stability: thermodynamics of double-helix formation with CCGG, CCGGp, CCGGAp, ACCGGp, CCGGUp, and ACCGGUp. Biochemistry 1983, 22, 256-263. 53

Wu, P.; Nakano, S.; Sugimoto, N. Temperature dependence of thermodynamic properties for DNA/DNA and RNA/DNA duplex formation. Eur. J. Biochem. 2002, 269, 2821–2830.

54

Dirks, R. M.; Lin, M.; Winfree, E.; & Pierce, N. A. Paradigms for computational nucleic acid design. Nucleic Acids Res. 2004, 32, 1392–1403.

55

Merritt, K. B.; Bradley, K. M.; Hutter, D.; Matsuura, D. J.; Rowold, D. J.; Benner S. A. Whole genes from autonomous self-assembly of synthetic oligonucleotides incorporating artificial nucleotides. Beilstein J. Org. Chem. 2014, 10, 2348-2360.

56

Bradley, K. M.; Benner, S. A. OligArch. A software package for the use of non-standard nucleotides to architect oligonucleotide fragments that autonomously assemble to give long DNA molecules. Beilstein J. Org. Chem. 2014, 10, 1826-1833.

57

Benner, S. A.; Yang, Z.; Chen, F. Synthetic biology, tinkering biology, and artificial biology. What are we learning? Comptes Rendus 2010, 14, 372-387.

58

Matsuura, M. F, Shaw, R. W.; Moses, J. D.; Kim, H. J.; Kim, M. J.; Kim, M. S.; Hohiska, S.; Karalkar, N.; Benner, S. A. Assays to detect the formation of triphosphates of unnatural nucleotides. Application to Escherichia coli nucleoside diphosphate kinase. ACS Synth. Biol. 2016, 5, 234–240.

ACS Paragon Plus Environment

24

Page 25 of 28

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

Figure 1. Two complementarity rules guide xNA base pairing: (a) size (large purines or analogs pair with small pyrimidines or analogs) and (b) hydrogen bonding (hydrogen bond donors, D, pair with hydrogen bond acceptors, A). Rearranging D and A groups on the bases gives an Artificially Expanded Genetic Information System (AEGIS). The chemical issues in the “first generation AEGIS” (left) are indicated in magenta; these motivated the Benner laboratory to create, over the past few years, a second generation AEGIS system (right). Electron density presented to the minor groove by the bases is believed to be a specificity determinant for polymerases. 111x40mm (300 x 300 DPI)

ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment

Page 26 of 28

Page 27 of 28

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

Figure 3. (i) Because the pKa’s of Z and the conjugate acid of P are not as separated as those of standard pairs, the central hydrogen bond might be viewed as a “low barrier hydrogen bond”. Shown is the proton transfer and the accompanying resonance structures. (ii) The G:Z and G:T wobbles have analogous geometry, displacing their purine component towards the minor groove. Alternative structures for the G:Z mismatch include a Hoogsteen structure, and a geometrically standard Watson-Crick pair following deprotonation of Z. (iii) The P:C mismatch might have a wobble structure analogous to the B:T mismatch, which is known from crystallographic studies. Alternatively, the P:C mismatch can have a Watson-Crick geometry if one of its components is protonated, although the pKa of protonated P (~5) is rather low. The T:P and A:Z mismatches are the weakest in this set. Apparently lone pair repulsion between the top pair of oxygens dominates the thermodynamics of T:P, and the A:Z mismatch has no redeeming nucleobase:nucleobase interactions. 206x177mm (300 x 300 DPI)

ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Graphical Abstract 345x140mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 28 of 28