Interrogating Key Positions of Size-Reduced TALE ... - ACS Publications

Nov 4, 2016 - Interrogating Key Positions of Size-Reduced TALE Repeats Reveals a. Programmable Sensor of 5‑Carboxylcytosine. Sara Maurer, Mario Gies...
0 downloads 0 Views 2MB Size
Subscriber access provided by The Bodleian Libraries of The University of Oxford

Letter

Interrogating Key Positions of Size-Reduced TALE Repeats Reveals a Programmable Sensor of 5-Carboxylcytosine Sara Maurer, Mario Giess, Oliver Koch, and Daniel Summerer ACS Chem. Biol., Just Accepted Manuscript • DOI: 10.1021/acschembio.6b00627 • Publication Date (Web): 04 Nov 2016 Downloaded from http://pubs.acs.org on November 7, 2016

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

ACS Chemical Biology is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 7

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Chemical Biology

TOC Figure (Tiff) 118x123mm (300 x 300 DPI)

ACS Paragon Plus Environment

ACS Chemical Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 7

Interrogating Key Positions of Size-Reduced TALE Repeats Reveals a Programmable Sensor of 5-Carboxylcytosine Sara Maurer, Mario Giess, Oliver Koch and Daniel Summerer* Department of Chemistry and Chemical Biology, TU Dortmund University, Otto-Hahn-Str. 6, 44227 Dortmund (Germany)

Supporting Information Placeholder ABSTRACT: Transcription-activator-like effector (TALE) proteins consist of concatenated repeats that recognize consecutive canonical nucleobases of DNA via the major groove in a programmable fashion. Since this groove displays unique chemical information for the four human epigenetic cytosine nucleobases, TALE repeats with epigenetic selectivity can be engineered, with potential to establish receptors for the programmable decoding of all human nucleobases. TALE repeats recognize nucleobases via key amino acids in a structurally conserved loop whose backbone is positioned very close to the cytosine 5-carbon. This complicates the engineering of selectivities for large 5-substituents. To interrogate a more promising structural space, we engineered size-reduced repeat loops, performed saturation mutagenesis of key positions and screened a total of 200 repeat-nucleobase interactions for new selectivities. This provided insights into the structural requirements of TALE repeats for affinity and selectivity, revealed repeats with improved or relaxed selectivity, and resulted in the first selective sensor of 5-carboxylcytosine. 5-methylcytosine (5mC, Fig. 1a) is an epigenetic nucleobase that exists in the human genome and plays important roles in gene expression regulation, development and disease.1 Moreover, ten-eleven translocation (TET) enzymes can oxidize 5mC to 5-hydroxymethylcytosine (5hmC), 5-formylcytosine (5fC) and 5-carboxylcytosine (5caC) (Fig. 1a)2-6 that are intermediates of an active demethylation pathway.7 Moreover, the 5substituents of the oxidized 5mC derivatives are chemically unique8-10 and may thus act as independent epigenetic marks. Indeed, profiling studies have revealed nucleobase-specific spatial genomic distributions and varying levels between cell types.7 Other studies have revealed differential interactions of oxidized 5mC derivatives with methyl-CpG-binding domains (MBD),11 eukaryotic RNA polymerase,12 and nucleosomes in vitro,13 and unique interaction profiles with chromatin proteins have been found in fishing studies.14, 15 A deeper understanding of the cellular functions of each of these epigenetic cytosine nucleobases demands strategies for their selective chemical transformation or molecular recognition that can be exploited for targeting and analysis. A particularly successful strategy relies on selective nucleobase deamination with bisulfite coupled to DNA sequencing, and its combination with redox16, 17 and/or tagging18 reactions to

achieve additional selectivity. Tagging chemistries can also be used for affinity enrichment,19, 20 and chemistries for alternative analyses have been described.21, 22 Finally, protein receptors such as antibodies and enzymes with selectivity for individual cytosine nucleobases are available (several of these strategies can be used individually or in alternative combinations, for excellent overviews with more room for references, see23, 24). Though most of the aforementioned approaches are widely used key drivers in epigenetic research, their underlying transformation or recognition strategies are typically only selective for specific epigenetic nucleobases and require a combination with canonical nucleobase analysis to reveal the sequence position of the epigenetic nucleobases in DNA. Approaches are thus under development that aim at a direct recognition of the sequence of both canonical and epigenetic nucleobases in DNA, offering potential for a simple analysis of natural, untreated DNA. One attractive strategy relies on processive DNA-reading molecules such as nanopores25, 26 or DNA polymerases27 for single molecule sequencing with direct sensing of epigenetic nucleobases. A complementary strategy is the use of engineered transcription-activator-like effectors (TALEs)28 as receptors for the programmable recognition of both canonical and epigenetic nucleobases.29-31 TALEs are DNAbinding proteins that recognize one strand of a DNA duplex via the major groove32, 33 that displays unique chemical information not only for each canonical base pair, but also for each epigenetic cytosine nucleobase.8-10 TALEs consist of multiple concatenated repeats, each of which selectively recognizes one nucleobase through one of two variable amino acids (repeat variable di-residue, RVD) positioned in a loop motif (Fig. 1b, c). This recognition follows a simple code, with the RVDs NI, NN, NG and HD (amino acid positions 12 and 13 within the TALE repeat) preferentially binding A, G, T, and C, respectively (Fig. 1d).34, 35 The major groove-binding mode of TALEs offers the potential to engineer a toolbox of repeats with selectivity for epigenetic nucleobases that can be employed for the design of programmable DNA binders with potential for diverse epigenetic technologies.36, 37 We have recently identified TALE repeats with selectivity for C, 5mC and 5hmC by testing of natural repeats and individual repeat mutants.29-31 These enabled the design of TALEs for the detection of 5mC and 5hmC at single, user-defined genomic positions by sequence isolation.29 New and improved selectivities for human

ACS Paragon Plus Environment

Page 3 of 7

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Chemical Biology

cytosine nucleobases require additional engineering efforts, however, crystal structures show that the RVD-bearing repeat loop of TALEs features a structurally conserved backbone that is positioned very closely to the 5-position of cytosines (3.4 3.7 Å between Cα and amide C=O of amino acid 13 and the 5methylgroup of T or 5mC, Fig. 1b-c). This suggests insufficient space for the accommodation of larger 5-substituents, and a low potential of natural repeats for the engineering of altered loop geometries and new side chain interactions to the nucleobase.

5caC in DNA duplexes (pdb entries 4I9V, 4QC7, and 4PWM).8-10 For 5hmC, two observed conformations are shown, with an arrow indicating conformational flexibility.

We previously tested individual amino acids with altered or removed hydrogen bonding capabilities at repeat position 12 in size-reduced loop deletion mutants, but identified only repeats with lowered affinity and reduced selectivity.29 Nevertheless, size-reduced TALE repeats should generally have a high potential for the design of selective interactions with oxidized 5mC derivatives, since crystal structures of DNA duplexes containing single 5hmC, 5fC or 5caC revealed regular duplex geometries, suggesting the ability of regular complex formation with TALEs.8-10 Moreoever, the 5-substituents differ in their steric demand, flexibility, and in their capabilities to accept or donate hydrogen bonds (Fig. 1e-g and explanations below). To comprehensively interrogate a new structural space with increased room for 5-substituents and a higher potential for new nucleobase interaction modes, we here combined saturation mutagenesis of loop position 12 with one or two deletions at adjacent loop positions 13 and 14 (Figure 1e-f, positions shown in bold).29 We established full selectivity profiles of the resulting 40 mutants for the five human cytosine nucleobases by testing a total of 200 repeat-nucleobase interactions. For engineering and in vitro testing of the size-reduced TALE repeats, we constructed TALE libraries by NNK codon mutagenesis on single repeat plasmids and subsequent hierarchical golden gate assembly39 using vector pGFP-ENTRY (SI Fig. 1). This enables high-level expression and purification of soluble TALE mutants with N-terminal GFP domain, shortened, AvrBs3-type TALE N-terminus (+136 amino acids starting from canonical repeat 1) and a C-terminal His6 tag in E. coli (SI Fig. 2-3, TALE genes were based on a Xanthomonas axonopodis TALE scaffold30). Using this approach, we designed and expressed TALEs targeting the 17 nt sequence Hey2_b (Fig. 2a). TALEs beared a single mutant repeat opposite nucleotide position 6 of the target oligonucleotide duplexes that contained C, 5mC, 5hmC, 5fC, or 5caC in later in vitro assays (Fig. 2a). To study the selectivity of TALE repeats, we employed an assay based on the ability of TALEs to control DNA replication (Fig. 2b)31 in a synthetic oligonucleotide primer-template complex. This assay inversely correlates TALE affinity with the accessibility of the complex for the DNA polymerase, and thus with the amount of formed primer extension product which can be resolved by PAGE (Fig. 2c. For experimental details, see the SI).

Figure 1. Human cytosine nucleobases as well as overall TALE constitution and mechanisms of nucleobase recognition. a.) Chemical structures of C, 5mC, 5hmC, 5fC, and 5caC. b.) Superimposed crystal structures of TALE repeat loops with RVDs HD 32 binding to C (black) and NG to T (grey, both pdb entry 3V6T ). c.) as b.), but with RVD HD binding to C (black) and NG to 5mC (red, pdb entry 4GJP38). Hydrogen bonds are shown as dotted lines. Distances between two backbone positions and the 5methylgroup of T/5mC are shown as dotted yellow lines and indicated. d.) Cartoon showing features of employed TALEs. Amino acid sequence of one representative TALE repeat is shown with RVD amino acids 12 and 13 marked. Natural TALE repeat selectivities are shown on the right. e.) Interaction of RVD HD 32 with C (pdb entry 3V6T). Positions targeted for mutagenesis and deletion in this study are bold. f.) Interaction of RVD NG with 5mC (pdb entry 4GJP).38 g.) Crystal structures of 5hmC, 5fC, and

We first tested a library with mutations at position 12 and two deletions at positions 13 and 14. In natural TALE repeats, amino acid 12 (H or N) does not directly interact with the nucleobase, but its side chain conformation is highly conserved, and it is engaged in a hydrogen bond with the backbone carbonyl of A8 (Fig. 1b). In RVD NG opposite 5mC, an alternative hydrogen bond to the amide carbonyl of S11 is found, with a however highly similar backbone conformation (Fig. 1c, f). Amino acid 13 is involved in C or 5mC recognition, either by a hydrogen bond to the N4 amino group of C in RVD HD (Fig. 1e), or by accommodating the 5-methyl group of 5mC (or T) by the missing side chain in RVD NG (Fig. 1f). Amino acid 14 is a conserved glycine positioned further away from the cytosine nucleobase and rather oriented towards the 5´-phosphate. The most striking finding of this screen was an overall low affinity of all TALEs in the library (Fig. 2d). Re-

2

ACS Paragon Plus Environment

ACS Chemical Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

peats N** and H**29 (* = amino acid deletion) exhibited the weakest binding and no selectivity at all, whereas most other repeats showed a weak selectivity with 5mC being the nucleobase bound with highest affinity (Fig. 2d).

Page 4 of 7

Figure 2. Profiling size-reduced TALE repeats for selective recognition of human cytosine nucleobases. a.) Target sequences used in this study. Oligonucleotide templates for DNA polymerase accessibility assay shown in Fig. 2b contained a single variable nucleotide and were annealed to a complementary primer (direction of primer extension indicated with an arrow). b.) Principle of DNA polymerase accessibility assay shown for C and 5mC. Binding sites of TALE and DNA polymerase in the primer template complex overlap, leading to competition. Binding and non-binding of TALE to DNA is shown with a black and grey arrow, respectively. c) PAGE analysis of primer extension reactions as shown in Fig. 2b containing 8.325 nM primer-template complex in presence or absence of 416 nM TALE_Hey2_b and 25 mU Klenow fragment of E. coli DNA Polymerase I (5´-3´-exo-; KF(exo-)) as indicated. Primer and extension product are marked with a black and grey arrow, respectively. d.) Saturating mutation analysis of amino acid position 12 in size reduced TALE repeats with deletions (*) at positions 13 and 14 in respect to recognition of human cytosine nucleobases. Primer extensions were carried out in duplicates as in Fig. 2c with 83.3 nM TALE. Product formation of a reaction w/o TALE was set to 100 % and % primer extension in presence of TALE with respective repeat mutant opposite respective cytosine nucleobase is shown as a measure of affinity.

5fC was the second strongest bound nucleobase in cases of K**, F**, A**, G**, P* and C**, whereas C, 5hmC and 5caC were only weakly or not bound by most repeats. Interestingly, the highest affinity was observed with the expectedly smallest repeat G**, with an otherwise related selectivity profile. These data indicate a dominating effect of the two deletion mutations on TALE affinity, and moderate influence of amino acid 12 on the selectivity. We next tested a library with mutations at position 12 and only a single mutation at position 13. This library contains repeat N* that we previously found to bind C, 5mC, and 5fC, but not 5hmC and 5caC (Fig. 3). Replacement of N12 with H or Q (that both can form hydrogen bonds) led to repeats with reduced affinity for all five nucleobases and a profile in that 5mC and 5fC were bound best.29 This profile is related to N* and also reminiscent of the unrelated G**, suggesting general similarities of 5mC and 5fC in terms of repeat interactions. Indeed, crystal structures of all five cytosine nucleobases in DNA8-10 suggest that their steric demand may not simply correlate with the size of the 5-substituent (being similar for 5hmC and 5fC, but not for 5fC and 5mC), but may also be dictated by their flexibility: the formyl and carboxyl-groups of 5fC and 5caC are fixed in one plane with the nucleobase by hydrogen bonding to the cytosine 4-amino group, whereas the hydroxymethyl group of 5hmC is found in two conformations (Fig. 1g). In direction of the repeat loop backbone, 5mC and 5fC thus present small, nonpolar methyl or methylene moieties, whereas 5hmC and 5caC are polar and can orient an oxygen atom towards the backbone (Fig. 1f-g). Anionic side chains as in D* and E* resulted in related profiles and low affinity correlating with size (Fig. 3). However, the cationic R* and K* exhibited marked differences: whereas R* showed the familiar profile and moderate affinity, K* bound to C, 5mC and 5fC with high and to 5caC with moderate affinity, but only weakly bound 5hmC. This selectivity makes repeat K* the best available sensor repeat for 5hmC (compare to N*, Fig. 3). Next, larger aromatic amino acids generally led to weak affinities in the order F* >> Y* > W*,

3

ACS Paragon Plus Environment

Page 5 of 7

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Chemical Biology

suggesting a role of the Y phenolic hydroxyl and an overall correlation with size. To rationalize the particularly low affinity of W*, homology modelling studies were performed. In the final model with DNA containing 5mC, W12 is forced between the nucleobase and loop backbone, resulting in an unlikely downwards movement of the backbone. In contrast, W12 is occupying the nucleobase position in absence of DNA (SI Fig. 4).

Figure 3. Mutational analysis of amino acid position 12 in sizereduced TALE repeats with a deletion (*) at position 13 in respect to selective recognition of human cytosine nucleobases by the assay shown in Fig. 2b (duplicate experiments).

Other interesting correlations were found in the large group of amino acids with nonpolar side chains, i.e. I*, V*, L*, M*, A*, and C*. The latter four shared similar profiles with relatively high overall affinity and strongest binding to 5mC and 5fC, whereas I* and V* showed very low affinity. This is interesting, since L and I share similar physicochemical properties and mutual exchanges are usually tolerable in proteins. However, V and I have a β-branch in common that does not occur in the other four amino acids. In models, L*, M* and A* are placed in similar orientations when interacting with 5mC (SI Fig. 5). In contrast, V* occupies a different part of the binding pocket, suggesting that the β-branched side chain cannot be placed in a similar conformation as the other amino acids and that it may thus impose a different steric demand on 5mC (SI Fig. 5). Strikingly, the small nonpolar repeat G* as well as the hydroxyl-bearing S* and T* differed strongly from all previous repeats (Fig. 3). The removal of a methyl group from A* to G* led to strong binding of all five cytosine nucleobases29 and this effect was even more pronounced for S* and T*. Structural models suggest that the helix in these three repeats is shortened and the loop size reduced, and that the helix terminus in S* and T* is stabilized by a hydrogen bond to the A8 carbonyl group (SI Fig. 6). This is a well-known capping motif for the stabilization of helix termini40 and differs from the corresponding N12 - A8 hydrogen bond in repeat NG (Fig. 1b) by a much shorter distance (2.7 versus 3.3 Å; note that the isosteric C* without pronounced hydrogen bonding capability showed reduced affinity). G* also showed the small loop with highly similar conformation, leaving sufficient space even for the carboxyl group of 5caC (SI Fig. 6). A particular selectivity profile was obtained with repeat P* that bound C, 5mC, 5hmC and 5fC with high to moderate affinity, whereas 5caC was only weakly bound (Fig. 3). This tendency was more pronounced at higher TALE concentrations (Fig. 4a) making P* the first sensor repeat of 5caC (for Ki constants, see Fig. 4b and SI Fig. 7; for additional electromobility shift assay data, see SI Fig. 9). We modeled this mutant repeat to rationalize its unusual selectivity. Models of interactions between 5caC and P* or G* (as reference with capability of binding 5caC) suggest that the pyrrolidin ring of P is facing the 5-carboxyl group and that a backbone conformation similar to G*, S* or T* is not possible for P*, since the hydrophobic P would be placed near the backbone carbonyl groups. This suggests a selective inability of 5caC binding by steric hindrance (Fig. 4c). To get insights into the binding of P* to the canonical nucleobases A, G, and T, we performed additional Ki measurements as above. Interestingly, T was bound significantly weaker than 5mC, suggesting more intricate interaction differences between P* and 5mC/T than previously found for the natural RVD NG that binds mC and T with equal affinities - likely by a simple accommodation of the 5-methyl group (Fig. 4b and SI Fig. 8).38 In contrast, both purines were only very poorly bound, indicating a general pyrimidine-selectivity of repeat P* (Fig. 4b and SI Fig. 8). In conclusion, we here comprehensively interrogated a new structural space of size-reduced TALE repeat loops to study the selectivity mechanisms of TALEs in recognizing human

4

ACS Paragon Plus Environment

ACS Chemical Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

cytosine nucleobases. We performed saturation mutagenesis of two key positions in two deletion mutants and screened a total of 200 repeat-nucleobase interactions for selectivity. This provided insights into the structural requirements of repeats for affinity and selectivity that could be further rationalized by modeling studies in several cases. Importantly, the studies revealed several repeats with useful new selectivities. The most striking finding was the first sensor repeat for 5caC that exerts its selectivity by a particular proline mutation that may result in sufficient steric demand to repulse the 5-carboxyl group of 5caC, whereas all other 5-substituents can be accommodated. Given the extremely low levels of 5caC in human genomic DNA,7 a direct application of TALEs bearing P* repeats to biological samples however requires further investigations and potential improvements.

Figure 4. Selective Sensing of 5caC by TALE repeat P*. a.) DNA polymerase accessibility assay with 8.325 nM primer-template complex in presence of 416 nM TALE_Hey2_b_P* and 25 mU KF(exo-) (duplicate experiments). b.) Ki constants for P* binding to indicated nucleobase from DNA polymerase accessibility assay as in Fig. 4a with different TALE concentrations. c.) Structural model of repeat P* interacting with 5caC superimposed on model of repeat G* interacting with 5caC.

In contrast, replacement of P12 by G, S, or T resulted in a particular loop backbone conformation that provides room for any 5-substituent. Such universal repeats could facilitate TALE applications that require ignoring epigenetic variability in target DNA sequences, or could be used for the detection of epigenetic cytosine nucleobases with maximized resolution.41 Another important finding is the improved 5hmC versus 5caC selectivity of repeat K* that makes it the best available 5hmC sensor and will be useful for genomic detection of this nucleobase with higher selectivity.29 Taken together, our study provides an extended repeat toolbox for the design of programmable DNA binders with expanded nucleobase selectivity with broad potential for epigenetics research.

ASSOCIATED CONTENT Supporting Information. Experimental procedures, oligonucleotide and protein sequences, data of protein expressions/purifications, biochemical assays and modelling studies. This material is available free of charge via the Internet at http://pubs.acs.org.”

AUTHOR INFORMATION Corresponding Author *[email protected]

Funding Sources

Page 6 of 7

No competing financial interests have been declared. This work was supported by grants from the Deutsche Forschungsgemeinschaft (Su 726/5-1 in SPP1784 and Su 726/6-1 in SPP1623).

A CKNOWLEDGMENT We acknowledge support by the TU Dortmund, the Zukunftskolleg of the University of Konstanz and the Konstanz Research School Chemical Biology. We thank A. J. Bogdanove and D. F. Voytas for TALE assembly plasmids obtained via Addgene.

REFERENCES 1. Law, J. A.; Jacobsen, S. E., Establishing, maintaining and modifying DNA methylation patterns in plants and animals. Nat. Rev. Genet. 2010, 11, 204-220. 2. Tahiliani, M.; Koh, K. P.; Shen, Y.; Pastor, W. A.; Bandukwala, H.; Brudno, Y.; Agarwal, S.; Iyer, L. M.; Liu, D. R.; Aravind, L.; Rao, A., Conversion of 5-methylcytosine to 5hydroxymethylcytosine in mammalian DNA by MLL partner TET1. Science 2009, 324 , 930-935. 3. Kriaucionis, S.; Heintz, N., The nuclear DNA base 5hydroxymethylcytosine is present in Purkinje neurons and the brain. Science 2009, 324, 929-930. 4. Ito, S.; Shen, L.; Dai, Q.; Wu, S. C.; Collins, L. B.; Swenberg, J. A.; He, C.; Zhang, Y., Tet proteins can convert 5-methylcytosine to 5formylcytosine and 5-carboxylcytosine. Science 2011, 333, 1300-1303. 5. He, Y. F.; Li, B. Z.; Li, Z.; Liu, P.; Wang, Y.; Tang, Q.; Ding, J.; Jia, Y.; Chen, Z.; Li, L.; Sun, Y.; Li, X.; Dai, Q.; Song, C. X.; Zhang, K.; He, C.; Xu, G. L., Tet-mediated formation of 5-carboxylcytosine and its excision by TDG in mammalian DNA. Science 2011, 333, 1303-1307. 6. Pfaffeneder, T.; Hackner, B.; Truss, M.; Munzel, M.; Muller, M.; Deiml, C. A.; Hagemeier, C.; Carell, T., The Discovery of 5Formylcytosine in Embryonic Stem Cell DNA. Angew. Chem. Int. Ed. Engl. 2011, 50, 7008-7012. 7. Shen, L.; Song, C. X.; He, C.; Zhang, Y., Mechanism and function of oxidative reversal of DNA and RNA methylation. Annu. Rev. Biochem. 2014, 83, 585-614. 8. Lercher, L.; McDonough, M. A.; El-Sagheer, A. H.; Thalhammer, A.; Kriaucionis, S.; Brown, T.; Schofield, C. J., Structural insights into how 5-hydroxymethylation influences transcription factor binding. Chem. Commun. (Camb) 2014, 50, 1794-1796. 9. Renciuk, D.; Blacque, O.; Vorlickova, M.; Spingler, B., Crystal structures of B-DNA dodecamer containing the epigenetic modifications 5-hydroxymethylcytosine or 5-methylcytosine. Nucleic Acids Res. 2013, 41, 9891-9900. 10. Szulik, M. W.; Pallan, P. S.; Nocek, B.; Voehler, M.; Banerjee, S.; Brooks, S.; Joachimiak, A.; Egli, M.; Eichman, B. F.; Stone, M. P., Differential Stabilities and Sequence-Dependent Base Pair Opening Dynamics of Watson-Crick Base Pairs with 5-Hydroxymethylcytosine, 5Formylcytosine, or 5-Carboxylcytosine. Biochemistry 2015, 54, 12941305. 11. Du, Q.; Luu, P. L.; Stirzaker, C.; Clark, S. J., Methyl-CpGbinding domain proteins: readers of the epigenome. Epigenomics 2015, 7, 1051-1073. 12. Kellinger, M. W.; Song, C. X.; Chong, J.; Lu, X. Y.; He, C.; Wang, D., 5-formylcytosine and 5-carboxylcytosine reduce the rate and substrate specificity of RNA polymerase II transcription. Nat. Struct. Mol. Biol. 2012, 19, 831-833. 13. Ngo, T. T.; Yoo, J.; Dai, Q.; Zhang, Q.; He, C.; Aksimentiev, A.; Ha, T., Effects of cytosine modifications on DNA flexibility and nucleosome mechanical stability. Nat. Commun. 2016, 7, 10813. 14. Spruijt, C. G.; Gnerlich, F.; Smits, A. H.; Pfaffeneder, T.; Jansen, P. W.; Bauer, C.; Munzel, M.; Wagner, M.; Muller, M.; Khan, F.; Eberl, H. C.; Mensinga, A.; Brinkman, A. B.; Lephikov, K.; Muller, U.; Walter, J.; Boelens, R.; van Ingen, H.; Leonhardt, H.; Carell, T.; Vermeulen, M., Dynamic readers for 5-(hydroxy)methylcytosine and its oxidized derivatives. Cell 2013, 152, 1146-1159. 15. Iurlaro, M.; Ficz, G.; Oxley, D.; Raiber, E. A.; Bachman, M.; Booth, M. J.; Andrews, S.; Balasubramanian, S.; Reik, W., A screen for hydroxymethylcytosine and formylcytosine binding proteins suggests functions in transcription and chromatin regulation. Genome Biol. 2013, 14, R119.

5

ACS Paragon Plus Environment

Page 7 of 7

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Chemical Biology

16. Booth, M. J.; Branco, M. R.; Ficz, G.; Oxley, D.; Krueger, F.; Reik, W.; Balasubramanian, S., Quantitative sequencing of 5methylcytosine and 5-hydroxymethylcytosine at single-base resolution. Science 2012, 336, 934-937. 17. Booth, M. J.; Marsico, G.; Bachman, M.; Beraldi, D.; Balasubramanian, S., Quantitative sequencing of 5-formylcytosine in DNA at single-base resolution. Nat. Chem. 2014, 6, 435-440. 18. Yu, M.; Hon, G. C.; Szulwach, K. E.; Song, C. X.; Zhang, L.; Kim, A.; Li, X.; Dai, Q.; Shen, Y.; Park, B.; Min, J. H.; Jin, P.; Ren, B.; He, C., Base-resolution analysis of 5-hydroxymethylcytosine in the mammalian genome. Cell 2012, 149, 1368-1380. 19. Pastor, W. A.; Pape, U. J.; Huang, Y.; Henderson, H. R.; Lister, R.; Ko, M.; McLoughlin, E. M.; Brudno, Y.; Mahapatra, S.; Kapranov, P.; Tahiliani, M.; Daley, G. Q.; Liu, X. S.; Ecker, J. R.; Milos, P. M.; Agarwal, S.; Rao, A., Genome-wide mapping of 5hydroxymethylcytosine in embryonic stem cells. Nature 2011, 473, 394397. 20. Song, C. X.; Szulwach, K. E.; Fu, Y.; Dai, Q.; Yi, C.; Li, X.; Li, Y.; Chen, C. H.; Zhang, W.; Jian, X.; Wang, J.; Zhang, L.; Looney, T. J.; Zhang, B.; Godley, L. A.; Hicks, L. M.; Lahn, B. T.; Jin, P.; He, C., Selective chemical labeling reveals the genome-wide distribution of 5hydroxymethylcytosine. Nat. Biotechnol. 2011, 29, 68-72. 21. Xia, B.; Han, D.; Lu, X.; Sun, Z.; Zhou, A.; Yin, Q.; Zeng, H.; Liu, M.; Jiang, X.; Xie, W.; He, C.; Yi, C., Bisulfite-free, base-resolution analysis of 5-formylcytosine at the genome scale. Nat. Methods 2015, 12, 1047-1050. 22. Samanta, B.; Seikowski, J.; Hobartner, C., Fluorogenic Labeling of 5-Formylpyrimidine Nucleotides in DNA and RNA. Angew. Chem. Int. Ed. Engl. 2016, 55, 1912-1916. 23. Wu, H.; Zhang, Y., Charting oxidized methylcytosines at base resolution. Nat. Struct. Mol. Biol. 2015, 22, 1-6. 24. Booth, M. J.; Raiber, E. A.; Balasubramanian, S., Chemical methods for decoding cytosine modifications in DNA. Chem. Rev. 2015, 115 2240-2254. 25. Stoddart, D.; Heron, A. J.; Mikhailova, E.; Maglia, G.; Bayley, H., Single-nucleotide discrimination in immobilized DNA oligonucleotides with a biological nanopore. Proc. Natl. Acad. Sci. U S A 2009, 106, 7702-7707. 26. Wescoe, Z. L.; Schreiber, J.; Akeson, M., Nanopores Discriminate among Five C5-Cytosine Variants in DNA. J. Am. Chem. Soc. 2014, 136, 16582-16587. 27. Flusberg, B. A.; Webster, D. R.; Lee, J. H.; Travers, K. J.; Olivares, E. C.; Clark, T. A.; Korlach, J.; Turner, S. W., Direct detection of DNA methylation during single-molecule, real-time sequencing. Nat. Methods 2010, 7, 461-465. 28. Boch, J.; Bonas, U., Xanthomonas AvrBs3 family-type III effectors: discovery and function. Ann. Rev. Phytopath. 2010, 48, 419-36. 29. Rathi, P.; Maurer, S.; Kubik, G.; Summerer, D., Isolation of Human Genomic DNA Sequences with Expanded Nucleobase Selectivity. J. Am. Chem. Soc. 2016, 138, 9910-9918. 30. Kubik, G.; Batke, S.; Summerer, D., Programmable sensors of 5-hydroxymethylcytosine. J. Am. Chem. Soc. 2015, 137, 2-5. 31. Kubik, G.; Schmidt, M. J.; Penner, J. E.; Summerer, D., Programmable and highly resolved in vitro detection of 5-methylcytosine by TALEs. Angew. Chem. Int. Ed. Engl. 2014, 53, 6002-6006.

32. Deng, D.; Yan, C.; Pan, X.; Mahfouz, M.; Wang, J.; Zhu, J. K.; Shi, Y.; Yan, N., Structural basis for sequence-specific recognition of DNA by TAL effectors. Science 2012, 335, 720-723. 33. Mak, A. N. S.; Bradley, P.; Cernadas, R. A.; Bogdanove, A. J.; Stoddard, B. L., The Crystal Structure of TAL Effector PthXo1 Bound to Its DNA Target. Science 2012, 335, 716-719. 34. Moscou, M. J.; Bogdanove, A. J., A simple cipher governs DNA recognition by TAL effectors. Science 2009, 326, 1501. 35. Boch, J.; Scholze, H.; Schornack, S.; Landgraf, A.; Hahn, S.; Kay, S.; Lahaye, T.; Nickstadt, A.; Bonas, U., Breaking the code of DNA binding specificity of TAL-type III effectors. Science 2009, 326, 15091512. 36. Kubik, G.; Summerer, D., TALEored Epigenetics: A DNABinding Scaffold for Programmable Epigenome Editing and Analysis. Chembiochem 2016, 17, 975-980. 37. Kubik, G.; Summerer, D., Deciphering Epigenetic Cytosine Modifications by Direct Molecular Recognition. ACS Chem. Biol. 2015, 10, 1580-1589. 38. Deng, D.; Yin, P.; Yan, C.; Pan, X.; Gong, X.; Qi, S.; Xie, T.; Mahfouz, M.; Zhu, J. K.; Yan, N.; Shi, Y., Recognition of methylated DNA by TAL effectors. Cell Res. 2012, 22, 1502-1504. 39. Cermak, T.; Doyle, E. L.; Christian, M.; Wang, L.; Zhang, Y.; Schmidt, C.; Baller, J. A.; Somia, N. V.; Bogdanove, A. J.; Voytas, D. F., Efficient design and assembly of custom TALEN and other TAL effectorbased constructs for DNA targeting. Nucleic Acids Res. 2011, 39, e82. 40. Ermolenko, D. N.; Thomas, S. T.; Aurora, R.; Gronenborn, A. M.; Makhatadze, G. I., Hydrophobic interactions at the Ccap position of the C-capping motif of alpha-helices. J. Mol. Biol. 2002, 322, 123-135. 41. Kubik, G.; Summerer, D., Achieving single-nucleotide resolution of 5-methylcytosine detection with TALEs. Chembiochem 2015, 16, 228-231.

Table of Contents Artwork

6

ACS Paragon Plus Environment