Deciphering Epigenetic Cytosine Modifications by Direct Molecular

Apr 21, 2015 - Department of Chemistry, Zukunftskolleg, and Konstanz Research School Chemical Biology, University of Konstanz,. Universitätsstraße 1...
2 downloads 0 Views 2MB Size
Subscriber access provided by Nanyang Technological Univ

Review

Deciphering Epigenetic Cytosine Modifications by Direct Molecular Recognition Grzegorz Kubik, and Daniel Summerer ACS Chem. Biol., Just Accepted Manuscript • DOI: 10.1021/acschembio.5b00158 • Publication Date (Web): 21 Apr 2015 Downloaded from http://pubs.acs.org on April 28, 2015

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

ACS Chemical Biology is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 12

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Chemical Biology

78x47mm (300 x 300 DPI)

ACS Paragon Plus Environment

ACS Chemical Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 12

Deciphering Epigenetic Cytosine Modifications by Direct Molecular Recognition

Grzegorz Kubik and Daniel Summerer* Department of Chemistry, Zukunftskolleg, and Konstanz Research School Chemical Biology, University of Konstanz, Universitätsstraße 10, 78457 Konstanz (Germany)

ABSTRACT: Epigenetic modification at the 5-position of cytosine is a key regulatory element of mammalian gene expression with important roles in genome stability, development, and disease. The repertoire of cytosine modifications has long been confined to only 5-methylcytosine (mC), but has recently been expanded by the discovery of 5hydroxymethyl-, 5-formyl-, and 5-carboxylcytosine. These are key intermediates of active mC demethylation, but may additionally represent new epigenetic marks with distinct biological roles. This leap in chemical complexity of epigenetic cytosine modifications has not only created a pressing need for analytical approaches that enable to unravel their functions, it has also created new challenges for such analyses with respect to sensitivity and selectivity. The crucial step of any such approach that defines its analytic potential is the strategy used for the actual differentiation of the cytosine 5-modifications from one another, and this selectivity can in principle be provided either by chemoselective conversions or by selective, molecular recognition events. While the former strategy has been particularly successful for accurate genomic profiling of cytosine modifications in vitro, the latter strategy provides interesting perspectives for simplified profiling of natural, untreated DNA, as well as for emerging applications such as single cell analysis and the monitoring of cytosine modification in vivo. We here review analytical techniques for the deciphering of epigenetic cytosine modifications with an emphasis on approaches that are based on the direct molecular recognition of these modifications in DNA.

1. Introduction

ability, and requires unique mechanisms of introduction and removal.1

1.1 Epigenetic Modification of Mammalian DNA

The best-studied epigenetic DNA nucleobase is 5methylcytosine (mC, Scheme 1) that is established as major regulator of gene expression and is critically involved in developmental and pathophysiological processes. In mammalian genomes, mC is predominantly found in CpG dinucleotides, where ~70 - 80 % of CpGs are methylated in somatic cells. mC exhibits specific distribution patterns in genomes, dependent on factors such as the genetic function and CpG density of loci, with its presence at promoters and an associated transcriptional silencing as a common setting.2, 3

The genetic information of cells is stored in the sequence of the four canonical DNA nucleobases adenine (A), cytosine (C), guanine (G) and thymine (T). This information can be decoded with high fidelity via selective Watson-Crick base pairing, enabling the faithful synthesis of the transcriptome and proteome. However, it is evident that the genomes of multicellular organisms contain additional information that is decoded differently, and that allows cells to differentiate and to respond to environmental influences by adaptive regulation of gene expression. This information is stored in the form of epigenetic DNA nucleobases, and these represent the only regulatory information of gene expression that is directly deposited onto DNA itself. Compared to regulatory mechanisms based on transcription factor binding or histone modification, this feature results in marked differences with respect to stability and inherit-

Key to the dynamic regulation of the cellular functions of mC is its introduction and removal. mC is introduced into DNA by DNA-methyltransferase (DNMT)-catalyzed methylation of C using S-adenosylmethionine (SAM) as methyl donor. DNMTs can introduce mC de novo, but can also maintain it after replication, making mC a relatively stable, inheritable nucleobase.2 However, while the process of C-methylation is relatively well characterized, the demethylation of mC has previously been

ACS Paragon Plus Environment

1

Page 3 of 12

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Chemical Biology

thought to occur only passively by replication and absence of maintenance methylation (passive dilution, PD). In contrast, possible mechanisms of active demethylation have long remained elusive. 1.2 The Discovery of Oxidized 5-Methylcytosine Derivatives Recently, the oxidized mC derivative 5hydroxymethylcytosine (hmC) was discovered in mammalian genomic DNA, and was identified to be the product of an enzymatic conversion catalyzed by the αketoglutarate- and Fe(II)-dependent ten-eleven translocation dioxygenase 1 (TET1) (hmC, Scheme 1).4, 5 In this reaction, a dioxygen molecule is transferred to an α-ketoglutarate molecule and mC via highly reactive Fe(IV)/Fe(III) oxo species to form hmC and succinate via oxidative decarboxylation. 6, 7 Shortly after the discovery of hmC, two additional derivatives, 5-formylcytosine (fC), and 5-carboxylcytosine (caC) were identified, both being generated by TET dioxygenases from hmC via additional, iterative oxidation steps (Scheme 1)8-10 (for recent reviews on the biology of TET dioxygenases, see11, 12)

Scheme 1. Current models of active mC demethylation pathways. TET: ten-eleven translocation dioxygenase. TDG: thymine DNA glycosylase. BER: base excision repair. DNMT: DNA methyltransferase. AID: activation-induced deaminase. APOBEC: apolipoprotein B mRNA-editing enzyme complex. SMUG1: Single-strand selective monofunctional uracil DNA glycosylase. The overall genomic content of the three new nucleobases has been studied in a variety of cell types. Though variable from cell type to cell type, hmC has been found to be the most abundant and stable of the three nucleobases, with contents in the lower doubledigit percent range relative to mC in several mammalian cell types. fC and caC occur less frequently, with fC contents often being one to three orders of magnitude

lower than hmC, and caC being roughly one additional order of magnitude lower than fC, again with variability.8-10, 13 It is a general hallmark of oxidized C derivatives that their abundance depends on the cell type. For example, several types of cancer cells have been shown to exhibit low hmC contents, whereas brain cells have very high contents.11, 12 This can be a consequence of cell type-specific TET regulation, but has also led to the notion that differential hmC contents may be the consequence of differential kinetics of maintenance methylation and subsequent oxidation, and generally depend on the proliferation rate of cells.14 A number of genomic profiling studies have also provided first maps of the spatial distribution of hmC, fC, and caC in genomes of embryonic stem cells. These revealed specific occurrences at different promoter types, transcription start sites, and gene bodies.15-19 The discovery of hmC, fC, and caC has spurred numerous studies to unravel their biological function, with their involvement in active mC demethylation processes being a particular focus. A variety of mechanisms have been proposed for this, with the excision of fC and caC by thymine-DNA-glycosylase (TDG) under formation of abasic sites, and the subsequent restoration of C by base excision repair (BER) being the only mechanism that has been confirmed in vitro and in vivo (Scheme 1).8, 9, 20 Proposed alternative mechanisms that have not been proven are a direct decarboxylation of caC,21 an enhanced replicative PD by reduced activity of the maintenance DNMT1 on hemihydroxymethylated compared to hemimethylated CpG dincucleotides,22, 23 C restoration via BER following hmC deamination and glycosylation involving AID/APOBEC and TDG or SMUG1,24 and a DNMT3a/DNMT3b mediated dehydroxymethylation of hmC to C in the absence of SAM (Scheme 1).25 The 5-substituents of hmC, fC, and caC create chemically unique structures in the DNA major groove26-29 that represent potential sites for protein recruitment or repulsion. Besides being intermediates in active demethylation processes, hmC, fC, and caC may thus represent epigenetic marks in their own right with unique interaction capabilities to previously known and/or so far uncharacterized reader proteins. The generally high abundance and stability of hmC and the high levels of fC found at certain loci15 support this scenario, and recent fishing studies revealed first candidate proteins.30-32 2. New Challenges for Deciphering Epigenetic Cytosine Modifications The need to unravel the functional involvement of hmC, fC, and caC in biological processes does not only demand advanced methodologies for epigenetic nucleobase analysis - it has also significantly raised the bar in respect to their selectivity and sensitivity. Instead of

ACS Paragon Plus Environment

2

ACS Chemical Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

only C and mC, a total of five cytosine nucleobases must now be differentiated from one another, and this increase in chemical complexity has triggered extensive efforts to further advance previously established methodologies of mC analysis, but also motivated new approaches that have previously not been pursued. At the heart of any such approach is the mechanism used for the molecular differentiation of the five nucleobases in DNA, since this step crucially defines its analytical value. An ideal approach should reveal the level of each of the five cytosine nucleobases in an unbiased, quantitative manner at user-defined nucleotide positions, and with a high dynamic range. Level information - i.e. the presence of a nucleobase at a given position in percent - is important, since DNA samples are usually a collection of genomes from a larger number of cells with heterogenous epigenetic modification. The approach should provide full coverage of any desired genomic region, offer maximal (ideally single nucleotide) resolution, and should be strand-specific to provide information on the modification symmetry. Moreover, manipulation (e.g. chemical treatment or amplification) of the DNA sample should be minimal to avoid bias and/or destruction of the sample, and to provide technical ease. From a chemical standpoint, current differentiation strategies can be categorized into two groups that make use of either chemoselective conversions or of selective, molecular recognition events. Both general strategies offer complementary inherent potentials to resolve the aforementioned analytical challenges, and there is no single strategy that would be able to fully resolve all of them. For example, approaches based on chemoselective conversion currently offer the highest resolution and level quantitativeness of all methods, and have recently been shown to be amenable for modifications that offer selectivity also in the context of all five nucleobases.12, 33 In contrast, approaches based on proteins that are capable of a direct differentiation between epigenetic cytosine modifications by direct, selective recognition have a high potential for particularly simple analyses, and are directly applicable to natural, non-manipulated DNA. 2. 1 Approaches Based on Chemoselective Conversion A large variety of analytical methods involving one or more chemoselective conversion steps have been introduced that all utilize a relatively small number of actual reaction types, with bisulfite (BS) conversion34 being by far the most impactful one. Heating DNA with sodium bisulfite (NaHSO3) leads to deamination of C to U, but not of mC to T. This conversion of a difference in epigenetic modification to a canonical nucleobase mutation with altered base pairing selectivity can be readily detected by common sequence analysis techniques, including PCR and traditional or next-generation DNA

Page 4 of 12

sequencing (NGS). Moreover, the reactivity of hmC, fC, and caC in BS reactions has been studied, and revealed that hmC forms a cytosine-5-methylsulfonate (CMS) adduct (that exhibits unaltered base pairing selectivity).35 In contrast, fC and caC are deaminated to U after deformylation and decarboxylation, respectively9, 16 (i.e. C, fC and caC all read as U, whereas mC and hmC both read as C in downstream analysis). This lack of selectivity of BS conversion in the context of the full set of cytosine modifications has prompted strategies to integrate additional conversion steps prior to BS treatment that provide increased selectivity. These either rely on redox reactions for the interconversion of 5modifications into one another and/or on selective tagging reactions. Selective oxidation of hmC to fC with potassium perruthenate or reduction of fC to hmC with sodium borohydride prior to BS conversion enable the determination of hmC (oxBS)16 and fC (redBS)15 levels by comparative analysis of treated and non-treated samples. Moreover, the use of a TET dioxygenase (that oxidizes all 5-modifications to caC) has been employed in combination with a prior enzymatic glucosylation of hmC by T4 β-glucosyltransferase36 (T4 BGT). Glucosylation protects hmC from TET oxidation, as well as from decarboxylation and deamination in the subsequent BS conversion, and thus is the only nucleobase that reads as C (TAB-Seq).37 Complementary tagging chemistries have been used for fC and caC in combination with BS conversion to prevent their deamination, and enable determination of their levels by comparative analysis. fC can be derivatized to oximes with aminooxy-functionalized reactants (fCAB-Seq),17 and caC can be derivatized with primary amines to amides via 1ethyl-3-(3-dimethylaminopropyl)carbodiimid (EDC)activation (caCAB-Seq).38 The decisive advantages of BS-based approaches are single nucleotide resolution and quantitative level analysis of the target nucleobase in a strand-specific manner. However, BS treatment also has intrinsic drawbacks, such as sample destruction and a reduction of the sequence complexity of the sample with an associated reduced selectivity of downstream analyses, as well as comparably complicated and costly procedures. These properties have been a motivation for the development of more simple and non-destructive approaches based on affinity enrichment. Here, the aforementioned tagging approaches have proven highly useful, since they allow the nucleobase-selective introduction of functional groups for conjugation reactions (or of unique recognition sites), DNA affinity enrichment, and analysis by e.g. PCR or NGS.17, 18, 39-41 Finally, the natural sensitivity of restriction endonucleases for cytosine 5-modifications can be exploited for the selective nucleolytic cleavage of the DNA backbone and downstream analysis. Such approaches are used for the analysis of both mC and hmC, partially in combination with T4 BGT to achieve selectivity between the

ACS Paragon Plus Environment

3

Page 5 of 12

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Chemical Biology

two nucleobases.42-46 For recent reviews covering the approaches of this chapter in more detail, see ref.33, 47 2. 2 Approaches Based on Direct Molecular Recognition Approaches based on proteins that are capable to selectively recognize epigenetic nucleobases are not inherently dependent on sample manipulations and thus have a particular potential for simple and unbiased analyses of native DNA.

Figure 1. Geometries of Watson-Crick base pairs between C, mC, hmC, fC, or caC and G as found in crystal structures of the Dickerson-Drew dodecamer (pdb entries 436D, 4C63, 4I9V, 4QC7, and 4PWM, respectively).26-28 Cytosine 5-positions are marked with an arrow, hydrogen bonds in fC and caC are shown as dotted red lines. C-G hydrogen bonds are not shown. Figure has been prepared with PyMOL Molecular Graphics System, Version 1.7.4 Schrödinger, LLC. The structural differences between the five cytosine nucleobases provide potential for a sufficient selectivity, both in single and double stranded DNA. Generally, crystal structures of the Dickerson-Drew dodecamer duplex containing a single mC, hmC, fC or caC opposite a G reveal regular Watson-Crick base pairing, with similar overall base pair geometries that position the 5modification in the major groove (Figure 1)26-28 (for the particular case of densely fC-modified sequences, more pronounced structural changes have been ob-

served).29 caC has thereby been shown to increase the melting temperature of duplexes, whereas the other modifications exhibit only subtle effects.28, 48 In contrast to this relatively moderate impact of single modifications on the overall structure and stability of DNA duplexes, marked differences exist between the mC, hmC, fC, and caC moieties themselves, offering a high potential for selective recognition. Besides the inherent differences between methyl-, hydroxyl-, formyl-, and carboxyl-groups in their capability to accept or donate hydrogen bonds, there are obvious differences in steric demand. Moreover, even fC and hmC, though relatively similar in size, exhibit marked differences in terms of their conformational flexibility. The hydroxymethyl sp3 carbon of hmC has been found in crystal structures in two different conformations, suggesting a certain flexibility,28 whereas the C=O bond of the formyl sp2 carbon of fC is forced into one plane with the cytosine heterocycle by an intramolecular hydrogen bond with the N4 exocyclic amino group (Figure 1). This type of interaction is also observed for caC and has profound impact on the pKa of the fC and caC nucleobases.48, 49 Though a larger variety of proteins differentiate between C and mC, only few protein scaffolds - despite the aforementioned structural differences - have so far proven a potential for pronounced recognition selectivity in the context of all five cytosine nucleobases. These can be divided into two groups. One group is consisting of proteins that act as binders, and that are typically used in affinity enrichment assays. These can bind DNA either without or with constrained sequence selectivity (i.e. in the context of short and fixed recognition sequences such as palindromes), or they bind DNA with programmable sequence selectivity (a related potential may be offered by triplex-forming oligonucleotides).50 The second group consists of proteins that act as processive readers and recognize nucleobases in a continuous manner as DNA is passing through the protein. These are the central tools of recently introduced single molecule sequencing techniques. 2. 2. 1 Protein Binders Without or With Fixed Sequence Selectivity A long established approach for the typing and profiling of mC has been the use of antibodies. These can be employed as general binders for genome-wide affinity enrichment of mC-containing DNA and subsequent analysis by e.g. qPCR or NGS in methylated DNA immunoprecipitation assays (MeDIP, Scheme 2). MeDIP assays are simple and cost-effective, however - as any non-sequence-selective affinity enrichment approach they do not provide the high resolution and quantitativeness of BS-based methods and are not strandspecific. The resolution of MeDIP is defined by the fragment size of the genomic DNA library (typically hundreds of base pairs), since an mC at any position of

ACS Paragon Plus Environment

4

ACS Chemical Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

a fragment will result in a positive signal. Moreover, the large size of antibodies results in negative bias for densely modified regions. Consequently, MeDIP reveals only the (potentially biased) relative abundance of mC in rather large genomic windows. Nevertheless, their simplicity and cost-effectiveness makes such assays widely used, and antibodies for the complete set of cytosine modifications have been raised and employed for their genomic profiling.19, 51 Interestingly, apart from direct recognition, antibodies have also been used in combination with chemical conversion to improve the specificity of hmC analyses. For example, BS-mediated formation of CMS-adducts has been introduced for improved recognition with an anti-CMS antibody.39 An alternative strategy compared to the use of the highly adaptable antibody scaffold has been the use of unmodified, natural DNA binding proteins. A long established approach relies on the use of the methylCpG-binding domain (MBD) of MeCP2 for affinity enrichment. In contrast to anti mC-antibodies, the sequence selectivity of this MBD allows for mC analysis only in the context of CpG, but has been shown to provide better coverage.52 The binding of this and other MBD to at least mC and hmC has been studied, and it has been shown that some MBD selectively bind mC, whereas others bind both mC and hmC. Moreover, mutations have been identified that influence this selectivity.30, 53 However, the true potential of MBD to provide full selectivity in the context of all five cytosine nucleobases is currently unknown.

Scheme 2. Workflow of the MeDIP assay. Enriched DNA can be analyzed by multiple methods, including qPCR (MeDIP-PCR) and NGS (MeDIP-Seq). mC is shown as black spheres, hmC as an exemplary noncognate nucleobase of the antibody is shown as blue sphere.

Page 6 of 12

A second protein that has been employed for hmC analysis by a combination approach is J-binding protein 1 (JBP1), a protein from Trypanosomes that binds to glucosylated 5-hydroxymethyluracil in its natural context, but is cross-reactive to glucosylated hmC. This enables affinity enrichment of hmC-containing DNA via T4 BGT mediated glucosylation.41. Besides the use for in vitro analyses, a widely exploited advantage of protein binders without sequence selectivity is their capability to visualize the content of individual cytosine modifications in cellular DNA by immunostaining.54, 55 2. 2. 2 Protein Binders With Programmable Sequence Selectivity In contrast to proteins without or with limited sequence selectivity, a complementary potential lies in proteins that provide programmable sequence selectivity. These promise an increased resolution (defined by their interaction range with DNA, i.e. ~20 - 30 nt), and a direct enrichment of only the target sequence of interest. After a long period in that zinc finger proteins have been the only option for the programmable design of DNA binding domains,56 transcription-activator-like effectors (TALEs) have recently emerged as a new alternative57, 58. TALEs consist of concatenated repeats, each of which recognizes one nucleobase through a repeat variable di-residue (RVD). RVDs with the amino acids NI, NN (NH), NG and HD thereby recognize A-, G-, T-, and C, respectively.59-61 TALEs are strandspecific DNA major groove binders that recognize pyrimidines via the face of the 5-position. RVD NG binds T (or mC) through a hydrophobic interaction between the 5-methyl group and the glycine Cα-methylene, whereas RVD HD recognizes C through a hydrogen bond between the aspartate and the cytosine 4-amino group (Figure 2A).62, 63

Figure 2. TALEs as programmable binders for the analysis of cytosine modifications. A: Interaction of RVD HD (amino acids 12 and 13 of TALE repeat) with C as found in a crystal structure of a TALE-DNA complex (pdb entry 3V6T)62. Hydrogen bonds are shown as dotted green lines. Figure has been prepared with PyMOL Molecular Graphics System, Version 1.7.4 Schrödinger, LLC. B: Selectivity of TALE repeats with RVDs HD, NG, and N* (* = amino acid deletion) for the differentiation between C, mC, and hmC in DNA replication as-

ACS Paragon Plus Environment

5

Page 7 of 12

ACS Chemical Biology 64

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

says. Figure was adapted from Kubik et. al. C: Overall concept. Natural or engineered TALE repeats provide selectivity for cytosine modifications and can be used for their sequence selective analysis by integration into full-length TALE proteins. Single or concatenated TALE repeats are colored according to their selectivity, hmC as exemplary epigenetic nucleobase is shown as blue sphere.

Direct differentiation between single genomic C and mC has been reported in vitro 65 based on the mC sensitivity of RVD HD.66, 67 Quantitative detection of mC levels at single positions, and little dependence of mC sensitivity on the sequence or on the position of the mC (for N-terminal and central positions) was observed in model systems.65, 68 Though RVD HD differentiates only between C and mC and cannot differentiate between mC and oxidized mCs, the engineering of RVDs holds potential to provide new, individual selectivities for each of the five cytosine nucleobases, that is, an expanded programmability of nucleobase recognition. In a pilot study involving C, mC, and hmC, first RVDs with individual selectivities were identified (Figure 2B).64 Though not yet as developed as antibodies, TALEs may thus be reengineered to a stage where they provide selectivity in the context of all five cytosine modifications. They could then serve as modular binders for affinity enrichment that integrate canonical and epigenetic nucleobase selectivity (Figure 2C). Notably, programmable sequence selectivity is the prerequisite for targeting epigenetic nucleobases at user-defined loci in vivo. The combination of TALEs with functional protein domains such as fluorophores69-71 or TET dioxygenases72 thus offers potential for the in vivo detection of cytosine modifications, or for coupling their presence to the control of specific chromatin properties. 2. 2. 3. Nanopores The analytical value of approaches based on BSconversion and on affinity enrichment via tagging or direct protein recognition is tightly linked to the method used for downstream DNA analysis. Consequently, these approaches have directly benefitted from the dramatic performance leap of DNA sequencing enabled by NGS techniques. Another jump in performance is anticipated from the application of third generation (i.e. single molecule) sequencing techniques to the analysis of epigenetic nucleobases. This promises the direct analysis of DNA from single cells, without any chemical conversion or amplification steps. Key to these approaches are proteins that read canonical DNA sequences in a processive manner, and that have the additional ability to directly recognize epigenetic nucleobases.

the ionic current flows exclusively through the pore in an applied electric potential (Figure 3A). However, charged polymers such as DNA can also migrate through the pore, which modulates the ionic current in a way that is dependent on the structure of the monomers (e.g. nucleotides) present in the pore. Protein nanopores have been particularly successful in this approach, since they can be produced with structural homogeneity by protein expression. Moreover, they can be redesigned with exquisite molecular precision by genetic engineering on the basis of available crystal structural information. The overall topology of nanopores thereby features an entrance, a larger vestibule, and a constriction (Figure 3B). The principle of nanopore sequencing has first been demonstrated with the pore-forming toxin αhaemolysine (α-HL) from Staphylococcus aureus73 This heptameric, mushroom-shaped pore complex comprises a large vestibule connected to a β-barrel with hydrophilic inner surface that serves as transmembrane domain.74 A critical property of the pore channel is its overall dimension. The β-barrel of α-HL has an inner diameter of 2.6 nm, with a constriction at the connection to the vestibule with a width of only 1.4 nm. These dimensions allow ssDNA to translocate through the channel and in principle enable the differentiation of nucleobases present at the constriction.75 Biotinylated ssDNA strands have been positioned in the pore by using a bound streptavidine moiety as stopper, and measurements in this setup have revealed sufficient ability of the α-HL channel to directly differentiate between the four canonical nucleobases76 as well as between mC and hmC.77 Moreover, it has been shown that an additional tagging step (bisulfite-mediated thiol substitution) can be used to create adducts with hmC that can be differentiated from other nucleobases with improved confidence.78 However, a challenge with αHL has been the fast kinetics of DNA translocation, which result in only small changes in current and hamper reliable, continuous sequencing of free DNA. Controlling the kinetics of DNA translocation has therefore been an emphasis of developments, with the use of highly processive phi29 DNA polymerase as a particularly successful strategy.79 Free, binary complexes of DNA polymerase and DNA are thereby captured at the pore entrance with the DNA polymerase acting as stopper and the DNA threading through the pore. This enables a translocation of ssDNA through the pore in a kinetically controlled manner.

One of these approaches is based on nanopores, either introduced into solid materials or formed by proteins. When a single pore is present in an insulating barrier between two chambers filled with electrolyte,

ACS Paragon Plus Environment

6

ACS Chemical Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 12

cule sequencing.80 In reference oligonucleotides, each nucleobase caused a characteristic, distinguishable ionic current signature, resulting in call accuracies ranging from 92 - 98 % (Figure 3D). Since these accuracies are comparably low and show dependence on the neighboring nucleobases, it is not yet clear how this approach will perform in the de novo sequencing of complex, non-tagged DNA samples. Nevertheless, it offers an attractive perspective to resolve key challenges of epigenetic analysis, most importantly the direct detection of all five cytosine nucleobases in only one assay, and without any sample manipulation. 2. 2. 4 DNA Polymerases Figure 3. Nanopore based single molecule DNA sequencing. A: Cartoon of the MspA nanopore immobilized in a membrane. B: General principle of conductivity measurements with nanopores. C: Use of DNA polymerase for kinetic control of DNA translocation through a nanopore. DNA polymerase-DNA complex is thereby captured at the entrance of the pore and acts as stopper. DNA polymerase is shown in dark grey, cytosine modification is shown as small grey sphere. Direction of DNA translocation during primer extension is indicated by grey arrows. D: Ionic current signatures of single C, mC, hmC, fC, and caC observed during continuous sequencing using the setup shown in Figure 3C. Figure was adapted from Wescoe et. al.80 This strategy has also been used in the context of Mycobacterium smegmatis porin A (MspA), a second protein pore that has recently been introduced for single molecule DNA sequencing (Figure 3C). MspA exhibits favorable pore dimensions with a very short channel and a constriction of only 1.2 nm width (Figure 3B).81 This channel accommodates a much smaller number of nucleotides at a time than the rather long β-barrel channel of α-HL. This results in less noise caused by nucleotides in the channel that all modulate the current and overlay the modulation of the nucleotide of interest, that is, the one located in the constriction.82 MspA has further been engineered by replacing positively charged with neutral amino acids in the constriction, and by introducing multiple negatively charged amino acids in the vestibule to enable efficient entry and translocation of DNA. Similar as α-HL, MspA has been shown to differentiate between C and mC when present in an immobilized ssDNA strand.83 However, a later combination of MspA with phi29 DNA polymerase for control of DNA translocation kinetics enabled the direct differentiation not only between the four canonical nucleobases,84 but also between C, mC, and hmC during continuous sequencing.85, 86 These breakthrough developments culminated in the proof that this strategy provides sufficient selectivity to fully discriminate all five cytosine nucleobases during continuous single mole-

Particular attractive sensors for epigenetic nucleobase recognition are DNA polymerases, since these are central enzymes of DNA sequencing and other fundamental DNA analysis techniques, which promises a direct implementation of additional selectivities. This potential has been exploited in single-molecule, real-time (SMRT) DNA sequencing.87 This sequencing technique offers very high read lengths and the direct analysis of DNA without any pretreatment, and has already been successfully used in whole human genome sequencing, with insights into structural variants that had not previously been provided by short-read sequencing techniques.88 In SMRT sequencing, single engineered phi29 DNA polymerase molecules are immobilized in socalled zero-mode waveguides (ZMWs), zeptolitre-sized nanostructures that allow for fluorescence detection with minimal background signal.

Figure 4. Single molecule real time (SMRT) sequencing. A: A single DNA polymerase is immobilized on a glass slide in a cavity (grey) and can perform processive DNA synthesis on primer template complexes. Fluorescence signals are recorded from the bottom. B: Phospholinked dNTP bear a fluorophor-containing moiety (red star) at the γ-phosphate. Nucleophilic attack by the 3´-hyroxyl function of the terminal primer nucleotide leads to release of the fluorophore. B = nucleobase. The DNA polymerase performs processive DNA synthesis on single DNA molecules using four phospholinked deoxynucleotides that are individually labeled with fluorophores via the γ-phosphate (of hexaphosphate units). DNA synthesis can be continuously moni-

ACS Paragon Plus Environment

7

Page 9 of 12

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Chemical Biology

tored, as fluorescent pulses are generated by the temporal presence of the fluorophores during nucleotide binding and template specific incorporation, until subsequent release of the labeled pentaphosphate. The observed fluorescence signals exhibit a unique kinetic signature dependent on the bypassed, templating nucleobase (in particular differing in the duration between two fluorescent pulses, the interpulse duration, IPD), and this has been shown to be also valid for epigenetic modifications (that increase the IPD).89, 90 IPD differences are observed at the site of modification, but also several nucleotide positions after, in agreement with the DNA contact range of the DNA polymerase.91 Analysis of the signatures of mC, hmC, fC, and caC revealed moderate IPD increases for mC and hmC compared to C, with relatively low differences between each other, whereas fC and particularly caC result in more pronounced IPD increases. This makes a direct detection of mC and hmC challenging, but it has been shown that TET-mediated oxidation of mC to caC can be used to increase the effect for higher confidence,92 and this provides full selectivity in bacterial genomes that do not contain hmC, fC, and caC. An alternative strategy has been the enlargement of hmC for more pronounced IPD increases by T4 BGT mediated tagging. Analogously to the hMe-Seal protocol (see above),40 hmC has been glycosylated with azidoglucose, derivatized by strain-promoted azide alkyne cycloaddition reaction with a dibenzocyclooctyne (DIBO) moiety linked to biotin via a disulfide-containing linker, enriched by immobilization on streptavidine beads and cleaved by disulfide reduction. The remaining large adduct at the hmC 5position caused a dramatic increase in IPD, resulting in a unique kinetic signature.93 This enabled hmC profiling in a mouse genome at single nucleotide resolution. Interestingly, introduction of azidoglucose alone already resulted in a significant, yet lower IPD increase, indicating a size-dependence of the effect. Indeed, a related approach of hmC profiling based on the diglucosylation of hmC and SMRT sequencing observed an increase of IPDs from mono- to diglucosylation.94 This suggests that there is unexplored potential for improved IPD responses by modulating the size and potentially other structural properties of the introduced adducts. Given the availability of additional selective tagging chemistries for fC and caC for the potential introduction of structurally diverse adducts8 and of redox-reactions for nucleobase interconversion,8, 16 there is a potential of SMRT sequencing to achieve full selectivity in the context of all five cytosine nucleobases by combination with chemical conversion steps. An additional potential for increased selectivities without chemical conversion steps may lie in the further engineering of the employed DNA polymerase itself.95 3. Conclusions and Outlook

Within only a few years, the discovery of oxidized cytosine modifications has transformed the epigenetics field and stimulated numerous efforts to unravel the function of these modifications and of TET dioxygenases in normal and disease processes. Exciting new insights into the molecular mechanisms of dynamic epigenetic regulation of cellular processes are expected from these studies, and there is a considerable long-term potential for the development of new diagnostic and/or therapeutic approaches. Methodologies for the simple and unbiased quantification of each of the cytosine modifications at the genome-wide scale and at userdefined nucleotide positions will be a cornerstone of future endeavors in this field. This will require substantial further developments to improve the performance of current methodology. Proteins capable of a direct, selective recognition of cytosine modifications have a unique potential for that, and may play a central role in future, routine sequencing techniques and intracellular applications.

AUTHOR INFORMATION Corresponding Author [email protected]

Funding Sources No competing financial interests have been declared. This work was supported by a grant of the DFG (SU 726/2-1).

A CKNOWLEDGMENT We thank the Zukunftskolleg of the University of Konstanz and the Konstanz Research School Chemical Biology for support.

KEYWORDS xC: Epigenetic cytosine modification in DNA, constituted by the following groups at the 5-position of the pyrimidine ring: x = m: methyl-, = hm: hydroxymethyl-, = f: formyl-, = ca: carboxyl-group. CpG: Cytosine-(phosphate)-guanine dinucleotide. Target sequence of mammalian DNMTs for cytosine-5-methylation. DNMT: DNA-methyltransferase, an enzyme that catalyses the transfer of a CH3+ moiety from the cofactor Sadenosylmethionine (SAM) to e.g. the 5-position of cytosine. TET: ten-eleven translocation dioxygenase, an iron- and αketoglutarate-dependent enzyme that catalyses the step-wise oxidation of mC to hmC, fC, and caC. Dickerson-Drew dodecamer: DNA duplex with the sequence CGCGAATTCGCG that is a well-characterised model for BDNA. MeDIP: Methylated DNA immunoprecipitation. An affinity enrichment method for methylated DNA based on an anti-mCantibody that is precipitated together with bound DNA using a bead-bound secondary antibody. MBD: methyl-CpG-binding domain. Binds to DNA containing symmetrically methylated CpGs and constitutes the DNA binding domain of key epigenetic reader proteins that mediate the methylation signal via fused effector domains that are involved in chromatin shaping.

ACS Paragon Plus Environment

8

ACS Chemical Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

T4 BGT: T4 β-glucosyltransferase, a phage enzyme that catalyses the transfer of a glucose moiety of uridine diphosphoglucose to the hydroxyl-group of hmC. TALE: transcription-activator-like effector. These proteins feature a central, modular DNA-binding domain that can be programmed to bind user-defined sequences in doublestranded DNA. Nanopore: Single pore with nanometer-scale diameter present in an insulating material that is used as barrier between two chambers filled with electrolyte. In an applied electric potential, the ionic current flows exclusively through the pore, and is modulated e.g. by other ions or molecules when passing through - resulting in a characteristic signature with analytical value.

REFERENCES [1] Jones, P. A. (2012) Functions of DNA methylation: islands, start sites, gene bodies and beyond, Nat. Rev. Genet. 13, 484-492. [2] Law, J. A., and Jacobsen, S. E. (2010) Establishing, maintaining and modifying DNA methylation patterns in plants and animals, Nat. Rev. Genet. 11, 204-220. [3] Rodriguez-Paredes, M., and Esteller, M. (2011) Cancer epigenetics reaches mainstream oncology, Nat. Medicine 17, 330339. [4] Tahiliani, M., Koh, K. P., Shen, Y., Pastor, W. A., Bandukwala, H., Brudno, Y., Agarwal, S., Iyer, L. M., Liu, D. R., Aravind, L., and Rao, A. (2009) Conversion of 5-methylcytosine to 5hydroxymethylcytosine in mammalian DNA by MLL partner TET1, Science 324, 930-935. [5] Kriaucionis, S., and Heintz, N. (2009) The nuclear DNA base 5hydroxymethylcytosine is present in Purkinje neurons and the brain, Science 324, 929-930. [6] Loenarz, C., and Schofield, C. J. (2011) Physiological and biochemical aspects of hydroxylations and demethylations catalyzed by human 2-oxoglutarate oxygenases, Trends Biochem. Sci. 36, 7-18. [7] Hu, L., Li, Z., Cheng, J., Rao, Q., Gong, W., Liu, M., Shi, Y. G., Zhu, J., Wang, P., and Xu, Y. (2013) Crystal structure of TET2DNA complex: insight into TET-mediated 5mC oxidation, Cell 155, 1545-1555. [8] Ito, S., Shen, L., Dai, Q., Wu, S. C., Collins, L. B., Swenberg, J. A., He, C., and Zhang, Y. (2011) Tet proteins can convert 5methylcytosine to 5-formylcytosine and 5-carboxylcytosine, Science 333, 1300-1303. [9] He, Y. F., Li, B. Z., Li, Z., Liu, P., Wang, Y., Tang, Q., Ding, J., Jia, Y., Chen, Z., Li, L., Sun, Y., Li, X., Dai, Q., Song, C. X., Zhang, K., He, C., and Xu, G. L. (2011) Tet-mediated formation of 5carboxylcytosine and its excision by TDG in mammalian DNA, Science 333, 1303-1307. [10] Pfaffeneder, T., Hackner, B., Truss, M., Munzel, M., Muller, M., Deiml, C. A., Hagemeier, C., and Carell, T. (2011) The Discovery of 5-Formylcytosine in Embryonic Stem Cell DNA, Angew. Chem. Int. Ed. Engl. 50, 7008-7012. [11] Lu, X., Zhao, B. S., and He, C. (2015) TET Family Proteins: Oxidation Activity, Interacting Molecules, and Functions in Diseases, Chem. Rev. [12] Shen, L., Song, C. X., He, C., and Zhang, Y. (2014) Mechanism and function of oxidative reversal of DNA and RNA methylation, Annu. Rev. Biochem. 83, 585-614. [13] Liu, S., Wang, J., Su, Y., Guerrero, C., Zeng, Y., Mitra, D., Brooks, P. J., Fisher, D. E., Song, H., and Wang, Y. (2013) Quantitative assessment of Tet-induced oxidation products of 5methylcytosine in cellular and tissue DNA, Nucleic Acids Res. 41, 6421-6429. [14] Bachman, M., Uribe-Lewis, S., Yang, X. P., Williams, M., Murrell, A., and Balasubramanian, S. (2014) 5Hydroxymethylcytosine is a predominantly stable DNA modification, Nat. Chem. 6, 1049-1055. [15] Booth, M. J., Marsico, G., Bachman, M., Beraldi, D., and Balasubramanian, S. (2014) Quantitative sequencing of 5formylcytosine in DNA at single-base resolution, Nat. Chem. 6, 435-440.

Page 10 of 12

[16] Booth, M. J., Branco, M. R., Ficz, G., Oxley, D., Krueger, F., Reik, W., and Balasubramanian, S. (2012) Quantitative sequencing of 5-methylcytosine and 5-hydroxymethylcytosine at single-base resolution, Science 336, 934-937. [17] Song, C. X., Szulwach, K. E., Dai, Q., Fu, Y., Mao, S. Q., Lin, L., Street, C., Li, Y., Poidevin, M., Wu, H., Gao, J., Liu, P., Li, L., Xu, G. L., Jin, P., and He, C. (2013) Genome-wide profiling of 5formylcytosine reveals its roles in epigenetic priming, Cell 153, 678691. [18] Raiber, E. A., Beraldi, D., Ficz, G., Burgess, H. E., Branco, M. R., Murat, P., Oxley, D., Booth, M. J., Reik, W., and Balasubramanian, S. (2012) Genome-wide distribution of 5formylcytosine in embryonic stem cells is associated with transcription and depends on thymine DNA glycosylase, Genome Biol.13, R69. [19] Shen, L., Wu, H., Diep, D., Yamaguchi, S., D'Alessio, A. C., Fung, H. L., Zhang, K., and Zhang, Y. (2013) Genome-wide analysis reveals TET- and TDG-dependent 5-methylcytosine oxidation dynamics, Cell 153, 692-706. [20] Maiti, A., and Drohat, A. C. (2011) Thymine DNA glycosylase can rapidly excise 5-formylcytosine and 5-carboxylcytosine: potential implications for active demethylation of CpG sites, J. Biol. Chem. 286, 35334-35338. [21] Schiesser, S., Hackner, B., Pfaffeneder, T., Muller, M., Hagemeier, C., Truss, M., and Carell, T. (2012) Mechanism and Stem-Cell Activity of 5-Carboxycytosine Decarboxylation Determined by Isotope Tracing, Angew. Chem. Int. Ed. Engl. 51, 6516-6520. [22] Valinluck, V., and Sowers, L. C. (2007) Endogenous cytosine damage products alter the site selectivity of human DNA maintenance methyltransferase DNMT1, Cancer Res. 67, 946-950. [23] Hashimoto, H., Liu, Y., Upadhyay, A. K., Chang, Y., Howerton, S. B., Vertino, P. M., Zhang, X., and Cheng, X. (2012) Recognition and potential mechanisms for replication and erasure of cytosine hydroxymethylation, Nucleic Acids Res. 40, 4841-4849. [24] Guo, J. U., Su, Y., Zhong, C., Ming, G. L., and Song, H. (2011) Hydroxylation of 5-methylcytosine by TET1 promotes active DNA demethylation in the adult brain, Cell 145, 423-434. [25] Chen, C. C., Wang, K. Y., and Shen, C. K. (2012) The mammalian de novo DNA methyltransferases DNMT3A and DNMT3B are also DNA 5-hydroxymethylcytosine dehydroxymethylases, J. Biol. Chem. 287, 33116-33121. [26] Lercher, L., McDonough, M. A., El-Sagheer, A. H., Thalhammer, A., Kriaucionis, S., Brown, T., and Schofield, C. J. (2014) Structural insights into how 5-hydroxymethylation influences transcription factor binding, Chem. Commun. 50, 1794-1796. [27] Renciuk, D., Blacque, O., Vorlickova, M., and Spingler, B. (2013) Crystal structures of B-DNA dodecamer containing the epigenetic modifications 5-hydroxymethylcytosine or 5methylcytosine, Nucleic Acids Res. 41, 9891-9900. [28] Szulik, M. W., Pallan, P. S., Nocek, B., Voehler, M., Banerjee, S., Brooks, S., Joachimiak, A., Egli, M., Eichman, B. F., and Stone, M. P. (2015) Differential Stabilities and Sequence-Dependent Base Pair Opening Dynamics of Watson-Crick Base Pairs with 5Hydroxymethylcytosine, 5-Formylcytosine, or 5-Carboxylcytosine, Biochemistry. [29] Raiber, E. A., Murat, P., Chirgadze, D. Y., Beraldi, D., Luisi, B. F., and Balasubramanian, S. (2015) 5-Formylcytosine alters the structure of the DNA double helix, Nat. Struct. Mol. Biol. 22, 44-49. [30] Mellen, M., Ayata, P., Dewell, S., Kriaucionis, S., and Heintz, N. (2012) MeCP2 binds to 5hmC enriched within active genes and accessible chromatin in the nervous system, Cell 151, 1417-1430. [31] Spruijt, C. G., Gnerlich, F., Smits, A. H., Pfaffeneder, T., Jansen, P. W., Bauer, C., Munzel, M., Wagner, M., Muller, M., Khan, F., Eberl, H. C., Mensinga, A., Brinkman, A. B., Lephikov, K., Muller, U., Walter, J., Boelens, R., van Ingen, H., Leonhardt, H., Carell, T., and Vermeulen, M. (2013) Dynamic readers for 5(hydroxy)methylcytosine and its oxidized derivatives, Cell 152, 1146-1159. [32] Iurlaro, M., Ficz, G., Oxley, D., Raiber, E. A., Bachman, M., Booth, M. J., Andrews, S., Balasubramanian, S., and Reik, W. (2013) A screen for hydroxymethylcytosine and formylcytosine binding proteins suggests functions in transcription and chromatin regulation, Genome Biol. 14, R119.

ACS Paragon Plus Environment

9

Page 11 of 12

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Chemical Biology

[33] Booth, M. J., Raiber, E. A., and Balasubramanian, S. (2014) Chemical Methods for Decoding Cytosine Modifications in DNA, Chem. Rev. [34] Frommer, M., Mcdonald, L. E., Millar, D. S., Collis, C. M., Watt, F., Grigg, G. W., Molloy, P. L., and Paul, C. L. (1992) A Genomic Sequencing Protocol That Yields a Positive Display of 5Methylcytosine Residues in Individual DNA Strands, Proc. Natl. Acad. Sci. USA 89, 1827-1831. [35] Huang, Y., Pastor, W. A., Shen, Y., Tahiliani, M., Liu, D. R., and Rao, A. (2010) The behaviour of 5-hydroxymethylcytosine in bisulfite sequencing, PLoS One 5, e8888. [36] Josse, J., and Kornberg, A. (1962) Glucosylation of deoxyribonucleic acid. III. alpha- and beta-Glucosyl transferases from T4-infected Escherichia coli, J. Biol. Chem. 237, 1968-1976. [37] Yu, M., Hon, G. C., Szulwach, K. E., Song, C. X., Zhang, L., Kim, A., Li, X., Dai, Q., Shen, Y., Park, B., Min, J. H., Jin, P., Ren, B., and He, C. (2012) Base-resolution analysis of 5hydroxymethylcytosine in the mammalian genome, Cell 149, 13681380. [38] Lu, X., Song, C. X., Szulwach, K., Wang, Z., Weidenbacher, P., Jin, P., and He, C. (2013) Chemical modification-assisted bisulfite sequencing (CAB-Seq) for 5-carboxylcytosine detection in DNA, J. Am. Chem. Soc. 135, 9315-9317. [39] Pastor, W. A., Pape, U. J., Huang, Y., Henderson, H. R., Lister, R., Ko, M., McLoughlin, E. M., Brudno, Y., Mahapatra, S., Kapranov, P., Tahiliani, M., Daley, G. Q., Liu, X. S., Ecker, J. R., Milos, P. M., Agarwal, S., and Rao, A. (2011) Genome-wide mapping of 5-hydroxymethylcytosine in embryonic stem cells, Nature 473, 394-397. [40] Song, C. X., Szulwach, K. E., Fu, Y., Dai, Q., Yi, C., Li, X., Li, Y., Chen, C. H., Zhang, W., Jian, X., Wang, J., Zhang, L., Looney, T. J., Zhang, B., Godley, L. A., Hicks, L. M., Lahn, B. T., Jin, P., and He, C. (2011) Selective chemical labeling reveals the genomewide distribution of 5-hydroxymethylcytosine, Nat. Biotechnol. 29, 68-72. [41] Robertson, A. B., Dahl, J. A., Vagbo, C. B., Tripathi, P., Krokan, H. E., and Klungland, A. (2011) A novel method for the efficient and selective identification of 5-hydroxymethylcytosine in genomic DNA, Nucleic Acids Res. 39, e55. [42] Sun, Z., Terragni, J., Borgaro, J. G., Liu, Y., Yu, L., Guan, S., Wang, H., Sun, D., Cheng, X., Zhu, Z., Pradhan, S., and Zheng, Y. (2013) High-resolution enzymatic mapping of genomic 5hydroxymethylcytosine in mouse embryonic stem cells, Cell Rep. 3, 567-576. [43] Wang, H., Guan, S., Quimby, A., Cohen-Karni, D., Pradhan, S., Wilson, G., Roberts, R. J., Zhu, Z., and Zheng, Y. (2011) Comparative characterization of the PvuRts1I family of restriction enzymes and their application in mapping genomic 5hydroxymethylcytosine, Nucleic Acids Res. 39, 9294-9305. [44] Khulan, B., Thompson, R. F., Ye, K., Fazzari, M. J., Suzuki, M., Stasiek, E., Figueroa, M. E., Glass, J. L., Chen, Q., Montagna, C., Hatchwell, E., Selzer, R. R., Richmond, T. A., Green, R. D., Melnick, A., and Greally, J. M. (2006) Comparative isoschizomer profiling of cytosine methylation: the HELP assay, Genome Res. 16, 1046-1055. [45] Brunner, A. L., Johnson, D. S., Kim, S. W., Valouev, A., Reddy, T. E., Neff, N. F., Anton, E., Medina, C., Nguyen, L., Chiao, E., Oyolu, C. B., Schroth, G. P., Absher, D. M., Baker, J. C., and Myers, R. M. (2009) Distinct DNA methylation patterns characterize differentiated human embryonic stem cells and developing human fetal liver, Genome Res. 19, 1044-1056. [46] Cohen-Karni, D., Xu, D., Apone, L., Fomenkov, A., Sun, Z., Davis, P. J., Kinney, S. R., Yamada-Mabuchi, M., Xu, S. Y., Davis, T., Pradhan, S., Roberts, R. J., and Zheng, Y. (2011) The MspJI family of modification-dependent restriction endonucleases for epigenetic studies, Proc. Natl. Acad. Sci. U S A 108, 11040-11045. [47] Plongthongkum, N., Diep, D. H., and Zhang, K. (2014) Advances in the profiling of DNA modifications: cytosine methylation and beyond, Nat. Rev. Genet. 15, 647-661. [48] Sumino, M., Ohkubo, A., Taguchi, H., Seio, K., and Sekine, M. (2008) Synthesis and properties of oligodeoxynucleotides containing 5-carboxy-2'-deoxycytidines, Bioorg. Med. Chem. Lett. 18, 274-277. [49] La Francois, C. J., Jang, Y. H., Cagin, T., Goddard, W. A., 3rd, and Sowers, L. C. (2000) Conformation and proton configuration of

pyrimidine deoxynucleoside oxidation damage products in water, Chem. Res. Toxicol. 13, 462-470. [50] Johannsen, M. W., Gerrard, S. R., Melvin, T., and Brown, T. (2014) Triplex-mediated analysis of cytosine methylation at CpA sites in DNA, Chem. Commun. 50, 551-553. [51] Ficz, G., Branco, M. R., Seisenberger, S., Santos, F., Krueger, F., Hore, T. A., Marques, C. J., Andrews, S., and Reik, W. (2011) Dynamic regulation of 5-hydroxymethylcytosine in mouse ES cells and during differentiation, Nature 473, 398-402. [52] Li, N., Ye, M. Z., Li, Y. R., Yan, Z. X., Butcher, L. M., Sun, J. H., Han, X., Chen, Q. A., Zhang, X. Q., and Wang, J. (2010) Whole genome DNA methylation analysis based on high throughput sequencing technology, Methods 52, 203-212. [53] Valinluck, V., Tsai, H. H., Rogstad, D. K., Burdzy, A., Bird, A., and Sowers, L. C. (2004) Oxidative damage to methyl-CpG sequences inhibits the binding of the methyl-CpG binding domain (MBD) of methyl-CpG binding protein 2 (MeCP2), Nucleic Acids Res. 32, 4100-4108. [54] Oakeley, E. J. (1999) DNA methylation analysis: a review of current methodologies, Pharmacol. & Therap. 84, 389-400. [55] Yamagata, K. (2010) DNA methylation profiling using live-cell imaging, Methods 52, 259-266. [56] Stains, C. I., Furman, J. L., Segal, D. J., and Ghosh, I. (2006) Site-specific detection of DNA methylation utilizing mCpG-SEER, J. Am. Chem. Soc. 128, 9761-9765. [57] Boch, J., and Bonas, U. (2010) Xanthomonas AvrBs3 familytype III effectors: discovery and function, Ann. Rev. Phytopath. 48, 419-436. [58] Bogdanove, A. J., and Voytas, D. F. (2011) TAL effectors: customizable proteins for DNA targeting, Science 333, 1843-1846. [59] Moscou, M. J., and Bogdanove, A. J. (2009) A simple cipher governs DNA recognition by TAL effectors, Science 326, 1501. [60] Boch, J., Scholze, H., Schornack, S., Landgraf, A., Hahn, S., Kay, S., Lahaye, T., Nickstadt, A., and Bonas, U. (2009) Breaking the code of DNA binding specificity of TAL-type III effectors, Science 326, 1509-1512. [61] Yang, J., Zhang, Y., Yuan, P., Zhou, Y., Cai, C., Ren, Q., Wen, D., Chu, C., Qi, H., and Wei, W. (2014) Complete decoding of TAL effectors for DNA recognition, Cell Res. 24, 628-631. [62] Deng, D., Yan, C., Pan, X., Mahfouz, M., Wang, J., Zhu, J. K., Shi, Y., and Yan, N. (2012) Structural basis for sequence-specific recognition of DNA by TAL effectors, Science 335, 720-723. [63] Mak, A. N. S., Bradley, P., Cernadas, R. A., Bogdanove, A. J., and Stoddard, B. L. (2012) The Crystal Structure of TAL Effector PthXo1 Bound to Its DNA Target, Science 335, 716-719. [64] Kubik, G., Batke, S., and Summerer, D. (2015) Programmable sensors of 5-hydroxymethylcytosine, J. Am. Chem. Soc. 137, 2-5. [65] Kubik, G., Schmidt, M. J., Penner, J. E., and Summerer, D. (2014) Programmable and Highly Resolved In Vitro Detection of 5Methylcytosine by TALEs, Angew. Chem. Int. Ed. Engl. 53, 60026006. [66] Bultmann, S., Morbitzer, R., Schmidt, C. S., Thanisch, K., Spada, F., Elsaesser, J., Lahaye, T., and Leonhardt, H. (2012) Targeted transcriptional activation of silent oct4 pluripotency gene by combining designer TALEs and inhibition of epigenetic modifiers, Nucleic Acids Res. 40, 5368-5377. [67] Valton, J., Dupuy, A., Daboussi, F., Thomas, S., Marechal, A., Macmaster, R., Melliand, K., Juillerat, A., and Duchateau, P. (2012) Overcoming Transcription Activator-like Effector (TALE) DNA Binding Domain Sensitivity to Cytosine Methylation, J. Biol. Chem. 287, 38427-38432. [68] Kubik, G., and Summerer, D. (2015) Achieving SingleNucleotide Resolution of 5-Methylcytosine Detection with TALEs, Chembiochem 16, 228-231. [69] Thanisch, K., Schneider, K., Morbitzer, R., Solovei, I., Lahaye, T., Bultmann, S., and Leonhardt, H. (2014) Targeting and tracing of specific DNA sequences with dTALEs in living cells, Nucleic Acids Res. 42, e38. [70] Ma, H., Reyes-Gutierrez, P., and Pederson, T. (2013) Visualization of repetitive DNA sequences in human chromosomes with transcription activator-like effectors, Proc. Natl. Acad. Sci. U S A 110, 21048-21053. [71] Miyanari, Y., Ziegler-Birling, C., and Torres-Padilla, M. E. (2013) Live visualization of chromatin dynamics with fluorescent TALEs, Nat. Struct. Mol. Biol. 20, 1321-1324.

ACS Paragon Plus Environment

10

ACS Chemical Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

[72] Maeder, M. L., Angstman, J. F., Richardson, M. E., Linder, S. J., Cascio, V. M., Tsai, S. Q., Ho, Q. H., Sander, J. D., Reyon, D., Bernstein, B. E., Costello, J. F., Wilkinson, M. F., and Joung, J. K. (2013) Targeted DNA demethylation and activation of endogenous genes using programmable TALE-TET1 fusion proteins, Nat. Biotechnol. 31, 1137-1142. [73] Kasianowicz, J. J., Brandin, E., Branton, D., and Deamer, D. W. (1996) Characterization of individual polynucleotide molecules using a membrane channel, Proc. Natl. Acad. Sci. U S A 93, 13770-13773. [74] Song, L., Hobaugh, M. R., Shustak, C., Cheley, S., Bayley, H., and Gouaux, J. E. (1996) Structure of staphylococcal alphahemolysin, a heptameric transmembrane pore, Science 274, 18591866. [75] Akeson, M., Branton, D., Kasianowicz, J. J., Brandin, E., and Deamer, D. W. (1999) Microsecond time-scale discrimination among polycytidylic acid, polyadenylic acid, and polyuridylic acid as homopolymers or as segments within single RNA molecules, Biophys. J. 77, 3227-3233. [76] Stoddart, D., Heron, A. J., Mikhailova, E., Maglia, G., and Bayley, H. (2009) Single-nucleotide discrimination in immobilized DNA oligonucleotides with a biological nanopore, Proc. Natl. Acad. Sci. U S A 106, 7702-7707. [77] Wallace, E. V., Stoddart, D., Heron, A. J., Mikhailova, E., Maglia, G., Donohoe, T. J., and Bayley, H. (2010) Identification of epigenetic DNA modifications with a protein nanopore, Chem. Commun. 46, 8195-8197. [78] Li, W. W., Gong, L., and Bayley, H. (2013) Single-molecule detection of 5-hydroxymethylcytosine in DNA through chemical modification and nanopore analysis, Angew. Chem. Int. Ed. Engl. 52, 4350-4355. [79] Lieberman, K. R., Cherf, G. M., Doody, M. J., Olasagasti, F., Kolodji, Y., and Akeson, M. (2010) Processive r.eplication of single DNA molecules in a nanopore catalyzed by phi29 DNA polymerase, J. Am. Chem. Soc. 132, 17961-17972. [80] Wescoe, Z. L., Schreiber, J., and Akeson, M. (2014) Nanopores Discriminate among Five C5-Cytosine Variants in DNA, J. Am. Chem. Soc. [81] Faller, M., Niederweis, M., and Schulz, G. E. (2004) The structure of a mycobacterial outer-membrane channel, Science 303, 1189-1192. [82] Derrington, I. M., Butler, T. Z., Collins, M. D., Manrao, E., Pavlenok, M., Niederweis, M., and Gundlach, J. H. (2010) Nanopore DNA sequencing with MspA, Proc. Natl. Acad. Sci. U S A 107, 16060-16065. [83] Manrao, E. A., Derrington, I. M., Pavlenok, M., Niederweis, M., and Gundlach, J. H. (2011) Nucleotide discrimination with DNA immobilized in the MspA nanopore, PLoS One 6, e25723. [84] Manrao, E. A., Derrington, I. M., Laszlo, A. H., Langford, K. W., Hopper, M. K., Gillgren, N., Pavlenok, M., Niederweis, M., and Gundlach, J. H. (2012) Reading DNA at single-nucleotide resolution with a mutant MspA nanopore and phi29 DNA polymerase, Nat. Biotechnol. 30, 349-353. [85] Laszlo, A. H., Derrington, I. M., Brinkerhoff, H., Langford, K. W., Nova, I. C., Samson, J. M., Bartlett, J. J., Pavlenok, M., and Gundlach, J. H. (2013) Detection and mapping of 5-methylcytosine and 5-hydroxymethylcytosine with nanopore MspA, Proc. Natl. Acad. Sci. U S A 110, 18904-18909. [86] Schreiber, J., Wescoe, Z. L., Abu-Shumays, R., Vivian, J. T., Baatar, B., Karplus, K., and Akeson, M. (2013) Error rates for nanopore discrimination among cytosine, methylcytosine, and hydroxymethylcytosine along individual DNA strands, Proc. Natl. Acad. Sci. U S A 110, 18910-18915. [87] Eid, J., Fehr, A., Gray, J., Luong, K., Lyle, J., Otto, G., Peluso, P., Rank, D., Baybayan, P., Bettman, B., Bibillo, A., Bjornson, K., Chaudhuri, B., Christians, F., Cicero, R., Clark, S., Dalal, R., Dewinter, A., Dixon, J., Foquet, M., Gaertner, A., Hardenbol, P., Heiner, C., Hester, K., Holden, D., Kearns, G., Kong, X., Kuse, R., Lacroix, Y., Lin, S., Lundquist, P., Ma, C., Marks, P., Maxham, M., Murphy, D., Park, I., Pham, T., Phillips, M., Roy, J., Sebra, R., Shen, G., Sorenson, J., Tomaney, A., Travers, K., Trulson, M., Vieceli, J., Wegener, J., Wu, D., Yang, A., Zaccarin, D., Zhao, P., Zhong, F., Korlach, J., and Turner, S. (2009) Real-time DNA sequencing from single polymerase molecules, Science 323, 133138.

Page 12 of 12

[88] Mark J. P. Chaisson, John Huddleston, Megan Y. Dennis, Peter H. Sudmant, Maika Malig, Fereydoun Hormozdiari, Francesca Antonacci, Urvashi Surti, Richard Sandstrom, Matthew Boitano, Jane M. Landolin, John A. Stamatoyannopoulos, Michael W. Hunkapiller, Korlach, J., and Eichler, E. E. (2015) Resolving the complexity of the human genome using single-molecule sequencing, Nature 517, 608-611. [89] Flusberg, B. A., Webster, D. R., Lee, J. H., Travers, K. J., Olivares, E. C., Clark, T. A., Korlach, J., and Turner, S. W. (2010) Direct detection of DNA methylation during single-molecule, realtime sequencing, Nat. Methods 7, 461-465. [90] Summerer, D. (2010) High-throughput DNA sequencing beyond the four-letter code: epigenetic modifications revealed by single-molecule bypass kinetics, Chembiochem 11, 2499-2501. [91] Berman, A. J., Kamtekar, S., Goodman, J. L., Lazaro, J. M., de Vega, M., Blanco, L., Salas, M., and Steitz, T. A. (2007) Structures of phi29 DNA polymerase complexed with substrate: the mechanism of translocation in B-family polymerases, EMBO J. 26, 3494-3505. [92] Clark, T. A., Lu, X., Luong, K., Dai, Q., Boitano, M., Turner, S. W., He, C., and Korlach, J. (2013) Enhanced 5-methylcytosine detection in single-molecule, real-time sequencing via Tet1 oxidation, BMC Biol. 11, 4. [93] Song, C. X., Clark, T. A., Lu, X. Y., Kislyuk, A., Dai, Q., Turner, S. W., He, C., and Korlach, J. (2012) Sensitive and specific singlemolecule sequencing of 5-hydroxymethylcytosine, Nat. Methods 9, 75-77. [94] Chavez, L., Huang, Y., Luong, K., Agarwal, S., Iyer, L. M., Pastor, W. A., Hench, V. K., Frazier-Bowers, S. A., Korol, E., Liu, S., Tahiliani, M., Wang, Y., Clark, T. A., Korlach, J., Pukkila, P. J., Aravind, L., and Rao, A. (2014) Simultaneous sequencing of oxidized methylcytosines produced by TET/JBP dioxygenases in Coprinopsis cinerea, Proc. Natl. Acad. Sci. U S A 111, E51495158. [95] Aschenbrenner, J., Drum, M., Topal, H., Wieland, M., and Marx, A. (2014) Direct sensing of 5-methylcytosine by polymerase chain reaction, Angew. Chem. Int. Ed. Engl. 53, 8154-8158.

ACS Paragon Plus Environment

11