Insights into the biochemistry, evolution, and biotechnological

5 days ago - Insights into the biochemistry, evolution, and biotechnological applications of the ten-eleven translocation (TET) enzymes. Lana Saleh ...
0 downloads 0 Views 6MB Size
Subscriber access provided by University of South Dakota

Perspective

Insights into the biochemistry, evolution, and biotechnological applications of the ten-eleven translocation (TET) enzymes Lana Saleh, Mackenzie J. Parker, and Peter R Weigele Biochemistry, Just Accepted Manuscript • DOI: 10.1021/acs.biochem.8b01185 • Publication Date (Web): 20 Dec 2018 Downloaded from http://pubs.acs.org on December 22, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 92 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

Insights into the biochemistry, evolution, and biotechnological applications of the ten-eleven translocation (TET) enzymes Mackenzie J. Parker, Peter R. Weigele, and Lana Saleh*

Research Department, New England Biolabs, Inc., 240 County Road, Ipswich, MA 01938

1 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 92

ABSTRACT

A tight link exists between patterns of DNA methylation at carbon 5 of cytosine and differential gene expression in mammalian tissues. Indeed, aberrant DNA methylation results in various human diseases, including neurologic and immune disorders, and contributes to the initiation and progression of various cancers. Proper DNA methylation depends on the fidelity and control of the underlying mechanisms that write, maintain, and erase these epigenetic marks. In this perspective, we address one of the key players in active demethylation: the ten-eleven translocation enzymes or TETs. These enzymes belong to the Fe2+/ -ketoglutarate-dependent dioxygenase superfamily and iteratively oxidize 5-methylcytosine in DNA to produce 5hydroxymethylcytosine, 5-formylcytosine, and 5-carboxycytosine. The latter three bases may convey additional layers of epigenetic information in addition to being intermediates in active demethylation. Despite the intense interest in understanding the physiological roles TETs play in active demethylation and cell regulation, less has been done, in comparison, to illuminate details of the chemistry and factors involved in regulating the three-step oxidation mechanism. Herein, we focus on what is known about the biochemical features of TETs and explore questions whose answers will lead to a more detailed understanding of the in vivo modus 2 ACS Paragon Plus Environment

Page 3 of 92 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

operandi of these enzymes. We also summarize the membership and evolutionary history of the TET/JBP family and highlight the prokaryotic homologs as a reservoir of potentially diverse functionalities awaiting discovery. Finally, we spotlight sequencing methods that utilize TET for mapping 5mC and its oxidation products in genomic DNA and comment on possible improvements to these approaches.

3 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 92

Introduction In mammals, 5-methylcytosine (5mC) is found in approximately 1.5 % of genomic DNA (gDNA).1 5mC plays a role in several epigenetic processes, such as silencing of repetitive elements, genomic imprinting, X-chromosome inactivation, and regulation of gene expression during development and cellular specialization.2,

3

Tissue- and cell-specific DNA methylation

patterns are established early during embryogenesis and during primordial germ cell (PGC) maturation by two related DNA-(C5-cytosine)-methyltransferases (C5-cytosine-MT), DNMT3A and DNMT3B (Figure 1),4-6 with the help of the stimulatory factor DNMT3L.7-9 During replication, another C5-cytosine-MT, DNMT1, is guided to hemi-methylated sites on nascent coding strands by E3 ubiquitin-protein ligase (UHRF1) to restore symmetrical methylation (Figure 1).10,

11

Guidance of DNMT1 by UHRF1 is dependent on the ability of the latter to

cooperatively bind both hemi-methylated DNA and methylated histone H3K9.12-14

4 ACS Paragon Plus Environment

Page 5 of 92 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

Figure 1. Main pathways for DNA methylation and demethylation. Red lines indicate modified DNA strands and black lines nascently synthesized strands. DNMT3A/3B catalyze de novo methylation while UHRF1/DNMT1 complex maintains methylation after replication. Blue arrows indicate passive demethylation, which results in dilution of 5mC or its oxidized forms during replication. Gold arrows indicate TET-TDG-BER-mediated active demethylation.

The reverse process, erasure of methylation, is known to occur by both passive and active mechanisms. Both mechanisms contribute to the demethylation events in pre-implanted embryos preceding re-establishment of methylation patterns by DNMT3A and 3B, whereas only active demethylation is thought to occur during PGC maturation.15-20 At these two 5 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 92

developmental junctures, the entire genome undergoes global demethylation in a process that is essential for cellular-lineage determination and resetting of the life cycle. Down regulation of UHRF1 during PGC arrest in G2 prior to epigenetic remodeling21 is suggested to result in inefficient recruitment of DNMT1 to DNA during repeated rounds of replication, thus leading to passive demethylation (Figure 1).17,

19, 22, 23

In contrast, active demethylation is replication-

independent and relies on enzymatic activity. A major player in active demethylation pathways is the ten-eleven translocation (TET) dioxygenase, which catalyzes three iterative Fe2+- and α-ketoglutarate (aKG)-dependent oxidations of 5mC to yield 5-hydroxymethylcytosine (5hmC), 5-formylcytosine (5fC), and 5-carboxycytosine (5caC) (Figure 2A).24-27 This enzyme and thymine-DNA glycosylase (TDG) are currently thought to compose the main pathway for active demethylation in mammals (Figure 1). The latter excises 5fC and 5caC from DNA far more efficiently than its long known substrate, thymine (T) in T:G mismatches,28 producing an abasic site that is then replaced with cytosine (C) via the base-excision repair (BER) pathway.24,

28

Consistent with this model, disruption of mouse tdg results in increased levels of DNA methylation at certain genomic loci29,

30

while its overexpression in human embryonic kidney

(HEK) 293 cells results in lower levels of 5fC and 5caC with little or no effect on 5mC and 5hmC.31 6 ACS Paragon Plus Environment

Page 7 of 92 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

Figure 2. Reactions catalyzed by members of the TET/JPB family. (A) Iterative 5mC oxidation catalyzed by mammalian TET1/2/3, Naegleria gruberi TET1 (NgTET1) (major activity), and

Coprinopsis cinerea TET (CcTET). (B) JBP1/2 catalyze the oxidation of T to 5hmU in the first step of base J biosynthesis. (C) Iterative T oxidation catalyzed by NgTET1 (minor activity). aKG = α-ketoglutarate; O2 = molecular oxygen; Suc = succinate; CO2 = carbon dioxide; UDPGlu = uridine diphosphoglucose.

The TET-TDG-BER-mediated demethylation pathway (Figure 1) does not account for all events of active demethylation that occur in the cell. Other enzymes involved in active demethylation include the activation-induced cytidine deaminase (AID), which has a critical function in epigenetic reprogramming in mouse PGCs, and its deficiency interferes with genome-wide erasure of DNA methylation patterns.32 The growth arrest DNA-damage-inducible protein 45a (Gadd45a) has also been implicated in DNA demethylation by stimulating 7 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 92

nucleotide excision repair in Xenopus laevis embryos and mammalian cells,33-36 but these findings are challenged by results reporting neither global- nor locus-specific methylation increases in Gadd45a-/- mice.37 TET has also been suggested to affect passive demethylation at certain sites by interfering with

5mC

via

maintenance

oxidation

of

hemi-methylated

intermediates

to

hemi-

hydroxymethylated forms, thus interfering with the activity of DNMT1 (DNMT1 was suggested to have low activity on these sites).38-40 Recent data, however, conflicted with these observations

and

showed

hydroxymethylated sites.38,

40

that

DNMT3a

and

DNMT3b

exhibit

activity

toward

hemi-

In addition, UHRF1 is shown to bind 5hmC and target DNMT1

to these sites.41-43 It was further suggested that UHRF1, DNMT1, and TET may function together as a complex to maintain 5hmC during DNA replication.44 Several excellent reviews discussing the seminal advancements over the past decade in the biology of decoding active demethylation and the role of TET in this process are available.45, 46

However, key questions regarding the chemical mechanism of TET, factors that control its

iterative oxidation, how it targets specific genomic sites with the goal of active demethylation

versus depositing epigenetic marks for gene regulation, and how it differentiates 5mC from T, are not very well addressed. Furthermore, experimental efforts to understand the functional 8 ACS Paragon Plus Environment

Page 9 of 92 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

diversity of the TET family and to elucidate possible sequence-structure-function relationships by biochemically characterizing various TETs from different organisms would almost certainly lead to improved function-prediction methods and unexplored novel functionalities. In this perspective, we highlight literature that addresses aspects of these outstanding issues and offer our outlook on questions that need attention from biochemists in the community. Discovery of TET’s biochemical activity TETs were originally named after a chromosomal translocation identified in patients with acute myeloid or lymphocytic leukemia in the early 2000s that fuses the mixed-lineage

leukemia 1 gene located on chromosome 10 with the tet1 gene on chromosome 11.47,

48

TET’s function, however, remained unknown until 2009, when emerging bioinformatic clues implicated these enzymes as potential 5-methylpyrimidine dioxygenases. Iterative PSI-BLAST searches seeded initially with the predicted oxygenase domains of Trypanosoma brucei base J-binding proteins, JBP1 and JBP2, as queries revealed homologous regions within three human proteins, TET1, TET2, and TET3, and their orthologs in the genomes of other metazoans (i.e. animals that undergo development from an embryo).27,

49

JBP1 and JBP2

were previously shown to oxidize T to form 5-hydroxymethyluracil (5hmU), an intermediate in the biosynthesis of base J (5-(β-D-glucosyl)methyluracil) (Figure 2B),50,

51

which acts as a 9

ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 92

RNA polymerase II termination factor in Leishmania and perhaps T. brucei.52 At the time, oxidized T derivatives had not been observed in mammalian genomes outside of the context of DNA damage, therefore TETs seemed likely candidates as catalysts of 5mC oxidation instead. The first evidence for the 5mC dioxygenase activity of mammalian TETs was presented by Tahiliani et al. in 2009.27 In vivo immunostaining experiments in HEK 293 cells overexpressing human TET1 (hTET1) demonstrated a positive correlation between the enzyme’s presence and reduced 5mC levels in gDNA. The disappearance of 5mC also coincided with the formation of a new base, which was identified as 5hmC by thin-layer chromatography (TLC) and mass spectrometry. TET1’s involvement in these correlations was confirmed by purifying recombinant hTET1 catalytic domain (CD) from Sf9 insect cells and showing by TLC that it converted 5mC in synthetic oligonucleotides to 5hmC with an absolute dependency on Fe2+ and aKG. The discovery of TET’s activity coincided with emerging reports that 5hmC was actually a stable component of gDNA in a variety of vertebrate cell types,26,

27

leading to

speculations about the roles these enzymes play in active demethylation and/or in generating new layers of epigenetic control.

10 ACS Paragon Plus Environment

Page 11 of 92 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

A more definitive role for TETs in active demethylation was established two years later, when two groups independently demonstrated, using a variety of analytical techniques, that the three mouse TET (mTET) paralogs could catalyze further oxidation of 5hmC in oligonucleotides and gDNA to produce 5fC and 5caC in an Fe2+- and aKG-dependent manner.24,

25

The results of these studies rationalized the observation of 5fC in mouse

embryonic stem cell (mESC) gDNA in an earlier report by Pfaffeneder et al.53 The discovery that TETs could produce 5caC triggered relatively fruitless searches for a decarboxylase, similar to the thymine salvage enzyme isoorotate decarboxylase,54-56 that would catalyze decarboxylation of 5caC and re-establish C on genomic sites without resorting to the resource intensive TDG-BER pathway. One study found that treatment of [1,3-15N]-5caC-labeled DNA with mESC extract resulted in product containing isotopically-labeled C, suggesting that such a decarboxylase may exist.57 However, the nature of the enzyme responsible for this activity awaits identification. Another study reported that mammalian DNMTs catalyze decarboxylation of 5caC to yield C in vitro.58 Whether or not this occurs in vivo has yet to be demonstrated. Subsequent to the discovery of mammalian TETs, the activities of homologs from other eukaryotes such as Naegleria gruberi (NgTET1),59,

60

Coprinopsis cinerea (CcTET),61 Apis

mellifera (AmTET),62 and Drosophila melanogaster (droTET)63,

64

were investigated. Like their 11

ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 92

mammalian counterparts, NgTET1 and CcTET iteratively oxidize 5mC to produce 5hmC, 5fC, and 5caC (Figure 2A).59-61 Interestingly, NgTET1 also harbors minor iterative T-oxygenase activity in vitro, producing 5hmU, 5-formyluracil (5fU), and 5-carboxyuracil (5caU) (Figure 2C).60 Evidence suggesting that mTET can also oxidize T to form 5hmU in ESCs has been reported,65 but this activity could not be reproduced in vitro.60 In contrast, AmTET is reported to produce only 5hmC,62 while conflicting reports suggest that droTET is either a DNA-specific N6-methyl2-deoxyadenosine (6mA) demethylase64 or an RNA-specific 5mC dioxygenase.63 These results are surprising considering both AmTET and droTET exhibit absolute conservation of activesite residues27,

66

that are involved in hydrogen-bonding or stacking interactions with the

pyrimidine ring of 5mC in mammalian TETs (discussed in more detail in later sections). Structural studies of AmTET and droTET will be invaluable at respectively illuminating the main elements that dictate a single-step oxidation outcome or an alternative substrate specificity. Domain architecture and functional divergence of metazoan TETs Metazoan TETs are large, multi-domain proteins with architectures resembling that of DNMT1 and other chromatin-binding proteins (Figure 3A). The N-terminal end typically contains a CXXC DNA-binding domain and nuclear localization signal sequences, whereas the C-terminal 12 ACS Paragon Plus Environment

Page 13 of 92 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

end houses the CD,27 which adopts a double-stranded beta-helix (DSBH) fold characteristic of members of the Fe2+/aKG-dependent dioxygenase superfamily (Figure 3B).67 The CD of metazoan TETs is notably interrupted by a large low complexity insert of unknown function that splits the core DSBH in two (Figure 3A).27 Moreover, relative to lower eukaryotic TETs like NgTET1, the CD of metazoan enzymes is elongated at the N-terminal end to include a cysteine-rich region27 (Figure 3A) that binds three Zn2+ ions to help stabilize the enzyme fold and allow it to properly interact with DNA substrate (Figure 3B).67

13 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 92

Figure 3. (A) Domain architectures of the hTET paralogs and NgTET1. (B) Crystal structure of hTET2 truncated CD in complex with 5mC-containing DNA (PDB 4NM6).67 The layers of the DSBH are colored as follows: α-helical layer = green, major β-sheet = dark blue, minor β-sheet = magenta. Random coil regions between elements of the DSBH are colored light blue and the low complexity insert orange. All metals are indicated. Left inset: blow up of the Zn3 binding site. Right inset: blow up of the active site showing ligands to Fe2+ and NOG (w 14 ACS Paragon Plus Environment

Page 15 of 92 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

= water molecule). (C) Crystal structure of NgTET1 in complex with 5mC-containing DNA (PDB 4LT5).59 The layers of the DSBH and random coil regions outside of these layers are colored as in (B). Inset: blow up of the active site showing ligands to Mn2+ and NOG (w = water molecule). In both structures, only the base of the targeted 5mC is shown for clarity. Metal-ligand interactions and hydrogen bonds are shown as dashed lines. Atoms of residue sidechains and DNA are shown in stick representation and are colored according to heteroatom: red = O, blue = N, orange = P, yellow = S, rust = Fe, purple = Mn, grey = Zn.

Within the gnathostome (jawed) vertebrate clade of metazoans, either a gene triplication event or two independent duplications occurred during the course of their evolution that resulted in three TET paralogs (TET1, TET2, and TET3) being present in these organisms.68 Different expression patterns have arisen among the paralogs, signaling distinct biological functions for each in developmental processes and the routine maintenance of methylation patterns in various tissues. For example, TET1 and TET2 are highly expressed in mESCs during the blastocyst stage, while TET3 is highly enriched in mouse oocytes and early zygotes.69 Remarkably, TET3 seems to have experienced different evolutionary pressures than TET1 and TET2 since it has a lower number of divergent amino-acid substitutions in

15 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 92

comparison.70 Furthermore, tet1 and tet2 exhibit more frequent codon diversification in coding regions outside of the CXXC domain and the DSBH and Cys-rich components of the CD. However, strong selective constraints are observed within the CXXC and CDs of all three paralogs, emphasizing their functional importance to the enzyme.70 The functional divergence of gnathostome TET paralogs can also be inferred from their unique motifs and/or domains. TET1s have recently been shown to have a “Before CXXC” or BC domain that is involved in chromatin binding (Figure 3A).71 TET2s have three unique motifs: an approximately 380-amino acid long glutamine-rich region (20 % Gln content) upstream of the CD; a moderately conserved short Gln-rich region within the low complexity insert; and an N-terminal proline-rich region containing short poly-Pro repeats of about 20amino acids in length (Figure 3A).72 The roles of these three motifs in TET2’s function are unknown and require further study. A chromosomal inversion that occurred during evolution also resulted in the CXXC domain of TET2s being severed to form an adjacent gene called

Idax (Figure 3A).73 TET3s contain two previously unidentified sequence motifs of unknown function termed Element 1 and Element 2 (Figure 3A). Element 1 is situated downstream of the CXXC domain and is highly conserved in all mammalian TET3 proteins, whereas Element

16 ACS Paragon Plus Environment

Page 17 of 92 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

2 is located within the DSBH region of the CD.72 The diverse combination of unique motifs in the various TET paralogs likely dictates their function in the cell. Further functional diversity of gnathostome TET paralogs can be generated through the expression of alternative splicing isoforms. Full-length TET1 is detected only during early embryonic development, whereas adult somatic tissues weakly express an isoform, called TET1s, which is missing the CXXC domain. The presence versus absence of the CXXC domain on TET1 is predicted to control epigenetic memory erasure.71 A non-enzymatic isoform of TET2, which lacks the CD, is implicated in mast cell proliferation in humans, while the fulllength protein is reported to be not as efficient at this function.74 Lastly, full-length TET3 and TET3s, also lacking the CXXC domain, are important for neuronal differentiation, while another isoform called TET3o, which differs from TET3s by only one exon, is involved in oocyte generation and fertilization.75 The modus operandi of mammalian TETs The global demethylation events that occur in pre-implanted embryos and PGCs are severely attenuated in tet1/2/3 knockouts, clearly demonstrating the role of mammalian TETs in active demethylation.76 In somatic tissues, however, isotope-labeling experiments show that the vast majority of 5hmC, 5fC, and 5caC detected in mammalian gDNA exists for prolonged periods 17 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

as stable marks,77, precursors.79,

80

78

Page 18 of 92

and that their levels in various tissues do not correlate with their

Furthermore, many reader proteins that can recognize and interpret 5hmC,

5fC, and 5caC marks as epigenetic information have been identified including DNA glycosylases, repair proteins, transcription factors, and chromatin regulators.42,

43, 81-83

These

observations thus show that mammalian TETs, in addition to erasing methylation marks, deposit 5hmC, 5fC, and possibly 5caC on the genome to add new layers of epigenetic control in transcriptional regulation, cellular development, and lineage specification (Figure 1). The opposing nature of TET’s dual functionality requires cells to implement strict regulatory measures to control the enzyme’s activity in generating oxidized derivatives of 5mC. A few of these measures have been uncovered during the past decade, including chemical and structural properties inherent to TETs (discussed in the next section), putative allosteric effects on their activity by small molecules,84 their recruitment/exclusion by interacting proteins,85 and their localization in accessible regions of the chromatin.86 Many aspects of these regulatory mechanisms, however, are still poorly understood. There are two distinct physical modes by which TETs can locate 5mC sites in DNA: (i) a distributive mode, in which it differentiates target versus non-target sites by either random, three-dimensional diffusion or a combined one-dimensional sliding and three-dimensional 18 ACS Paragon Plus Environment

Page 19 of 92 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

hopping mechanism; or (ii) a processive mode, in which it slides along DNA between methylated sites. Once a site is found, TETs can utilize one of two chemical modes in oxidizing its substrate: (i) iterative oxidation, where TET oxidizes 5mC to 5caC without releasing the substrate; or (ii) distributive oxidation, where it releases oxidized product from the active site after a single turnover. We have unequivocally shown in vitro that the physical and chemical modus operandi of NgTET1 and the CDs of mTET1 and mTET2 are distributive, meaning that these enzymes fully dissociate from product and DNA after a single oxidation step.87 This behavior was revealed by examining the distribution of products formed at specific times during the course of the reaction, which was found to be dependent on the corresponding substrate abundance and the kinetics of a particular oxidation step.87 A contrary conclusion was argued by Crawford et al.;88 however, careful investigation of their results shows that intermediate products, specifically 5hmC, accumulate to concentrations greater than the input enzyme concentration (Figure 2C of Crawford et al.88), confirming that TET indeed releases the oxidized products after each turnover. Our proof for the distributive physical and chemical behaviors of TETs explains in part how these enzymes allow different 5mC oxidized forms to stably accumulate at different locales in the genome. This process will also be influenced by the enzyme’s localized concentration in 19 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 92

accessible regions of chromatin. At sites where active demethylation is expected to occur, TETs are speculated to be present in large amounts to drive the removal of errant methylation. Conversely, sites with relatively stable 5hmC marks are speculated to have low local enzyme concentrations.86 If this hypothesis is accurate, then TET’s distributive mode of action coupled with local enzyme concentration will dictate the occurrence of iterative- versus single-oxidation events. TETs may be recruited to or excluded from specific genomic sites by interacting with other chromatin-associated proteins. For example, TET1 initially targets 5mC sites in gene enhancer regions of mESCs to produce 5hmC. TET1 then interacts with the protein SALL4A and subsequently recruits TET2 to complete the iterative oxidation of 5hmC to 5fC and 5caC.89 Furthermore, the N-terminal CXXC domain could play a role in localizing TETs to specific genomic sites. Typically, CXXC domains are thought to anchor chromatin-modifying enzymes to DNA by binding to non-methylated CpG islands. However, a study of the CXXC domain of mTET3 revealed that it had a higher affinity for 5caCpG sites relative to non-methylated sites, and it was postulated that anchoring TET3 to 5caCpG keeps the enzyme localized in areas of the genome where C methylation is undesirable, such as at transcriptional start sites.75 Although there is no current evidence to support this intriguing proposal, it raises 20 ACS Paragon Plus Environment

Page 21 of 92 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

questions of whether there could be other variants of the CXXC domain that preferentially interact with other 5xC (x = m, hm, f, or ca) derivatives and, thus, help regulate TET activity through chromatin localization. An intriguing observation regarding the ability of mammalian TETs, as well as CcTET, to generate 5fC and 5caC is the dependence of these oxidation reactions on the presence of ascorbate and ATP.24,

61, 90-92

The mechanism by which ascorbate enhances the three-step

oxidation reaction is not quite understood. One study suggests that activation of TET by ascorbate results from a direct interaction of this small molecule with its C-terminus, evidenced by intrinsic fluorescence changes of TET CD upon increasing ascorbate concentration.92 However, the experiments in this study were performed at equilibrium binding conditions for iron, introducing the possibility that changes in the protein’s fluorescence were merely due to conformational events occurring from iron in the bound and unbound states. Therefore, this conclusion needs further experimental support. We believe that the most probable explanations for the enhancing effect of ascorbate on TET activity are (1) it keeps free Fe in a reduced and, therefore, kinetically labile state for binding to the enzyme, and (2) it reactivates the Fe3+-OH form of the enzyme in cases where the main substrate (R-CH3 in Figure 4) is absent or not properly positioned at the active site, similar to what is seen with prolyl-4-hydroxylase.93, 21 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

94

Page 22 of 92

In contrast to ascorbate, there have been no reported studies on how ATP stimulates 5fC

and 5caC production with mammalian TETs and CcTET. It is tempting to speculate that hydrolysis or allosteric binding of the nucleotide triggers structural rearrangements in the enzyme’s active site that make 5hmC and 5fC better substrates.

Figure 4. Consensus hydroxylation mechanism for Fe2+/aKG-dependent dioxygenases.95-97

Structural and mutational studies of TETs Crystal structures of two TETs, NgTET1 and hTET2 truncated CD (hTET2-TCD), in complex with double-stranded DNA oligonucleotides containing an internal fully- (x = m or hm) or hemimodified (x = f) 5xCpG site have provided insights into their substrate specificity and catalytic mechanism.59,

67, 98, 99

NgTET1, which consists of a minimally decorated CD and short,

22 ACS Paragon Plus Environment

Page 23 of 92 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

unstructured N-terminal extension, adopts a three-layered jelly-roll fold consisting of a distorted DSBH (eight-stranded major sheet and four-stranded minor sheet) and an α-helical layer that packs against the outer surface of the major sheet (Figure 3C).100 The open end of the DSBH, enlarged by the unequal number of strands in the major and minor sheets, serves as the active site and entrance to the Fe2+ and aKG binding sites located deeper within the βhelical core. In comparison, the CD of hTET2, although similar, is expanded through eight insertions, including the large low complexity insert noted in the previous section, and one deletion. To facilitate crystallization of the hTET2 CD, the low complexity insert, which is ~300 residues long and predicted to be unstructured, was replaced with a 15-residue GS linker to yield hTET2-TCD (Figure 3B).67 Although largely disordered, the location of the GS-linker in the hTET2-TCD structures is quite intriguing and suggests that the low complexity insert may play a role in DNA binding and/or regulation of the enzyme. The linker appears to be reaching into the major groove of the DNA substrate on the side of the duplex opposite of the active site, suggesting that the low complexity insert may do the same in the full-length enzyme. It has been noted previously that the insert bears homology to the C-terminal domain of Saccharomyces cerevisiae RNA polymerase II.101 Mammalian RNA polymerase II bears a similar C-terminal domain, and 23 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

studies

suggest

that

post-translational

modifications

Page 24 of 92

(phosphorylation,

sumoylation) in this domain help control the enzyme’s activity.49,

102, 103

Arg

methylation,

It is unknown whether

or not similar modifications can occur in the C-terminal domain of S. cerevisiae RNA polymerase II and, by extension, the low complexity insert of mammalian TETs, but such similarities suggest that this element at the very least will play a role in regulating metazoan TET activity. The Cys-rich region of metazoan TETs is often referred to in the literature as a domain. This description is likely a misnomer given that in the hTET2-TCD structure, the Cys-rich region does not form a distinct domain and instead wraps around the DSBH core and introduces three Cys3His Zn2+ binding sites (Zn1 – Zn3) (Figure 3B).67,

99

The Zn2 and Zn3

(Figure 3B inset) sites have ligands from both the Cys-rich region and DSBH and appear to be important in stabilizing two flexible loops that interact with the DNA substrate and, in the case of Zn2, another loop containing the His (H1382) and Asp (D1384) residues that coordinate the active site Fe2+. Zn1 is located distal to the active site, but its removal by mutational truncation of hTET2-TCD abolishes enzymatic activity.67 Therefore, the zinc sites are thought to confer stability to hTET2-TCD’s tertiary structure, with Zn2 and Zn3 further playing a potential role in substrate binding and catalysis. 24 ACS Paragon Plus Environment

Page 25 of 92 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

Despite the structural differences, the active sites of NgTET1 and hTET2-TCD are nearly superimposable, indicating that their chemical mechanisms are highly similar, if not identical. The Fe2+-binding site has the HX(D/E)XnH motif characteristic of most non-halogenating Fe2+/aKG-dependent dioxygenases that coordinates the metal ion in a 2-His-1-carboxylate facial triad (Figure 3B, C). The co-substrate aKG (or N-oxalylglycine (NOG), an unreactive analog) is bound to Fe2+ in the “off-line” configuration,104 with the 1-carboxylate coordinated opposite of the distal His residue (h = H1881; Ng = H279) and the 2-keto oxygen opposite of Asp (h = D1384; Ng = D231) (Figure 3B, C). Co-substrate binding is further stabilized by hydrophobic, van der Waals, and hydrogen-bonding interactions with the protein, including two Arg residues that neutralize the 1- and 5-carboxylate groups (Figure 3B, C). The octahedral coordination geometry of Fe2+ is completed by a water molecule, which is proposed to be displaced by O2 in the TET•Fe2+•aKG·substrate complex to initiate turnover (Figure 4). These waters are positioned away from the C5-substituents of the targeted base (C-O distances = 4.3 – 4.8 Å), a situation that is similar to structures of other Fe2+/aKG-dependent dioxygenase•substrate complexes with aKG bound in the off-line mode.105-107 In the latter cases, it has been proposed that either aKG or the oxo-ligand of Fe4+=O reorients during the course of the reaction mechanism to position the activated oxygen close to the substrate for 25 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 92

H-atom abstraction.104 Along these lines, a computational study of hTET2’s reaction mechanism suggests

the

peroxysuccinate

bridge

(Figure

4)

reorients

concomitantly

with

aKG

decarboxylation to position the latent oxo-group closer to substrate (C-O distances = 2.5 – 3.7 Å).108 As expected from studies of other Fe2+/aKG-dependent dioxygenases, ligands to Fe2+ and aKG are completely conserved and their mutation abolishes 5xC oxidation activity.59,

67

NgTET1 and hTET2-TCD interact with the minor groove of their DNA substrate primarily through

hydrogen

bonding/electrostatic

interactions

that

occur

between

flexible

loops

surrounding the active site and the DNA’s phosphate backbone. For hTET2-TCD, the substrateinteracting loop buttressed by Zn2 also contributes a patch of mostly hydrophobic residues (1290-1296) that pack against the interior of the DNA duplex 3ʹ from the 5xC being oxidized. These hydrophobic residues are essential for activity since their mutation to Ala nearly eliminates 5mC oxidation, but only mildly affects the KD of the enzyme-DNA interaction.67 The interactions induce significant distortions from B-form DNA by introducing kinks of 40 ° (hTET2TCD) (Figure 3B) or 65 ° (NgTET1) (Figure 3C), which causes the 5xC base targeted for oxidation to flip out of the duplex and insert into the active site. In NgTET1, the base-stacking interactions surrounding the orphaned guanine are maintained, and the protein hydrogen bonds with N2 of this base via a serine (S148) on a hairpin loop that is inserted into the widened 26 ACS Paragon Plus Environment

Page 27 of 92 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

minor groove.59 In contrast, the orphaned guanine is pushed out of the duplex in hTET2-TCD by a Tyr (Y1294) and Met (M1293) occupying the space vacated by the flipped 5xC.67 Consistent

with

activity

data,60,

67

fully-

versus hemi-modified DNA appears to be

indistinguishable by both enzymes since no protein-base interactions are observed with the (un)modified C in the CpG of the opposite strand (Figure 3B, C).

In vitro activity assays clearly demonstrate a preference of both TETs for CpG sites.60, 67 In the case of NgTET1, this preference may be explained by the observed interaction of the protein with the guanine 3ʹ of the targeted 5xC via hydrogen bonds between a Gln (Q310) and the base’s N1 and N2 atoms. Although mutation of Q310 reduced NgTET1-mediated 5mC oxidation by 60 %, the effects on the enzyme’s specificity for methylated non-CpG dinucleotides were not explored.59 In the case of hTET2-TCD, the enzyme appears to make no specific interactions with the adjacent G:C base pair other than a general stacking interaction with the inserted Y1294.67 This interaction was proposed to be the mechanism by which hTET2 distinguishes CpG from non-CpG sites,67 but the substrate specificity of a M1293A/Y1294A double mutant was not explored. Biochemical studies on TET variants with mutations in regions proximal to DNA may reveal interactions and/or factors not readily observed in the crystal structure that could narrow their specificity to the 5xCpG dinucleotide. 27 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 28 of 92

In both TETs, the targeted 5xC is stabilized by stacking interactions with an aromatic residue (h = Y1902; Ng = F295) and the guanidino group of the Arg that is hydrogen bonded to C1 of aKG/NOG (h = R1261; Ng = R224) (Figure 5).59,

67, 98, 99

Hydrogen bonds between polar

residues and the Watson-Crick base-pairing face of the pyrimidine supply additional binding energy and might provide the determinants that select 5mC over T (discussed in more detail in a later section). Specifically, a His residue from both proteins (h = H1904; Ng = H297) donates a hydrogen bond to N3 of the base, and an Asn (h = N1387) or Asp (Ng = D234) accepts a hydrogen bond from the exocyclic N4 amine (Figure 5). Mutation of either residue severely diminishes the ability of both TETs to oxidize 5xC.59,

67

28 ACS Paragon Plus Environment

Page 29 of 92 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

Figure 5. Structures of the active site of hTET2-TCD and NgTET1 with different 5xC-containing oligonucleotides. Top row: hTET2-TCD structures with (A) 5mC (PDB 4NM6), (B) 5hmC (PDB 5DEU), and 5fC (PDB 5D9Y).67, and (E) 5hmC (PDB 5CG8).59,

99

98

Bottom row: NgTET1 structures with (D) 5mC (PDB 4LT5)

Residues proposed to be important in substrate recognition

and binding are shown, and hydrogen-bond interactions are indicated with dashed lines. Only the substrate nucleotide inserted into the active site is shown, and the 2-His-1-carboxylate facial triad ligating Fe2+/Mn2+ is omitted for clarity. Atoms are shown in stick representation. Residue side chains are colored by heteroatom, whereas 5xC, aKG/NOG, metal ions, and 29 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 30 of 92

water molecules are colored according to element: grey = C, red = O, blue = N, dark orange = P, rust = Fe, purple = Mn.

The proteins also exhibit distinct additional interactions with 5xC that may depend, at least in the case of hTET2-TCD, on the oxidation state of the base. In NgTET1, N147 donates a hydrogen bond to the exocyclic O2 of both 5mC and 5hmC (Figure 5D, E).59, 98 This interaction is important for catalysis as mutation of the N147 to D reduces 5mC oxidation activity by over 40 %.59 In contrast, interactions between exocyclic O2 and hTET2-TCD appear to possibly depend on the oxidation state of the C5 substituent. A fourth His residue in the enzyme’s active site (H1386) adopts three different conformations with 5mC-, 5hmC-, and 5fC-containing substrates, and only with 5hmC does it appear capable of donating a hydrogen bond to the base’s O2 atom (N – O distance = 2.8 Å) (Figure 5B).67,

99

Another potential oxidation state-

dependent interaction in hTET2-TCD structures is a water-mediated hydrogen bond between T1393 and the exocyclic amino (N4) nitrogen of 5hmC and 5fC (Figure 5B, C).99 Mutational studies exploring the roles of H1386 and T1393 in 5hmC and 5fC oxidation by hTET2-TCD are needed. 5xC substrate preference of TETs

30 ACS Paragon Plus Environment

Page 31 of 92 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

Although TETs can oxidize 5mC, 5hmC, and 5fC, the efficiency of each reaction differs significantly. Oxidation of 5mC to 5hmC is about 3.5-fold faster than 5hmC to 5fC, which in turn is about 1.5-fold faster than 5fC to 5caC for mammalian TETs,25,

67

NgTET1,60 and

CcTET.61 Electrophoretic mobility-shift assays, fluorescence polarization, and surface plasmon resonance measurements suggest that the substrate preference of hTET2 CD is not a result of different substrate binding affinity.99 Furthermore, similar substrate preferences were reported regardless of sequence content (AT versus CG rich) and length of the DNA substrate.99 Closer examination of this data, however, reveals lower 5hmC and 5fC oxidation rates with CG-rich substrates, indicating a possible effect of unmethylated CpG sites on the rates of oxidation. Furthermore, a noticeable inverse correlation between substrate length and oxidation rates for all three bases is also seen in this data, which is interesting considering the distributive behavior for TET binding. This might imply that hTET2 CD follows a sliding and hopping mechanism in search of its substrate. Examination of the NgTET159,

98

and hTET2-TCD67,

99

structures reveals no apparent direct

contacts between the protein and the C5 substituent of the base inserted into the active site (Figure 5). This observation is consistent with the enzyme’s ability to accommodate and oxidize 5mC, 5hmC, and 5fC, but raises the question of how TETs properly orient a C-H 31 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 32 of 92

bond of the substituent towards the Fe center for H-atom abstraction during turnover (Figure 4). The absence of any steric or non-covalent bonding restraints allows for free rotation about the C5-Csubstituent bond, at least in the cases of 5mC and 5hmC. Unless 5fC is hydrated to form the gem-diol, rotation about the C5-Cformyl bond will likely be restricted due to (1) its partial double-bond character resulting from conjugation with the pyrimidine ring’s π electrons, and (2) an intramolecular hydrogen bond between the formyl group’s oxygen atom and the exocyclic N4 amine (Figure 5C). The three H atoms of methyl groups are chemically equivalent, thus C5-CH3 bond rotation in 5mC should minimally affect its oxidation as a C-H bond will always be oriented towards the Fe center. In contrast, 5hmC and 5fC have respectively one and two less C-H bonds that can be activated by the enzyme. Factoring in the distinct C-H bond-dissociation energies of a methyl, hydroxymethyl, and formyl group, these differences may explain the observed substrate preferences of TETs for 5mC-containing substrates over those with 5hmC and 5fC. An unexplored complexity that could influence the oxidation of 5hmC and 5fC is the fact that their C5-substituents contain polar functional groups that may hydrogen bond with components of the metal center and, therefore, affect its chemistry. This issue is evident in the structures of hTET2-TCD and NgTET1 complexed with 5hmC-containing oligonucleotides. 32 ACS Paragon Plus Environment

Page 33 of 92 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

In the case of the former, the OH group of the targeted base is oriented such that it is within hydrogen bonding distance of both the 1-carboxylate group of NOG (2.7 – 3.4 Å) and R1261 (3.3 – 3.4 Å) and, thus, could affect the binding and positioning of the co-substrate (Figure 5B).99 In the case of NgTET1, the orientation of the OH group allows it to hydrogen bond with the water (3.3 Å) coordinated to the Fe2+ and, thus, possibly interfere with O2 binding and activation (this structure has been suggested to alternatively reflect the product state of the enzyme after 5mC oxidation). The OH group is also within hydrogen bonding distance of the 1-carboxylate group of aKG (2.9 Å) but not R224 (4.4 Å) (Figure 5E).98 In silico analysis of the hTET2 reaction mechanism suggests altered reactivity of the metal center by showing a potential hydrogen bond between the OH group of 5hmC and the Fe3+-peroxy bridge affecting reorientation of the metal ligands after the decarboxylation step.108 Spectroscopic and crystallographic studies focused on Fe oxidation intermediate states should provide insight into the effects of polar interactions on the reactivity of TET’s Fe2+ center. A hydrophobic pocket identified in the active sites of both NgTET1 and hTET2 may also be key in dictating the substrate preferences of these enzymes. This pocket consists of the aromatic residue stacking with the inserted base (h = Y1902; Ng = F295), a valine (h = V1900; Ng = V293), and either A212 (Ng) or T1372 (h) (Figure 5).98, 109 The high conservation 33 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 34 of 92

of these residues among various TET homologs from other organisms109 strongly supports their involvement in the enzyme’s function. Mutation of the small residues (Val and Ala/Thr) in either TET to ones possessing bulkier side chains causes the enzyme’s activity to stall after one round of oxidation with a 5mC-containing substrate.98,

109

It was speculated in both

cases that the larger side chains reduce the pocket’s volume, leading to steric clashes between the protein and the C5-substituents of 5hmC and 5fC and, consequently, occluding them from the active site. Following this logic, Gly substitutions at these positions should allow both TETs to oxidize 5hmC and 5fC at rates similar to that of 5mC; this prediction turned out to be incorrect. It is notable in these studies that all mutations reduced the overall 5mC oxidation activity of NgTET1 and hTET2-TCD,98,

109

indicating that other factors not

immediately apparent in the crystal structures contribute to the substrate preferences of these enzymes. Evolutionary history and phylogenetic classification of TETs As a result of their shared catalytic properties and detectable sequence similarity, TETs and JBPs have been grouped by Aravind, L. and co-workers into the so-called TET/JBP family.49 Homologs of these enzymes can be found in all domains of life, from viruses to humans. Sequence and phylogenetic analyses performed by this group suggest that the TET/JBPs of 34 ACS Paragon Plus Environment

Page 35 of 92 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

bacteriophages are the most ancestral of the 5-methylpyrimidine dioxygenases.110 Interestingly, the reactions catalyzed by these enzymes (Figure 2) also resemble those of thymine-7hydroxylase (T7H), a fungal salvage enzyme that can iteratively oxidize thymine to 5hydroxymethyluracil, 5-formyluracil, and 5-carboxyuracil.111 The mechanistic similarities strongly suggest that TETs, JBPs, and T7Hs may share a common ancestor. Prokaryotic and viral TET/JBPs show conserved neighborhood linkages to several distinct sets of genes predicted to encode other DNA-modifying enzymes, including methyltransferases (MTs), glycosyltransferases, and enzymes with known112 or tentative T hypermodification110 activity. One feature many of these co-occurring genes share is that their gene products are predicted to require a hydroxyl as an acceptor group in their reactions. As a result, the product of a TET/JBP-like activity could provide the chemical handle for further elaboration. These groupings are reminiscent of biosynthetic gene clusters and their observed combinatorial permutations hint at a large diversity of nucleotide modifications that await discovery. We carefully examined these prokaryotic homologs and found that they can be further classified into defined subgroups based on gene-neighborhood associations (Figure 6). A tempting prediction that arises from our classification is that a TET-like function (5mC-oxidizing) versus JBP-like function (T-oxidizing) is contingent on the presence of a C5-cytosine-MT-like gene. 35 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 36 of 92

For example, the TET/JBP homolog of Proteobacteria bacterium TMED261 is associated with predicted C5-cytosine-MT and glycosyltransferase, suggestive of a pathway in which C is first methylated and then oxidized to 5hmC for subsequent glycosylation. In contrast, the absence of neighboring C5-cytosine-MTs in Mycobacterium bacterium chelonae strain D16R7 and

Mycobacterium phage Nigel (Nigel) suggests that T may be targeted for oxidation. In both systems, 5hmU formed is hypothesized to be further modified by products of neighboring genes. Other MTs, such as a predicted DNA-N6A-MT in Persicivirga phage P12024L and a FkBM-like MT in Cyanophage MED4-184, are also observed to cluster with TET/JBP homologs. The only characterized FkbM-like MT to date catalyzes the methylation of an oxo-group during the biosynthesis of a macrolide antibiotic,113 thus we predict that the pairing of such gene with a TET/JBP homolog in the absence of C5-cytosine-MT will result in the formation of 5methoxymethyluridine as the product of this pathway.

36 ACS Paragon Plus Environment

Page 37 of 92 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

Figure 6. Gene maps of prokaryotic loci encoding TET/JBP homologs. Representative prokaryotic gene clusters containing a TET/JBP homolog are grouped and labeled according to their co-associations with other predicted DNA-modifying enzymes. TET/JBP co-associations may be predictive of their substrate choice with respect to 5mC versus T. For example, only those homologs associating with a predicted C5-Cytosine-MT are anticipated to oxidize 5mC. Phage and bacterial names are indicated. Sequence names consisting of three alpha-numeric strings uniquely identify metagenome sequence contigs obtained from the Joint Genome Institute’s (JGI) Integrated Microbial Genomes-Virus (IMG/VR) dataset: the first number corresponds to the IMG Genome ID, the second to the Gold Analysis Project ID, and the 37 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 38 of 92

third number identifies the contig from within that project’s dataset.114 Gene maps were generated using Geneious 10.2.6 (https://www.geneious.com).

What role could these base modifications play in the physiology of a bacterium or virus? In cells, one obvious possibility is that they are the component of a bacterial restrictionmodification system that protects the host’s gDNA against resident restriction endonucleases. In phages, such modifications could be used to block restriction endonucleases expressed by their hosts or, alternatively, could play a role in their morphogenesis. It has been noted that phage TET/JBP homologs are often flanked by a parB-like homolog upstream and large terminase subunit gene downstream (e.g. Nigel (Figure 6)).110 The ParB protein family has been shown to be involved in chromosome and plasmid segregation in cellular organisms, and the terminase subunit is part of a DNA packaging motor used during assembly of the virus particle.115 Therefore, it was suggested that DNA modification by TET/JBP homologs and the associated base-modifying enzymes might be used to define packaging start/end points during viral morphogenesis.68 Further studies are warranted to determine the extent and sequence specificity of base modification in the DNA of phages encoding TET/JBP homologs, and if such modifications coincide with the termini of viral genomic DNA.

38 ACS Paragon Plus Environment

Page 39 of 92 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

Aravind and co-workers state that eukaryotes commandeered TET/JBPs from bacteriophages and, through the course of evolution, repurposed these enzymes as generators of epigenetic marks.68 Acquisition of these genes likely occurred through two distinct phyletic patterns:49 (i) lateral gene transfer as observed in animals, Acanthamoeba, Naegleria, kinetoplastids, bacteria, phages, and certain algae; and (ii) a massive gene expansion, often with ten or more copies, as seen in Coprinopsis and Laccaria.116 In the latter case, the TET/JBP homolog is frequently coupled with active transposons that are predicted to have played important roles in speciation during evolution.116 Interestingly, TET/JBP genes from Coprinopsis, Laccaria, and metazoans are strongly correlated with a C5-cytosine-MT, suggesting that the corresponding enzymes likely act on 5mC. In contrast, the acquired element in kinetoplastids appears to have strictly included only the genes (a JBP and glucosyltransferase) necessary for base J synthesis. In summary, if a correlation indeed exists between the spread of these nucleic-acid modifying enzymes and the evolution of DNA methylation in eukaryotes, then a future focus on prokaryotic TET/JBP characterization might possibly uncover novel biochemistry and yield a deeper understanding of the evolution of regulatory base modifications in all organisms. Selectivity of 5mC over T for oxidation by TET

39 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 40 of 92

Residues interacting with N3 and the exocyclic functional group bonded to C4 of the pyrimidine ring are expected to play a key role in dictating the substrate specificity of TETs and JBPs.60 The hydrogen-bond donating/accepting capacity is distinct at these two positions of the ring for T (N3 = donating, O4 = accepting) versus 5xC (N3 = accepting, N4 = donating). Therefore, a protein, in selecting one pyrimidine over the other, should provide functional groups that match the hydrogen-bond ability at each position. Thymidylate synthases (TSs) are good examples of this chemical logic.117 Canonical TSs that produce 2ʹ-deoxythymidine monophosphate from 2-deoxyuridine monophosphate (dUMP) use the amido NH2 of an Asn side chain to donate a hydrogen bond to O4 of the pyrimidine ring (Figure 7A).118,

119

In

contrast, the TS homolog gp42 of phage T4, characterized to methylate 2ʹ-deoxycytidine monophosphate (dCMP), harbors Asp, which accepts a hydrogen bond from N4 of cytosine, at the corresponding position (Figure 7B).120 In fact, substitution of Asn for Asp in canonical TSs increases the rate of turnover for dCMP over seven orders of magnitude relative to the wildtype enzyme.118 Similarly, substitution of the corresponding Asp in gp42 for Asn results in an enzyme that prefers dUMP over its natural substrate, dCMP.118

40 ACS Paragon Plus Environment

Page 41 of 92 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

Figure 7. Base selectivity in thymidylate synthases (TSs), TETs, and JBPs. (A) In canonical TSs, selectivity for UMP is dictated by the use of the amido NH2 group of Asn to donate a hydrogen bond to O4 of uracil. (B) The selectivity of TS homolog gp42 from phage T4 for dCMP is dictated by the use of the sidechain carboxylate of Asp to accept a hydrogen bond from N4 of cytosine. (C) In TETs, selectivity for 5mC may be dictated by having a hydrogen bond acceptor (h = amido O of Asn, Ng = Asp sidechain carboxylate) to interact with N4 and a hydrogen bond donating His to interact with N3 of the pyrimidine ring. (D) Residues thought to interact with T in JBPs based on the sequence alignments shown in Figure 8.

Sequence alignments of representative TET/JBPs (Figure 8) reveal some distinct trends in the known or predicted base-interacting residues of these proteins. 5mC dioxygenases, such as mammalian TET1/2/3 and NgTET1, utilize His and Asn/Asp to respectively interact with 41 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 42 of 92

N3 and N4 of the base (Figure 7C). As shown in the structures of hTET2-TCD, the sidechain of N1387 is oriented such that the amido O can accept a hydrogen bond from N4 of the 5xC substrates (Figure 5A-C).67,

99

Similarly, H1904 serves as a hydrogen-bond donor to N3

of the pyrimidine ring. These two positions are almost completely conserved in all predicted 5mC dioxygenases from multicellular organisms (Figure 8). On the basis of sequence alignments between TETs and JBPs, the corresponding positions in T dioxygenases (e.g.

Leishmania and T. brucei JBP1/2) are predicted to use Arg to interact with N3 and Asp with O4 (Figure 7D). These observations are surprising given that the hydrogen-bonding potential of the residues (R = donating, D = accepting) seems discordant with the positions of the pyrimidine ring with which they are proposed to interact. It may be possible that the positioning of T and the orientation of the Arg and Asp side chains in the active sites of these enzymes allow these residues to essentially swap roles, resulting in Arg hydrogen bonding with O4 and Asp with N3. A structure of a T dioxygenase will shed more light on this mystery. Biochemical evidence indicates, however, that the specificity for 5mC or T is more complicated than suggested by our predictions. NgTET1, which exhibits both 5mC- and Toxidizing activities,60 uses D234 to interact with N4 of 5xC (Figure 5D, E).59,

98

Mutation of

this residue to Asn or Ala increases the efficiency of T oxidation by NgTET1;60 we think that 42 ACS Paragon Plus Environment

Page 43 of 92 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

Asn would be the better substitution to achieve this aim, as it could hydrogen bond with O4, but the data shows that D234A was more efficient at oxidizing T.60 Another example is droTET, which has the same base-interacting residues (Asn and His) as mammalian TETs (Figure 8), yet is reported to have 6mA dioxygenase activity.64 Furthermore, many prokaryotic homologs shown in the sequence alignment (Figure 8) are outliers from the general trends noted above, adding further complexity to predicting TET/JBP specificity. Establishing a sequence-structure-function relationship for these enzymes will require biochemical and structural investigations of additional members of the TET/JBP family.

43 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 44 of 92

Figure 8. Sequence alignment of hTET1, 2, 3 (Q8NFU7, Q6N021, and O43151), mTET1, 2, 3 (XP_011241810, XP_011241810, and XP_006505839), AmTET (XM_006561197.3), droTET (AAF47691.4), CcTET (XP_001831108.2), T. brucei JBP1 and 2 (XP_829420.1, and 44 ACS Paragon Plus Environment

Page 45 of 92 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

Q57X81.1), L major JBP1 and 2 (XP_001681321.1, and YP_007674071.1), phage Med4-184 (YP_007674071.1), phage Nigel (YP_002003841.1), TMED261 (OUX44518.1), and NgTET1 (XP_002667965). Highlighted in red are ligands to Fe2+ and aKG; in green are residues with potential hydrogen bonds to the pyrimidine ring of 5xC or T; and in cyan residues that constitute the active-site scaffold. Open boxes correspond to residues identified structurally to have the function determined by their color code but they do not align properly with respective residues from other organisms (e.g. hTET1 R1261 and NgTET R224 are both ligands to aKG but are not in alignment with each other). Secondary structural elements parsed and numbered from PDB files 4LT5 (NgTet1)59 and 4NM6 (hTET2-TCD)67 are displayed for the indicated sequences in the alignment. Figure was generated using ESPript 3.0.121

TET as a biotechnological tool for 5(h)mC sequencing Gaining an understanding of the diverse roles that 5mC and its oxidized derivatives play in epigenetic regulation require methods to selectively enrich, detect, and quantify these bases in gDNA samples. It was recognized early after the discovery of its catalytic properties that TETs hold great potential for both enhancing pre-existing methylome sequencing technologies and in developing new ones for profiling genomic 5hmC, 5fC, and 5caC content. Here we

45 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 46 of 92

briefly review some of the advances that have been made in 5mC and 5hmC sequencing applications due to incorporation of TET as a reagent in the procedural workflow. These methods are listed in Table 1, along with the pros and cons of their utilization. Table 1: Comparison of key features of TET-dependent 5(h)mC-sequencing methods. Method

C-read

BS-seq

TAB-seq

TAmC-seq

Type of method

Advantages

Disadvantages

5mC +

Chemical

Foundation of many

DNA degradation;

5hmC

deamination followed

other methods;

5mC and 5hmC are

by NGS sequencing

single-base resolution

indistinguishable

Enzymatic

Direct detection of

DNA degradation;

glucosylation and

5hmC; single-base

TET and BGT

oxidation followed by

resolution

sequence biases

Enrichment followed

Outperforms

TET and BGT

by NGS sequencing

antibody-based

sequence biases; no

sequencing methods

absolute

out

5hmC

BS-seq 5mC

quantification of DNA methylation status

TET-

5mC

Single-molecule;

Single-molecule;

hard to distinguish

SMRT-seq

and

dependent on TET

long-reads; direct

IPDs of: (C and

5hmC

activity and

detection of 5mC

5mC) and (5fC and

polymerase kinetics

and 5hmC; single-

5caC); TET

base resolution

sequence bias

mediated

Enzyme-

5mC

Enzymatic

Low DNA input;

sequence biases for

cytosine-

and

glucosylation

detection of 5mC

TET, BGT, and

5hmC

/oxidation/deamination

and 5hmC; single-

APOBEC3A

dependent

modification

sequencing

base resolution

46 ACS Paragon Plus Environment

Page 47 of 92 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

The current gold standard for profiling genomic 5mC and 5hmC content is bisulfite sequencing (BS-seq). Treatment of denatured DNA with sodium bisulfite results in deamination of C, 5fC, and 5caC to respectively form uracil (U), 5fU, and 5caU, all of which are read as T during sequencing.122 In contrast, 5mC and 5hmC are resistant to this chemical deamination and are, therefore, read as C.24,

122

As can be inferred from this description, standard BS

allows 5mC+5hmC to be identified by comparing the sequencing results of a bisulfite-treated sample to that of an untreated control, but the individual 5mC and 5hmC sites are indistinguishable (Table 1). TET-assisted bisulfite sequencing (TAB-seq) was developed as a means to selectively maintain 5hmC in samples for sequencing detection by coupling the activities of TET and T4 β-glucosyltransferase (BGT).122 BGT transfers a glucosyl group from UDP-glucose to 5hmC to generate 5-(β-D-glucosyl)methylcytosine (5gmC), which is resistant to both oxidation by TET and bisulfite-catalyzed deamination. This enzymatic reaction is used to quantitatively protect all pre-existing 5hmC in a DNA sample during the first step of TABseq. Subsequent introduction of excess TET results in the oxidation of 5mC to 5fC and 5caC, the latter of which are deaminated, along with unmodified Cs, following bisulfite treatment. Sequencing of these samples will result in 5gmC being read as C, whereas all other formerly

47 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 48 of 92

(un)modified Cs will be read as T, thus allowing for base-resolution detection of 5hmC sites in the original DNA sample (Table 1). While powerful, BS-seq can be laborious, expensive, and requires tremendous amounts of sample in order to offset the degradation of DNA that occurs during bisulfite treatment. This issue has led to the development of many alternatives to BS-seq for profiling genome-wide and/or loci-specific 5(h)mC. The typical workflow of these methods involve affinity enrichment of DNA fragments containing a particular base followed by deep sequencing. In TET-assisted 5mC sequencing (TAmC-seq),123 the ability of BGT to accept the analog UDP-6-azido glucose is exploited in mapping 5mC sites. After inactivating pre-existing 5hmC sites with glucose, TET, BGT, and UDP-6-azido glucose are combined with the glucosylated DNA sample in a one-pot reaction. TET oxidizes 5mC to 5hmC, and the latter is then rapidly converted to 6N3-β-glucosyl-5-hydroxymethyl-2-deoxycytosine (N3-5gmC) by BGT. A biotin tag is covalently attached to N3-5gmC via click chemistry for subsequent pull-down and sequencing to map the original 5mC content of the DNA sample. Cross validation experiments revealed that TAmC-seq outperformed 5mC immunoprecipitation-based sequencing and attained genomic 5mC coverage that approached levels observed in BS-seq (Table 1).123

48 ACS Paragon Plus Environment

Page 49 of 92 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

One proven technique for profiling many base modifications without enrichment or bisulfite treatment is TET-mediated single-molecule, real-time sequencing (SMRT-seq). In SMRT-seq, DNA polymerase kinetics are measured to determine length of time, or interpulse duration (IPD), between two successive nucleotide incorporation events. When the polymerase encounters a modified base on the template strand, the IPD changes in a manner that is characteristic of the modification and its sequence context. Sites of base modification in a gDNA sample can be determined, sometimes at base-resolution, by comparing the IPD measured using modified DNA with that measured using an unmodified control (the IPD ratio). While SMRT-seq works particularly well in detecting 6mA and 4-methylcytosine, the effects of 5mC on the IPD ratio are subtle, rendering it very challenging to confidently call a position as methylated (Table 1).85 TET-mediated oxidation of 5mC in short oligonucleotides to 5hmC, 5fC, and 5caC was shown to produce similarly patterned, yet readily detectable IPD ratios with magnitudes that scale with the oxidation state of the base (5hmC < 5fC ≈ 5caC), thereby providing a means of improving 5mC detection by SMRT-seq (Table 1).85 This beneficial effect was further validated by successfully mapping 95 %, 77 %, and 90 % of the expected 5mC positions in the genomes of Escherichia coli MG1655, Bacillus halodurans C-125, and

Helicobacter pylori strain 2295.60, 85 TET-mediated SMRT-seq may also be useful for identifying 49 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 50 of 92

5hmC and 5fC by using techniques that specifically label these bases to produce distinctive IPD ratios.124 Another alternative to BS-seq is an enzyme-dependent 5(h)mC-sequencing method in which the activities of TET, cytosine deaminase (APOBEC3A), and BGT are orchestrated for mapping of 5mC and 5hmC sites.125,

126

APOBEC3A is proficient in deaminating C and 5mC,127,

128

but

displays significantly reduced activity on 5hmC and almost no activity on 5fC, 5caC and 5gmC.126,

129

Therefore, utilizing TET to oxidize all 5mC to higher oxidation states and BGT

to capture any remaining 5hmC as 5gmC will block APOBEC3A’s activity on these sites. Since APOBEC3A is specific to single-stranded DNA, gDNA samples need to be denatured subsequent to treatment with TET and BGT by heating in the presence of formamide.125 Similar to other sequencing methods, distinguishing 5mC from C is achieved by comparing the sequencing results for gDNA samples with and without treatment with TET, BGT, and APOBEC3A. In the treated samples, C will be deaminated to U as a result of APOBEC3A’s activity, while former 5mC sites, rendered inert to APOBEC3A by oxidation/glucosylation, will read as C. Distinguishing 5mC and 5hmC sites is achieved by introducing a third sample that is only subjected to the activities of BGT and APOBEC3A. The latter results in deamination of C and 5mC to U and T, respectively, while 5gmC is left intact and is, therefore, 50 ACS Paragon Plus Environment

Page 51 of 92 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

read as C.126 A recent method, termed APOBEC-coupled epigenetic sequencing, also utilizes BGT and APOBEC3A for single-base resolution of 5hmC.130 Since these methods are solely dependent on the activities of DNA-modifying enzymes, they hold great promise in obtaining single-base resolution detection of 5(h)mC with lower DNA input and longer reads as compared to BS-based approaches (Table 1). The methods described above highlight some of the biotechnological potential of utilizing TETs as reagents in the workflows of 5(h)mC sequencing techniques. However, there are certain aspects of the biochemistry of these enzymes that are undesirable for sequencing applications and require further investigation in order to attenuate or abolish them. Perhaps the largest issue is that currently characterized TETs show a strong preference for 5mC in the context of CpG dinucleotides, which likely results in data sets being biased against nonCpG methylation sites in DNA. Overcoming such biases may come through characterizing new TET homologs from bacteriophages or prokaryotes. Alternatively, if the factors dictating a preference for CpG sites can be identified, it may be possible to engineer well-studied TETs to possess relaxed sequence specificity. For some applications, it may also be desirable to have TET catalyze only one oxidation and then stop. The 5hmC-stalling mutations noted

51 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

earlier98,

109

Page 52 of 92

are perhaps a step in the right direction, but further optimization is needed in

order to maintain robust enzymatic activity. Conclusions and Perspective Despite the substantial knowledge gained in the past decade regarding the role that TETs play in active demethylation, many key biological questions remain unanswered. The current model of active demethylation, TET-TDG-BER, seems like a rather costly process in terms of cellular resources for removing epigenetic marks. Could DNMTs be linked to the 5caC decarboxylation activity observed with mESC extract,57,

58

or is there a 5caC decarboxylase

awaiting discovery? How is the TET-TDG-BER pathway invoked versus cytosine deaminase and/or other mechanisms? We have also highlighted evidence implicating TETs as depositors of new layers of epigenetic information, leading to the question of what exact role do 5hmC, 5fC, and 5caC have in vivo. Intense research is currently in progress to address these questions. In our opinion, an important area of the field that seems to be severely lacking is an understanding of the biochemical fundamentals of the TET catalytic reaction. A few biochemical studies and the X-ray crystal structures of TETs with 5xC substrates have illuminated some aspects of catalysis. However, a detailed kinetic examination of the Fe2+-oxidation mechanism 52 ACS Paragon Plus Environment

Page 53 of 92 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

with each of the 5xC substrates and an exploration of the inherent regulatory mechanisms that TETs employ to control each oxidation step are of imminent importance for full comprehension of enzyme function. Studies of TET homologs from phage and bacteria, which arguably may be simpler biochemical systems to examine, can help in attaining this comprehension as well as provide a stepping stone to understanding the functional evolution and diversity of the TET/JBP superfamily. AUTHOR INFORMATION Corresponding Author *Email: [email protected]

ORCiD

Lana Saleh, ORCiD: 0000-0003-2629-9795

Mackenzie J. Parker, ORCiD: 0000-0001-7174-0485

Peter R. Weigele, ORCiD: 0000-0003-3696-4541

Author Contributions

53 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 54 of 92

The manuscript was written through contributions of all authors. All authors have given approval to the final version of the manuscript. Funding Sources This work was supported entirely by internal funding from New England Biolabs, Inc.

Notes The authors declare the following competing financial interests: Mackenzie J Parker, Peter R. Weigele, and Lana Saleh are employees of New England Biolabs, a manufacturer and vendor of molecular biology reagents, including enzyme reagents for epigenetics research. This affiliation does not affect the authors’ impartiality, adherence to journal standards and policies, and availability of data.

ACKNOWLEDGMENT We would like to thank Zhiyi Sun, Bill Jack, Andy Gardner, Tom Evans, and Rich Roberts for critical feedback on this manuscript.

ABBREVIATIONS 5mC, 5-methylcytosine; gDNA, genomic DNA; PGC, primordial germ cell; C5-cytosine-MT, C5cytosine-methyltransferases; UHRF1, E3 ubiquitin-protein ligase; TET, ten-eleven translocation; 54 ACS Paragon Plus Environment

Page 55 of 92 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

aKG,

Biochemistry

α-ketoglutarate;

5hmC,

5-hydroxymethylcytosine;

5fC,

5-formylcytosine;

5caC,

5-

carboxycytosine; TDG, thymine-DNA glycosylase; T, thymine; BER, base excision repair; C, cytosine; HEK, human embryonic kidney; AID, activation-induced cytidine deaminase; Gadd45a, growth arrest DNA-damage-inducible protein 45a; JBP, base J-binding protein; 5hmU, 5hydroxymethyluracil; base J, 5-(β-D-glucosyl)methyluracil; hTET, human TET; TLC, thin-layer chromatography; CD, catalytic domain; mTET, mouse TET; mESC, mouse embryonic stem cell; NgTET1, Naegleria gruberi TET1; CcTET, Coprinopsis cinerea TET; AmTET, Apis

mellifera TET; droTET, Drosophila melanogaster TET; 5fU, 5-uracil; 5hmU, 5-carboxyuracil; 6mA, N6-methyladenosine; DSBH, double-stranded beta-helix; 5xC, 5mC, 5hmC, 5fC, or 5caC (x = m; hm; f; or ca); hTET2-TCD, hTET2 truncated CD; MT, methyltransferase; BS-seq, bisulfite

sequencing,

TAB-seq,

TET-assisted

bisulfite

sequencing;

BGT,

T4

β-

glucosyltransferase; TAmC-seq, TET-assisted 5mC sequencing; SMRT-seq, single-molecule real-time sequencing; IPD, interpulse duration; APOBEC3A, a cytosine deaminase from human.

REFERENCES

[1] Lister, R., Pelizzola, M., Dowen, R. H., Hawkins, R. D., Hon, G., Tonti-Filippini, J., Nery, J. R., Lee, L., Ye, Z., Ngo, Q. M., Edsall, L., Antosiewicz-Bourget, J., Stewart, R., Ruotti, V., 55 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 56 of 92

Millar, A. H., Thomson, J. A., Ren, B., and Ecker, J. R. (2009) Human DNA methylomes at base resolution show widespread epigenomic differences, Nature 462, 315-322. [2] Bogdanovi ć , O., and Lister, R. (2017) DNA methylation and the preservation of cell identity, Curr. Opin. Genet. Dev. 46, 9-14. [3] Smith, Z. D., and Meissner, A. (2013) DNA methylation: roles in mammalian development,

Nat. Rev. Genet. 14, 204-220. [4] Li, E., Bestor, T. H., and Jaenisch, R. (1992) Targeted mutation of the DNA methyltransferase gene results in embryonic lethality, Cell 69, 915-926. [5] Okano, M., Bell, D. W., Haber, D. A., and Li, E. (1999) DNA methyltransferases Dnmt3a and Dnmt3b are essential for de novo methylation and mammalian development, Cell 99, 247-257.

[6] Ravichandran, M., Jurkowska, R. Z., and Jurkowski, T. P. (2018) Target specificity of mammalian DNA methylation and demethylation machinery, Org. Biomol. Chem. 16, 14191435.

56 ACS Paragon Plus Environment

Page 57 of 92 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

[7] Chédin, F., Lieber, M. R., and Hsieh, C. L. (2002) The DNA methyltransferase-like protein Dnmt3L stimulates de novo methylation by Dnmt3a, Proc. Natl. Acad. Sci. U. S. A. 99, 1691616921.

[8] Hata, K., Okano, M., Lei, H., and Li, E. (2002) Dnmt3L cooperates with the Dnmt3 family of de novo DNA methyltransferases to establish maternal imprints in mice, Development 129, 1983-1993.

[9] Jeltsch, A., and Jurkowska, R. Z. (2013) Multimerization of the Dnmt3a DNA methyltransferase and its functional implications, Prog. Mol. Biol. Transl. Sci. 117, 445-464. [10] Jeltsch, A. (2006) On the enzymatic properties of Dnmt1: specificity, processivity, mechanism of linear diffusion and allosteric regulation of the enzyme, Epigenetics 1, 63-66. [11] Jeltsch, A., and Jurkowska, R. Z. (2014) New concepts in DNA methylation, Trends

Biochem. Sci. 39, 310-318. [12] Bostick, M., Kim, J. K., Esteve, P. O., Clark, A., Pradhan, S., and Jacobsen, S. E. (2007) UHRF1 plays a role in maintaining DNA methylation in mammalian cells, Science 317, 1760-1764.

57 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 58 of 92

[13] Liu, X., Gao, Q., Li, P., Zhao, Q., Zhang, J., Li, J., Koseki, H., and Wong, J. (2013) UHRF1 targets DNMT1 for DNA methylation through cooperative binding of hemi-methylated DNA and methylated H3K9, Nat. Commun. 4, 1563. [14] Sharif, J., Muto, M., Takebayashi, S., Suetake, I., Iwamatsu, A., Endo, T. A., Shinga, J., Mizutani-Koseki, Y., Toyoda, T., Okamura, K., Tajima, S., Mitsuya, K., Okano, M., and Koseki, H. (2007) The SRA protein Np95 mediates epigenetic inheritance by recruiting Dnmt1 to methylated DNA, Nature 450, 908-912. [15] Hajkova, P., Erhardt, S., Lane, N., Haaf, T., El-Maarri, O., Reik, W., Walter, J., and Surani, M. A. (2002) Epigenetic reprogramming in mouse primordial germ cells, Mech. Dev.

117, 15-23. [16] Oswald, J., Engemann, S., Lane, N., Mayer, W., Olek, A., Fundele, R., Dean, W., Reik, W., and Walter, J. (2000) Active demethylation of the paternal genome in the mouse zygote,

Curr. Biol. 10, 475-478. [17] Rougier, N., Bourc'his, D., Gomes, D. M., Niveleau, A., Plachot, M., Pàldi, A., and Viegas-Péquignot,

E.

(1998)

Chromosome

methylation

patterns

during

mammalian

preimplantation development, Genes Dev. 12, 2108-2113. 58 ACS Paragon Plus Environment

Page 59 of 92 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

[18] Sasaki, H., and Matsui, Y. (2008) Epigenetic events in mammalian germ-cell development: reprogramming and beyond, Nat. Rev. Genet. 9, 129-140. [19] Weber, M., Hellmann, I., Stadler, M. B., Ramos, L., Pääbo, S., Rebhan, M., and Schübeler, D. (2007) Distribution, silencing potential and evolutionary impact of promoter DNA methylation in the human genome, Nat. Genet. 39, 457-466. [20] Yamazaki, Y., Mann, M. R., Lee, S. S., Marh, J., McCarrey, J. R., Yanagimachi, R., and Bartolomei, M. S. (2003) Reprogramming of primordial germ cells begins before migration into the genital ridge, making these cells inadequate donors for reproductive cloning, Proc.

Natl. Acad. Sci. U. S. A. 100, 12207-12212. [21] Seki, Y., Yamaji, M., Yabuta, Y., Sano, M., Shigeta, M., Matsui, Y., Saga, Y., Tachibana, M., Shinkai, Y., and Saitou, M. (2007) Cellular dynamics associated with the genome-wide epigenetic reprogramming in migrating primordial germ cells in mice, Development 134, 26272638.

[22] Jones, P. A., and Taylor, S. M. (1980) Cellular differentiation, cytidine analogs and DNA methylation, Cell 20, 85-93.

59 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 60 of 92

[23] Smith, Z. D., and Meissner, A. (2013) The simplest explanation: passive DNA demethylation in PGCs, EMBO J. 32, 318-321. [24] He, Y. F., Li, B. Z., Li, Z., Liu, P., Wang, Y., Tang, Q., Ding, J., Jia, Y., Chen, Z., Li, L., Sun, Y., Li, X., Dai, Q., Song, C. X., Zhang, K., He, C., and Xu, G. L. (2011) Tetmediated formation of 5-carboxylcytosine and its excision by TDG in mammalian DNA, Science

333, 1303-1307. [25] Ito, S., Shen, L., Dai, Q., Wu, S. C., Collins, L. B., Swenberg, J. A., He, C., and Zhang, Y. (2011) Tet proteins can convert 5-methylcytosine to 5-formylcytosine and 5carboxylcytosine, Science 333, 1300-1303. [26] Kriaucionis, S., and Heintz, N. (2009) The nuclear DNA base, 5-hydroxymethylcytosine is present in brain and enriched in Purkinje neurons, Science 324, 929-930. [27] Tahiliani, M., Koh, K. P., Shen, Y., Pastor, W. A., Bandukwala, H., Brudno, Y., Agarwal, S., Iyer, L. M., Liu, D. R., Aravind, L., and Rao, A. (2009) Conversion of 5-methylcytosine to 5-hydroxymethylcytosine in mammalian DNA by MLL Partner TET1, Science 324, 930-935.

60 ACS Paragon Plus Environment

Page 61 of 92 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

[28] Maiti, A., and Drohat, A. C. (2011) Thymine DNA glycosylase can rapidly excise 5formylcytosine and 5-carboxylcytosine: potential implications for active demethylation of CpG sites, J. Biol. Chem. 286, 35334-35338. [29] Cortázar, D., Kunz, C., Selfridge, J., Lettieri, T., Saito, Y., MacDougall, E., Wirz, A., Schuermann, D., Jacobs, A. L., Siegrist, F., Steinacher, R., Jiricny, J., Bird, A., and Schär, P. (2011) Embryonic lethal phenotype reveals a function of TDG in maintaining epigenetic stability, Nature 470, 419-423. [30] Cortellino, S., Xu, J., Sannai, M., Moore, R., Caretti, E., Cigliano, A., Le Coz, M., Devarajan, K., Wessels, A., Soprano, D., Abramowitz, L. K., Bartolomei, M. S., Rambow, F., Bassi, M. R., Bruno, T., Fanciulli, M., Renner, C., Klein-Szanto, A. J., Matsumoto, Y., Kobi, D., Davidson, I., Alberti, C., Larue, L., and Bellacosa, A. (2011) Thymine DNA glycosylase is essential for active DNA demethylation by linked deamination-base excision repair, Cell 146, 67-79.

[31] Nabel, C. S., Jia, H., Ye, Y., Shen, L., Goldschmidt, H. L., Stivers, J. T., Zhang, Y., and Kohli, R. M. (2012) AID/APOBEC deaminases disfavor modified cytosines implicated in DNA demethylation, Nat. Chem. Biol. 8, 751-758. 61 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 62 of 92

[32] Popp, C., Dean, W., Feng, S., Cokus, S. J., Andrews, S., Pellegrini, M., Jacobsen, S. E., and Reik, W. (2010) Genome-wide erasure of DNA methylation in mouse primordial germ cells is affected by AID deficiency, Nature 463, 1101-1105. [33] Barreto, G., Schafer, A., Marhold, J., Stach, D., Swaminathan, S. K., Handa, V., Doderlein, G., Maltry, N., Wu, W., Lyko, F., and Niehrs, C. (2007) Gadd45a promotes epigenetic gene activation by repair-mediated DNA demethylation, Nature 445, 671-675. [34] Ma, D. K., Jang, M. H., Guo, J. U., Kitabatake, Y., Chang, M. L., Pow-Anpongkul, N., Flavell, R. A., Lu, B., Ming, G. L., and Song, H. (2009) Neuronal activity-induced Gadd45b promotes epigenetic DNA demethylation and adult neurogenesis, Science 323, 1074-1077. [35] Rai, K., Huggins, I. J., James, S. R., Karpf, A. R., Jones, D. A., and Cairns, B. R. (2008) DNA demethylation in zebrafish involves the coupling of a deaminase, a glycosylase, and gadd45, Cell 135, 1201-1212. [36] Schmitz, K. M., Schmitt, N., Hoffmann-Rohrer, U., Schafer, A., Grummt, I., and Mayer, C. (2009) TAF12 recruits Gadd45a and the nucleotide excision repair complex to the promoter of rRNA genes leading to active DNA demethylation, Mol. Cell 33, 344-353.

62 ACS Paragon Plus Environment

Page 63 of 92 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

[37] Engel, N., Tront, J. S., Erinle, T., Nguyen, N., Latham, K. E., Sapienza, C., Hoffman, B., and Liebermann, D. A. (2009) Conserved DNA methylation in Gadd45a(-/-) mice,

Epigenetics 4, 98-99. [38] Hashimoto, H., Liu, Y., Upadhyay, A. K., Chang, Y., Howerton, S. B., Vertino, P. M., Zhang, X., and Cheng, X. (2012) Recognition and potential mechanisms for replication and erasure of cytosine hydroxymethylation, Nucleic Acids Res. 40, 4841-4849. [39] Otani, J., Kimura, H., Sharif, J., Endo, T. A., Mishima, Y., Kawakami, T., Koseki, H., Shirakawa, M., Suetake, I., and Tajima, S. (2013) Cell cycle-dependent turnover of 5hydroxymethyl cytosine in mouse embryonic stem cells, PLoS One 8, e82961. [40] Ji, D., Lin, K., Song, J., and Wang, Y. (2014) Effects of Tet-induced oxidation products of 5-methylcytosine on Dnmt1- and DNMT3a-mediated cytosine methylation, Mol. Biosyst. 10, 1749-1752.

[41] Frauer, C., Hoffmann, T., Bultmann, S., Casa, V., Cardoso, M. C., Antes, I., and Leonhardt, H. (2011) Recognition of 5-hydroxymethylcytosine by the Uhrf1 SRA domain, PLoS

One 6, e21306.

63 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 64 of 92

[42] Iurlaro, M., Ficz, G., Oxley, D., Raiber, E.-A., Bachman, M., Booth, M. J., Andrews, S., Balasubramanian, S., and Reik, W. (2013) A screen for hydroxymethylcytosine and formylcytosine binding proteins suggests functions in transcription and chromatin regulation,

Genome Biol. 14, R119. [43] Spruijt, Cornelia G. , Gnerlich, F., Smits, A. H., Pfaffeneder, T., Jansen, P. W. T. C., Bauer, C., Münzel, M., Wagner, M., Müller, M., Khan, F., Eberl, H. C., Mensinga, A., Brinkman, A. B., Lephikov, K., Müller, U., Walter, J., Boelens, R., van Ingen, H., Leonhardt, H., Carell, T., and Vermeulen, M. (2013) Dynamic Readers for 5-(Hydroxy)Methylcytosine and Its Oxidized Derivatives, Cell 152, 1146-1159. [44] Shen, L., and Zhang, Y. (2013) 5-Hydroxymethylcytosine: generation, fate, and genomic distribution, Curr. Opin. Cell Biol. 25, 289-296. [45] Scott-Browne, J. P., Lio, C. J., and Rao, A. (2017) TET proteins in natural and induced differentiation, Curr. Opin. Genet. Dev. 46, 202-208. [46] Wu, X., and Zhang, Y. (2017) TET-mediated active DNA demethylation: mechanism, function and beyond, Nat. Rev. Genet. 18, 517-534.

64 ACS Paragon Plus Environment

Page 65 of 92 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

[47] Lorsbach, R. B., Moore, J., Mathew, S., Raimondi, S. C., Mukatira, S. T., and Downing, J. R. (2003) TET1, a member of a novel protein family, is fused to MLL in acute myeloid leukemia containing the t(10;11)(q22;q23), Leukemia 17, 637-641. [48] Ono, R., Taki, T., Taketani, T., Taniwaki, M., Kobayashi, H., and Hayashi, Y. (2002) LCX, leukemia-associated protein with a CXXC domain, is fused to MLL in acute myeloid leukemia with trilineage dysplasia having t(10;11)(q22;q23), Cancer Res. 62, 4075-4080. [49] Iyer, L. M., Tahiliani, M., Rao, A., and Aravind, L. (2009) Prediction of novel families of enzymes involved in oxidative and other complex modifications of bases in nucleic acids,

Cell Cycle 8, 1698-1710. [50] Cliffe, L. J., Kieft, R., Southern, T., Birkeland, S. R., Marshall, M., Sweeney, K., and Sabatini, R. (2009) JPB1 and JPB2 are two distinct thymidine hydroxylases involved in J biosynthesis in genomic DNA of African trypanosomes, Nucleic Acids Res. 37, 1452-1462. [51] Yu, Z., Genest, P. A., ter Riet, B., Sweeney, K., DiPaolo, C., Kieft, R., Christodoulou, E., Perrakis, A., Simmons, J. M., Hausinger, R. P., van Luenen, H. G. A. M., Rigden, D. J., Sabatini, R., and Borst, P. (2007) The protein that binds to DNA base J in trypanosomatids has features of a thymidine hydroxylase, Nucleic Acids Res. 35, 2107-2115. 65 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 66 of 92

[52] van Luenen, H. G. A. M., Farris, C., Jan, S., Genest, P. A., Tripathi, P., Velds, A., Kerkhoven, R. M., Nieuwland, M., Haydock, A., Ramasamy, G., Vainio, S., Heidebrecht, T., Perrakis, A., Pagie, A., van Steensel, B., Myler, P. J., and Borst, P. (2012) Glucosylated hydroxymethyluracil, DNA base J, prevents transcriptional readthrough in Leishmania, Cell

150, 909-921. [53] Pfaffeneder, T., Hackner, B., Truß, M., Münzel, M., Müller, M., Deiml, C. A., Hagemeier, C., and Carell, T. (2011) The discovery of 5-formylcytosine in embryonic stem cell DNA,

Angew. Chem. Int. Ed. 50, 7008-7012. [54] Fink, R. M., and Fink, K. (1962) Utilization of radiocarbon from thymidine and other precursors of ribonucleic acid in Neurospora crassa, J. Biol. Chem. 237, 2289-2290. [55] Palmatier, R. D., McCroskey, R. P., and Abbott, M. T. (1970) The enzymatic conversion of uracil 5-carboxylic acid to uracil and carbon dioxide, J. Biol. Chem. 245, 6706-6710. [56] Smiley, J. A., Angelot, J. M., Cannon, R. C., Marshall, E. M., and Asch, D. K. (1999) Radioactivity-based and spectrophotometric assays for isoorotate decarboxylase: identification of the thymidine salvage pathway in lower eukaryotes, Anal. Biochem. 266, 85-92.

66 ACS Paragon Plus Environment

Page 67 of 92 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

[57] Schiesser, S., Hackner, B., Pfaffeneder, T., Müller, M., Hagemeier, C., Truß, M., and Carell, T. (2012) Mechanism and stem-cell activity of 5-carboxycytosine decarboxylation determined by isotope tracing, Angew. Chem. Int. Ed. 51, 6516-6520. [58] Liutkevičiūtė, Z., Kriukienė, E., Ličytė, J., Rudytė, M., Urbanavičiūtė, G., and Klimašauskas, S. (2014) Direct decarboxylation of 5-carboxylcytosine by DNA C5-methyltransferases, J. Am.

Chem. Soc. 136, 5884-5887. [59] Hashimoto, H., Pais, J. E., Zhang, X., Saleh, L., Fu, Z. Q., Dai, N., Corrêa, I.R., Jr., Zheng, Y., and Cheng, X. (2014) Structure of a Naegleria Tet-like dioxygenase in complex with 5-methylcytosine DNA, Nature 506, 391-395. [60] Pais, J. E., Dai, N., Tamanaha, E., Vaisvila, R., Fomenkov, A. I., Bitinaite, J., Sun, Z., Guan, S., Corrêa, I.R., Jr., Noren, C. J., Cheng, X., Roberts, R. J., Zheng, Y., and Saleh, L. (2015) Biochemical characterization of a Naegleria TET-like oxygenase and its application in single molecule sequencing of 5-methylcytosine, Proc. Natl. Acad. Sci. U. S. A. 112, 43164321.

[61] Zhang, L., Chen, W., Iyer, L. M., Hu, J., Wang, G., Fu, Y., Yu, M., Dai, Q., Aravind, L., and He, C. (2014) A TET homologue protein from Coprinopsis cinerea (CcTET) that 67 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 68 of 92

biochemically converts 5-methylcytosine to 5-hydroxymethylcytosine, 5-formylcytosine, and 5carboxylcytosine, J. Am. Chem. Soc. 136, 4801-4804. [62] Wojciechowski, M., Rafalski, D., Kucharski, R., Misztal, K., Maleszka, J., Bochtler, M., and Maleszka, R. (2014) Insights into DNA hydroxymethylation in the honeybee from in-depth analyses of TET dioxygenase, Open Biol. 4, 140110. [63] Delatte, B., Wang, F., Ngoc, L. V., Collignon, E., Bonvin, E., Deplus, R., Calonne, E., Hassabi, B., Putmans, P., Awe, S., Wetzel, C., Kreher, J., Soin, R., Creppe, C., Limbach, P. A., Gueydan, C., Kruys, V., Brehm, A., Minakhina, S., Defrance, M., Steward, R., and Fuks, F. (2016) Transcriptome-wide distribution and function of RNA hydroxymethylcytosine, Science

351, 282-285. [64] Zhang, G., Huang, H., Liu, D., Cheng, Y., Liu, X., Zhang, W., Yin, R., Zhang, D., Zhang, P., Liu, J., Li, C., Liu, B., Luo, Y., Zhu, Y., Zhang, N., He, S., He, C., Wang, H., and Chen, D. (2015) N6-methyladenine DNA modification in Drosophila, Cell 161, 893-906. [65] Pfaffeneder, T., Spada, F., Wagner, M., Brandmayr, C., Laube, S. K., Eisen, D., Truss, M., Steinbacher, J., Hackner, B., Kotljarova, O., Schuermann, D., Michalakis, S., Kosmatchev, O., Schiesser, S., Steigenberger, B., Raddaoui, N., Kashiwazaki, G., Muller, U., Spruijt, C. G., 68 ACS Paragon Plus Environment

Page 69 of 92 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

Vermeulen, M., Leonhardt, H., Schar, P., Muller, M., and Carell, T. (2014) Tet oxidizes thymine to 5-hydroxymethyluracil in mouse embryonic stem cell DNA, Nat. Chem. Biol. 10, 574-581. [66] Hashimoto, H., Zhang, X., Vertino, P. M., and Cheng, X. (2015) The Mechanisms of Generation, Recognition, and Erasure of DNA 5-Methylcytosine and Thymine Oxidations, J.

Biol. Chem. 290, 20723-20733. [67] Hu, L., Li, Z., Cheng, J., Rao, Q., Gong, W., Liu, M., Shi, Y. G., Zhu, J., Wang, P., and Xu, Y. (2013) Crystal structure of TET2-DNA complex: insight into TET-mediated 5mC oxidation, Cell 155, 1545-1555. [68] Iyer, L. M., Abhiman, S., and Aravind, L. (2011) Natural history of eukaryotic DNA methylation systems, Prog. Mol. Biol. Transl. Sci. 101, 25-104. [69] Shukla, A., Sehgal, M., and Singh, T. R. (2015) Hydroxymethylation and its potential implication in DNA repair system: a review and future perspectives, Gene 564, 109-118. [70] Akahori, H., Guindon, S., Yoshizaki, S., and Muto, Y. (2015) Molecular evolution of the TET gene family in mammals, Int. J. Mol. Sci. 16, 28472-28485.

69 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 70 of 92

[71] Zhang, W., Xia, W., Wang, Q., Towers, A. J., Chen, J., Gao, R., Zhang, Y., Yen, C. A., Lee, A. Y., Li, Y., Zhou, C., Liu, K., Zhang, J., Gu, T. P., Chen, X., Chang, Z., Leung, D., Gao, S., Jiang, Y. H., and Xie, W. (2016) Isoform switch of TET1 regulates DNA demethylation and mouse development, Mol. Cell 64, 1062-1073. [72] Liu, D., Li, G., and Zuo, Y. (2018) Function determinants of TET proteins: the arrangements of sequence motifs with specific codes, Brief Bioinform., bby053. [73] Ko, M., An, J., Bandukwala, H. S., Chavez, L., Äijö, T., Pastor, W. A., Segal, M. F., Li, H., Koh, K. P., Lähdesmäki, H., Hogan, P. G., Aravind, L., and Rao, A. (2013) Modulation of TET2 expression and 5-methylcytosine oxidation by the CXXC domain protein IDAX, Nature

497, 122-126. [74] Montagner, S., Leoni, C., Emming, S., Della Chiara, G., Balestrieri, C., Barozzi, I., Piccolo, V., Togher, S., Ko, M., Rao, A., Natoli, G., and Monticelli, S. (2016) TET2 regulates mast cell differentiation and proliferation through catalytic and non-catalytic activities, Cell Rep.

15, 1566-1579. [75] Jin, S. G., Zhang, Z. M., Dunwell, T. L., Harter, M. R., Wu, X., Johnson, J., Li, Z., Liu, J., Szabó, P. E., Lu, Q., Xu, G. L., Song, J., and Pfeifer, G. P. (2016) Tet3 reads 570 ACS Paragon Plus Environment

Page 71 of 92 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

carboxylcytosine

Biochemistry

through

its

CXXC

domain

and

is

a

potential

guardian

against

neurodegeneration, Cell Rep. 14, 493-505. [76] Dawlaty, M. M., Breiling, A., Le, T., Barrasa, M. I., Raddatz, G., Gao, Q., Powell, B. E., Cheng, A. W., Faull, K. F., Lyko, F., and Jaenisch, R. (2014) Loss of Tet enzymes compromises proper differentiation of embryonic stem cells, Dev. Cell 29, 102-111. [77] Bachman, M., Uribe-Lewis, S., Yang, X., Burgess, H. E., Iurlaro, M., Reik, W., Murrell, A., and Balasubramanian, S. (2015) 5-Formylcytosine can be a stable DNA modification in mammals, Nat. Chem. Biol. 11, 555-557. [78] Bachman, M., Uribe-Lewis, S., Yang, X., Williams, M., Murrell, A., and Balasubramanian, S. (2014) 5-Hydroxymethylcytosine is a predominantly stable DNA modification, Nat. Chem.

6, 1049-1055. [79] Booth, M. J., Branco, M. R., Ficz, G., Oxley, D., Krueger, F., Reik, W., and Balasubramanian,

S.

(2012)

Quantitative

sequencing

of

5-methylcytosine

and

5-

hydroxymethylcytosine at single-base resolution, Science 336, 934-937.

71 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 72 of 92

[80] Song, C. X., Szulwach, K. E., Dai, Q., Fu, Y., Mao, S. Q., Lin, L., Street, C., Li, Y., Poidevin, M., Wu, H., Gao, J., Liu, P., Li, L., Xu, G. L., Jin, P., and He, C. (2013) Genomewide profiling of 5-formylcytosine reveals its roles in epigenetic priming, Cell 153, 678-691. [81] Hashimoto, H., Olanrewaju, Y. O., Zheng, Y., Wilson, G. G., Zhang, X., and Cheng, X. (2014) Wilms tumor protein recognizes 5-carboxylcytosine within a specific DNA sequence,

Genes Dev. 28, 2304-2313. [82] Mellen, M., Ayata, P., Dewell, S., Kriaucionis, S., and Heintz, N. (2012) MeCP2 binds to 5hmC enriched within active genes and accessible chromatin in the nervous system, Cell

151, 1417-1430. [83] Yildirim, O., Li, R., Hung, J. H., Chen, P. B., Dong, X., Ee, L. S., Weng, Z., Rando, O. J., and Fazzio, T. G. (2011) Mbd3/NURD complex regulates expression of 5hydroxymethylcytosine marked genes in embryonic stem cells, Cell 147, 1498-1510. [84] Ko, M., An, J., Pastor, W. A., Koralov, S. B., Rajewsky, K., and Rao, A. (2015) TET proteins and 5-methylcytosine oxidation in hematological cancers, Immunol. Rev. 263, 6-21.

72 ACS Paragon Plus Environment

Page 73 of 92 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

[85] Clark, T. A., Lu, X., Luong, K., Dai, Q., Boitano, M., Turner, S. W., He, C., and Korlach, J. (2012) Enhanced 5-methylcytosine detection in single-molecule, real-time sequencing via Tet1 oxidation, BMC Biol. 11, 4. [86] Rasmussen, K. D., and Helin, K. (2016) Role of TET enzymes in DNA methylation, development, and cancer, Genes Dev. 30, 733-750. [87] Tamanaha, E., Guan, S., Marks, K., and Saleh, L. (2016) Distributive processing by the iron(II)/alpha-ketoglutarate-dependent catalytic domains of the TET enzymes is consistent with epigenetic roles for oxidized 5-methylcytosine bases, J. Am. Chem. Soc. 138, 9345-9348. [88] Crawford, D. J., Liu, M. Y., Nabel, C. S., Cao, X.-J., Garcia, B. A., and Kohli, R. M. (2016) Tet2 catalyzes stepwise 5-methylcytosine oxidation by an iterative and de novo mechanism, J. Am. Chem. Soc. 138, 730-733. [89] Xiong, J., Zhang, Z., Chen, J., Huang, H., Xu, Y., Ding, X., Zheng, Y., Nishinakamura, R., Xu, G. L., Wang, H., Chen, S., Gao, S., and Zhu, B. (2016) Cooperative action between SALL4A and TET proteins in stepwise oxidation of 5-methylcytosine, Mol. Cell 64, 913-925.

73 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 74 of 92

[90] Blaschke, K., Ebata, K. T., Karimi, M. M., Zepeda-Martínez, J. A., Goyal, P., Mahapatra, S., Tam, A., Laird, D. J., Hirst, M., Rao, A., Lorincz, M. C., and Ramalho-Santos, M. (2013) Vitamin C induces Tet-dependent DNA demethylation and a blastocyst-like state in ES cells,

Nature 500, 222-226. [91] Minor, E. A., Court, B. L., Young, J. I., and Wang, G. (2013) Ascorbate induces teneleven

translocation

(Tet)

methylcytosine

dioxygenase-mediated

generation

of

5-

hydroxymethylcytosine, J. Biol. Chem. 288, 13669-13674. [92] Yin, R., Mao, S. Q., Zhao, B., Chong, Z., Yang, Y., Zhao, C., Zhang, D., Huang, H., Gao, J., Li, Z., Jiao, Y., Li, C., Liu, S., Wu, D., Gu, W., Yang, Y. G., Xu, G. L., and Wang, H. (2013) Ascorbic acid enhances Tet-mediated 5-methylcytosine oxidation and promotes DNA demethylation in mammals, J. Am. Chem. Soc. 135, 10396-10403. [93] de Jong, L., Albracht, S. P., and Kemp, A. (1982) Prolyl 4-hydroxylase activity in relation to the oxidation state of enzyme-bound iron. The role of ascorbate in peptidyl proline hydroxylation, Biochim. Biophys. Acta 704, 326-332.

74 ACS Paragon Plus Environment

Page 75 of 92 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

[94] Myllylä, R., Majamaa, K., Günzler, V., Hanauske-Abel, H. M., and Kivirikko, K. I. (1984) Ascorbate is consumed stoichiometrically in the uncoupled reactions catalyzed by prolyl 4hydroxylase and lysyl hydroxylase, J. Biol. Chem. 259, 5403-5405. [95] Hanauske-Abel, H. M., and Günzler, V. (1982) A stereochemical concept for the catalytic mechanism of prolylhydroxylase: applicability to classification and design of inhibitors, J. Theor.

Biol. 94, 421-455. [96] Mitchell, A. J., Dunham, N. P., Martinie, R. J., Bergman, J. A., Pollock, C. J., Hu, K., Allen, B. D., Chang, W. C., Silakov, A., Bollinger, J. M., Jr., Krebs, C., and Boal, A. K. (2017) Visualizing the Reaction Cycle in an Iron(II)- and 2-(Oxo)-glutarate-Dependent Hydroxylase, J.

Am. Chem. Soc. 139, 13830-13836. [97] Ye, S., Riplinger, C., Hansen, A., Krebs, C., Bollinger, J. M., Jr., and Neese, F. (2012) Electronic structure analysis of the oxygen-activation mechanism by Fe(II)- and alphaketoglutarate (alphaKG)-dependent dioxygenases, Chemistry (Easton) 18, 6555-6567. [98] Hashimoto, H., Pais, J. E., Dai, N., Corrêa, I.R., Jr. , Zhang, X., Zheng, Y., and Cheng, X. (2015) Structure of Naegleria Tet-like dioxygenase (NgTet1) in complexes with a reaction intermediate 5-hydroxymethylcytosine DNA, Nucleic Acids Res. 43, 10713-10721. 75 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 76 of 92

[99] Hu, L., Lu, J., Cheng, J., Rao, Q., Li, Z., Hou, H., Lou, Z., Zhang, L., Li, W., Gong, W., Liu, M., Sun, C., Yin, X., Li, J., Tan, X., Wang, P., Wang, Y., Fang, D., Cui, Q., Yang, P., He, C., Jiang, H., Luo, C., and Xu, Y. (2015) Structural insight into substrate preference for TET-mediated oxidation, Nature 527, 118-122. [100] Aik, W., McDonough, M. A., Thalhammer, A., Chowdhury, R., and Schofield, C. J. (2012) Role of the jelly-roll fold in substrate binding by 2-oxoglutarate oxygenases, Curr. Opin.

Struct. Biol. 22, 691-700. [101] Upadhyay, A. K., Horton, J. R., Zhang, X., and Cheng, X. (2011) Coordinated methyllysine erasure: structural and functional linkage of a Jumonji demethylase domain and a reader domain, Curr. Opin. Struct. Biol. 21, 750-760. [102] Egloff, S., and Murphy, S. (2008) Cracking the RNA polymerase II CTD code, Trends

Genet. 24, 280-288. [103] Sims, R. J., III, Rojas, L. A., Beck, D., Bonasio, R., Schüller, R., Drury, W. J., III, Eick, D., and Reinberg, D. (2011) The C-terminal domain of RNA polymerase II is modified by site-specific methylation, Science 332, 99-103.

76 ACS Paragon Plus Environment

Page 77 of 92 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

[104] Hausinger, R. P. (2004) FeII/alpha-ketoglutarate-dependent hydroxylases and related enzymes, Crit. Rev. Biochem. Mol. Biol. 39, 21-68. [105] Chang, W. C., Guo, Y., Wang, C., Butch, S. E., Rosenzweig, A. C., Boal, A. K., Krebs, C., and Bollinger, J. M., Jr. (2014) Mechanism of the C5 stereoinversion reaction in the biosynthesis of carbapenem antibiotics, Science 343, 1140-1144. [106] Clifton, I. J., Doan, L. X., Sleeman, M. C., Topf, M., Suzuki, H., Wilmouth, R. C., and Schofield, C. J. (2003) Crystal structure of carbapenem synthase (CarC), J. Biol. Chem. 278, 20843-20850.

[107] Wilmouth, R. C., Turnbull, J. J., Welford, R. W., Clifton, I. J., Prescott, A. G., and Schofield, C. J. (2002) Structure and mechanism of anthocyanidin synthase from Arabidopsis

thaliana, Structure 10, 93-103. [108] Lu, J., Hu, L., Cheng, J., Fang, D., Wang, C., Yu, K., Jiang, H., Cui, Q., Xu, Y., and Luo, C. (2016) A computational investigation on the substrate preference of ten-eleventranslocation 2 (TET2), Phys. Chem. Chem. Phys. 18, 4728-4738.

77 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 78 of 92

[109] Liu, M. Y., Torabifard, H., Crawford, D. J., DeNizio, J. E., Cao, X. J., Garcia, B. A., Cisneros, G. A., and Kohli, R. M. (2017) Mutations along a TET2 active site scaffold stall oxidation at 5-hydroxymethylcytosine, Nat. Chem. Biol. 13, 181-187. [110] Iyer, L. M., Zhang, D., Burroughs, A. M., and Aravind, L. (2013) Computational identification of novel biochemical systems involved in oxidation, glycosylation and other complex modifications of bases in DNA, Nucleic Acids Res. 41, 7635-7655. [111] Liu, C. K., Hsu, C. A., and Abbott, M. T. (1973) Catalysis of three sequential dioxygenase reactions by thymine 7-hydroxylase, Arch. Biochem. Biophys. 159, 180-187. [112] Lee, Y. J., Dai, N., Walsh, S. E., Müller, S., Fraser, M. E., Kauffman, K. M., Guan, C., Corrêa, I.R., Jr. , and Weigele, P. R. (2018) Identification and biosynthesis of thymidine hypermodifications in the genomic DNA of widespread bacterial viruses, Proc. Natl. Acad. Sci.

U. S. A. 115, E3116-E3125. [113] Shafiee, A., Motamedi, H., and Chen, T. (1994) Enzymology of FK-506 biosynthesis: purification

and

characterization

of

31-O-desmethylFK-506

O:methyltransferase

from

Streptomyces sp. MA6858, Eur. J. Biochem. 225, 755-764.

78 ACS Paragon Plus Environment

Page 79 of 92 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

[114] Paez-Espino, D., Chen, I. A., Palaniappan, K., Ratner, A., Chu, K., Szeto, E., Pillay, M., Huang, J., Markowitz, V. M., Nielsen, T., Huntemann, M., TB, K. R., Pavlopoulos, G. A., Sullivan, M. B., Campbell, B. J., Chen, F., McMahon, K., Hallam, S. J., Denef, V., Cavicchioli, R., Caffrey, S. M., Streit, W. R., Webster, J., Handley, K. M., Salekdeh, G. H., Tsesmetzis, N., Setubal, J. C., Pope, P. B., Liu, W. T., Rivers, A. R., Ivanova, N. N., and Kyrpides, N. C. (2017) IMG/VR: a database of cultured and uncultured DNA Viruses and retroviruses,

Nucleic Acids Res. 45, D457-D465. [115] Salje, J. (2010) Plasmid segregation: how to survive as an extra piece of DNA, Crit.

Rev. Biochem. Mol. Biol. 45, 296-317. [116] Iyer, L. M., Zhang, D., de Souza, R. F., Pukkila, P. J., Rao, A., and Aravind, L. (2014) Lineage-specific expansions of TET/JBP genes and a new class of DNA transposons shape fungal genomic and epigenetic landscapes, Proc. Natl. Acad. Sci. U. S. A. 111, 1676-1683. [117] Weigele, P., and Raleigh, E. A. (2016) Biosynthesis and function of modified bases in bacteria and their viruses, Chem. Rev. 116, 12655-12687.

79 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 80 of 92

[118] Hardy, L. W., and Nalivaika, E. (1992) Asn177 in Escherichia coli thymidylate synthase is a major determinant of pyrimidine specificity, Proc. Natl. Acad. Sci. U. S. A. 89, 97259729.

[119] Matthews, D. A., Appelt, K., Oatley, S. J., and Xuong, N. H. (1990) Crystal structure of Escherichia coli thymidylate synthase containing bound 5-fluoro-2'-deoxyuridylate and 10propargyl-5,8-dideazafolate, J. Mol. Biol. 214, 923-936. [120] Graves, K. L., Butler, M. M., and Hardy, L. W. (1992) Roles of Cys148 and Asp179 in catalysis by deoxycytidylate hydroxymethylase from bacteriophage T4 examined by sitedirected mutagenesis, Biochemistry 31, 10315-10321. [121] Robert, X., and Gouet, P. (2014) Deciphering key features in protein structures with the new ENDscript server, Nucleic Acids Res. 42, W320-324. [122] Yu, M., Hon, G. C., Szulwach, K. E., Song, C.-X., Zhang, L., Kim, A., Li, X., Dai, Q., Shen, Y., Park, B., Min, J.-H., Jin, P., Ren, B., and He, C. (2012) Base-resolution analysis of 5-hydroxymethylcytosine in the mammalian genome, Cell 149, 1368-1380.

80 ACS Paragon Plus Environment

Page 81 of 92 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

[123] Zhang, L., Szulwach, K. E., Hon, G. C., Song, C. X., Park, B., Yu, M., Lu, X., Dai, Q., Wang, X., Street, C. R., Tan, H., Min, J. H., Ren, B., Jin, P., and He, C. (2013) Tetmediated covalent labeling of 5-methylcytosine for its genome-wide detection and sequencing,

Nat. Commun. 4, 1517. [124] Song, C. X., Clark, T. A., Lu, X. Y., Kislyuk, A., Dai, Q., Turner, S. W., He, C., and Korlach,

J.

(2011)

Sensitivie

and

specific

single-molecule

sequencing

of

5-

hydroxymethylcytosine, Nat. Methods 9, 75-77. [125] Vaisvila, R., Sun, Z., Guan, S., Saleh, L., Ettwiller, L., and Davis, T. B. Compositions and methods for analyzing modified nucleotides. Pat. Appl. 15/441431, July 13, 2017.

[126] Vaisvila, R., Davis, T. B., Guan, S., Sun, Z., Ettwiller, L., and Saleh, L. Compositions and methods for analyzing modified nucleotides. Pat. Appl. 15/893373, June 21, 2018.

[127] Carpenter, M. A., Li, M., Rathore, A., Lackey, L., Law, E. K., Land, A. M., Leonard, B., Shandilya, S. M., Bohn, M. F., Schiffer, C. A., Brown, W. L., and Harris, R. S. (2012) Methylcytosine and normal cytosine deamination by the foreign DNA restriction enzyme APOBEC3A, J. Biol. Chem. 287, 34801-34808.

81 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 82 of 92

[128] Wijesinghe, P., and Bhagwat, A. S. (2012) Efficient deamination of 5-methylcytosines in DNA by human APOBEC3A, but not by AID or APOBEC3G, Nucleic Acids Res. 40, 92069217.

[129] Schutsky, E. K., Nabel, C. S., Davis, A. K. F., DeNizio, J. E., and Kohli, R. M. (2017) APOBEC3A efficiently deaminates methylated, but not TET-oxidized, cytosine bases in DNA,

Nucleic Acids Res. 45, 7655-7665. [130] Schutsky, E. K., DeNizio, J. E., Hu, P., Liu, M. Y., Nabel, C. S., Fabyanic, E. B., Hwang, Y., Bushman, F. D., Wu, H., and Kohli, R. M. (2018) Nondestructive, base-resolution sequencing of 5-hydroxymethylcytosine using a DNA deaminase, Nat. Biotechnol. 11, 10831090.

82 ACS Paragon Plus Environment

Page 83 of 92 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

For TOC use only.

Insights into the biochemistry, evolution, and biotechnological applications of the ten-eleven translocation (TET) enzymes

Mackenzie J. Parker, Peter R. Weigele, and Lana Saleh*

83 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 1. Main pathways for DNA methylation and demethylation. Red lines indicate modified DNA strands and black lines nascently synthesized strands. DNMT3A/3B catalyze de novo methylation while UHRF1/DNMT1 complex maintains methylation after replication. Blue arrows indicate passive demethylation, which results in dilution of 5mC or its oxidized forms during replication. Gold arrows indicate TET-TDG-BERmediated active demethylation.

ACS Paragon Plus Environment

Page 84 of 92

Page 85 of 92 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

Figure 2. Reactions catalyzed by members of the TET/JPB family. (A) Iterative 5mC oxidation catalyzed by mammalian TET1/2/3, Naegleria gruberi TET1 (NgTET1) (major activity), and Coprinopsis cinerea TET (CcTET). (B) JBP1/2 catalyze the oxidation of T to 5hmU in the first step of base J biosynthesis. (C) Iterative T oxidation catalyzed by NgTET1 (minor activity). aKG = α-ketoglutarate; O2 = molecular oxygen; Suc = succinate; CO2 = carbon dioxide; UDP-Glu = uridine diphosphoglucose. 169x110mm (300 x 300 DPI)

ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 3. (A) Domain architectures of the hTET paralogs and NgTET1. (B) Crystal structure of hTET2 truncated CD in complex with 5mC-containing DNA (PDB 4NM6). The layers of the DSBH are colored as follows: α-helical layer = green, major β-sheet = dark blue, minor β-sheet = magenta. Random coil regions between elements of the DSBH are colored light blue and the low complexity insert orange. All metals are indicated. Left inset: blow up of the Zn3 binding site. Right inset: blow up of the active site showing ligands to Fe2+ and NOG (w = water molecule). (C) Crystal structure of NgTET1 in complex with 5mC-containing DNA (PDB 4LT5). The layers of the DSBH and random coil regions outside of these layers are colored as in (B). Inset: blow up of the active site showing ligands to Mn2+ and NOG (w = water molecule). In both structures, only the base of the targeted 5mC is shown for clarity. Metal-ligand interactions and hydrogen bonds are shown as dashed lines. Atoms of residue sidechains and DNA are shown in stick representation and are colored according to heteroatom: red = O, blue = N, orange = P, yellow = S, rust = Fe, purple = Mn, grey = Zn.

ACS Paragon Plus Environment

Page 86 of 92

Page 87 of 92 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

Figure 4. Consensus hydroxylation mechanism for Fe2+/aKG-dependent dioxygenases. 194x184mm (300 x 300 DPI)

ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 5. Structures of the active site of hTET2-TCD and NgTET1 with different 5xC-containing oligonucleotides. Top row: hTET2-TCD structures with (A) 5mC (PDB 4NM6), (B) 5hmC (PDB 5DEU), and 5fC (PDB 5D9Y). Bottom row: NgTET1 structures with (D) 5mC (PDB 4LT5) and (E) 5hmC (PDB 5CG8). Residues proposed to be important in substrate recognition and binding are shown, and hydrogen-bond interactions are indicated with dashed lines. Only the substrate nucleotide inserted into the active site is shown, and the 2-His-1-carboxylate facial triad ligating Fe2+/Mn2+ is omitted for clarity. Atoms are shown in stick representation. Residue side chains are colored by heteroatom, whereas 5xC, aKG/NOG, metal ions, and water molecules are colored according to element: grey = C, red = O, blue = N, dark orange = P, rust = Fe, purple = Mn.

ACS Paragon Plus Environment

Page 88 of 92

Page 89 of 92 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

Figure 6. Gene maps of prokaryotic loci encoding TET/JBP homologs. Representative prokaryotic gene clusters containing a TET/JBP homolog are grouped and labeled according to their co-associations with other predicted DNA-modifying enzymes. TET/JBP co-associations may be predictive of their substrate choice with respect to 5mC versus T. For example, only those homologs associating with a predicted C5-Cytosine-MT are anticipated to oxidize 5mC. Phage and bacterial names are indicated. Sequence names consisting of three alpha-numeric strings uniquely identify metagenome sequence contigs obtained from the Joint Genome Institute’s (JGI) Integrated Microbial Genomes-Virus (IMG/VR) dataset: the first number corresponds to the IMG Genome ID, the second to the Gold Analysis Project ID, and the third number identifies the contig from within that project’s dataset. Gene maps were generated using Geneious 10.2.6 (https://www.geneious.com).

ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 7. Base selectivity in thymidylate synthases (TSs), TETs, and JBPs. (A) In canonical TSs, selectivity for UMP is dictated by the use of the amido NH2 group of Asn to donate a hydrogen bond to O4 of uracil. (B) The selectivity of TS homolog gp42 from phage T4 for dCMP is dictated by the use of the sidechain carboxylate of Asp to accept a hydrogen bond from N4 of cytosine. (C) In TETs, selectivity for 5mC may be dictated by having a hydrogen bond acceptor (h = amido O of Asn, Ng = Asp sidechain carboxylate) to interact with N4 and a hydrogen bond donating His to interact with N3 of the pyrimidine ring. (D) Residues thought to interact with T in JBPs based on the sequence alignments shown in Figure 8. 118x97mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 90 of 92

Page 91 of 92 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

Figure 8. Sequence alignment of hTET1, 2, 3 (Q8NFU7, Q6N021, and O43151), mTET1, 2, 3 (XP_011241810, XP_011241810, and XP_006505839), AmTET (XM_006561197.3), droTET (AAF47691.4), CcTET (XP_001831108.2), T. brucei JBP1 and 2 (XP_829420.1, and Q57X81.1), L major JBP1 and 2 (XP_001681321.1, and YP_007674071.1), phage Med4-184 (YP_007674071.1), phage Nigel (YP_002003841.1), TMED261 (OUX44518.1), and NgTET1 (XP_002667965). Highlighted in red are ligands to Fe2+ and aKG; in green are residues with potential hydrogen bonds to the pyrimidine ring of 5xC or T; and in cyan residues that constitute the active-site scaffold. Open boxes correspond to residues identified structurally to have the function determined by their color code but they do not align properly with respective residues from other organisms (e.g. hTET1 R1261 and NgTET R224 are both ligands to aKG but are not in alignment with each other). Secondary structural elements parsed and numbered from PDB files 4LT5 (NgTet1) and 4NM6 (hTET2-TCD) are displayed for the indicated sequences in the alignment. Figure was generated using ESPript 3.0.

ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

TOC file

ACS Paragon Plus Environment

Page 92 of 92