Single-Nucleotide Resolution Analysis of 5-Hydroxymethylcytosine in

Stroud, H.; Feng, S.; Morey Kinney, S.; Pradhan, S.; Jacobsen, S. E. Genome Biol. 2011, 12, R54, DOI: 10.1186/gb-2011-12-6-r54. [Crossref], [PubMed], ...
0 downloads 0 Views 626KB Size
Subscriber access provided by Kaohsiung Medical University

Article

Single-nucleotide resolution analysis of 5-hydroxymethylcytosine in DNA by enzyme-mediated deamination in combination with sequencing Qiao-Ying Li, Neng-Bin Xie, Jun Xiong, Bi-Feng Yuan, and Yu-Qi Feng Anal. Chem., Just Accepted Manuscript • DOI: 10.1021/acs.analchem.8b04833 • Publication Date (Web): 20 Nov 2018 Downloaded from http://pubs.acs.org on November 21, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 9 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Single-nucleotide resolution analysis of 5-hydroxymethylcytosine in DNA by enzyme-mediated deamination in combination with sequencing Qiao-Ying Li, Neng-Bin Xie, Jun Xiong, Bi-Feng Yuan,* Yu-Qi Feng Key Laboratory of Analytical Chemistry for Biology and Medicine (Ministry of Education), Department of Chemistry, Wuhan University, Wuhan 430072, P.R. China. * Corresponding author: Bi-Feng Yuan. Tel: +86-27-68755595; Fax: +86-27-68755595; E-mail: [email protected] ABSTRACT: The report of the existence of 5-hydroxymethylcytosine (hm5C) in mammalian genomes is a milestone discovery. hm5C is now generally viewed as the sixth base of DNA with important functions on epigenetic regulation. The in-depth investigation of the biological functions of hm5C requires elucidating the distribution patterns of hm5C in genomes, better in single-nucleotide resolution. It was reported that the cytosine deaminases of APOBEC (apolipoprotein B mRNAediting catalytic polypeptide-like) family are nucleic acid editing enzymes and can deaminate cytosine (C) to form uracil (U). Particularly, a subfamily of APOBEC (APOBEC3A) can efficiently deaminate both C and 5-methylcytosine (m5C). In the current study, we identified that APOBEC3A protein can effectively deaminate C, m5C, and hm5C, but shows no observable deamination activity toward glycosylated hm5C (β-glucosyl-5-hydroxymethyl-2’-deoxycytidine, ghm5C) by using the restriction enzyme-based assay and liquid chromatography-electrospray ionization-tandem mass spectrometry (LC-ESI-MS/MS) analysis. By virtue of the differential deamination activity of APOBEC3A toward C, m5C and ghm5C in conjugation with sequencing, we developed the single-nucleotide resolution analysis of hm5C in DNA. In this analytical strategy, the original C and m5C in DNA will be deaminated by APOBEC3A to form U and thymine (T), both of which will read as T during sequencing. While the ghm5C is resistant to deamination and will read as C during sequencing. Therefore, the remaining C in the sequence context only could come from original hm5C, which offers the single-nucleotide resolution analysis of hm5C in DNA. This APOBEC3A-mediated deamination sequencing (AMD-seq) is straight forward and involves no bisulfite treatment, which avoids the substantial degradation of DNA. Future application of this strategy can be performed for the reliable mapping of hm5C in genome-wide scale at the single-nucleotide resolution. Introduction Cytosine methylation (5-methylcytosine, m5C), a naturally occurring base in DNA, is the most important epigenetic modification that plays pivotal roles in many physiological and pathological processes.1,2 DNA methylation is reversible and undergoes dynamic changes.3 The discovery of 5-hydroxymethylcytosine (hm5C) in genomic DNA of mammalian cells in 2009 set off an upsurge in elucidating the mechanism of DNA demethylation in mammals.4,5 These reports demonstrated that the Ten–Eleven Translocation (TET) proteins can oxidize m5C to form hm5C.4,5 Moreover, TET proteins can further oxidize hm5C to produce 5-formylcytosine (f5C) and 5-carboxylcytosine (ca5C), both of which can be recognized and cleaved by thymine-DNA glycosylase followed by replacement with unmodified cytosine via base-excision repair machinery.6 It is a milestone discovery for the report of the existence of hm5C in mammalian genomes and hm5C is

now generally viewed as the sixth base of genomic DNA.7 In addition to mediating the DNA demethylation process, it has been established that hm5C is also an important epigenetic marker that could regulate gene expression.8 Moreover, hm5C plays important roles in embryogenesis,9 cellular differentiation,10 and tumorigenesis.11,12 Thus, hm5C is increasingly recognized as a biomarker in diseaserelated diagnostics.13 Up to now, some methods have been developed for the detection of hm5C in genomic DNA, including thin layer chromatography detection,4,5 immunohistochemistry,14 and liquid chromatographymass spectrometry (LC-MS) analysis.15-23 In these methods, DNA was enzymatically digested to nucleotides or nucleoside, and then analyzed via different platforms. Thus, these methods mainly afford the overall detection of hm5C while the location information of hm5C in DNA is lost. To better understand the biological roles of hm5C, it is important to know the distribution patterns of hm5C in

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

genome. Several techniques have been developed to specifically map hm5C in genomic DNA. Affinity purification-based enrichment followed by highthroughput sequencing strategies were established to decipher the distribution of hm5C in genome.24-30 The affinity purification-based profiling methods have advanced our understanding of hm5C, but these methods cannot precisely localize hm5C and the mapping resolution of hm5C in genomes is normally restricted by the size of the immunoprecipitated or chemically captured DNA fragments.31 Single-nucleotide resolution analysis of m5C has been achieved by whole-genome bisulfite sequencing (BS-Seq).32 However, bisulfite-based sequencing cannot discriminate m5C and hm5C since both of them are resistant to deamination by bisulfite treatment and then still read as C in high-throughput sequencing.33 Detection of hm5C through single-molecule, real-time (SMRT) sequencing or nanopore sequencing is possible.34,35 But SMRT and nanopore sequencing have a relatively high rate of sequencing errors.36 Currently, nanopore sequencing technology is still under development to achieve better sequencing performance. Yu et al. 37 reported TET-assisted bisulfite sequencing (TAB-seq) method to map hm5C in genomes. In TAB-seq method, β-glucosyltransferase (β-GT) was used to selectively add a glucosyl moiety to hm5C to form β-glucosyl-5-hydroxymethyl-2’-deoxycytidine (ghm5C). TET1 was used to convert the m5C to ca5C, but the ghm5C is resistant to oxidation by TET1. Subsequent bisulfate treatment causes all the C and ca5C to be converted to uracil that will read as T during sequencing, while ghm5C still reads as C. Therefore, TAB-seq can provide location analysis of hm5C at single-base resolution. In the same year, Booth et al. 38 developed the oxidative bisulfite sequencing (oxBS-seq) for analysis of hm5C in genomes. KRuO4 was used to specifically oxidize hm5C to f5C, which undergoes deamination with bisulfite treatment. Therefore, hm5C will read as T, whereas m5C still reads as C in oxBS-seq. Comparison of the BS-seq data and oxBSseq data would allow for indirect identification of hm5C at single-nucleotide resolution. However, a limitation of these methods is the use of bisulfite since the harsh conditions for chemical deamination can degrade as much as 99.9% of input DNA.39 Moreover, the procedures of these analytical strategies are complex and relatively tedious. Previous studies have reported the enzymatic deamination of cytosine.40 The cytosine deaminases of APOBEC (apolipoprotein B mRNA-editing catalytic polypeptide-like) family are DNA editing enzymes that function in immune response.41 It was reported that APOBEC proteins can deaminate cytosine to form uracil in DNA.40 Particularly, a subfamily of APOBEC (APOBEC3A) can efficiently deaminate both cytosine and m5C, but shows low activity on deamination of hm5C,42,43 which raises the possibility to map hm5C in genome by

Page 2 of 9

utilizing this unique property of APOBEC3A. In the current study, we developed the single-nucleotide resolution analysis of hm5C in DNA by virtue of the differential deamination activity of APOBEC3A toward different cytosine modifications.

Experimental Section Chemicals and reagents 2’-Deoxyguanosine (dG), 2’-deoxyadenosine (dA), 2’-deoxycytidine (dC), thymidine (T), phosphodiesterase I were purchased from Sigma-Aldrich (St. Louis, MO, USA). 5-Methyl-2’-deoxycytosine (m5C) and 5hydroxymethyl-2’-deoxycytosine (hm5C) were purchased from Berry & Associates (Dexter, MI, USA). 2’deoxyurdine (dU) was purchased from Meryer Chemical Technology Co., Ltd (Shanghai, China). β-Glucosyl-5hydroxymethyl-2’-deoxycytidine (ghm5C) standard was prepared according to previous method.15 Recombinant T4 phage β-glucosyltransferase (β-GT) and SwaI restriction enzyme were obtained from the New England Biolabs (Ipswich, MA, USA). S1 nuclease and alkaline phosphatase were from Takara Biotechnology Co., Ltd. (Dalian, China). Chromatographic grade methanol was purchased from Tedia Co. Inc. (Fairfield, OH, USA). Boracic acid, ethylene diamine tetraacetic acid (EDTA) and tris(hydroxymethyl)aminomethane (Tris) were purchased from Sinopharm Chemical Reagent Co., Ltd. (Shanghai, China). DNA substrates All the oligonucleotides with modified bases were purchased from Takara Biotechnology Co., Ltd. (Dalian, China). The detailed sequences of these DNA are listed in Table 1. Expression and purification of APOBEC3A protein To express the recombinant protein of APOBEC3A in E. coli cells, the pET28(b+)-APOBEC3A plasmid was constructed by inserting the full length coding sequence of APOBEC3A into the vector of pET-28(b+) that carries the glutathione S-transferase(GST) tag at the N-termination of the recombinant protein. The coding sequence of APOBEC3A was synthesized at TsingKe Co., Ltd. (Wuhan, China) and inserted into the vector at the Not I/Xba I cloning sites. The constructs were confirmed by DNA sequencing and expressed in BL21 (DE3) strain. Protein expression was induced using 1 mM IPTG (isopropyl-β-D-thiogalactopyranoside) for 24 h at 16oC. Recombinant APOBEC3A protein were purified with Glutathione Sepharose™ 4B (GE Healthcare) following the manufacturer’s protocol. The purified protein was then concentrated using a 30 kDa ultrafiltration (Millipore) and stored in the buffer containing 20 mM Tris-HCl (pH 8.0),

ACS Paragon Plus Environment

Page 3 of 9 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

150 mM NaCl, 0.01mM EDTA, 0.5 mM dithiothreitol and 0.01% Tween-20 at −80℃. Characterization of the deaminase activity of APOBEC3A using restriction enzyme-based method A restriction enzyme-based method was employed to characterize the deaminase activity of APOBEC3A. Two 29-mer DNA (DNA-T and DNA-U, detailed sequence information can be found in Table 1) and two FAM-labeled complementary DNA (DNA-comp-A and DNA-comp-G, detailed sequence information can be found in Table 1) were used for the evaluation of the SwaI performance on the digestion of mismatched duplex DNA. Four 29-mer DNA (DNA-C, DNA-m5C, DNA-hm5C, and DNA-ghm5C, detailed sequence information can be found in Table 1) were used for the examination of the deaminase activity of APOBEC3A on C, m5C, hm5C, and ghm5C. The DNA-ghm5C was prepared by glucosylation of the duplex DNA-hm5C using β-GT according to the manufacture recommended protocol. Briefly, the glucosylation reaction was performed in a 10-μl solution containing 50 mM potassium acetate (pH 7.9), 20 mM trisacetate, 10 mM magnesium acetate, 1 mM dithiothreitol, 10 U β-GT, 10 pmol DNA and 20 μM UDP-glucose. The reaction were incubated at 37℃ for 1 h. The deamination reaction was carried out in a 20-μL solution with 25 mM HEPES (pH 6.5), 10 pmol DNA substrate (DNA-C, DNA-m5C, DNA-hm5C, or DNAghm5C), and varied amounts of APOBEC3A enzyme at 37℃ for 2 h. Deamination was terminated by incubation of the mixture at 90℃ for 10 min. Then 10 pmol (1 μL) of FAM-labeled complementary strand (DNA-comp-G) was added and annealed to form duplex DNA. The mixture was further added 8 μL of SwaI (10 U/μL), 4 μL of NEBuffer 3.1 and 7 μL of H2O to a finial volume of 40 μL, and then incubated at 25℃ for 12 h. The resulting products were analyzed by 15% polyacrylamide gel electrophoresis (PAGE, acrylamide/bisacrylamide = 19/1). All the bands were visualized using Tanon 4600SF (Tanon Science & Technology Co., Ltd., Shanghai, China). Enzymatic digestion of DNA The enzymatic digestion was carried out under neutral conditions according to previous report.44 Briefly, 100 ng of APOBEC3A-treated or untreated DNA (in 22 μL of H2O) were digested with 360 U (2 μL) of S1 nuclease, 0.002 U (2 μL) of venom phosphodiesterase I, 30 U (1 μL) of alkaline phosphatase in the buffer of 50 mM Tris-HCl (pH 7.0), 10 mM NaCl, 1 mM MgCl2, and 1 mM ZnSO4. The mixture (30 μL) was incubated at 37℃ for 3 h. Then, 170 μL of sterilized water was added to the above solution followed by extraction with 200 μL of chloroform three times. The resulting aqueous layer was collected and dried at 37℃ for subsequent LC-ESI-MS/MS analysis.

Analysis nucleosides by LC-ESI-MS/MS Analysis of dA, dG, T, dC, m5C, hm5C, ghm5C were performed on the LC-ESI-MS/MS system consisting of an AB 3200 QTRAP mass spectrometer (Applied Biosystems, Foster City, CA, USA) and a Shimadzu LC20AD HPLC (Tokyo, Japan). Data acquisition and processing were performed using AB SCIEX Analyst 1.5 Software (Applied Biosystems, Foster City, CA, USA). The HPLC separation was performed on a Hisep C18-T column (150 mm × 2.1 mm i.d., 5 μm, Weltech Co., Ltd., Wuhan, China) at 35℃. Water (solvent A) and methanol (solvent B) were used as the mobile phases. A gradient of 5 - 40% B for 25 min was used. The flow rate of the mobile phase was set at 0.2 mL/min. The mass spectrometry detection was performed under positive ESI mode. The nucleosides were monitored using the multiple reaction monitoring (MRM) mode. Mass transitions (precursor ions → product ions) of dC (228.1 → 112.1), T (243.1 → 127.1), dA (252.1 → 136.1), dG (268.1 → 152.1), m5C (242.1 → 126.1), hm5C (258.1 → 142.1), ghm5C (268.1 → 142.1) were used. The MRM parameters of the analysis were optimized to achieve maximal detection sensitivity. Sequencing of APOBEC3A-treated DNA Three synthesized 150-mer DNA (L-DNA-C, LDNA-m5C, and L-DNA-hm5C, detailed sequence information can be found in Table 1) as well the glucosylated 150-mer L-DNA-ghm5C were used to establish the single-nucleotide resolution analysis of 5hm5C in DNA by APOBEC3A-mediated deamination. LDNA-ghm5C was prepared by glucosylation of the DNAhm5C using β-GT in a similar way as that for the preparation of 29-mer DNA-ghm5C. 100 ng L-DNA-C, L-DNA-m5C, L-DNA-hm5C, or LDNA-ghm5C was treated with APOBEC3A at 37℃ for 2 h. Then the deaminized DNA substrates were directly used as a template for PCR amplification. The primers of 5’GAGATGTGGTGAGTAGAGTGGAGTG-3’ and 5’CACTACACTCCACTCACTACATCCC-3’ were used for the amplification. The amplification cycle was 25, each consisting of 30 s at 95C, 30 s at 56C, 30 s at 72 C, with a final extension at 72C for 5 min. The PCR products were then subjected to sequencing (TsingKe Co., Ltd., Wuhan, China). Results and discussion Evaluation of deaminase activity of APOBEC3A on C, m5C, hm5C and ghm5C APOBEC3A is well known for deaminating cytosine to form uracil in DNA (Figure 1A). In addition to C, previous study demonstrated that APOBEC3A also shows differential deamination activity on m5C and hm5C in DNA (Figure 1A).43 In this respect, this property of APOBEC3A with the different deamination performance

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

on C, m5C, and hm5C raises the possibility that APOBEC3A could be exploited to map hm5C in DNA without the involvement of bisulfite treatment.

Figure 1. The schematic illustration of the deamination of cytosine and cytosine modifications by APOBEC3A protein. (A) Deamination of C, m5C, and hm5C by APOBEC3A protein. The deaminated C, m5C, and hm5C bases pair with dA. (B) Upon glycosylation of hm5C, the formed ghm5C is resistant to the deamination by APOBEC3A protein. The ghm5C base pairs with dG. Table 1. The sequences of the oligodeoxynucleotides (ODNs) used in this study. ODNs DNA-T DNA-U DNA-comp-A DNA-comp-G DNA-C DNA-m5C DNA-hm5C Primer-fwd Primer-rev DNA-E L-DNA-C

L-DNA-m5C

L-DNA-hm5C

Page 4 of 9

enzymatically digested followed by HPLC analysis. The result showed that the signal intensity of cytosine gradually decreased with the increased amount of APOBEC3A protein (Figure S2 in Supporting Information). And the newly formed uridine from the deamination of cytosine was also detected (Figure S2 in Supporting Information), indicating the prepared recombinant APOBEC3A protein has good deamination activity. Next, we evaluated the deamination activity of APOBEC3A protein on different cytosine modifications. SwaI is a restriction enzyme that normally cuts at 5’ATTTAAAT-3’ recognition sequence. When the complementary strand (DNA-comp-A) has the fully matched complementary restriction site (5’-ATTTAAAT3’/3’-TAAATTTA-5’), SwaI can cleave the duplex DNA at T:A site (lane 1 and 2 in Figure 2A). In addition, when the complementary strand (DNA-comp-G) has a mismatched G (5’-ATTTAAAT-3’/3’-TAAGTTTA-5’; 5’-ATTUAAAT-3’/3’-TAAGTTTA-5’), SwaI can still cleave the duplex DNA at T:G or U:G site (lane 3 - 6 in Figure 2A), but not at the full matched C:G site (5’ATTCAAAT-3’/3’-TAAGTTTA-5’, lane 1 in Figure 2B). Deamination of C and m5C by APOBEC3A forms U and T, respectively. Since both U:G and T:G sites can be efficiently cleaved by SwaI, the cleavage of the duplex DNA can serve as the indicator of the deaminase activity of APOBEC3A.

Sequence (from 5’to 3’) 5’-TGAGGAATGAAGTTGATTTAAATGTGATG-3’ 5’-TGAGGAATGAAGTTGATTUAAATGTGATG-3’ 5’-CATCACATTTAAATCAACTTCATTCCTCA-FAM-3’ 5’-CATCACATTTGAATCAACTTCATTCCTCA-FAM-3’ 5’-TGAGGAATGAAGTTGATTCAAATGTGATG-3’ 5’-TGAGGAATGAAGTTGATTm5CAAATGTGATG-3’ 5’-TGAGGAATGAAGTTGATThm5CAAATGTGATG-3’ 5’-GAGATGTGGTGAGTAGAGTGGAGTG-3’ 5’-CACTACACTCCACTCACTACATCCC-3’ 5’-AAAACCGTCGCCATCTCTTCCTATAGTGAGTCGTATTA-3 ’ 5’-GAGATGTGGTGAGTAGAGTGGAGTGTAGATATCACATC ATACAGTCATACATACGATTCAAATGTACATTACAATAAC GTATCTAATCATATCGATTAACTAATCGACATAATAGTGAT GGATTAGGGATGTAGTGAGTGGAGTGTAGTG-3’ 5’-GAGATGTGGTGAGTAGAGTGGAGTGTAGATATCACATC ATACAGTCATACATACGATTm5CAAATGTACATTACAATAA CGTATCTAATCATATCGATTAACTAATCGACATAATAGTGA TGGATTAGGGATGTAGTGAGTGGAGTGTAGTG-3’ 5’-GAGATGTGGTGAGTAGAGTGGAGTGTAGATATCACATC ATACAGTCATACATACGATThm5CAAATGTACATTACAATA ACGTATCTAATCATATCGATTAACTAATCGACATAATAGTG ATGGATTAGGGATGTAGTGAGTGGAGTGTAGTG-3’

We first examined the deamination activity of the prepared recombinant APOBEC3A protein. The polyacrylamide gel electrophoresis analysis showed that the expected band was observed (Figure S1 in Supporting Information), indicating the successful preparation of APOBEC3A protein. We then used a synthesized DNA strand (DNA-E, detailed sequence information can be found in Table 1) to initially evaluate the deamination activity of APOBEC3A protein toward C in DNA. The APOBEC3A-treated DNA and untreated DNA were ACS Paragon Plus Environment

Page 5 of 9 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Figure 2. Characterization of the deaminase activity of APOBEC3A by restriction enzyme-based method. (A) Evaluation of the digestion performance of SwaI on full matched duplex DNA (5’-ATTTAAAT-3’/3’TAAATTTA-5’) and mismatched duplex DNA (5’ATTTAAAT-3’/3’-TAAGTTTA-5’; 5’-ATTUAAAT3’/3’-TAAGTTTA-5’). Two 29-mer DNA (DNA-T and DNA-U) and two FAM-labeled complementary DNA (DNA-comp-A and DNA-comp-G) were used for the evaluation. (B) Evaluation of the deaminase activity of APOBEC3A on C, m5C, hm5C, and ghm5C. Four 29-mer DNA substrates (DNA-C, DNA-m5C, DNA-hm5C, and DNA-ghm5C) were used for the examination. DNA substrate were treated with various amount of APOBEC3A protein. After the reaction, DNA substrates were annealed to the complementary FAM-labeled DNA strand (DNAcomp-G) followed by cleavage with SwaI and analysis with 15% PAGE.

Examination of the effect of APOBEC3A on deaminating C, m5C, hm5C and ghm5C by LC-ESIMS/MS In addition to assessing the characterized deamination activity of APOBEC3A by SwaI-based cleavage assay, we further examined the deamination effect of APOBEC3A by LC-ESI-MS/MS. In this respect, we used four synthesized 150-mer DNA (L-DNA-C, L-DNA-m5C, LDNA-hm5C, and L-DNA-ghm5C, detailed sequence information can be found in Table 1) to perform the examination. The LC-ESI-MS/MS results showed that the original C, m5C and hm5C were disappeared after APOBEC3A treatment (3.0 μg) (Figure 3A-3F), indicating highly efficient deamination of C, m5C and hm5C by APOBEC3A protein.

Using the SwaI-based cleavage assay, we then qualitatively examined the deaminase activity of APOBEC3A on the cytosine modifications in DNA (DNAC, DNA-m5C and DNA-hm5C). A total of 10 pmol of each DNA substrate was treated with various amount of APOBEC3A protein. After the reaction, DNA substrates were annealed to the complementary FAM-labeled DNA strand (DNA-comp-G) followed by cleavage with SwaI. The results showed that the native C substrate (DNA-C) was efficiently deaminated with using 0.1 μg to 1 μg APOBEC3A protein (Figure 2B). m5C substrate (DNAm5C) showed partial deamination with 1.5 μg APOBEC3A protein and complete deamination with 2.0 μg APOBEC3A protein (Figure 2B). However, we also observed the deamination of hm5C substrate (DNA-hm5C) with 2.0 μg of APOBEC3A protein (Figure 2B), which suggested that APOBEC3A cannot well distinguish m5C with hm5C in terms of on the deamination effect. Previous study reported that hm5C in DNA can be efficiently glycosylated by β-GT. We speculated that the glycosylation of hm5C might protect the hm5C from deamination by APOBEC3A protein. In this respect, we prepared the DNA substrate of DNA-ghm5C to assess the deamination of ghm5C by APOBEC3A protein. The SwaIbased cleavage assay showed that no cleavage of the duplex DNA with ghm5C (Figure 2B), indicating that glycosylation of hm5C can indeed prevent the APOBEC3A-mediated deamination of hm5C (Figure 1B). Therefore, the results reveal that APOBEC3A protein can effectively deaminate C and m5C, but shows no observable deamination activity toward glycosylated hm5C (ghm5C). Theses deaminated C, m5C and hm5C can pair with dA instead of dG (Figure 1A), while ghm5C still pairs with dG (Figure 1B), which inspires us to develop the singlenucleotide resolution analysis of hm5C in DNA by virtue of the differential deamination activity of APOBEC3A protein on C, m5C and ghm5C.

Figure 3. Examination of the effect of APOBEC3A on deaminating C, m5C, hm5C and ghm5C by LC-ESIMS/MS. Four synthesized 150-mer DNA (L-DNA-C, LDNA-m5C, L-DNA-hm5C, and L-DNA-ghm5C) were used for the examination. The signal intensity of dC from LDNA-C without APOBEC3A (A) or with APOBEC3A (B) treatment. The signal intensity of m5C from L-DNA-m5C without APOBEC3A (C) or with APOBEC3A (D) treatment. The signal intensity of hm5C from L-DNAhm5C without APOBEC3A (E) or with APOBEC3A (F) treatment. The signal intensity of ghm5C from L-DNA-

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 9

ghm5C without APOBEC3A (G) or with APOBEC3A (H) treatment. However, the ghm5C remained almost unchanged upon APOBEC3A treatment (3.0 μg) (Figure 3G and 3H). These results suggest that APOBEC3A protein is capable of efficiently deaminating C, m5C and hm5C, but not ghm5C, which are consistent with the results obtained by SwaI-based cleavage assay. And we also observed the newly formed uridine from the deamination of cytosine (Data not shown). On the contrary, no obvious changes were observed for the other nucleosides of dA, dG and T upon APOBEC3A treatment (Figure 4), indicating that APOBEC3A protein has high selectivity and only deaminates cytosine and cytosine modifications.

Figure 4. Examination of the effect of APOBEC3A on dA, dG and T by LC-ESI-MS/MS analysis. (A) The signal intensities of dA, dG and T without APOBEC3A treatment. (B) The signal intensities of dA, dG and T with APOBEC3A treatment.

Figure 5. The schematic illustration of the singlenucleotide resolution analysis of hm5C in DNA by APOBEC3A-mediated deamination in combination with sequencing. DNA is firstly treated with β-GT that selectively adds a glycosyl group to hm5C, forming ghm5C. Then the resulting DNA is treated with APOBEC3A followed by sequencing. The original C and m5C will be deaminated by APOBEC3A to form U and T, both of which will read as T during sequencing. While the ghm5C is resistant to deamination and will read as C during sequencing.

Single-nucleotide resolution analysis of hm5C in DNA by APOBEC3A-mediated deamination in combination with sequencing The above results demonstrated that APOBEC3A exhibits highly efficient deamination activity toward C, m5C, and hm5C, but doesn’t deaminate glycosylated hm5C. Therefore, this differential deamination activity of APOBEC3A can be used to establish the single-nucleotide resolution analysis of hm5C in DNA by APOBEC3Amediated deamination in combination with sequencing. Shown in Figure 5 is the proposed analytical strategy. DNA is firstly treated with β-GT that can selectively add a glycosyl group to hm5C, forming ghm5C. Then the resulting DNA is treated with APOBEC3A followed by sequencing. In this strategy, the original C and m5C will be deaminated by APOBEC3A to form U and T, both of which will read as T during sequencing. While the ghm5C is resistant to deamination and will read as C during sequencing. In this respect, the remaining C in the sequence context theoretically only could come from original hm5C, which therefore offers the single-nucleotide resolution analysis of hm5C in DNA (Figure 5). Figure 6. Sequencing analysis of APOBEC3A-treated DNA. Four synthesized 150-mer DNA (L-DNA-C, LDNA-m5C, L-DNA-hm5C, and L-DNA-ghm5C) were used ACS Paragon Plus Environment

Page 7 of 9 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

for sequencing. (A) The sequencing results of L-DNA-C with or without APOBEC3A treatment. (B) The sequencing results of L-DNA-m5C with or without APOBEC3A treatment. (C) The sequencing results of LDNA-hm5C with or without APOBEC3A treatment. (D) The sequencing results of L-DNA-ghm5C with or without APOBEC3A treatment. We used four synthesized 150-mer DNA (L-DNA-C, L-DNA-m5C, L-DNA-hm5C, and L-DNA-ghm5C) to demonstrate the single-nucleotide resolution analysis of hm5C. After APOBEC3A treatment, the resulting DNA was amplified by PCR. The obtained PCR products were then subjected to sequencing analysis. The results showed that all the C in the four DNA substrates (L-DNA-C, LDNA-m5C, L-DNA-hm5C, and L-DNA-ghm5C) read as T (Figure 6A-6D), indicating that C was efficiently deaminated by APOBEC3A to form U. In addition, the m5C in L-DNA-m5C and hm5C in L-DNA-hm5C also read as T (Figure 6B and 6C), indicating that both m5C and hm5C were also efficiently deaminated by APOBEC3A. On the contrary, the ghm5C still read as C after APOBEC3A treatment (Figure 6D), suggesting that the glycosylation of hm5C protect hm5C from the deamination by APOBEC3A protein. Moreover, a series of dilution of the 150-mer L-DNA-ghm5C were treated by APOBEC3A and then subjected to PCR amplification and sequencing analysis. Gel electrophoresis analysis showed that the distinct PCR products were observed with only 1 × 10-22 moles of DNA template (~ 1-2 molecules, Figure S3 in Supporting Information). And similar sequencing results were observed with using low amount of DNA template (1 × 10-22 moles, Figure S4 in Supporting Information) as that with using 100 ng of DNA template (~ 2 × 10-12 moles), which indicated the method can be potentially used for single cell analysis. A recent study showed that APOBEC3A had substrate sequence selectivity.45 While, our sequencing results demonstrated that all the C in the DNA template read as T without sequence bias upon APOBEC3A treatment, suggesting that the substrate sequence selectivity of APOBEC3A could be overcome by optimizing the reaction conditions, such as concentration of APOBEC3A and the reaction time. Collectively, these results demonstrated that the proposed strategy of APOBEC3A-mediated deamination in combination with sequencing (AMD-seq) is capable of analysis of hm5C at single-nucleotide resolution with high sensitivity. We took advantage of the substrate selectivity of APOBEC3A to develop an enzyme-mediated method for single-nucleotide resolution localization of hm5C. This analytical strategy is straight forward and involves no bisulfite treatment. While both TAB-seq and oxBS-seq analytical strategies require bisulfite treatment that is carried out under harsh conditions and can cause substantial degradation of input DNA. On the contrary, the APOBEC3A-mediated deamination is performed under mild conditions, which can avoid the degradation of DNA.

In addition, the AMD-seq is easy to perform and the procedure for mapping hm5C is simpler than that of TABseq and oxBS-seq strategies (Figure S5 and S6 in Supporting Information). Overall, our results reveal the inherent preference of APOBEC3A on the deamination of cytosine and cytosine modifications. Unmodified C and m5C can be deaminated; but the glycosylated hm5C is not subject to significant deamination. Owing to the fundamental mechanism of AMD-seq, future application of this strategy can be performed for the reliable mapping of hm5C in genomewide scale at the single-nucleotide level. Conclusions In the current study, we found that APOBEC3A protein is capable of efficiently deaminating C, m5C and hm5C, but not ghm5C by SwaI-based cleavage assay and LC-ESI-MS/MS analysis. And no obvious changes were observed for dA, dG and T upon APOBEC3A treatment. We then developed the single-nucleotide resolution analysis of hm5C in DNA by utilizing the differential deamination activity of APOBEC3A toward cytosine modifications in conjugation with sequencing. The sequencing results showed that all the C and m5C read as T, while only ghm5C that originates from hm5C read as C, demonstrating the proposed analytical strategy is capable of analysis of hm5C at single-nucleotide level. The enzyme-mediated deamination coupled with sequencing can prevent the substantial degradation of DNA that is typically exists in bisulfite-based methods. In addition, the procedure of the developed AMD-seq method is simpler than previously established TAB-seq and oxBS-seq analytical strategies. Taken together, we proposed a straight forward method for the single-nucleotide resolution analysis of hm5C. We envision that this developed method should be convenient for the reliable mapping of hm5C in genome-wide scale and promote the functional study of hm5C.

ASSOCIATED CONTENT Supporting Information The Supporting Information is available free of charge on the ACS Publications website at DOI:xxxxx. Author Contributions The manuscript was written through contributions of all authors. All authors have given approval to the final version of the manuscript. Notes The authors declare no competing financial interest. The authors noted that one similar work was just recently published in Nat. Biotechnol., 2018, Doi: 10.1038/nbt.4204.

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Acknowledgements We thank the financial support from the National Natural Science Foundation of China (21522507, 21672166, 21728802, 21721005). We thank the help of Prof. Laixin Xia (Southern Medical University, China) on the preparation of APOBEC3A plasmid. References (1) Jones, P. A. Nat Rev Genet 2012, 13. 484-492. (2) Liu, T.; Ma, C. J.; Yuan, B. F.; Feng, Y. Q. Sci China Chem 2018, 61. 381-392. (3) Luo, C.; Hajkova, P.; Ecker, J. R. Science 2018, 361. 1336-1340. (4) Kriaucionis, S.; Heintz, N. Science 2009, 324. 929-930. (5) Tahiliani, M.; Koh, K. P.; Shen, Y.; Pastor, W. A.; Bandukwala, H.; Brudno, Y.; Agarwal, S.; Iyer, L. M.; Liu, D. R.; Aravind, L.; Rao, A. Science 2009, 324. 930-935. (6) Wu, X.; Zhang, Y. Nat Rev Genet 2017, 18. 517-534. (7) Munzel, M.; Globisch, D.; Carell, T. Angew Chem Int Ed Engl 2011, 50. 6460-6468. (8) Wu, H.; Zhang, Y. Cell 2014, 156. 45-68. (9) Stroud, H.; Feng, S.; Morey Kinney, S.; Pradhan, S.; Jacobsen, S. E. Genome Biol 2011, 12. R54. (10) Ito, S.; D'Alessio, A. C.; Taranova, O. V.; Hong, K.; Sowers, L. C.; Zhang, Y. Nature 2010, 466. 1129-1133. (11) Scourzic, L.; Mouly, E.; Bernard, O. A. Genome Med 2015, 7. 9. (12) Chen, M. L.; Shen, F.; Huang, W.; Qi, J. H.; Wang, Y.; Feng, Y. Q.; Liu, S. M.; Yuan, B. F. Clin Chem 2013, 59. 824832. (13) Wang, J.; Tang, J.; Lai, M.; Zhang, H. Mutat Res Rev Mutat Res 2014, 762C. 167-175. (14) Inoue, A.; Zhang, Y. Science 2011, 334. 194. (15) Tang, Y.; Chu, J. M.; Huang, W.; Xiong, J.; Xing, X. W.; Zhou, X.; Feng, Y. Q.; Yuan, B. F. Anal Chem 2013, 85. 6129-6135. (16) Tang, Y.; Xiong, J.; Jiang, H. P.; Zheng, S. J.; Feng, Y. Q.; Yuan, B. F. Anal Chem 2014, 86. 7764-7772. (17) Tang, Y.; Zheng, S. J.; Qi, C. B.; Feng, Y. Q.; Yuan, B. F. Anal Chem 2015, 87. 3445-3452. (18) Yin, R.; Mo, J.; Lu, M.; Wang, H. Anal Chem 2015, 87. 1846-1852. (19) Li, Q. Y.; Yuan, B. F.; Feng, Y. Q. Chem Lett 2018, 47. 1453-1459. (20) Lan, M. D.; Yuan, B. F.; Feng, Y. Q. Chin Chem Lett 2018. Doi: 10.1016/j.cclet.2018.04.021 (21) Liu, S.; Wang, J.; Su, Y.; Guerrero, C.; Zeng, Y.; Mitra, D.; Brooks, P. J.; Fisher, D. E.; Song, H.; Wang, Y. Nucleic Acids Res 2013, 41. 6421-6429. (22) Yu, Y.; Wang, P.; Cui, Y.; Wang, Y. Anal Chem 2018, 90. 556-576. (23) Chen, B.; Yuan, B. F.; Feng, Y. Q. Anal Chem 2018. Doi: 10.1021/acs.analchem.8b04078 (24) Ficz, G.; Branco, M. R.; Seisenberger, S.; Santos, F.; Krueger, F.; Hore, T. A.; Marques, C. J.; Andrews, S.; Reik, W. Nature 2011, 473. 398-402. (25) Pastor, W. A.; Pape, U. J.; Huang, Y.; Henderson, H. R.; Lister, R.; Ko, M.; McLoughlin, E. M.; Brudno, Y.; Mahapatra, S.; Kapranov, P.; Tahiliani, M.; Daley, G. Q.; Liu, X. S.; Ecker, J. R.; Milos, P. M.; Agarwal, S.; Rao, A. Nature 2011, 473. 394-397.

Page 8 of 9

(26) Robertson, A. B.; Dahl, J. A.; Vagbo, C. B.; Tripathi, P.; Krokan, H. E.; Klungland, A. Nucleic Acids Res 2011, 39. e55. (27) Song, C. X.; Szulwach, K. E.; Fu, Y.; Dai, Q.; Yi, C.; Li, X.; Li, Y.; Chen, C. H.; Zhang, W.; Jian, X.; Wang, J.; Zhang, L.; Looney, T. J.; Zhang, B.; Godley, L. A.; Hicks, L. M.; Lahn, B. T.; Jin, P.; He, C. Nat Biotechnol 2011, 29. 68-72. (28) Williams, K.; Christensen, J.; Pedersen, M. T.; Johansen, J. V.; Cloos, P. A.; Rappsilber, J.; Helin, K. Nature 2011, 473. 343-348. (29) Wu, H.; D'Alessio, A. C.; Ito, S.; Wang, Z.; Cui, K.; Zhao, K.; Sun, Y. E.; Zhang, Y. Genes Dev 2011, 25. 679-684. (30) Xu, Y.; Wu, F.; Tan, L.; Kong, L.; Xiong, L.; Deng, J.; Barbera, A. J.; Zheng, L.; Zhang, H.; Huang, S.; Min, J.; Nicholson, T.; Chen, T.; Xu, G.; Shi, Y.; Zhang, K.; Shi, Y. G. Mol Cell 2011, 42. 451-464. (31) Peng, J.; Xia, B.; Yi, C. Sci China Life Sci 2016, 59. 219226. (32) Lister, R.; Pelizzola, M.; Dowen, R. H.; Hawkins, R. D.; Hon, G.; Tonti-Filippini, J.; Nery, J. R.; Lee, L.; Ye, Z.; Ngo, Q. M.; Edsall, L.; Antosiewicz-Bourget, J.; Stewart, R.; Ruotti, V.; Millar, A. H.; Thomson, J. A.; Ren, B.; Ecker, J. R. Nature 2009, 462. 315-322. (33) Jin, S. G.; Kadam, S.; Pfeifer, G. P. Nucleic Acids Res 2010, 38. e125. (34) Song, C. X.; Clark, T. A.; Lu, X. Y.; Kislyuk, A.; Dai, Q.; Turner, S. W.; He, C.; Korlach, J. Nat Methods 2012, 9. 75-77. (35) Li, W. W.; Gong, L.; Bayley, H. Angew Chem Int Ed Engl 2013, 52. 4350-4355. (36) Eid, J.; Fehr, A.; Gray, J.; Luong, K.; Lyle, J.; Otto, G.; Peluso, P.; Rank, D.; Baybayan, P.; Bettman, B.; Bibillo, A.; Bjornson, K.; Chaudhuri, B.; Christians, F.; Cicero, R.; Clark, S.; Dalal, R.; Dewinter, A.; Dixon, J.; Foquet, M.; Gaertner, A.; Hardenbol, P.; Heiner, C.; Hester, K.; Holden, D.; Kearns, G.; Kong, X.; Kuse, R.; Lacroix, Y.; Lin, S.; Lundquist, P.; Ma, C.; Marks, P.; Maxham, M.; Murphy, D.; Park, I.; Pham, T.; Phillips, M.; Roy, J.; Sebra, R.; Shen, G.; Sorenson, J.; Tomaney, A.; Travers, K.; Trulson, M.; Vieceli, J.; Wegener, J.; Wu, D.; Yang, A.; Zaccarin, D.; Zhao, P.; Zhong, F.; Korlach, J.; Turner, S. Science 2009, 323. 133-138. (37) Yu, M.; Hon, G. C.; Szulwach, K. E.; Song, C. X.; Zhang, L.; Kim, A.; Li, X.; Dai, Q.; Shen, Y.; Park, B.; Min, J. H.; Jin, P.; Ren, B.; He, C. Cell 2012, 149. 1368-1380. (38) Booth, M. J.; Branco, M. R.; Ficz, G.; Oxley, D.; Krueger, F.; Reik, W.; Balasubramanian, S. Science 2012, 336. 934937. (39) Tanaka, K.; Okamoto, A. Bioorg Med Chem Lett 2007, 17. 1912-1915. (40) Conticello, S. G. Genome Biol 2008, 9. 229. (41) Siriwardena, S. U.; Chen, K.; Bhagwat, A. S. Chem Rev 2016, 116. 12688-12710. (42) Wijesinghe, P.; Bhagwat, A. S. Nucleic Acids Res 2012, 40. 9206-9217. (43) Schutsky, E. K.; Nabel, C. S.; Davis, A. K. F.; DeNizio, J. E.; Kohli, R. M. Nucleic Acids Res 2017, 45. 7655-7665. (44) Lan, M. D.; Xiong, J.; You, X. J.; Weng, X. C.; Zhou, X.; Yuan, B. F.; Feng, Y. Q. Chem-Eur J 2018, 24. 9949-9956. (45) Silvas, T. V.; Hou, S.; Myint, W.; Nalivaika, E.; Somasundaran, M.; Kelch, B. A.; Matsuo, H.; Kurt Yilmaz, N.; Schiffer, C. A. Sci Rep 2018, 8. 7511.

ACS Paragon Plus Environment

Page 9 of 9 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Table of Contents

9 Environment ACS Paragon Plus