In Vitro Selection with a Site-Specifically Modified RNA Library

Jul 9, 2019 - RNA behavior in the cell is regulated by its interactions with a large complement of RNA-binding proteins. ... RNA-binding proteins (or ...
0 downloads 0 Views 3MB Size
Subscriber access provided by KEAN UNIV

Article

In vitro selection with a site-specifically modified RNA library reveals the binding preferences of N-methyladenosine (mA) reader proteins 6

6

A. Emilia Arguello, Robert W. Leach, and Ralph E. Kleiner Biochemistry, Just Accepted Manuscript • DOI: 10.1021/acs.biochem.9b00485 • Publication Date (Web): 09 Jul 2019 Downloaded from pubs.acs.org on July 17, 2019

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 41 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

In vitro selection with a site-specifically modified RNA library reveals the binding preferences of N6-methyladenosine (m6A) reader proteins

A. Emilia Arguello1, Robert W. Leach2, Ralph E. Kleiner1*

1Department 2Lewis-Sigler

of Chemistry, Princeton University, Princeton, NJ 08544, USA Institute for Integrative Genomics, Princeton University, Princeton, NJ

08544, USA

*[email protected]

ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ABSTRACT Epitranscriptomic RNA modifications can serve as recognition elements for the recruitment of effector proteins (i.e., “readers”) to modified transcripts. While these interactions play an important role in mRNA regulation, there is a major gap in our understanding of the sequence determinants critical for binding of readers to modified sequence motifs. Here, we develop a high-throughput platform, relying upon in vitro selection with a site-specifically modified random sequence RNA library and nextgeneration sequencing, to profile the binding specificity of RNA modification reader proteins. We apply our approach to interrogate the effect of sequence context on the interactions of YTH-domain proteins with N6-methyladenosine (m6A)-modified RNA. We find that while the in vitro binding preferences of YTHDC1 strongly overlap with the well-characterized DR(m6A)CH motif, the related YTH-domain proteins YTHDF1 and YTHDF2 can bind tightly to non-canonical m6A-containing sequences. Our results reveal the principles underlying substrate selection by m6A reader proteins and provide a powerful approach for investigating protein-modified RNA interactions in an unbiased manner.

ACS Paragon Plus Environment

Page 2 of 41

Page 3 of 41 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

FOR TOC USE ONLY

ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

INTRODUCTION RNA behavior in the cell is regulated by its interactions with a large complement of RNA-binding proteins1. These proteins recognize specific RNA molecules and affect gene expression through the control of processes, including splicing, turnover, trafficking, and translation. Characterizing the molecular determinants underlying RNAprotein binding, including how RNA sequence and structure influence these interactions, is therefore an important component towards a unified understanding of gene expression regulation. Recently, a growing number of chemical modifications on eukaryotic mRNA2 have emerged as a new modality for post-transcriptional gene regulation and have been termed the “epitranscriptome.” The most abundant of these modifications, N6methyladenosine (m6A), is found at ~10,000 sites in the human transcriptome3, 4. The m6A modification affects mRNA stability5, translation6-9, splicing10, and nuclear export11, 12, and has been implicated in diverse biological processes including development13-15, innate immunity16, 17, DNA damage signaling18, and cellular proliferation19. Emerging evidence suggests that other epitranscriptomic marks such as 5-methylcytidine20, N1methyladenosine21, pseudouridine22, 23, N4-acetylcytidine24, 2’-O-methylation25 and N6,2′O-dimethyladenosine26, 27 can also affect mRNA behavior in the cell. New approaches are needed in order to study the molecular mechanisms underlying the interpretation of this RNA modification code.

ACS Paragon Plus Environment

Page 4 of 41

Page 5 of 41 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

How do post-transcriptional modifications affect mRNA properties in the cell? On the one hand, modifications may influence RNA structure by modulating inter- or intramolecular RNA-RNA interactions. Such a mechanism has been demonstrated for m6A28 and has many precedents among tRNA modifications29. Alternatively, modifications can also serve as a binding platform to recruit modification-specific RNAbinding proteins (or “readers”) to modified RNA transcripts30. Indeed, numerous m6A readers and anti-readers have now been identified and functionally characterized3, 5, 31, 32. Most prominent among these are the YTH-domain proteins which bind to m6A-modified sequences and are broadly conserved in eukaryotes33. The human genome encodes 5 YTH-domain proteins with distinct biological functions5, 6, 9, 10, 34. While we now have structural models for how these proteins bind to methylated RNA35-37, we lack a comprehensive understanding of how m6A readers select their mRNA substrates, including the sequence determinants underlying these interactions. RNA-protein interactions can be studied in cells using cross-linking and immunoprecipitation combined with high-throughput sequencing approaches (e.g., HITS-CLIP, CLIP-Seq, PAR-CLIP, CRAC)38. While these methods are easily generalizable and report on native RNA-protein interactions in a highly parallel fashion, they have not been adapted to specifically interrogate RNA modification-dependent interactions. Moreover, sequence bias can be introduced through reliance upon base-specific

ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

photocrosslinking chemistry or non-canonical nucleotides, and they do not typically provide insight into interaction affinity. An alternative approach to investigate RNA-protein interactions involves the use of the SELEX/in vitro selection strategy described by Gold39 and Szostak40. In this approach, an unbiased random sequence RNA library is subjected to affinity-based selection against a protein target of interest. Iterated cycles of in vitro transcription, selection, reverse transcription, and amplification enable the identification of tightbinding sequences and reveal the sequence-binding preferences of the target protein. Indeed, in vitro selection has been applied to query the substrate binding preferences of numerous RNA-binding proteins in an unbiased manner39, 41-43. While modified bases have been incorporated into random sequence RNA libraries used for in vitro selection44, 45,

since these libraries are made by in vitro transcription of a randomized DNA template,

modification location, and stoichiometry cannot be controlled, except in the case where one of the native NTPs is replaced entirely with a modified NTP. In this manuscript, we interrogate the substrate preferences of three mammalian m6A reader proteins using in vitro selection with a random-sequence, site-specifically modified RNA library and high-throughput sequencing (Fig. 1). First, we develop conditions for the direct synthesis of random-sequence RNA containing a single, defined m6A site. Next, we perform affinity selection of an m6A-modified RNA library against readers YTHDF1, YTHDF2, and YTHDC1, and identify bound sequences using Illumina

ACS Paragon Plus Environment

Page 6 of 41

Page 7 of 41 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

sequencing. Finally, bioinformatic analysis and sequence clustering, as well as biophysical validation of enriched k-mer motifs centered around the m6A residue, provides a fingerprint of the binding preferences for each m6A reader protein.

Figure 1. In vitro selection strategy to reveal the sequence preferences of m6A reader proteins. A large pool of random, m6A-containing RNA sequences is interrogated with a bead-bound reader, and enriched sequences are selectively eluted and elucidated by high-throughput sequencing.

MATERIALS AND METHODS Oligonucleotide synthesis Solid-phase synthesis of random sequence libraries and fluorescein-labeled RNA probes was performed on an ABI 394 oligonucleotide synthesizer (Applied Biosystems) using standard conditions and commercial phosphoramidites and oligosynthesis reagents (Glen Research). For synthesis of random sequence RNA, a custom mix was

ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 41

created by combining each TBDMS-protected RNA phosphoramidite in the appropriate ratio. After cleavage and deprotection, RNA and RNA-DNA hybrid libraries were purified by denaturing PAGE. 5’-fluorescein-labeled RNA probes were purified by reverse-phase HPLC and validated by high-resolution ESI-MS (Table S1). The sequence for

the

hybrid

DNA-RNA

libraries

1

and

2

is

as

follows:

5’-

AAGCTTCCCGGGCTGCAGGGATCC-NNNNNA*NNNNGCCGCGGGAATTCTCCCT-3’, where the constant regions consist of DNA and the central region is randomized RNA with either m6A (library 1) or adenosine (library 2) at position 6. The sequence of RNA library 3 is 5’-NNNNNNN-m6A-NNNNNNNU-3’. Reverse transcription and Sanger sequencing Following the manufacturer’s protocol, each library (1 fmol) was reversetranscribed in a 10 µL volume with SuperScript II reverse transcriptase (Invitrogen, 0.25 µL) using reverse primer 1 (Table S2). The incomplete first-strand cDNA was extended with the Klenow fragment of DNA polymerase I (NEB, 0.5 µL) at 37oC for 1 hour, and the original DNA-RNA template was digested with RNAse H (NEB, 0.5 µL) at 37oC for 20 minutes. Each enzyme was heat inactivated at 75oC for 15 minutes before adding the next. A small aliquot of the RT reaction (2-5 µL) was PCR-amplified under standard conditions for OneTaq DNA polymerase (NEB) using the aforementioned reverse primer 1, forward primer 2 (Table S2), a 60.5oC annealing temperature, and a 30 second elongation. The size of the amplicon was checked by gel electrophoresis on 3% agarose, and a small aliquot of

ACS Paragon Plus Environment

Page 9 of 41 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

the unpurified amplified mixture (5-8 µL), along with the forward and reverse primers, was submitted for Sanger sequencing (Genewiz) to assess base content across the random region. Results were analyzed using SnapGene software. Illumina sequencing Library 1 was processed for reverse transcription and amplification as described above, and the amplicon was purified by agarose gel (3%) electrophoresis. The amount of double-stranded template was then quantified by Quant-iT PicoGreen assay (Invitrogen) following the supplier’s directions, and 20 µL of amplicon (5 ng/µL) were submitted for Illumina sequencing. To assess percent incorporation of each base at the random

position,

reads

were

uploaded

to

the

Galaxy

workflow

system

(https://galaxy.princeton.edu) and processed with the FastQC tool. For post-selection sequencing of library 3, a small portion of the selection elutions (6 µL) was converted to cDNA using the NEBNext Ultra RNA Library Prep Kit for Illumina (NEB) following the supplier’s instructions. Given the small amount of material in each elution, the kit’s 3’ adapter, RT primer, and 5’ adapter were diluted 1:1 in RNAse-free water prior to use. Differently indexed primers (index 4, 5, and 6) were employed for each sample at the PCR stage. After amplification, barcoded PCR amplicons were gel purified, combined, and submitted for Illumina sequencing on a MiSeq Micro flowcell (Illumina) as paired-end 2 x 150 nt reads following the manufacturer’s protocol. Bioinformatics

ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Sequences were uploaded to and demultiplexed on the Princeton HTSeq database system and transferred to the Princeton Galaxy instance, where read quality was assessed using FastQC. Adapters were trimmed using Cutadapt (Galaxy Version 1.16.4), and the 15-base randomized regions were excised using "Trim sequences" (Galaxy Version 1.0.2). Results were then downloaded and further analyzed on a SLURM compute cluster. Sequences were filtered for the methylated (or unmethylated control) adenosine in the template-designed position. Sequence case was changed to lowercase to identify the target adenosine. Positional base frequencies were calculated and overall flat transfac motif logos generated using WebLogo 3.6.0. Observed/expected ratios were calculated based on the wikiselev/bioinformaticsalgorithms github page (https://github.com/wikiselev/bioinformaticsalgorithms/wiki/Kmer-expected-number-of-occurrences-in-a-DNA-string). Casesensitive k-mers were counted and sorted by descending abundance. They were then greedily clustered by 70% identity (without replacement). Logos were generated per cluster using ceqlogo from MEME suite 4.10.0_1. All steps, unless otherwise noted were performed using custom in-house perl scripts and executed on the cluster in batch per k-mer size. Protein expression and purification Plasmids encoding cDNA for YTH proteins were obtained from Addgene: YTHDC1 (NP_001026902.1) (#85167)46, YTHDF1 (NP_060268.2) (# 70087)5, and YTHDF2

ACS Paragon Plus Environment

Page 10 of 41

Page 11 of 41 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

(NP_057342.2) (# 52300)5. The YTH domains of these proteins were cloned into pGEX-6P1 (YTHDF2) or a pET28a vector (YTHDC1 and DF1) for protein expression in Escherichia coli. All sequence-verified constructs were transformed into E. coli strain BL21 (Rosetta). His6-YTHDC1 (residues 345-509) and His6-YTHDF1 (residues 389-523) were expressed overnight in at 18oC with 0.2 mM isopropyl-β-D-thiogalactopyranoside (IPTG). Cells were lysed by sonication in lysis buffer (20 mM HEPES pH 7.4, 200 mM NaCl, 5 mM mercaptoethanol, 5 mM imidazole, 0.5% Triton X-100, supplemented with 1 mM PMSF, EDTA-free protease inhibitor tablet (Roche), and benzonase) and purified by affinity chromatography with Ni-NTA resin (ThermoFisher) according to the manufacturer’s recommendations. GST-YTHDF2 (residues 383-553) was overexpressed overnight in at 18oC with 0.2 mM IPTG. Lysis by sonication was performed in buffer containing 1X TBS, 150 mM NaCl, 5 mM EDTA, 1 mM DTT, 0.2 mg/mL lysozyme, and 1% Triton X-100, supplemented with 1 mM PMSF, protease inhibitor tablet, and benzonase. The lysate was purified using Pierce glutathione agarose resin (ThermoFisher) following manufacturer’s instructions. Following affinity purification, all proteins were fractionated on a HiLoad 16/600 Superdex 200 pg preparative size exclusion column (GE Healthcare) with buffer containing 20 mM HEPES pH 7.4, 220 mM NaCl, 1 mM DTT. The most concentrated fractions were combined and further concentrated to 10-12 mg/mL. In vitro selection protocol

ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

For selections with libraries 1 and 2, His6-tagged YTH domains (50 pmol) were immobilized to pre-equilibrated magnetic His-Tag Dynabeads (Invitrogen) in binding buffer (200 µL) containing 50 mM sodium phosphate pH 8.0, 300 mM NaCl, and 0.01% Tween 20 by incubating overnight at 4oC with end-to-end rotation (experimental binding capacity: YTHDC1 = 4 µg protein/µL bead slurry, YTHDF1 = 3.5 µg protein/µL bead slurry). Bead-bound proteins were washed with washing buffer (200 µL, 50 mM sodium phosphate pH 8.0, 300 mM NaCl, 0.05% Tween 20) three times, the last wash being supplemented with 100 µg/mL salmon sperm DNA as blocking agent. The RNA library (10 pmol) was then applied to the washed beads in selection buffer (100 µL, 50 mM sodium phosphate pH 8.0, 300 mM NaCl, 0.05% Tween 20, 100 µg/mL salmon sperm DNA), and the mix was incubated at room temperature for 1 hour with shaking. The unbound flow-through was discarded, and the beads were washed three times with washing buffer, using the last wash to transfer the beads to a clean tube. Protein-library adducts were eluted with elution buffer (50 µL, 50 mM sodium phosphate pH 8.0, 300 mM NaCl, 300 mM imidazole) at room temperature for 10 minutes. The elution was isolated and desalted using an Illustra MicroSpin G-25 column (GE Healthcare) following the manufacturer’s protocol. Selections with the GST-tagged YTHDF2 YTH domain were performed on Pierce glutathione magnetic agarose beads (Invitrogen, experimental binding capacity = 16 µg protein/µL settled bead) following the above protocol with the following buffers: binding buffer (1X TBS pH 7.4, 3 mM DTT, 0.01% Tween 20), washing

ACS Paragon Plus Environment

Page 12 of 41

Page 13 of 41 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

buffer (1X TBS pH 7.4, 1 mM DTT, 0.05% Tween 20), selection buffer (1X TBS pH 7.4, 1 mM DTT, 0.05% Tween 20, 100 µg/mL salmon sperm DNA), and elution buffer balanced at pH 8 (1X TBS pH 7.4, 1 mM DTT, 50 mM reduced glutathione). The selection of library 3 against YTH proteins was carried out using the aforementioned selection protocol, but the library and protein-bead amounts were increased 10-fold. Washing, application, and elution volumes were kept the same. Prior to selection, library 3 was 5’-phosphorylated using T4 polynucleotide kinase (NEB) following the manufacturer’s recommendations. Quantitative reverse-transcription PCR All measurements were carried out in triplicate on a Viia 7 Real-Time PCR System (Applied Biosystems) using a MicroAmp Fast Optical 96-well plate (Applied Biosystems). Following selection, a small portion of the elutions containing the enriched sequences (1 µL) was converted to first-strand DNA with SuperScript II as indicated above (see Sanger sequencing) using primer 1 as the reverse primer (Table S2). The qPCR reactions were prepared using PowerUP SYBR Green Master Mix (Invitrogen) according to the manufacturer’s recommendations, including primer 2 as the forward primer (Table S2). Recoveries from each elution sample were determined by fitting the experimental Ct value to the pertinent standard curve (Figure S3). Binding assays All MST experiments were conducted in triplicate at 25oC on a Monolith NT.115 instrument (Nanotemper) using standard Monolith NT.115 capillaries. The following

ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

parameters were employed in Expert Mode: blue laser excitation, 60% excitation power, 30 seconds MST on, 5 seconds MST off. The buffer for the experiment and in which the 2X probe stocks and the 2X protein stocks were prepared was 20 mM Tris HCl pH 7.5, 150 mM NaCl, 5 mM MgCl2, and 0.05% Tween 20. GGACU control and GGm6ACU control probes (50 nM working concentration) were titrated with decreasing concentrations of the purified YTH domains of YTHDC1 and YTHDF2 (12-point titration in 2-fold dilutions) starting with an initial protein concentration of 50 µM YTHDC1 or YTHDF2. Probes 1/2 (30 nM) and 3/4 (50 nM) were titrated against the YTH domain of YTHDC1 (16-point titration in 2-fold dilutions, 100 µM initial protein concentration for 1 and 3, 176 µM initial concentration for 2 and 4). Probes 5/6 (50 nM) were titrated against the YTH domain of YTHDF1 (16-point titration in 2-fold dilutions, 140 µM initial concentration for probe 5, 192 µM initial concentration for probe 6). Probes 7/8 (50 nM) were titrated against the YTH domain of YTHDF2 (16-point titration in 2-fold dilutions, 100 µM initial concentration for both probes). Mixtures were incubated for 15 minutes at room temperature before loading into the capillaries and MST measurements taken. Data were recorded in the MO.Control software (Nanotemper) and analyzed in the MO.Affinity Analysis software (Nanotemper) using TJump analysis. MST Values were normalized, plotted against protein concentration, and fit to a four-parameter doseresponse equation (Hill model) to determine the dissociation constant (Kd). Graphs in the main text and SI were generated with GraphPad Prism.

ACS Paragon Plus Environment

Page 14 of 41

Page 15 of 41 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

RESULTS Synthesis of RNA libraries In order to profile the binding preferences of m6A reader proteins using in vitro selection (Fig. 1), we needed to generate a library of m6A-containing RNA sequences. Since RNA-binding proteins have been shown to recognize short sequence motifs1 and library complexity increases rapidly with sequence length, we designed a short 10-mer oligonucleotide with a single m6A residue at position 6, resulting in a pool of 49 (~260,000) unique, m6A-containing sequences (library 1, Fig. 2A). Additionally, we flanked the random RNA region with defined DNA primer binding sites to facilitate reverse transcription, PCR amplification, and sequencing analysis after the in vitro binding selection. While random sequence RNA libraries containing modified and unmodified nucleotides can be prepared by in vitro transcription of the corresponding random sequence DNA44, 45, enzymatic synthesis cannot install modified bases site-specifically. Therefore, we prepared the library directly by solid-phase oligonucleotide synthesis using an m6A phosphoramidite, a custom mix of A, C, G, and U designed to generate equal proportions of each unmodified base at the randomized positions, and deoxynucleotide phosphoramidites for the flanking DNA primer binding sites.

ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

A

R

NH

N N

5’

B

DNA primer binding site

Page 16 of 41

N N

Library 1: R = Me Library 2: R = H

NNNNN-A*-NNNN RNA

DNA primer binding site

3’

Library 1 … G A T C C N N N N N m6A N N N N G C C G C …

C

Library 2 … GATCC NNNNN ANNNN GCCGC…

D

Figure 2. Design and characterization of random sequence libraries used in this study. (A) Structure of libraries 1 and 2. The libraries consist of a random RNA region with a central m6A (1) or adenosine (2) residue flanked by constant DNA sequences as primer binding sites. (B-C) Relative abundance of each ribonucleobase across the random regions of library 1 and 2. The purified libraries were reverse-transcribed, PCR-amplified, and analyzed by Sanger sequencing (see Fig. S2 for gel characterization). (D) Percent incorporation of each base across the random sequence region in library 1. The library

ACS Paragon Plus Environment

Page 17 of 41 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

was prepared as in panel B and subjected to by high-throughput sequencing (Illumina) to quantify base content at the random region.

First, we developed reaction conditions for direct chemical synthesis of random sequence RNA. We started by synthesizing a test library using a custom mix of TBDMSprotected RNA phosphoramidites (A:C:G:U = 0.26:0.25:0.29:0.20) based upon conditions reported by Bartel and co-workers for random sequence DNA synthesis47. After synthesis and purification, we subjected the library to reverse transcription, PCR amplification, and Sanger sequencing to measure the relative abundance of each base at the randomized positions. Interestingly, our initial library contained over-incorporation of G at the expense of C (Fig. S1), suggesting that relative RNA phosphoramidite reactivity differs from that of the corresponding DNA monomers. Therefore, we prepared several additional libraries using phosphoramidite mixes with varying compositions (decreasing the concentration of G monomer while increasing C monomer concentration). Sanger sequencing analysis of these libraries enabled the selection of optimal reaction conditions (A:C:G:U = 0.28:0.35:0.15:0.22) that produced comparable amounts of the 4 bases across the random sequence region (Fig. S1). This custom mix was then used to synthesize library 1, containing the RNA motif NNNNN-m6A-NNNN (Fig. 2B, Fig. S2) and the corresponding unmodified library (2, Fig. 2C, Fig. S2) containing the motif NNNNN-ANNNN. We subjected library 1 to high-throughput sequencing (Illumina) and observed comparable distribution of all bases at the random sites (between 22% and 27%) and

ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

exclusive presence of adenosine signal (m6A is read as A during reverse transcription) at the modified position, further validating our synthetic protocol (Fig. 2D). Validation of selection platform with YTH-domain proteins Next, we evaluated the ability of m6A reader proteins to bind and enrich bona fide substrate sequences by affinity selection (Fig. 3A). For this purpose, we chose YTHdomain proteins YTHDC1, YTHDF1, and YTHDF2, which have all been characterized as m6A readers, and tested binding of our m6A-modified and unmodified random sequence libraries. Briefly, library 1 or 2 was incubated with an excess of bead-immobilized YTH domain and bound sequences were eluted after stringent washing. Quantitative reversetranscription PCR (RT-qPCR), using appropriate standard curves generated from library 1 and 2 (Fig. S3), was then used to measure the amount of library present in the elution. Analysis of selections against all 3 YTH-domain proteins indicated that only a minor fraction, ranging from 10-7 to 10-4, of the input library interacted with the bead-bound protein (Fig. 3B). Notably, for all 3 m6A readers that we tested, we observed ~100-fold increased recovery of the m6A-modified library compared to the unmodified library (Fig. 3B), indicating that our in vitro selection conditions can reliably distinguish sequences based upon their affinity, and validating the preference of YTH-domain proteins for m6Amodified RNA substrates.

ACS Paragon Plus Environment

Page 18 of 41

Page 19 of 41 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

A

R R

NH

N N R

N N

NH

N N

N

N

R

NH

N

N

m6A reader

N

R H NH N N N N

NH

N N R

N N

NH

N N

RT-qPCR

N N

N

Library 1 (R = Me) Library 2 (R = H)

bound sequences

B

Figure 3. Recovery of sequences in libraries 1 and 2 after affinity selection with YTHdomain reader proteins. (A) Workflow for qPCR validation of selection. Libraries 1 and 2 are probed for binding to immobilized YTH readers, bound targets are enriched, and recovery is quantified by qPCR. (B) Recovery of libraries 1 and 2 upon binding to YTHdomain proteins as determined by qPCR (see Fig. S3 for qPCR standard curves). Libraries were incubated with excess immobilized YTH reader, and after strenuous washing bound sequences were selectively eluted. The eluted targets were then reversetranscribed, and the resulting first-strand cDNA was amplified by qPCR. Amount of material in each elution was extrapolated from the experimental threshold cycles (Ct) and the respective qPCR standard curve. Values represent mean +/- s.d. (n=3). High-throughput sequencing analysis of YTH-domain protein selections Having validated our proposed selection strategy, we sought to elucidate the identities of each m6A-reader’s preferred sequences through high-throughput sequencing. To avoid any potential bias that could be introduced by the DNA priming

ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

regions of libraries 1 and 2 (Fig. 2A), we synthesized a new 15-mer RNA-only library, 3, consisting of a single m6A nucleotide surrounded on both sides by seven fully randomized positions (Fig. S4). We then performed in vitro selection against YTHDC1, YTHDF1, and YTHDF2, and prepared bound library sequences for high-throughput sequencing by adaptor ligation, reverse transcription, and PCR amplification. Illuminabased sequencing was then performed on these samples yielding between 1-2 million sequence reads per selection. In order to analyze selection results, we developed a custom script to extract the abundance of k-mers centered around the m6A residue. Based on our sequencing coverage and the assumption that enrichments after 1 round of selection are likely to be modest, we considered k-mers up to length 11 (~106 possible sequences). We first chose to focus on the 10 most abundant 5-mer motifs from each selection (Table 1), as the consensus sequence motif for m6A modification sites as mapped by antibody-based sequencing is of the form DR(m6A)CH (D = A/G/U; R = A/G; H = A/C/U)3, 4, 48. Together, these top 5-mer motifs comprise ~8-10% of all sequence reads for their respective selection. Consistent with the structural homology between YTHDC1, YTHDF1, and YTHDF233, we observed considerable overlap in their enriched 5-mer motifs, with 4 out of the top 10 motifs in each selection (GGACG, GGACA, GUAGA, and GGACU) (Table 1, yellow) shared between all three proteins, and 8/10 top 5-mer motifs shared between YTHDF1 and YTHDF2 (which share 90% sequence identity in their YTH domains, as

ACS Paragon Plus Environment

Page 20 of 41

Page 21 of 41 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

compared to only ~30% sequence identity with YTHDC1). Among the top 5-mer motifs enriched in all three selections, we found GG(m6A)CU, the most abundant m6Acontaining 5-mer in the mammalian transcriptome and a validated substrate of YTH domain proteins3, 4, 35, 48, suggesting that our in vitro selection approach can effectively capture known binding sequences. We also found GG(m6A)CA48, another abundant DR(m6A)CH-matching motif (Table 1), as well as GG(m6A)CG, which has been reported as the most abundant non-DR(m6A)CH m6A-containing pentamer sequence in mammalian cells48.

ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Position

YTHDC1

YTHDF1

Page 22 of 41

YTHDF2

5-mers 1 2 3 4 5

GGACG (1.35)a GGACA (0.98) AGACG (0.93) GUAGA (0.88) GGACU (0.85)

CUAGA (1.30) GUAGA (1.23) GGACG (1.08) CGAUC (1.03) CUAGU (0.94)

GUAGA (1.15) GGACG (0.94) CUAGA (0.91) GGACA (0.87) GUAGG (0.82)

6

GAACG (0.82)

GGACA (0.91)

CGAUC (0.74)

7 8 9 10

AGACU (0.72) GUAGG (0.71) GAACU (0.59) GAACA (0.59)

GCAGA (0.91) GGACU (0.85) GGAGA (0.82) CGACU (0.79)

GCAGA (0.73) GGACU (0.71) GGAGA (0.70) AGACG (0.67)

11-mers 1 2 3 4 5 6 7 8 9 10

AAAGGACGUGG (0.0050) UUUGGACGUGG (0.0039) AAAGGACGUGA (0.0037) GAUCUACUGAA (0.0036) AAUGGACGUGA (0.0035) UGACGAUCUGA (0.0034) UGAAGACGUGG (0.0032) AAAAGACUGGG (0.0032) GGGCAAAAGAG (0.0032) AAAAUAAAGGG (0.0031)

CGGCUAUAGAA (0.0080) CGGCUAGAAUA (0.0079) GGCCGAUCUGA (0.0078) CGGCGAUCUUU (0.0076) CGGCUAUUUGA (0.0075) CGGCUAGAAUU (0.0071) CGGCUAUUGAA (0.0069) CGGCUAUGAAU (0.0068) CGGCUAGAGAA (0.0067) CGACGAUCUGA (0.0066)

GUGCUAUAGAA (0.0062) GUGCUAUAGAU (0.0046) CGGCUAGAAUA (0.0039) CGGCUAGAACA (0.0037) CGGCUAUGAAA (0.0036) GGCCGAUCUGA (0.035) CGGCUAGAAUU (0.0035) GGCGUAGAAAA (0.0035) CGCCGAUCUGA (0.0035) CGGCUAGUAGA (0.003)

a

percent abundance for each sequence (sequence counts/total sequence counts for selection) is shown in parenthesis.

Table 1: The 10 most abundant m6A-centered 5-mers and 11-mers enriched upon in vitro binding selections with YTH readers. 5-mers and 11-mers shared between selections are highlighted in yellow and purple, respectively. The enriched 5-mer motifs in our library selection against YTHDC1 show clear overlap with the canonical m6A-containing sites that have been identified in transcriptomic modification-sequencing studies. There is a strong preference for C after the m6A residue (8/10 of the top 10 motifs possess this feature) (Table 1) and clear

ACS Paragon Plus Environment

Page 23 of 41 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

selection of purines at the n-1 and n-2 positions, consistent with reported m6Amodifications sites3, 4, 48, but seemingly less selection at the n+2 position. In contrast, our selections with YTHDF1 and YTHDF2 appear to enrich a greater variety of motifs including several non-canonical sequences (e.g., CUAGA, CUAGU, Table 1) bearing little resemblance to the DR(m6A)CH consensus motif. In order to examine the wider sequence context of enriched m6A-containing motifs, we next analyzed 11-mers centered around m6A (Table 1). Among the top 10 most abundant 11-mers enriched in the YTHDC1, YTHDF1 and YTHDF2 selections, we observed sequence abundances ranging from 0.003-0.008% of all sequence counts, indicating enrichment factors of ~30-80-fold, assuming equal abundance of all sequences in the starting pool. Again, we saw enrichment of similar motifs in the selections with YTHDF1 and YTHDF2 (Table 1, 3/10 shared 11-mer motifs in purple), further demonstrating the biochemical similarities between these two proteins. Interestingly, none of the enriched 11-mers in these two selections contained the DR(m6A)CH sequence around the m6A residue, suggesting that this motif is not required for recognition by YTHDF1/2 proteins. In contrast, in the YTHDC1 selection, we found strong enrichment for 11-mers containing a central 5-mer GG(m6A)CG sequence and the DR(m6A)CHmatching AG(m6A)CU motif (Table 1). As with its enriched 5-mers, YTHDC1 maintains a preference for C at the +1 position (Table 1; 6/10 sequences). Taken together, the similarities in the YTHDF1/2 selection results compared with YTHDC1 support the

ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

notion that protein sequence (and presumably structural) similarity underlies binding preference for distinct m6A-containing sequences.

In vitro binding analysis of YTH-domain proteins with selection motifs To validate and further expand upon our selection results, we characterized the affinity of enriched 11-mer sequences against their relevant protein targets. Rather than picking individual k-mers to evaluate, we performed clustering of k-mers based on sequence similarity to generate sequence logos representing the predominant 11-mer motif present in a particular family of related sequences (Fig. 4A, 4B, and 4C). Sequence logos were then ranked by the number of total sequence counts contained within. Next, we synthesized 4 m6A-containing 11-mer sequences (probes 1, 3, 5, and 7) encompassing highly enriched sequence logos from all 3 selections, as well as the corresponding unmethylated control sequences (probes 2, 4, 6, and 8) (Fig. 4A, 4B, and 4C; Table S1) and measured binding using microscale thermophoresis (MST)49, 50. Gratifyingly, methylated probe 1, which represents the top sequence logo (Fig. 4A) and the most strongly enriched 11-mer motif from selection against YTHDC1 (Table 1), bound to the protein with a dissociation constant of 0.39 +/- 0.071 µM (Fig. 4D). Probe 2, which contains the same sequence but lacks methylation, bound with ~40-fold lower affinity (Kd = 16.9 +/- 0.8 µM) (Fig. 4D), demonstrating the importance of the methyl group for the interaction. Similarly, probe 3, which was designed based on the second

ACS Paragon Plus Environment

Page 24 of 41

Page 25 of 41 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

most abundant sequence logo, bound to YTHDC1 in an m6A-dependent manner, exhibiting ~40-fold lower Kd than the corresponding unmethylated sequence 4 (Fig. 4E). As a comparison, we also measured binding between YTHDC1 and 10-mer oligonucleotides containing the methylated or unmethylated GGACU motif and observed similar affinity and m6A selectivity (Fig. S5A, Table S1) as to our selection sequences.

ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 41

A YTHDC1 0.07%

Probes 1/2

0.05%

0.04%

0.04%

0.04%

0.05%

0.04%

0.04%

0.04%

0.03%

Probes 3/4

B YTHDF1 0.06%

0.05%

Probes 5/6 C YTHDF2 0.07%

0.05%

0.04%

Probes 7/8 D

E

F

G

Figure 4: Enriched families of 11-mer sequence motifs identified in YTH-domain selections. (A-C) The 5 most abundant clustered 11-mer logos elucidated from the selection of library 3 with YTH-domain m6A readers. m6A is centered at position 6 of the logo, and U is represented as T. Values above the logos represent the percentage of sequences in each selection encompassed by the logo. Sequence logos from which probes were synthesized are boxed. Briefly, library 3 was selected against excess immobilized YTH reader as in Figure 3B, and the elution was reverse transcribed and PCR-amplified with barcoded primers before being subjected to high-throughput sequencing. (D-G) Binding of methylated (m6A) and unmethylated (adenosine) probes derived from the

ACS Paragon Plus Environment

Page 27 of 41 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

logos (in A-C) to YTH-domain reader proteins. Binding assays were performed by microscale thermophoresis (MST) with 30 nM (1 and 2) or 50 nM (3-8) probe and increasing concentrations of protein (see Table S1 for probe sequences). Values represent mean +/- s.d. (n=3). Kd values were calculated by fitting the data points to a fourparameter dose-response curve. We next measured the affinity of YTHDF1 and YTHDF2 for methylated sequences identified in their selections. For this purpose, we chose the third most abundant logo from the YTHDF1 selection (Fig. 4B) (used for probes 5/6) and the most abundant logo from the YTHDF2 selection (Fig. 4C) (used for probe 7/8), which differs by only 1 nucleotide from the most abundant logo selected by YTHDF1 (Fig. 4B). While these sequences do not resemble the DR(m6A)CH motif, both methylated probes bound tightly (YTHDF1: probe 5, Kd = 0.51 +/- 0.045 µM; YTHDF2: probe 7, Kd = 0.79 +/- 0.018 µM) and in an m6A-dependent fashion to their cognate protein, exhibiting 25-40-fold selectivity for methylated over unmethylated sequences (Fig. 4F and 4G). Indeed, we found that these non-canonical m6A-containing sequences bound to YTHDF1/2 with 2-3-fold higher affinity

and

greater

m6A

specificity

than

a

GG(m6A)CU-containing

10-mer

oligonucleotide (Fig. S5B). Finally, we asked whether sequences exhibiting low enrichment in a selection would bind poorly towards that protein target. For this purpose, we tested methylated probes 1 and 3 (identified in the YTHDC1 selection) against YTHDF1/2, and methylated probe 7 (identified in the YTHDF1 selection) against YTHDC1; these sequences did not exhibit strong enrichment in all 3 protein selections. Consistent with our selection results,

ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

we indeed observed little binding of these non-selected sequences, characterized by either irregular dose-response traces, non-saturating binding curves, or weak thermophoretic changes even at high protein concentrations (Fig. S6). Taken together, our results demonstrate that a single round of in vitro selection combined with high-throughput sequencing can identify tight-binding methylationspecific RNA-protein interactions from a random sequence site-specifically m6Amodified library and generate substrate binding profiles for distinct m6A reader proteins.

DISCUSSION In this manuscript, we develop a strategy based on in vitro selection and highthroughput sequencing to profile the sequence-binding preferences of RNA modification reader proteins. We apply our approach using a site-specifically m6A-modified random sequence RNA library to investigate the effect of sequence context on the binding of YTHdomain reader proteins to m6A-containing RNA. Our results reveal distinct m6A-binding preferences among different families of YTH-domain proteins and provide a general strategy for characterizing modification-dependent RNA-protein interactions. Interactions between m6A-modified mRNA and its corresponding reader proteins, most prominently the YTH-domain proteins, play an important role in the biological function of this epitranscriptomic modification. In mammals, the 5 YTH-domain proteins regulate distinct aspects of the mRNA lifecycle, including splicing10, nuclear export11, 12,

ACS Paragon Plus Environment

Page 28 of 41

Page 29 of 41 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

translation6-9, and degradation5. How do these proteins find their relevant RNA substrates in the cell? In our study, we aimed to investigate in an unbiased fashion the sequence determinants underlying recognition of m6A residues by YTHDF1, YTHDF2, and YTHDC1, established m6A reader proteins. Our work reveals several insights into these modification-dependent interactions. YTHDC1 shows a preference for binding canonical DR(m6A)CH-like m6A-containing sequences, with a strong preference for G(m6A)C-containing sequences, consistent with prior biochemical and structural studies of this protein35, 37. Interestingly, a preference for G at the +2 position is also strongly supported by our data. In contrast, YTHDF1 and YTHDF2, which exhibit very similar binding profiles to one another, do not show strong selection for DR(m6A)CH-like m6A sequences. Instead, our selection data identified 11-mer motifs containing pyrimidine bases at the -1 and -2 positions, and lacking C at the +1 position, which we demonstrated bind tighter to YTHDF1/2 than GG(m6A)CU-containing oligos of similar length. Since these proteins are known to bind canonical DR(m6A)CH sequences as well5, 6, our data suggest that YTHDF1/2 can recognize a more diverse collection of m6A-modified RNA sequences than YTHDC1, and demonstrate the importance of sequences flanking the m6A residue for recognition. Interestingly, the strong similarity in sequence binding preference between YTHDF1 and YTHDF2 raises the question of whether these proteins compete for the same RNA binding sites in cells; since these two proteins have different

ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

effects on mRNA behavior, additional mechanisms regulating RNA-protein binding over space and time could function to ensure proper recruitment to m6A-modified RNAs. Our method surveys the interactions of m6A reader proteins with all possible singly-modified m6A-containing RNA sequences. Of course, only a subset of these sequences may exist in vivo (presumably determined by the substrate preferences of m6A writer and eraser enzymes), and therefore our in vitro binding results must be interpreted in the context of validated m6A-modified sequences. Nevertheless, the finding that YTHDF1/2 can bind tightly to non-canonical m6A-containing motifs suggests that related m6A sequences may exist in the transcriptome. Indeed, single-nucleotide m6A sequencing approaches have revealed the presence of such sequences and have indicated that the m6A-modified transcriptome is more diverse than previously appreciated48, 51. Finally, we envision that site-specifically modified random RNA libraries can be applied to probe the effect of diverse epitranscriptomic marks and sequence context on fundamental nucleic acid-related processes including protein-RNA binding, catalysis by modification writer and eraser enzymes, and templated polymerization. Combined with nucleic acid indexing and massively parallel sequencing strategies, different library chemistries and experimental conditions can be interrogated in a single experiment. Such efforts are currently underway in our laboratory.

CONCLUSION

ACS Paragon Plus Environment

Page 30 of 41

Page 31 of 41 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

Herein, we develop an in vitro selection approach to interrogate modification-dependent RNA-protein interactions with a site-specifically modified random sequence RNA library. We apply our strategy to characterize the effects of sequence context on the binding of YTH-domain proteins, established m6A reader proteins, to m6A-modified RNA. Our results demonstrate that YTHDC1 and YTHDF1/2 possess distinct sequencebinding preferences, suggesting a mechanism for their recruitment to different m6Amodified mRNA substrates in the cell. Taken together, our study provides insight into m6A-dependent protein-RNA interactions and provides a general and unbiased approach for investigating the effect of RNA modifications on diverse biochemical processes.

ACESSION CODES YTHDC1

NP_001026902.1

YTHDF1

NP_060268.2

YTHDF2

NP_057342.2

ASSOCIATED CONTENT Supporting Information

ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Characterization of oligonucleotides/libraries, RT-qPCR measurements, and microscale thermophoresis. The Supporting Information is available free of charge on the [insert website name] at [insert DOI].

AUTHOR INFORMATION Corresponding Author [email protected] Funding Sources No competing financial interests have been declared.

ACKNOWLEDGMENTS

The authors thank Wei Wang at the Princeton University Genomics Core Facility for assistance with Illumina sequencing and library preparation. R.E.K. is a Sidney Kimmel Foundation Scholar. This research was supported by the NIH (R01GM132189 to R.E.K.). A.E.A. was supported by a generous gift from the Edward C. Taylor 3rd Year Graduate Fellowship in Chemistry. All authors thank Princeton University for financial support.

REFERENCES

[1] Gerstberger, S., Hafner, M., and Tuschl, T. (2014) A census of human RNA-binding proteins, Nat Rev Genet 15, 829-845.

ACS Paragon Plus Environment

Page 32 of 41

Page 33 of 41 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

[2] Roundtree, I. A., Evans, M. E., Pan, T., and He, C. (2017) Dynamic RNA Modifications in Gene Expression Regulation, Cell 169, 1187-1200. [3] Dominissini, D., Moshitch-Moshkovitz, S., Schwartz, S., Salmon-Divon, M., Ungar, L., Osenberg, S., Cesarkas, K., Jacob-Hirsch, J., Amariglio, N., Kupiec, M., Sorek, R., and Rechavi, G. (2012) Topology of the human and mouse m6A RNA methylomes revealed by m6A-seq, Nature 485, 201-206. [4] Meyer, K. D., Saletore, Y., Zumbo, P., Elemento, O., Mason, C. E., and Jaffrey, S. R. (2012) Comprehensive analysis of mRNA methylation reveals enrichment in 3' UTRs and near stop codons, Cell 149, 1635-1646. [5] Wang, X., Lu, Z., Gomez, A., Hon, G. C., Yue, Y., Han, D., Fu, Y., Parisien, M., Dai, Q., Jia, G., Ren, B., Pan, T., and He, C. (2014) N6-methyladenosine-dependent regulation of messenger RNA stability, Nature 505, 117-120. [6] Wang, X., Zhao, B. S., Roundtree, I. A., Lu, Z., Han, D., Ma, H., Weng, X., Chen, K., Shi, H., and He, C. (2015) N(6)-methyladenosine Modulates Messenger RNA Translation Efficiency, Cell 161, 1388-1399. [7] Zhou, J., Wan, J., Gao, X., Zhang, X., Jaffrey, S. R., and Qian, S. B. (2015) Dynamic m(6)A mRNA methylation directs translational control of heat shock response, Nature 526, 591-594. [8] Meyer, K. D., Patil, D. P., Zhou, J., Zinoviev, A., Skabkin, M. A., Elemento, O., Pestova, T. V., Qian, S. B., and Jaffrey, S. R. (2015) 5' UTR m(6)A Promotes CapIndependent Translation, Cell 163, 999-1010. [9] Li, A., Chen, Y. S., Ping, X. L., Yang, X., Xiao, W., Yang, Y., Sun, H. Y., Zhu, Q., Baidya, P., Wang, X., Bhattarai, D. P., Zhao, Y. L., Sun, B. F., and Yang, Y. G. (2017) Cytoplasmic m(6)A reader YTHDF3 promotes mRNA translation, Cell Res 27, 444-447. [10] Xiao, W., Adhikari, S., Dahal, U., Chen, Y. S., Hao, Y. J., Sun, B. F., Sun, H. Y., Li, A., Ping, X. L., Lai, W. Y., Wang, X., Ma, H. L., Huang, C. M., Yang, Y., Huang, N., Jiang, G. B., Wang, H. L., Zhou, Q., Wang, X. J., Zhao, Y. L., and Yang, Y. G. (2016) Nuclear m(6)A Reader YTHDC1 Regulates mRNA Splicing, Molecular cell 61, 507-519. [11] Lesbirel, S., Viphakone, N., Parker, M., Parker, J., Heath, C., Sudbery, I., and Wilson, S. A. (2018) The m(6)A-methylase complex recruits TREX and regulates mRNA export, Sci Rep 8, 13827. [12] Roundtree, I. A., Luo, G. Z., Zhang, Z., Wang, X., Zhou, T., Cui, Y., Sha, J., Huang, X., Guerrero, L., Xie, P., He, E., Shen, B., and He, C. (2017) YTHDC1 mediates nuclear export of N(6)-methyladenosine methylated mRNAs, Elife 6. [13] Wen, J., Lv, R., Ma, H., Shen, H., He, C., Wang, J., Jiao, F., Liu, H., Yang, P., Tan, L., Lan, F., Shi, Y. G., He, C., Shi, Y., and Diao, J. (2018) Zc3h13 Regulates Nuclear RNA m(6)A Methylation and Mouse Embryonic Stem Cell Self-Renewal, Mol Cell 69, 1028-1038 e1026.

ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

[14] Ivanova, I., Much, C., Di Giacomo, M., Azzi, C., Morgan, M., Moreira, P. N., Monahan, J., Carrieri, C., Enright, A. J., and O'Carroll, D. (2017) The RNA m(6)A Reader YTHDF2 Is Essential for the Post-transcriptional Regulation of the Maternal Transcriptome and Oocyte Competence, Mol Cell 67, 1059-1067 e1054. [15] Zhang, C., Chen, Y., Sun, B., Wang, L., Yang, Y., Ma, D., Lv, J., Heng, J., Ding, Y., Xue, Y., Lu, X., Xiao, W., Yang, Y. G., and Liu, F. (2017) m(6)A modulates haematopoietic stem and progenitor cell specification, Nature 549, 273-276. [16] Rubio, R. M., Depledge, D. P., Bianco, C., Thompson, L., and Mohr, I. (2018) RNA m(6) A modification enzymes shape innate responses to DNA by regulating interferon beta, Genes Dev 32, 1472-1484. [17] Winkler, R., Gillis, E., Lasman, L., Safra, M., Geula, S., Soyris, C., Nachshon, A., TaiSchmiedel, J., Friedman, N., Le-Trilling, V. T. K., Trilling, M., Mandelboim, M., Hanna, J. H., Schwartz, S., and Stern-Ginossar, N. (2019) m(6)A modification controls the innate immune response to infection by targeting type I interferons, Nat Immunol 20, 173-182. [18] Xiang, Y., Laurent, B., Hsu, C. H., Nachtergaele, S., Lu, Z., Sheng, W., Xu, C., Chen, H., Ouyang, J., Wang, S., Ling, D., Hsu, P. H., Zou, L., Jambhekar, A., He, C., and Shi, Y. (2017) RNA m(6)A methylation regulates the ultraviolet-induced DNA damage response, Nature 543, 573-576. [19] Vu, L. P., Pickering, B. F., Cheng, Y., Zaccara, S., Nguyen, D., Minuesa, G., Chou, T., Chow, A., Saletore, Y., MacKay, M., Schulman, J., Famulare, C., Patel, M., Klimek, V. M., Garrett-Bakelman, F. E., Melnick, A., Carroll, M., Mason, C. E., Jaffrey, S. R., and Kharas, M. G. (2017) The N6-methyladenosine (m6A)-forming enzyme METTL3 controls myeloid differentiation of normal hematopoietic and leukemia cells, Nat Med 23, 1369-1376. [20] Yang, X., Yang, Y., Sun, B. F., Chen, Y. S., Xu, J. W., Lai, W. Y., Li, A., Wang, X., Bhattarai, D. P., Xiao, W., Sun, H. Y., Zhu, Q., Ma, H. L., Adhikari, S., Sun, M., Hao, Y. J., Zhang, B., Huang, C. M., Huang, N., Jiang, G. B., Zhao, Y. L., Wang, H. L., Sun, Y. P., and Yang, Y. G. (2017) 5-methylcytosine promotes mRNA exportNSUN2 as the methyltransferase and ALYREF as an m(5)C reader, Cell Research 27, 606-625. [21] Li, X., Xiong, X., Zhang, M., Wang, K., Chen, Y., Zhou, J., Mao, Y., Lv, J., Yi, D., Chen, X. W., Wang, C., Qian, S. B., and Yi, C. (2017) Base-Resolution Mapping Reveals Distinct m(1)A Methylome in Nuclear- and Mitochondrial-Encoded Transcripts, Molecular cell 68, 993-1005 e1009. [22] Li, X., Zhu, P., Ma, S., Song, J., Bai, J., Sun, F., and Yi, C. (2015) Chemical pulldown reveals dynamic pseudouridylation of the mammalian transcriptome, Nat Chem Biol 11, 592-597.

ACS Paragon Plus Environment

Page 34 of 41

Page 35 of 41 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

[23] Carlile, T. M., Rojas-Duran, M. F., Zinshteyn, B., Shin, H., Bartoli, K. M., and Gilbert, W. V. (2014) Pseudouridine profiling reveals regulated mRNA pseudouridylation in yeast and human cells, Nature 515, 143-146. [24] Arango, D., Sturgill, D., Alhusaini, N., Dillman, A. A., Sweet, T. J., Hanson, G., Hosogane, M., Sinclair, W. R., Nanan, K. K., Mandler, M. D., Fox, S. D., Zengeya, T. T., Andresson, T., Meier, J. L., Coller, J., and Oberdoerffer, S. (2018) Acetylation of Cytidine in mRNA Promotes Translation Efficiency, Cell 175, 18721886 e1824. [25] Ayadi, L., Galvanin, A., Pichot, F., Marchand, V., and Motorin, Y. (2019) RNA ribose methylation (2'-O-methylation): Occurrence, biosynthesis and biological functions, Biochim Biophys Acta Gene Regul Mech 1862, 253-269. [26] Mauer, J., Luo, X., Blanjoie, A., Jiao, X., Grozhik, A. V., Patil, D. P., Linder, B., Pickering, B. F., Vasseur, J. J., Chen, Q., Gross, S. S., Elemento, O., Debart, F., Kiledjian, M., and Jaffrey, S. R. (2017) Reversible methylation of m(6)Am in the 5' cap controls mRNA stability, Nature 541, 371-375. [27] Akichika, S., Hirano, S., Shichino, Y., Suzuki, T., Nishimasu, H., Ishitani, R., Sugita, A., Hirose, Y., Iwasaki, S., Nureki, O., and Suzuki, T. (2019) Cap-specific terminal N (6)-methylation of RNA by an RNA polymerase II-associated methyltransferase, Science 363. [28] Liu, N., Dai, Q., Zheng, G., He, C., Parisien, M., and Pan, T. (2015) N(6)methyladenosine-dependent RNA structural switches regulate RNA-protein interactions, Nature 518, 560-564. [29] Pan, T. (2018) Modifications and functional genomics of human transfer RNA, Cell Res 28, 395-404. [30] Kleiner, R. E. (2018) Reading the RNA Code, Biochemistry 57, 11-12. [31] Arguello, A. E., DeLiberto, A. N., and Kleiner, R. E. (2017) RNA Chemical Proteomics Reveals the N(6)-Methyladenosine (m(6)A)-Regulated Protein-RNA Interactome, J Am Chem Soc 139, 17249-17252. [32] Edupuganti, R. R., Geiger, S., Lindeboom, R. G. H., Shi, H., Hsu, P. J., Lu, Z., Wang, S. Y., Baltissen, M. P. A., Jansen, P., Rossa, M., Muller, M., Stunnenberg, H. G., He, C., Carell, T., and Vermeulen, M. (2017) N(6)-methyladenosine (m(6)A) recruits and repels proteins to regulate mRNA homeostasis, Nat Struct Mol Biol 24, 870-878. [33] Zhang, Z. Y., Theler, D., Kaminska, K. H., Hiller, M., de la Grange, P., Pudimat, R., Rafalska, I., Heinrich, B., Bujnicki, J. M., Allain, F. H. T., and Stamm, S. (2010) The YTH Domain Is a Novel RNA Binding Domain, J Biol Chem 285, 14701-14710. [34] Wojtas, M. N., Pandey, R. R., Mendel, M., Homolka, D., Sachidanandam, R., and Pillai, R. S. (2017) Regulation of m(6) A Transcripts by the 3' -> 5' RNA Helicase YTHDC2 Is Essential for a Successful Meiotic Program in the Mammalian Germline, Molecular Cell 68, 374-+.

ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

[35] Xu, C., Wang, X., Liu, K., Roundtree, I. A., Tempel, W., Li, Y., Lu, Z., He, C., and Min, J. (2014) Structural basis for selective binding of m6A RNA by the YTHDC1 YTH domain, Nat Chem Biol 10, 927-929. [36] Li, F., Zhao, D., Wu, J., and Shi, Y. (2014) Structure of the YTH domain of human YTHDF2 in complex with an m(6)A mononucleotide reveals an aromatic cage for m(6)A recognition, Cell Res 24, 1490-1492. [37] Xu, C., Liu, K., Ahmed, H., Loppnau, P., Schapira, M., and Min, J. (2015) Structural Basis for the Discriminative Recognition of N6-Methyladenosine RNA by the Human YT521-B Homology Domain Family of Proteins, J Biol Chem 290, 2490224913. [38] Lee, F. C. Y., and Ule, J. (2018) Advances in CLIP Technologies for Studies of Protein-RNA Interactions, Molecular cell 69, 354-369. [39] Tuerk, C., and Gold, L. (1990) Systematic evolution of ligands by exponential enrichment: RNA ligands to bacteriophage T4 DNA polymerase, Science 249, 505510. [40] Ellington, A. D., and Szostak, J. W. (1990) In vitro selection of RNA molecules that bind specific ligands, Nature 346, 818-822. [41] Levine, T. D., Gao, F., King, P. H., Andrews, L. G., and Keene, J. D. (1993) Hel-N1: an autoimmune RNA-binding protein with specificity for 3' uridylate-rich untranslated regions of growth factor mRNAs, Mol Cell Biol 13, 3494-3504. [42] Galarneau, A., and Richard, S. (2005) Target RNA motif and target mRNAs of the Quaking STAR protein, Nat Struct Mol Biol 12, 691-698. [43] Buckanovich, R. J., and Darnell, R. B. (1997) The neuronal RNA binding protein Nova-1 recognizes specific RNA targets in vitro and in vivo, Molecular and Cellular Biology 17, 3194-3201. [44] Keefe, A. D., and Cload, S. T. (2008) SELEX with modified nucleotides, Curr Opin Chem Biol 12, 448-456. [45] Lauridsen, L. H., Rothnagel, J. A., and Veedu, R. N. (2012) Enzymatic recognition of 2'-modified ribonucleoside 5'-triphosphates: towards the evolution of versatile aptamers, Chembiochem 13, 19-25. [46] Patil, D. P., Chen, C. K., Pickering, B. F., Chow, A., Jackson, C., Guttman, M., and Jaffrey, S. R. (2016) m(6)A RNA methylation promotes XIST-mediated transcriptional repression, Nature 537, 369-373. [47] Unrau, P. J., and Bartel, D. P. (1998) RNA-catalysed nucleotide synthesis, Nature 395, 260-263. [48] Linder, B., Grozhik, A. V., Olarerin-George, A. O., Meydan, C., Mason, C. E., and Jaffrey, S. R. (2015) Single-nucleotide-resolution mapping of m6A and m6Am throughout the transcriptome, Nat Methods 12, 767-772.

ACS Paragon Plus Environment

Page 36 of 41

Page 37 of 41 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

[49] Moon, M. H., Hilimire, T. A., Sanders, A. M., and Schneekloth, J. S., Jr. (2018) Measuring RNA-Ligand Interactions with Microscale Thermophoresis, Biochemistry 57, 4638-4643. [50] Jerabek-Willemsen, M., Wienken, C. J., Braun, D., Baaske, P., and Duhr, S. (2011) Molecular Interaction Studies Using Microscale Thermophoresis, Assay Drug Dev Techn 9, 342-353. [51] Garcia-Campos, M. A., Edelheit, S., Toth, U., Shachar, R., Nir, R., Lasman, L., Brandis, A., Hanna, J. H., Rossmanith, W., and Schwartz, S. (2019) Deciphering the ‘m6A code’ via quantitative profiling of m6A at single-nucleotide resolution, bioRxiv 571679.

ACS Paragon Plus Environment

Me Me N N

Me N N

NH

N

m6A-binding

protein

HN

Me

N

NH

N N

Me N

N

N

N

m6A-modified

N

N

N

N

N

N

Me

NH

N

N

N

N

N

NH

N NH

N N

elution

RNA library non-binding sequences

Me Me

AG

GAAAG A

AA

TG

T

11

9

10

C A C

6

5

4

3

2

1

CT GG Ceqlogo 08.01.19 17:24

1. cDNA generation

N

NH

N NH

N

G CATC A

C

8

G

T GG TT ATTC

C C C

7

bits

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 242 25 26 271 28 29 30 310 32

Me HN Page 38 of 41 N N

Biochemistry

N

N

N N

N

ACS Paragon Plus Environment

2. Next-generation sequencing

enriched sequences

APage 39 of 41

R

Biochemistry

N

NH

N

Library 1: R = Me Library 2: R = H

1 N N 2 3 4 5 NNNNN-A*-NNNN 65’ DNA primer DNA primer 7 RNA binding site binding site 8 9 10 Library 1 B11 12 … G A T C C N N N N N m6A N N N N G C C G C … 13 14 15 16 17 18 19 20 Library 2 C21 22 … G A T C C N N N N N A N N N N G C C G C … 23 24 25 26 27 28 29 30 31 32 D33 34 35 36 37 38 39 40 41 42 43 44 ACS Paragon Plus Environment 45 46

3’

A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 B 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41

R R

N NH

N N R

N N

NH

N N

Biochemistry

N

N

R

NH N

m6A

reader

Page 40 of 41

NH

N

N

N

R H NH N N N N

R

N N

NH

N N

N N

N

Library 1 (R = Me) Library 2 (R = H)

bound sequences

ACS Paragon Plus Environment

RT-qPCR

APage 41 of 41

Biochemistry

YTHDC1 0.07% 1 2 3 4 5 6 7 8 9 10 B11 12 13 14 15 16 17 18 19 20 21 22 C23 24 25 26 27 28 29 30 31 32 33 34 35 36 D37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 F58 59 60

Probes 1/2

YTHDF1 0.06%

0.05%

0.04%

0.04%

0.04%

0.05%

0.04%

0.04%

0.04%

0.03%

Probes 3/4

0.05%

Probes 5/6

YTHDF2 0.07%

0.05%

0.04%

Probes 7/8 E

G

ACS Paragon Plus Environment