Identification of G-Quadruplex-Binding Protein from the Exploration of

Dec 5, 2018 - Support. Get Help · For Advertisers · Institutional Sales; Live Chat. Partners. Atypon · CHORUS · COPE · COUNTER · CrossRef · CrossCheck...
1 downloads 0 Views 3MB Size
Subscriber access provided by University of Rhode Island | University Libraries

Article

Identification of G-Quadruplex-Binding Protein from the Exploration of RGG Motif/G-Quadruplex Interactions Zhou-Li Huang, Jing Dai, Wen-Hua Luo, Xiang-Gui Wang, Jia-Heng Tan, Shuo-Bin Chen, and Zhi-Shu Huang J. Am. Chem. Soc., Just Accepted Manuscript • DOI: 10.1021/jacs.8b09329 • Publication Date (Web): 05 Dec 2018 Downloaded from http://pubs.acs.org on December 5, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of the American Chemical Society

Zhou-Li Huang‡, Jing Dai‡, Wen-Hua Luo, Xiang-Gui Wang, Jia-Heng Tan, Shuo-Bin Chen* and ZhiShu Huang* School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou 510006, People’s Republic of China. KEYWORDS: G-quadruplex; Nucleic Acid-Binding Protein; RGG Motif; Cold-Inducible RNA-Binding Protein (CIRBP) ABSTRACT: The arginine/glycine-rich region termed the RGG domain is usually found in G-quadruplex (G4)-binding proteins and is important in G4-protein interactions. Studies on the binding mechanism of RGG domains found that small segments (RGG motif) inside the domain contribute greatly to the G4 binding affinity. However, unlike the entire RGG domains that have been broadly explored, the role of the RGG motif remains obscure, with very limited study. Herein, to clarify the role of the RGG motif in G4protein interactions, we systematically investigated the binding affinity and mode between RGG-motif peptides and G4s. The internal arrangement of RGG repeats and gap amino acids played a more crucial role in the G4 binding mechanism than a critical number of RGG repeats. Arginines and phenylalanines at the exact position of the RGG motif might enable additional hydrogen bonding and π stacking interaction with nucleobases and strengthen the binding of G4. Impressively, proceeding from a G4-binding RGG peptide, 12, discovered above, we identified the cold-inducible RNA-binding protein (CIRBP) as a new G4 DNA-binding protein both in vitro and in cells. In addition, we found that the key amino acids for G4 binding in peptide 12 and CIRBP were highly similar, and peptide 12 clearly played a key role in the G4 binding of CIRBP. This report is the first in which a G4-binding protein was identified from exploration of the G4-binding RGG motif. Our findings suggest a novel strategy for discovering new G4-binding proteins by exploring key peptide segments.

INTRODUCTION The G-quadruplex (G4) is a noncanonical nucleic acid secondary structure formed within guanine-rich (G-rich) sequences in both DNA and RNA and is widely found in promoter regions, telomeres and the transcriptome.1-4 Its compact four-stranded structure is held together by Hoogsteen hydrogen bonds between guanines and π-π stacking between G-quartets and is further stabilized by monovalent cations, typically K+ or Na+.5-6 G4s play an important role in a variety of cellular processes, including DNA transcription, translation, and telomere maintenance. These sophisticated biological processes necessitate the participation of G4-binding proteins.7-8 For example, binding of nucleolin to the c-MYC NHE (nuclease hypersensitive element) III1 element induces folding of the G4 and reduces the transcription of the oncogene c-myc, while binding of NM23-H2 unfolds a G4 and promotes transcription.9-10 At the translational level, human Fragile X Mental Retardation Protein (FMRP) binds RNA G4s in the 5′-UTR of various mRNAs such as FMR1 and represses translation initiation.11-12 The G4-protein interaction is also significant for telomere maintenance.13-14 Translocated in liposarcoma (TLS) protein binds to both human telomere G4 DNA and RNA, subsequently regulating histone modifications and telomere length in vivo.15 Because these important biological events are involved, studying the G4-protein interaction is a research area of intensive investigation, and identifying new

G4-binding proteins is still of great interest and would contribute to a more global understanding of G4 functions and the biological events that involve G4s. In recent years, an increasing number of proteins with binding affinity to G-quadruplexes have been identified. For examples, DEAH-Box Helicase 36 (DHX36) was found to bind G4s by affinity chromatography, and SRA stem-loop-interacting RNA-binding protein (SLIRP) has been identified as G4-binding proteins by a quantitative mass spectrometry-based approach.16-17 In G4-binding proteins, various motifs have been found to participate in G4 recognition. Notable, a recent study found the N-terminal motif and OB-fold-like subdomain of helicase DHX36 were found to mediate the G4-binding of DHX36 based on the crystal structure of protein-DNA complex.18 On the other hand, arginine/glycine-rich regions termed the RGG domain were frequently found among G4-binding proteins.19-20 In the RGG domain, a sequence of closely spaced arginine-glycine-glycine (RGG) repeats is interspersed with other, often aromatic, amino acids. A recent analysis revealed that this domain is the second most common RNA-binding domain, emphasizing its important role in mediating protein-nucleic acid interactions.21 Its low complexity facilitates interaction with a variety of nucleic acid substrates, including G4s. The interaction of the RGG domain with the G4 has attracted wide interest, and the significance of elucidating RGG-mediated G4 recognition is underscored by the observation that this domain is indispensable

ACS Paragon Plus Environment

Journal of the American Chemical Society 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

for protein function. For example, the C-terminal region of nucleolin, consisting of RNA-binding domains (RBDs) 3 and 4 as well as the RGG domain, is critical for the initial recognition of the c-MYC NHE III1 sequence and responsible for promoting G4 formation.22 Research has been conducted to shed light on the role of the RGG domain in mediating protein-G4 interactions. It has been demonstrated that the RGG domain alone is both essential and sufficient for G4 binding in situations involving Ewing’s sarcoma (EWS) and TLS, strongly suggesting that this domain is the core interaction element for G4-protein recognition.23-27 Studies on the binding mechanism of RGG domains found that small segments (RGG motif) inside the domain contribute greatly to the G4 binding affinity. Interestingly, recent studies found that a small peptide of the RGG motif alone was also capable of G4 binding.28-29 A peptide of the RGG motif from FMRP comprises 3 RGG repeats and 18 amino acids in total. Both the solution and crystal structures of this peptide with the in vitro-selected G-rich RNA Sc1 were obtained, and the originally unstructured RGG peptide folds into a β-turn conformation upon binding a duplex-quadruplex junction. The above information suggests that peptides as small as the RGG peptide are capable of binding G4, and this result suggests how the RGG motif mediates a domain or whole protein binding to G4s. Most studies of the RGG sequence were carried out on the entire domain, while a smaller unit, the RGG motif, within the domain has received much less attention. The role of the RGG motif remains obscure and has been the subject of very limited study. These studies have been restricted to the known RGG motif, leaving a large number of others excluded. Whether these motifs bind G4 and how this recognition happens is worth exploring. In addition, because the function of a protein is mainly dominated by the motifs in its amino acid sequence, studying RGG motifs reveals previously unknown properties of proteins and provides clues about how a motif mediates G4-protein interactions. Herein, to clarify the role of the RGG motif in G4-protein interactions, we systematically investigated the binding affinity and mode between the RGG-motif peptides and G4s. Our investigation started with RGG peptides that were devised on the basis of nucleic acid-binding proteins. To further detail the interactions, the key amino acids in G4-binding peptides were determined by making point mutations, and the binding modes of peptides with G4 were studied by biophysical methods. Furthermore, to determine the role of the RGG motif in proteins, the G4 binding ability of the original protein was studied both in vitro and in cells. METHODS AND MATERIALS Peptides and Oligonucleotides. All peptides used were purchased from ChinaPeptides, dissolved in water according to the powder weight and stored at -80 °C (Table S1). All oligonucleotides (Table S2) used in this study were purchased from Sangon and Takara. All the oligonucleotides were dissolved in relevant buffer, and their concentrations were determined from the absorbance at 260 nm on the basis of their molar extinction coefficients using NanoDrop 1000 Spectrophotometer (Thermo Scientific). To obtain G-quadruplex formation, oligonucleotides were annealed in relevant buffer containing KCl by heating to

Page 2 of 20

95 °C for 5 min, followed by gradual cooling to room temperature. Further dilutions of samples to working concentrations were made with relevant buffer immediately prior to use. Circular Dichroism Studies. CD studies were performed on a Chirascan circular dichroism spectrophotometer (Applied Photophysics). A quartz cuvette with a 1 cm path length was used for the recording of spectra with a 1 nm bandwidth, 1 nm step size and time of 0.5 s per point. The melting data were recorded over a range of 25–95 °C, with a heating rate of 1.0 °C/min. A buffer baseline was collected in the same cuvette and was subtracted from the sample spectra. The final analysis of the data was conducted using Origin (OriginLab Corp.). Surface Plasmon Resonance. SPR measurements were performed on a ProteOn XPR36 Protein Interaction Array system (Bio-Rad Laboratories, Hercules, CA) using Streptavidincoated sensor chip. Biotinylated oligonucleotides were prefolded in filtered and degassed running buffer (50 mM Tris-HCl, 150 mM KCl, pH 7.4, 0.05% Tween20) and then attached to the chip. The biotinylated oligonucleotides were captured (~1000 RU) in five flow cells, leaving one flow cell as a blank. Solutions of peptides or proteins were prepared with running buffer by serial dilutions from stock solutions. The samples were then injected at a flow rate of 50 µl/min during the association phase, which was followed by a 400 s disassociation phase at 25 °C. The chip was regenerated with a short injection of 2 M KCl between consecutive measurements. Data were analyzed with ProteOn manager software. Filter-Binding Assay. Filter-binding assays were performed as follows. In brief, a nylon membrane was placed directly below the nitrocellulose membrane to trap any DNA not retained on the nitrocellulose. The nitrocellulose membrane was then treated with 0.5 M KOH for 10 min at 4 °C and washed with 0.5×TB prior to use. 5 nM biotinylated DNAs were incubated with peptides or proteins at 37 °C for 30 min in binding buffer (10 mM Tris-HCl, 100 mM KCl, pH 7.4). All samples were applied to the membrane under vacuum and washed with binding buffer. The cross-linking reaction was carried out under UV irradiation at 265 nm for 120s. The detection of biotinylated DNA was carried out using a Chemiluminescent Nucleic Acid Detection Module Kit (Thermo Scientific). The gray levels of the dots were measured using Quantity One. The EC50 (half maximal effective concentration for binding) value was evaluated via a Hill model. The obtained data were analyzed using GraphPad Prism (GraphPad Software, San Diego, CA) Fluorescence Studies. Fluorescence studies were performed on a Fluoromax-4 luminescence spectrophotometer (HORIBA). A quartz cuvette with 3 mm × 3 mm path length was used for the spectra recorded. For TO displacement, 0.25 μM G-quadruplexes were pre-incubated with 0.5 μM thiazole orange for 1 hour at 37 °C (10 mM Tris-HCl, 100 mM KCl, pH 7.4) to form a stable complex, and then different concentrations of peptides were added. Fluorescence measurement was taken when excited at 480 nm. For the 2-Ap titration, peptides or proteins were added into the solution containing Ap-labeled oligonucleotides at fixed concentration (1 μM) in buffer (10 mM Tris-HCl, 100 mM KCl, pH 7.4). After each addition, the reaction was stirred and allowed to equilibrate for at least 1 min and fluorescence measurement was taken when excited at 305 nm.

ACS Paragon Plus Environment

Page 3 of 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of the American Chemical Society Table 1. The RGG peptides from nucleic binding proteins.

Name 1 2 3 4 5 6 7 8 9 10 11 12

Motif Sequence RGGRGQNSASRGG RGGSGGTRGPPSRGG RGGNFSGRGGFGGSRGG RGGLGGGMRGPPRGG RGGGGGFHRRGGGGRGG RGGFAGRARGRGG RGGYRGRGGFQGRGG RGGRGSFRGCRGG KGGRGGARGSARGGVRGG RGGHEQGGGRGGRGGYDHGGRGG RGGGHRGRGGFNMRGGNFRGGAPGNRGG RGGSAGGRGFFRGGRGRGRGFSRGG

Name MRE11 hnRNP G ROA0 G3BP1 SFPQ hnRNP D0 RB56 DDX4 SPRN FA98A hnRNP U CIRBP

NMR Spectroscopy. Samples for nuclear magnetic resonance (NMR) were incubated in phosphate buffer (25 mM KH2PO4, 70 mM KCl, 10% D2O, pH 7.4) at 25°C before measurement. The final concentration of DNA was 150 μM and that of peptide 12 was 300 μM. Experiments were performed on 400 MHz spectrometer (Bruker) at 25 °C. Protein Expression and Purification. The CIRBP cDNAcoding region was synthesized and cloned into the BamHI and EcoRI sites of pGEX-4T-1 (expression of GST tagged proteins). Expression was achieved in Escherichia coli BL21(DE3) with an induction temperature of 30 °C for 6 h. The proteins were purified using GSH magnetic beads (BeaverBeads) and eluted by freshly prepared elution buffer. The purified proteins were verified by gel electrophoresis and western blotting. pSANG103F-BG4 was a gift from Shankar Balasubramanian (Addgene plasmid # 55756).30 The G-quadruplex antibodies BG4 and D1 were prepared following a previous report.31 Electrophoretic Mobility Shift Assay (EMSA). Binding experiments of fluorescein-labeled DNAs and proteins were performed in a buffer, 10 mM Tris−HCl, pH 7.4, 100 mM KCl. The samples were loaded onto a nondenatured polyacrylamide gel (10%) after incubation at 37 °C for 30 min. Electrophoresis was performed in 1×TBE buffer with the addition of 20 mM KCl at 4 °C. The gel was photographed on a 4500SF Gel-Imaging System (TANON). ChIP Assay. Chromatin immunoprecipitation (ChIP) experiments were performed using the Pierce Agarose ChIP Kit (Thermo Scientific) following the standard protocol. CIRBP antibody (ab191885, Abcam) was used to trap CIRBP and its substrate. Normal rabbit IgG was used as a negative control, and the total genomic DNA (input) was used as a positive control. DNA samples were amplified by PCR using primers for telomeres. The amplified products were separated on a 1.5% agarose gel and photographed on a 4500SF Gel-Imaging System (TANON). Confocal Imaging. HeLa cells were first seeded on a glass bottom plate and transfected with pEGFP-N3 containing the CIRBP coding region (expression of GFP tagged CIRBP). The cells were then fixed with 4% paraformaldehyde, permeabilized with 0.5% TritonX-100/PBS, and blocked with 5% BSA/PBS, sequentially. For visualization of DNA G-quadruplexes, cells were treated with RNase A at 37 °C for 2 h, and then incubated with G-quadruplex antibodies (BG4 or D1) at 37 °C for 30 min,

Protein Amino Acid Feature ID Residue Length RGGs Aromatic Basic Acid P49959 577–589 13 3 0 3 0 P38159 113–126 15 3 0 3 0 Q13151 192–202 15 3 0 3 0 Q13283 435–449 15 3 2 3 0 P23246 9-27 17 3 2 3 0 Q14103 272–284 13 3 1 4 0 Q92804 337–351 15 4 1 3 0 Q9NQI0 147–159 13 4 1 4 0 Q5BIV9 25–42 18 4 0 5 0 Q8NCA5 352–374 23 4 1 6 2 Q00839 702-729 28 6 3 6 2 Q14011 94–118 25 7 3 7 0

FLAG antibody (#8146, Cell Signaling Technology) at 4 °C overnight. For visualization of telomere, cells were incubated with TRF2 antibody (ab13579, Abcam) at 4 °C overnight. Cells were then incubated with Alexa 647-conjugated secondary antibody (A28181 or A21206, Thermo Scientific) at 37 for 1h. After rinsing with PBS, cells were stained with DAPI. Fluorescence signals were recorded by using a FV3000 laser scanning confocal microscope (Olympus). RESULTS Selection of RGG motifs. To identify representative RGG motifs among more than 1000 RGG-containing proteins, we first reduced the scope on the basis of the following principles. First, the peptides should be from well-reported nucleic acidbinding proteins, including transcription factors and helicases. These proteins function through binding nucleic acids, so RGG motifs may be involved in nucleic acid binding. Second, the peptides should contain at least 3 repeats of RGG with a short residue gap and preferentially contain aromatic residues, which are reported to contribute to G4 binding.28, 32 According to the above principles, we finally chose 12 peptides from a collected database, and their sequence information and features are listed in Table 1.33 The amino acid number of these peptides ranged from 13 to 28, and there was more than one peptide of each amino acid length so that we could analyze the relationship between their nucleic acid affinity and amino acid components. The proteins from which we chose these peptides include transcription factors (hnRNP D0, SFPQ, RB56, hnRNP U), helicases (DDX4, G3BP1), DNA repair factors (MRE11) and RNA processing factors (hnRNP G, ROA0, CIRBP). Table 2. The dissociation constants (KD) of RGG peptides binding G-quadruplexes. KD (μM) 1 2 3 4 5 6 7 8 9 10 11 12 Htg22 n.d. n.d. n.d. n.d. n.d. n.d. n.d. 17.3 14.3 n.d. n.d. 7.2 Pu22 n.d. n.d. n.d. n.d. n.d. n.d. n.d. n.d. n.d. n.d. n.d. 5.2 n.d.: no binding was detected at concentrations less than 20 μM.

Binding of G-quadruplexes by the selected RGG peptides. Nucleic acid binding is the most important function of the RGG motif, but limited studies on their G4 affinity have been reported.24-26, 28, 34 Therefore, we investigated the binding affinity of the chosen peptides to G4 using surface plasmon resonance (SPR). The G4s we applied were the classic telomeric G4 Htg22 and the c-myc promotor G4 Pu22 (Figure S1). Dissociation constants (KDs) were obtained using an equilibrium fitting mode (Table 2).

ACS Paragon Plus Environment

Journal of the American Chemical Society 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 20

Table 3. Sequence information of peptide 12 mutants and their binding ability with G-quadruplexes. Name

Sequence of mutants

KDa (μM) Htg22

12

(Position of RGG repeats) 1st 2nd 3rd 4th 5th 6th 7th RGGSAGGRGFFRGGRGRGRGFSRGG

7.2

13

AGGSAGGRGFFRGGRGRGRGFSRGG

>20

14

RGGSAGGAGFFRGGRGRGRGFSRGG

11.2

15

RGGSAGGRGFFAGGRGRGRGFSRGG

13.4

16

RGGSAGGRGFFRGGAGRGRGFSRGG

>20

17

RGGSAGGRGFFRGGRGAGRGFSRGG

18

RGGSAGGRGFFRGGRGRGAGFSRGG

19

RGGSAGGRGFFRGGRGRGRGFSAGG

7.0

20

KGGSAGGKGFFKGGKGKGKGFSKGG

>20

21

RGGSAGGRGAARGGRGRGRGASRGG

>20

22

RGGSAGGRGAFRGGRGRGRGFSRGG

23

RGGSAGGRGFARGGRGRGRGFSRGG

24

RGGSAGGRGFFRGGRGRGRGASRGG

13.3

25

RGGSAGGRGLLRGGRGRGRGLSRGG

>20

26

RGGAAGGRGFFRGGRGRGRGFARGG

9.5

27

RGGDDGGRGFFRGGRGRGRGFDRGG

>20

EC50b (nM) Pu22

DRc (%)

Htg22

Htg22

Pu22

5.2

244

65

58

>20

1327

39

40

13.9

808

38

36

15.2

629

42

48

>20

1159

27

21

>20

>20

2159

26

24

8.9

12.4

730

35

37

8.5

604

43

34

14.1

1051

42

39

>20

1893

36

34

15.5

7.2

884

40

40

16.0

7.6

883

57

35

6.8

713

61

33

10.3

1522

55

34

5.2

352

55

45

>20

3675

33

40

28 RGGKKGGRGFFRGGRGRGRGFKRGG 4.1 3.7 241 76 63 a: dissociation constants (KD) were determined by SPR assay. b: effective binding concentrations (EC50) were determined by filter-binding assay. c: displacement ratios (DR) were determined by TO displacement assay. >20: binding constant was not smaller than 20 μM.

Only 3 of the 12 RGG peptides bound DNA G4s (Figure S2). Among them, peptide 12 showed efficient binding with a KD value under 10 μM, while peptides 8 and 9 exhibited slight binding to G4s, with a KD value greater than 10 μM. Peptide 12, with the strongest G4 affinity, has the greatest number of continuous RGG repeats among these motifs, indicating that the RGG repeats might be important for G4 recognition. However, peptide 11, with 6 RGG repeats but two negatively charged amino acids, did not effectively bind DNA G4s. Peptides 1 to 10 shared similar lengths and numbers of RGG repeats but had different gap amino acids. Although positively charged and aromatic amino acids are believed to participate in peptide-DNA binding, G4binding peptides 8 and 9 contain neither the most basic nor the most aromatic amino acids among these peptides.35 The ability of the RGG peptide to bind G4 was not crudely related to the number of amino acids that could contribute to DNA binding. Instead, the internal arrangement of the RGG repeats and the gap amino acids might be crucial for peptide-G4 binding. Interaction between peptide 12 and G-quadruplexes. To obtain more insight into the relationship between the G4 binding affinity and particular RGG peptide sequences, peptide 12, with good G4 binding affinity, was selected for further study. Before studying the structural basis of peptide 12 binding, we confirmed its binding to G4s using a filter-binding assay. In such an assay, peptide-bound nucleic acids are retained in the upper membrane, and the lower membrane holds the rest. To obtain a quantitative analysis, the gray level of the dots in the upper and lower membranes was counted, and the effective binding concentration (EC50) was determined. As shown in Figure 1 and Figure S3, peptide 12 effectively bound G4 Htg22 DNA with an EC50 value of 244 nM, while no dots with peptide 12 bound the single-stranded DNA mHtg22 or the hairpin DNA Hp18, and only a few dots with peptide 12 bound the G-rich single-

stranded DNA mG14. It is clear that peptide 12 selectively bound the G4 DNA structure compared with single-stranded or duplex DNA. To understand the details of the peptide 12 interaction with G4 DNA, we delved into the structural basis for its binding. Since RGG domains do not adopt a single, stable secondary structure and have an intrinsically disordered region (Figure S4), we focused on the key amino acids involved in G4 binding. A previous investigation of RGG domains determined that the arginine in RGG repeats was highly conserved and might be the key residue. The positively charged guanidine group enables electrostatic interactions with the negatively charged phosphate backbone and might enable hydrogen bond formation and π stacking interactions with the nucleobases. To investigate the role of arginine, we designed peptide 20, in which all the arginine residues were replaced by another cationic amino acid, lysine. In addition, arginines at different positions contribute differently to substrate binding. Thus, point mutations of peptide 12 were made to determine which RGG repeat was more important, and substitution of arginine with alanine in different positions gave peptides 13 to 19. The mutant sequences are listed in Table 3. The binding affinity of mutants 13 ~ 20 for G4 DNA was studied by SPR and filter binding (Table 3, Figure S5-9). Peptide 20, which has a similar positively charged state as peptide 12, shows a significant decrease in G4 binding affinity. Thus, arginine residues in RGG repeats of peptide 12 participate not only in binding via electrostatic interactions but also via other types of interactions, such as hydrogen bonding. In the arginineto-alanine-substituted peptides, peptides 13, 16 and 17, with arginine mutated in the first, fourth and fifth RGG repeats, respectively, had much weaker G4 binding affinity than the others. Their binding constants (KD) for G4 were all beyond the highest

ACS Paragon Plus Environment

Page 5 of 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of the American Chemical Society

sample concentration in SPR, and their EC50 value with telomeric G4 was one order of magnitude higher than that of peptide 12. Meanwhile, other arginine-to-alanine-substituted peptides exhibited only a slight change in binding constants. This result suggests that the arginines do not contribute equally and that the three arginines in the first, fourth and fifth RGG repeats might be key arginines that mediate G4 binding.

Figure 1. Filter-binding assay of peptide 12 with different DNA structures. (A) Filter binding dots of peptide 12 with G-quadruplex (Htg22), single-stranded (mHtg22) and hairpin (Hp18) DNA; (B) quantification of DNA on filter binding dots. On the other hand, aromatic residues are frequently observed within the G4-binding RGG motif and are proposed to be an important component for the specific recognition of G4.34 Therefore, a phenylalanine-to-alanine-substituted peptide, 21, and a phenylalanine-to-leucine-substituted peptide, 25, were designed to evaluate the role of phenylalanine in the motif-G4 interaction. Point mutations of peptide 12 were also made to determine the possible role of each phenylalanine. Substitutions of phenylalanine with alanine in different positions were made and named peptides 22 to 24. As shown in Table 3, peptide 21 had weak binding affinity among the peptide mutants, slightly higher than that of peptide 17. Thus, phenylalanines might be also important residue in RGG peptide 12. Peptide 25, which has hydrophobic state closed to peptide 12 but with all the phenylalanines replaced by leucine, shows a significant decrease in G4 binding affinity. Therefore, we assumed the phenylalanines in peptide 12 were more likely to participate in G4 binding via π-π interactions. In the phenylalanine-to-alanine peptides, peptides 22, 23 and 24, exhibit binding constants (KD) for telomeric G4 all beyond 10 μM, and their EC50 value with telomeric G4 were approximately 800 nM. Their similar G4 binding affinity were weaker than that of peptide 12, denoting all these phenylalanines in peptide 12 participated in the G4 binding. Interestingly, the arginines in RGG repeats bind in a site-specific manner, which corresponds to the observations for peptides 1 to 12, indicating that the internal arrangement of RGG repeats and gap amino acids might be crucial for G4 binding. Besides the RGG repeats and aromatic gap amino acid, phenylalanines, it might also be interesting to see the influence of other gap amino acids on G4 binding of peptide 12. We then designed peptide 26, in which all the serine residues were replaced by alanine. The G4 binding affinity of peptide 26 was similar to peptide 12, so the serine in peptide 12 seems not to contribute to G4 binding. Peptides 27 and 28, in which all the serines and alanines were replaced by an acidic amino acid, aspartate, or basic amino acid, lysine, respectively, were designed to evaluate the possible influence of non-aromatic amino acids on G4 binding of RGG peptide 12. The G4 binding affinity of peptide 27 was much weaker than that of peptide 12, while that of peptide

28 was slightly stronger than that of peptide 12. It is clear that the negatively charged acidic amino acid s could hamper the G4 binding affinity of peptide 12, while the positively charged basic amino acids could enhance the G4 binding affinity. In addition, we varied the internal arrangement of RGG repeats, and peptide R1 to R7 were obtained from peptide 12 with the RGG repeats and gap amino acids in different orders (Table S3, Figure S1113). Their binding affinities to G4 were studied by SPR. Impressively, all the binding affinity to G4 of these altered RGG peptides decreased in different levels, suggesting the importance of the internal arrangement of the RGG motif sequence. To consolidate the binding component of peptide 12 toward the G4, thiazole orange (TO) displacement of G4 DNA was also carried out. In the TO replacement assay, TO fluoresced upon binding G4, and the fluorescence intensity decreased with the addition of peptides to compete with its binding site of G4.36 A higher replacement ratio represented a stronger affinity of peptides to G4. The addition of peptide 12 effectively decreased TO fluorescence with Htg22 and Pu22 (Table 3 and Figure S10). In addition, the results of TO displacement for the mutants were in good agreement with those of SPR and filter binding. Taken together, our results indicate that the arginines in RGG repeats and the phenylalanines throughout the sequence of peptide 12 are all important for the G4 binding. The arginine in this RGG motif for G4 binding not only mediates the electrostatic interactions but also is involved in possible hydrogen bonding and π-π interactions. Meanwhile, the role of phenylalanines in peptide 12 for G4 binding might involve in π-π interactions. The internal arrangement of RGG repeats and gap amino acids also have an effect on the G4 binding of RGG peptides. Investigation of the peptide 12 binding mode in the G4peptide 12 complex. The great difference in G4 binding affinity of the mutants also suggests a special binding mode of peptide 12. While we learned about the molecular basis for the amino acids in peptide 12 used for binding, it was still intriguing to discover how the RGG motif binds to G4s with site-specific features. Before investigating the binding site on G4 for peptide 12, we first evaluated the possible impact of peptide 12 on G4 formation by circular dichroism (CD) and melting assays (Figure S14-15). The addition of peptide 12 to G4s with different formations (hybrid-typed G4 Htg22, parallel G4 Pu22 and antiparallel G4 Hras) did not cause a change in the global characteristics of CD signals but increased the Tm value of G4 by 3.5 to 20 °C. The binding of peptide 12 increased the stabilization of the G4s without inducing any conformational change. A 1H-NMR titration was carried out to further explore the mode of peptide 12 binding to G4s in detail. Notably, telomere G4 Htg25 and c-myc promoter G4 Pu22 are used to study the G4 interaction with peptide 12. The signals of the imino protons (10–13 ppm) in these two G4s were well resolved, which enabled the binding sites of the peptide to be determined.34, 37-38 As shown in the 1H-NMR data of Htg25 (Figure 2A), among all the imino protons, G3, G17 and G21 imino signals significantly shift upon addition of peptide 12. The signals of only these three guanines, belonging to the 5′-terminal G-quartet of Htg25, were changed. This result indicated that the binding site of peptide 12 is specific for the 5′ terminal region of Htg25. On the other hand, the 1H-NMR data of Pu22 (Figure 2B) show distinct chemical

ACS Paragon Plus Environment

Journal of the American Chemical Society 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

shifts of imino peaks of G6, G10, G17 and G19 upon addition of peptide 12, while other imino peaks had no or little shift. Interestingly, these four guanines belonged to the terminal Gquartet, suggesting peptide 12 also may bind on the G-quartet of Pu22. All these data suggested that peptide 12 binds to G4 upon the G-quartet plane with a specific interaction rather than via random binding.

Figure 2. Chemical shift perturbation analysis of G-quadruplexes with peptide 12. (A) The imino proton regions of the 1HNMR spectra of Htg25 either alone (bottom) or with peptide 12 at a ratio of 1:2 (top). (B) The imino proton regions of the 1D 1H-NMR spectra of Pu22 either alone (bottom) or with peptide 12 at a ratio of 1:2 (top). The assays were performed in 25 mM KH2PO4 buffer (70 mM KCl, 10% D2O, pH 7.4) using 400 MHz Bruker spectrometers at 25 °C. Meanwhile, modification with 2-aminopurine (2-Ap) in different loops has been widely used to estimate the binding modes of molecules with G4s.39 To obtain more detail on the binding mode, fluorescence experiments were performed using the telomere G4 Htg22 and Pu22 with 2-Ap substitutions at different positions in the G-quartet or in the loop. Upon titration with peptide 12, the fluorescence intensity of Ap1 in the 5′ terminal region in Htg22 was significantly disturbed and that of Ap7 in the first loop was slightly disturbed (Figure 3A). However, the fluorescence of Ap13 and Ap19 was not affected. This result indicated that peptide 12 bound the 5′ terminal region and the propeller loop of Htg22. On the other hand, the fluorescence intensity of Ap21 in the 3′ terminal region in Pu22 was significantly disturbed upon titration with peptide 12 and that of Ap7 and Ap16 in the loop was slightly disturbed (Figure 3B). This result indicated that peptide 12 bound the terminal region and two propeller loops of Pu22 simultaneously. Combining the results of the NMR and 2-Ap titrations, we deduced that peptide 12 was restricted to binding on the G-quartet plane of G4s. Recent studies have found that the G-quartet is important for G4 binding by RGG domain.34 A NMR-based binding assay revealed that the aromatic amino acids phenylalanine and tyrosine, were critical in the binding. It is also important to see the key amino acids modulating G-quartet binding of RGG peptide. 1HNMR titrations of mutated peptides to Htg25 were carried out to reveal the amino acids for G-quartet binding. Since the RGG peptide 12 exhibits a binding mode specific to the G-quartet, we focus on the imino signals belonging to the 5′-terminal G-quartet (G3, G9, G17 and G21) upon the addition of all the mutant peptides and compared them with that of peptide 12. The difference in signal change upon substitution of peptide 12 could somehow represent the influence of G-quartet binding of the

Page 6 of 20

substituted amino acids. As shown in Figure S16 and S17, the signal changes of all the mutant peptides were variously different from that of peptide 12. Impressively, upon addition of the all phenylalanine substituted peptides 21 and 25, the signal change of G17 was decreased, while those of G3, G9 and G21 were increased, suggesting that the loss of the phenylalanines in peptide 12 would hamper the binding with G17. A slightly decrease in the change of G17 signal and an increase in the change of G9 were observed for peptides 22 and 23, while decreases in the changes of all these guanines signals were observed for peptide 24. The two phenylalanines in the center of peptide 12 might have specific interactions with G17, while the other might bind with G4 within another manner. On the other hand, the changes of G3, G17 and G21 signals were decreased upon the addition of arginine-substituted peptides, suggesting that all the arginines have a contribution to G-quartet binding of peptide 12. Among the single arginine-substituted peptides, peptide 16 had the least change in the imino signals, indicated that the arginine in the fourth RGG repeat should be the most important arginine involved in the G-quartet binding of peptide 12. Moreover, the G-quartet binding mode is likely to be that seen in the NMR structure of the peptide segment of DHX36 (Rhau18) and prion (P16) with the G4, where they both preferentially bind upon the G-quartet plane (Figure S18).40-41 Therefore, the structural principles of the RGG motif-G4 complex reported in our article might still be applicable to other peptideG4 complexes. These results implied a new way in which the RGG motif mediates protein-G4 recognition.

Figure 3. Fluorescence titrations of 1 μM 2-AP-labeled Gquadruplexes with stepwise addition of peptide 12 in 10 mM Tris-HCl buffer, 100 mM KCl, pH 7.4. (A) Plot of normalized fluorescence intensity at 375 nm of Htg22 labeled with individual 2-Ap versus the binding ratio of [peptide 12]/[Htg22]. (B) Plot of normalized fluorescence intensity at 375 nm of Pu22 labeled with individual 2-Ap versus the binding ratio of [peptide 12]/[Pu22]. The 2-Ap-labeled sites in the G-quadruplexes are shown in the chart. Identification of CIRBP as a G4-binding protein. RGG domains have been reported to mediate G4 binding in several G4binding proteins. The effective binding of RGG peptide 12 might also mediate protein G4 recognition. Thus, we look back to the original protein from these studies, named the cold-inducible RNA-binding protein (CIRBP). CIRBP has an RNAbinding function in the stress response.42-43 A recent study reported that knocking out CIRBP led to a shortened telomere length.44 Interestingly, the telomere region contains an abundance of repeated G-rich DNA sequences that fold into a G4 and are involved in the regulation of the telomere length. Peptide 12 also effectively bound telomere G4s. Thus, this evidence suggests a correlation between CIRBP and telomere G4s. To deter-

ACS Paragon Plus Environment

Page 7 of 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of the American Chemical Society

mine whether RGG peptide 12 mediates the protein-G4 interaction, we investigated the binding of CIRBP to G4 both in vitro and in cells.

The RGG motif is essential for binding G-quadruplexes. As discussed above, the similarity of the binding selectivity of CIRBP and peptide 12 prompted us to speculate that the RGG motif within CIRBP was responsible for the binding affinity and selectivity for its nucleic acid substrates. To pursue this hypothesis, we constructed a mutant protein with a deleted RGG motif (peptide 12), termed ΔRGG (Figure 5A). The KD values of RGG for G4s were around 10 μM, which were three orders of magnitude (~250 times) higher than that of CIRBP, denoting the necessity of the RGG motif in CIRBP binding G4 (Table S5, Figure S26-S27). In addition, the results of filter binding and EMSAs corroborated this conclusion, while no distinct band corresponding to G4 binding was observed with ΔRGG (Figure S28).

Figure 4. CIRBP binds to the G-quadruplex. (A) Filter binding dots of CIRBP with G-quadruplex (Htg22), single-stranded (mHtg22) and hairpin (Hp18) DNA; (B) Quantification of DNA on filter binding dots. (C) EMSA binding bands of CIRBP with G-quadruplex (Htg22), single-stranded (mHtg22) and hairpin (Hp18) DNA. CIRBP was purified with a glutathione S-transferase (GST) tag, following a previous report (Figure S19). The affinity of CIRBP binding different types of nucleic acids was determined by SPR, which prompted us to investigate whether CIRBP binds to G4 in a structure-dependent manner. In addition to G4 (Htg22 and Pu22), hairpin (Hp18) and single-stranded (mG14) DNA, G4 RNA (TERRA) and single-stranded RNA (mTERRA) were included because CIRBP has been reported to bind RNA. As shown in Table S4 and Figure S20-21, CIRBP exhibited strong binding of G4s (Htg22, Pu22 and TERRA), with KD values of 35.8 nM, 44.1 nM and 86.9 nM, respectively. Meanwhile, CIRBP also showed good binding of single-stranded RNA (mTERRA), with a KD value of 110.0 nM, but no binding to hairpin or single-stranded DNA, which was consistent with its role as an RNA-binding protein. It seems that the selective binding of CIRBP toward the G4 is only relevant for only DNA. Filter binding and electrophoretic mobility shift assays (EMSAs) were then performed to confirm the selective binding of CIRBP to G4s. The filter binding results were in good agreement with SPR, where CIRBP clearly bound G4s with an EC50 value of 73.8 nM, and no distinct binding was observed with singlestranded DNA (mHtg22 and mG14) and double-strand hairpin DNA (Hp18) (Figure 4 and Figure S22). The sequences were then expanded in EMSAs, and similar selectivity in DNA binding was observed for CIRBP (Figure 4C and Figure S23-24). Biolayer interferometry (BLI) indicated that CIRBP bound telomeric G4 Htg22 DNA with a KD of 27.1 nM, and a competition assay also confirmed its selectivity for telomeric G4 DNA (Figure S25). Taken together, these data represent the first time that CIRBP is identified as a G4-binding protein. In addition, CIRBP bound different nucleic acid substrates, which is a behavior known as promiscuous binding or degenerate specificity. It was also interesting to see that CIRBP showed similar binding selectivity as peptide 12, implying that the RGG motif is crucial for CIRBP binding G4. Therefore, it could be deduced that its binding flexibility was mostly due to the plasticity of the RGG motif.

Figure 5. (A) Alterations in the RGG motif of CIRBP. (B) Filter-binding assays of RGG motif deletion (ΔRGG) or mutation (mRGG1 and mRGG2) constructs with telomeric G-quadruplex Htg22 DNA. (C) Quantification of DNA on filter binding dots. In the studies of RGG peptides, arginine at the fourth and fifth RGG repeats played a key role in G4 binding. We next explored whether these arginine residues in the RGG motif play important roles in CIRBP binding G4. Therefore, two mutant proteins were designed with an arginine substituted by an alanine (R108A for mRGG1 and R110A for mRGG2) (Figure 5A). Both mRGG1 and mRGG2 exhibited weaker binding of telomeric G4 than CIRBP, with the KD values of 825.2 nM and 355.3 nM, respectively (Table S5, Figure S27). The EMSAs of these two mutants with G4s also exhibited a dramatic reduction in the nucleic acid affinity (Figure 5B, Figure S28), confirming that these two amino acid residues played an important role in mediating the CIRBP-G4 interaction. That lack of binding observed for ΔRGG and seriously impaired binding observed for mRGG1 and mRGG2 suggested that the RGG motif (peptide 12) in CIRBP was essential for its G4 binding and that the peptide sequence should be involved as a core binding mediator to G4. The interaction mode between G-quadruplexes and CIRBP. It has been reported that the RGG domain of G4-binding proteins effectively recognize the loop of G4s. However, peptide 12 alone was more likely to bind upon the G-quartet plane. It is interesting that this RGG motif participated in G4 binding by CIRBP. To investigate this behavior, we used human telomeric DNA as a study model. The telomere G4 DNA, with different repeats of TTAGGG, can form different topologies

ACS Paragon Plus Environment

Journal of the American Chemical Society 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

containing different loops, including a tetramolecular G4 without a loop, a bimolecular G4 with two propeller loops, and a unimolecular G4 with two basket loops and one propeller loop.45-47 The formation of G4 structures was confirmed by CD spectra (Figure S1). We first compared the affinity of CIRBP binding to these three G4s by competition assays using Bio Layer Interferometry (BLI).48 In addition, CIRBP showed weaker binding affinity with dimer and tetramer telomere G4s than with previous G4s (Figure 6A). This result suggested that CIRBP might interact with the loop of G4. Then, 2-Ap titration was carried out to directly determine the possible CIRBP binding of telomeric G4 DNA. In contrast to the RGG peptide alone, CIRBP affects all 2-Aps, both in the G-quartet (Ap1) and in the loop site (Ap7, Ap13, Ap19). The CIRBP might bind both Gquartet and loop site (Figure 6B). The binding stoichiometry between CIRBP and G-quadruplex DNA was also determined by Isothermal Titration Calorimetry (ITC). The binding enthalpy was well fitted with a 1:1 binding mode, suggesting that one CIRBP might interact with both G-quartet and the loop of G4, simultaneously (Figure S29). Considering these results, it could be concluded that CIRBP efficiently bound by surrounding the telomeric G4 DNA. In addition to the RGG motif that could bind the G-quartet, other components of CIRBP might also participate in G4 recognition, while all loops could be reached by the CIRBP. The effect of the CIRBP on G4 structure was then evaluated by fluorescence resonance energy transfer (FRET) and CD; while the signal of G4 was slightly affected, CIRBP might be directly bound to G4 without helicase activity (Figure S30-S31).

Figure 6. (A) BLI competition assay of 50 nM CIRBP binding with Htg22 upon various DNAs. (B) Plot of normalized fluorescence intensity at 375 nm of Htg22 labeled with individual 2Ap versus the binding ratio of [CIRBP]/[Htg22]. CIRBP binds to telomeric G-quadruplex DNA in cells. Because CIRBP selectively binds DNA G4 structures in vitro, whether CIRBP interacts with G4 DNA in cells is the next question to be answered. To test this hypothesis, an immunofluorescence assay was conducted. Notably, CIRBP was expressed at a low level, making it difficult for us to observe its location. Therefore, we transfected a plasmid that expresses GFP-tagged CIRBP in HeLa cells. DNA G4s were visualized by the G4 antibody BG4 or D1 after RNase A treatment, which removed the RNA.30-31 From the images in Figure 7, the partial colocalization of GFP-CIRBP and DNA G4s was observed. Because CIRBP does not specifically bind G4 and because there are many more nucleic acids than G4s, this extent of colocalization was plausible. On the other hand, the colocalization of the RGG motif deleted mutant (GFP-RGG) and DNA G4s was not observed (Figure S32), suggesting that the RGG motif in CIRBP might also be essential for G4 binding in cells.

Page 8 of 20

Telomeric G4 DNA is one of the most abundant G4s in the nucleus, and the formation of telomeric DNA G4s could play an important role in telomere maintenance. CIRBP has been reported to play a role in telomere maintenance and could effectively bind to telomeric DNA G4s in vitro in our study. Thus, CIRBP might also participate in telomere regulation as a telomeric G4-binding protein in vivo. We then investigated the interaction between CIRBP and telomeric G4s in cells. The binding of CIRBP to telomeric DNA was first evaluated by chromatin immunoprecipitation (ChIP). The results indicated that CIRBP could indeed bind to telomeric repeat DNA (Figure S33). The binding of CIRBP to the telomere region was confirmed by immunofluorescence.49 In Figure S34, a large portion of GFPtagged CIRBPs were colocalized with the telomeric protein TRF2, which could represent the location of telomere regions. At the same time, the colocalization of telomere was not observed for GFP-tagged RGG. Taken together, ChIP and immunofluorescence data illustrated an association of CIRBP with G4s and telomeres in cells, which was possibly driven by the G4-binding RGG motif.

Figure 7. Immunofluorescence image of GFP-tagged CIRBP (green) and G-quadruplex antibody BG4 or D1 (red) in HeLa cells after RNase A treatment. The nucleus was stained with DAPI (blue). The colocalized foci are indicated by white arrows and showed in enlarged image of the yellow box. DISCUSSION G4 has long been recognized as a regulating element for its involvement in telomere maintenance and gene expression. These processes are regulated by numerous G4-binding proteins. However, how a protein recognizes G4 is obscure. In this study, we focused on the mechanism of RGG motif binding to G4s; this motif is found in parts of G4-binding proteins. To clarify the role of the RGG motif in G4-protein interactions, we systematically investigated the binding affinity and mode between the RGG-motif peptides and G4s. From the SPR results of peptides 1 to 12, we found that RGG peptides with similar lengths and numbers of RGG repeats but different gap amino acids and internal arrangements exhibited great differences in G4 binding ability. Among these peptides, peptide 12, with 7 RGG repeats, was found to effectively bind G4 DNA over single-stranded and hairpin DNA. Mutation studies of peptide 12 revealed that both arginine and phenylalanine residues are crucial for the G4 binding of peptide 12 and that the positions of the residues are site specific. The arginine in the RGG repeat is more capable of binding G4s than lysine or alanine, and it is likely that arginine

ACS Paragon Plus Environment

Page 9 of 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of the American Chemical Society

not only provides positive charges for electrostatic interactions but also provides hydrogen bonding or π stacking interaction sites with substrates. Thus, arginine is conserved in the RGG motif. Moreover, the phenylalanine is more capable of binding G4s than leucine or alanine, prompting its role in G4 binding via π-π interactions. In addition, the internal arrangement of RGG repeats and gap amino acids also plays role in the G4 binding mechanism rather than a critical number of RGG repeats. We also found that RGG peptide 12 binds mainly upon one G-quartet plane of G4s. The arginine in the fourth RGG repeat and the phenylalanines in the middle of the sequence were found to be the most important amino acids in G-quartet binding. This G-quartet binding mode was also observed in the RGG3 domain of TLS/FUS or the peptide, Rhau18.34, 40 Interestingly, a recent crystal structure showed that Rhau18 mediated the binding between protein DHX36 and a G4 with a similar binding mode (Figure S18).18 These intriguing results bring our focus to the origin protein of peptide 12, CIRBP. According to SPR, BLI, filter binding and EMSA results, we found that CIRBP bound both G4 and single-stranded structures in RNA but only G4 structures in DNA. As CIRBP has been reported as an RNAbinding protein whose action is mediated in part through an RNA recognition motif (RRM), it is not surprising that CIRBP has a broad substrate-binding scope in RNA. However, the DNA binding of CIRBP is specific for the G4 structure, which might be mediated by the G4-binding RGG motif. The cellular interaction between CIRBP and G4 DNA was confirmed by an immunofluorescence assay in HeLa cells. In addition, the binding of CIRBP to DNA is associated with telomeres. This observation suggests that CIRBP might bind telomeric G4 DNA in vivo. Considering the effect of CIRBP in telomeric DNA maintenance, CIRBP might also be involved in this function via binding with telomeric G4 DNA. The G4 DNA-binding domain of CIRBP was confirmed by the deletion and mutation of the RGG motif. The deletion of the RGG motif (peptide 12) completely abolished the binding affinity of CIRBP to G4s, while mutations of arginine 108 or 110 in the RGG motif resulted in much weaker binding affinity. Both the deletion and mutation indicated the necessary RGG motifG4 interaction for the CIRBP recognition of G4 DNA. In particular, arginine 108 and 110 in CIRBP might be the key residues for its binding to DNA G4s and are key amino acids for peptide 12 binding alone. The highly consistent key amino acids clearly suggest that this RGG motif is the core mediator of the CIRBP-G4 interaction. In conclusion, our study provides new insight into how RGG peptides bind to G4s. The internal arrangement of RGG repeats and gap amino acids played a more crucial role in the G4 binding mechanism than a critical number of RGG repeats. Arginines and phenylalanines at the exact position of the RGG motif might enable additional hydrogen bonding and π stacking interaction with nucleobases and strengthen the binding to G4s. Impressively, proceeding from the G4-binding RGG peptide 12, we identified CIRBP as a new G4 DNA-binding protein both in vitro and in cells. This is the first time that a G4-binding protein was identified from the exploration of the G4-binding RGG motif. In addition, we found that the RGG motif acted as a key mediator in the G4 binding of CIRBP. The binding ability of this key peptide segment to G4 could represent the binding ability

of the full-length protein. A recent analysis also found RGG sequence as a shared motif from 77 G4-binding proteins.20 In future studies, based on deeper insight into the role of the G4binding RGG motif, it might be possible to rapidly identify G4binding proteins by analyzing the internal arrangement of RGG repeats. This approach might be a novel strategy for discovering new G4-binding proteins by exploring key peptide segments.

Supporting Information. Characterization of the peptides and proteins, supplemental spectra and graphs.

*(S.-B. Chen) E-mail: [email protected]; *(Z.-S. Huang) E-mail: [email protected].

All authors have given approval to the final version of the manuscript. / ‡These authors contributed equally.

The authors declare no competing financial interests.

This work was supported by the National Natural Science Foundation of China (81330077, 21708053, 81872732 and 21672265), the Natural Science Foundation of Guangdong Province (2017A030308003 and 2017A030313040), the Ministry of Education of China (No. IRT-17R111), the Fundamental Research Funds for the Central Universities (17ykpy18), the Guangdong Provincial Key Laboratory of Construction Foundation (2017B030314030).

(1) Bochman, M. L.; Paeschke, K.; Zakian, V. A., DNA secondary structures: stability and function of G-quadruplex structures. Nat. Rev. Genet. 2012, 13 (11), 770-80. (2) Chambers, V. S.; Marsico, G.; Boutell, J. M.; Di Antonio, M.; Smith, G. P.; Balasubramanian, S., High-throughput sequencing of DNA G-quadruplex structures in the human genome. Nat. Biotechnol. 2015, 33 (8), 877-81. (3) Kwok, C. K.; Marsico, G.; Sahakyan, A. B.; Chambers, V. S.; Balasubramanian, S., rG4-seq reveals widespread formation of Gquadruplex structures in the human transcriptome. Nat. Methods 2016, 13 (10), 841-4. (4) Chen, X.-C.; Chen, S.-B.; Dai, J.; Yuan, J.-H.; Ou, T.-M.; Huang, Z.-S.; Tan, J.-H., Tracking the Dynamic Folding and Unfolding of RNA G-Quadruplexes in Live Cells. Angew. Chem. Int. Ed. Engl. 2018, 57 (17), 4702-6. (5) Kuryavyi, V.; Phan, A. T.; Patel, D. J., Solution structures of all parallel-stranded monomeric and dimeric G-quadruplex scaffolds of the human c-kit2 promoter. Nucleic Acids Res. 2010, 38 (19), 6757-73. (6) Dhakal, S.; Cui, Y.; Koirala, D.; Ghimire, C.; Kushwaha, S.; Yu, Z.; Yangyuoru, P. M.; Mao, H., Structural and mechanical properties of individual human telomeric G-quadruplexes in molecularly crowded solutions. Nucleic Acids Res. 2013, 41 (6), 3915-23. (7) Tian, T.; Chen, Y.-Q.; Wang, S.-R.; Zhou, X., G-Quadruplex: A Regulator of Gene Expression and Its Chemical Targeting. Chem. 2018, 4 (6), 1314-44.

ACS Paragon Plus Environment

Journal of the American Chemical Society 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(8) Takahashi, S.; Brazier, J. A.; Sugimoto, N., Topological impact of noncanonical DNA structures on Klenow fragment of DNA polymerase. Proc. Natl. Acad. Sci. U.S.A. 2017, 114 (36), 9605-10. (9) Sutherland, C.; Cui, Y.; Mao, H.; Hurley, L. H., A Mechanosensor Mechanism Controls the G-Quadruplex/i-Motif Molecular Switch in the MYC Promoter NHE III1. J. Am. Chem. Soc. 2016, 138 (42), 14138–51. (10) Reyes-Reyes, E. M.; Teng, Y.; Bates, P. J., A new paradigm for aptamer therapeutic AS1411 action: uptake by macropinocytosis and its stimulation by a nucleolin-dependent mechanism. Cancer Res. 2010, 70 (21), 8617-29. (11) Blice-Baum, A. C.; Mihailescu, M. R., Biophysical characterization of G-quadruplex forming FMR1 mRNA and of its interactions with different fragile X mental retardation protein isoforms. RNA 2014, 20 (1), 103-14. (12) Zhang, Y.; Gaetano, C. M.; Williams, K. R.; Bassell, G. J.; Mihailescu, M. R., FMRP interacts with G-quadruplex structures in the 3'-UTR of its dendritic target Shank1 mRNA. RNA Biol. 2014, 11 (11), 1364-74. (13) Neidle, S.; Parkinson, G., Telomere maintenance as a target for anticancer drug discovery. Nat. Rev. Drug Discovery 2002, 1 (5), 383-93. (14) Ray, S.; Bandaria, J. N.; Qureshi, M. H.; Yildiz, A.; Balci, H., G-quadruplex formation in telomeres enhances POT1/TPP1 protection against RPA binding. Proc. Natl. Acad. Sci. U.S.A. 2014, 111 (8), 2990-5. (15) Takahama, K.; Takada, A.; Tada, S.; Shimizu, M.; Sayama, K.; Kurokawa, R.; Oyoshi, T., Regulation of telomere length by Gquadruplex telomere DNA- and TERRA-binding protein TLS/FUS. Chem. Biol. 2013, 20 (3), 341-50. (16) Williams, P.; Li, L.; Dong, X.; Wang, Y., Identification of SLIRP as a G Quadruplex-Binding Protein. J. Am. Chem. Soc. 2017, 139 (36), 12426-9. (17) Vaughn, J. P.; Creacy, S. D.; Routh, E. D.; Joyner-Butt, C.; Jenkins, G. S.; Pauli, S.; Nagamine, Y.; Akman, S. A., The DEXH protein product of the DHX36 gene is the major source of tetramolecular quadruplex G4-DNA resolving activity in HeLa cell lysates. J. Biol. Chem. 2005, 280 (46), 38117-20. (18) Chen, M. C.; Tippana, R.; Demeshkina, N. A.; Murat, P.; Balasubramanian, S.; Myong, S.; Ferre-D'Amare, A. R., Structural basis of G-quadruplex unfolding by the DEAH/RHA helicase DHX36. Nature 2018, 558 (7710), 465-9. (19) Brazda, V.; Haronikova, L.; Liao, J. C.; Fojta, M., DNA and RNA quadruplex-binding proteins. Int. J. Mol. Sci. 2014, 15 (10), 17493-517. (20) Brazda, V.; Cerven, J.; Bartas, M.; Mikyskova, N.; Coufal, J.; Pecinka, P., The Amino Acid Composition of Quadruplex Binding Proteins Reveals a Shared Motif and Predicts New Potential Quadruplex Interactors. Molecules 2018, 23 (9), 2341-56. (21) Thandapani, P.; O'Connor, T. R.; Bailey, T. L.; Richard, S., Defining the RGG/RG motif. Mol. Cell 2013, 50 (5), 613-23. (22) Gonzalez, V.; Hurley, L. H., The C-terminus of nucleolin promotes the formation of the c-MYC G-quadruplex and inhibits cMYC promoter activity. Biochemistry 2010, 49 (45), 9706-14. (23) Takahama, K.; Kino, K.; Arai, S.; Kurokawa, R.; Oyoshi, T., Identification of Ewing's sarcoma protein as a G-quadruplex DNAand RNA-binding protein. FEBS J. 2011, 278 (6), 988-98. (24) Takahama, K.; Oyoshi, T., Specific binding of modified RGG domain in TLS/FUS to G-quadruplex RNA: tyrosines in RGG domain recognize 2'-OH of the riboses of loops in G-quadruplex. J. Am. Chem. Soc. 2013, 135 (48), 18016-9. (25) Ozdilek, B. A.; Thompson, V. F.; Ahmed, N. S.; White, C. I.; Batey, R. T.; Schwartz, J. C., Intrinsically disordered RGG/RG domains mediate degenerate specificity in RNA binding. Nucleic Acids Res. 2017, 45 (13), 7984-96. (26) Yagi, R.; Miyazaki, T.; Oyoshi, T., G-quadruplex binding ability of TLS/FUS depends on the beta-spiral structure of the RGG domain. Nucleic Acids Res. 2018, 46 (12), 5894-5901.

Page 10 of 20

(27) Takahama, K.; Sugimoto, C.; Arai, S.; Kurokawa, R.; Oyoshi, T., Loop lengths of G-quadruplex structures affect the Gquadruplex DNA binding selectivity of the RGG motif in Ewing's sarcoma. Biochemistry 2011, 50 (23), 5369-78. (28) Vasilyev, N.; Polonskaia, A.; Darnell, J. C.; Darnell, R. B.; Patel, D. J.; Serganov, A., Crystal structure reveals specific recognition of a G-quadruplex RNA by a beta-turn in the RGG motif of FMRP. Proc. Natl. Acad. Sci. U.S.A. 2015, 112 (39), E5391-400. (29) Phan, A. T.; Kuryavyi, V.; Darnell, J. C.; Serganov, A.; Majumdar, A.; Ilin, S.; Raslin, T.; Polonskaia, A.; Chen, C.; Clain, D.; Darnell, R. B.; Patel, D. J., Structure-function studies of FMRP RGG peptide recognition of an RNA duplex-quadruplex junction. Nat. Struct. Mol. Biol. 2011, 18 (7), 796-804. (30) Biffi, G.; Tannahill, D.; McCafferty, J.; Balasubramanian, S., Quantitative visualization of DNA G-quadruplex structures in human cells. Nat. Chem. 2013, 5 (3), 182-6. (31) Liu, H.-Y.; Zhao, Q.; Zhang, T.-P.; Wu, Y.; Xiong, Y.-X.; Wang, S.-K.; Ge, Y.-L.; He, J.-H.; Lv, P.; Ou, T.-M.; Tan, J.-H.; Li, D.; Gu, L.-Q.; Ren, J.; Zhao, Y.; Huang, Z.-S., Conformation Selective Antibody Enables Genome Profiling and Leads to Discovery of Parallel G-Quadruplex in Human Telomeres. Cell Chem. Biol. 2016, 23 (10), 1261-70. (32) Nagatoishi, S.; Isono, N.; Tsumoto, K.; Sugimoto, N., Hydration is required in DNA G-quadruplex-protein binding. Chembiochem 2011, 12 (12), 1822-6. (33) Corley, S. M.; Gready, J. E., Identification of the RGG box motif in Shadoo: RNA-binding and signaling roles? Bioinform. Biol. Insights 2008, 2, 383-400. (34) Kondo, K.; Mashima, T.; Oyoshi, T.; Yagi, R.; Kurokawa, R.; Kobayashi, N.; Nagata, T.; Katahira, M., Plastic roles of phenylalanine and tyrosine residues of TLS/FUS in complex formation with the G-quadruplexes of telomeric DNA and TERRA. Sci. Rep. 2018, 8 (1), 2864-75. (35) Sathyapriya, R.; Vishveshwara, S., Interaction of DNA with clusters of amino acids in proteins. Nucleic Acids Res. 2004, 32 (14), 4109-18. (36) Monchaud, D.; Teulade-Fichou, M. P., G4-FID: a fluorescent DNA probe displacement assay for rapid evaluation of quadruplex ligands. Methods Mol. Biol. 2010, 608, 257-71. (37) Dai, J.; Carver, M.; Hurley, L. H.; Yang, D., Solution structure of a 2:1 quindoline-c-MYC G-quadruplex: insights into Gquadruplex-interactive small molecule drug design. J. Am. Chem. Soc. 2011, 133 (44), 17673-80. (38) Luu, K. N.; Phan, A. T.; Kuryavyi, V.; Lacroix, L.; Patel, D. J., Structure of the Human Telomere in K+ Solution:  An Intramolecular (3 + 1) G-Quadruplex Scaffold. J. Am. Chem. Soc. 2006, 128 (30), 9963-70. (39) Cravens, H., A scientific project locked in time. The Terman Genetic Studies of Genius, 1920s-1950s. Am. Psychol. 1992, 47 (2), 183-9. (40) Heddi, B.; Cheong, V. V.; Martadinata, H.; Phan, A. T., Insights into G-quadruplex specific recognition by the DEAH-box helicase RHAU: Solution structure of a peptide-quadruplex complex. Proc. Natl. Acad. Sci. U.S.A. 2015, 112 (31), 9608-13. (41) Mashima, T.; Matsugami, A.; Nishikawa, F.; Nishikawa, S.; Katahira, M., Unique quadruplex structure and interaction of an RNA aptamer against bovine prion protein. Nucleic Acids Res. 2009, 37 (18), 6249-58. (42) Yang, R.; Weber, D. J.; Carrier, F., Post-transcriptional regulation of thioredoxin by the stress inducible heterogenous ribonucleoprotein A18. Nucleic Acids Res. 2006, 34 (4), 1224-36. (43) Chen, J. K.; Lin, W. L.; Chen, Z.; Liu, H. W., PARP-1dependent recruitment of cold-inducible RNA-binding protein promotes double-strand break repair and genome stability. Proc. Natl. Acad. Sci. U.S.A. 2018, 115 (8), E1759-68. (44) Zhang, Y.; Wu, Y.; Mao, P.; Li, F.; Han, X.; Zhang, Y.; Jiang, S.; Chen, Y.; Huang, J.; Liu, D.; Zhao, Y.; Ma, W.; Songyang, Z., Cold-inducible RNA-binding protein CIRP/hnRNP A18 regulates

ACS Paragon Plus Environment

Page 11 of 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of the American Chemical Society

telomerase activity in a temperature-dependent manner. Nucleic Acids Res. 2016, 44 (2), 761-75. (45) Parkinson, G. N.; Lee, M. P.; Neidle, S., Crystal structure of parallel quadruplexes from human telomeric DNA. Nature 2002, 417 (6891), 876-80. (46) Dai, J.; Punchihewa, C.; Ambrus, A.; Chen, D.; Jones, R. A.; Yang, D., Structure of the intramolecular human telomeric Gquadruplex in potassium solution: a novel adenine triple formation. Nucleic Acids Res. 2007, 35 (7), 2440-50. (47) Hounsou, C.; Guittat, L.; Monchaud, D.; Jourdan, M.; Saettel, N.; Mergny, J. L.; Teulade-Fichou, M. P., G-quadruplex recognition by quinacridines: a SAR, NMR, and biological study. ChemMedChem 2007, 2 (5), 655-66.

(48) Chen, S.-B.; Hu, M.-H.; Liu, G.-C.; Wang, J.; Ou, T.-M.; Gu, L.-Q.; Huang, Z.-S.; Tan, J.-H., Visualization of NRAS RNA GQuadruplex Structures in Cells with an Engineered Fluorogenic Hybridization Probe. J. Am. Chem. Soc. 2016, 138 (33), 10382-5. (49) Rodriguez, R.; Muller, S.; Yeoman, J. A.; Trentesaux, C.; Riou, J. F.; Balasubramanian, S., A novel small molecule that alters shelterin integrity and triggers a DNA-damage response at telomeres. J. Am. Chem. Soc. 2008, 130 (47), 15758-9.

ACS Paragon Plus Environment

Journal of the American Chemical Society 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

TOC

ACS Paragon Plus Environment

Page 12 of 20

Page 13 of 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of the American Chemical Society

Figure 1. Filter-binding assay of peptide 12 with different DNA structures. (A) Filter binding dots of peptide 12 with G-quadruplex (Htg22), single-stranded (mHtg22) and hairpin (Hp18) DNA; (B) quantification of DNA on filter binding dots. 84x34mm (300 x 300 DPI)

ACS Paragon Plus Environment

Journal of the American Chemical Society 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 2. Chemical shift perturbation analysis of G-quadruplexes with peptide 12. (A) The imino proton regions of the 1H-NMR spectra of Htg25 either alone (bottom) or with peptide 12 at a ratio of 1:2 (top). (B) The imino proton regions of the 1D 1H-NMR spectra of Pu22 either alone (bottom) or with peptide 12 at a ratio of 1:2 (top). The assays were performed in 25 mM KH2PO4 buffer (70 mM KCl, 10% D2O, pH 7.4) using 400 MHz Bruker spectrometers at 25 °C. 84x49mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 14 of 20

Page 15 of 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of the American Chemical Society

Figure 3. Fluorescence titrations of 1 μM 2-AP-labeled G-quadruplexes with stepwise addition of peptide 12 in 10 mM Tris-HCl buffer, 100 mM KCl, pH 7.4. (A) Plot of normalized fluorescence intensity at 375 nm of Htg22 labeled with individual 2-Ap versus the binding ratio of [peptide 12]/[Htg22]. (B) Plot of normalized fluorescence intensity at 375 nm of Pu22 labeled with individual 2-Ap versus the binding ratio of [peptide 12]/[Pu22]. The 2-Ap-labeled sites in the G-quadruplexes are shown in the chart. 84x31mm (300 x 300 DPI)

ACS Paragon Plus Environment

Journal of the American Chemical Society 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 4. CIRBP binds to the G-quadruplex. (A) Filter binding dots of CIRBP with G-quadruplex (Htg22), single-stranded (mHtg22) and hairpin (Hp18) DNA; (B) Quantification of DNA on filter binding dots. (C) EMSA binding bands of CIRBP with G-quadruplex (Htg22), single-stranded (mHtg22) and hairpin (Hp18) DNA. 84x51mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 16 of 20

Page 17 of 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of the American Chemical Society

Figure 5. (A) Alterations in the RGG motif of CIRBP. (B) Filter-binding assays of RGG motif deletion (ΔRGG) or mutation (mRGG1 and mRGG2) constructs with telomeric G-quadruplex Htg22 DNA. (C) Quantification of DNA on filter binding dots. 84x58mm (300 x 300 DPI)

ACS Paragon Plus Environment

Journal of the American Chemical Society 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 6. (A) BLI competition assay of 50 nM CIRBP binding with Htg22 upon various DNAs. (B) Plot of normalized fluorescence intensity at 375 nm of Htg22 labeled with individual 2-Ap versus the binding ratio of [CIRBP]/[Htg22].

ACS Paragon Plus Environment

Page 18 of 20

Page 19 of 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of the American Chemical Society

Figure 7. Immunofluorescence image of GFP-tagged CIRBP (green) and G-quadruplex antibody BG4 or D1 (red) in HeLa cells after RNase A treatment. The nucleus was stained with DAPI (blue). The colocalized foci are indicated by white arrows and showed in enlarged image in the yellow box. 84x52mm (300 x 300 DPI)

ACS Paragon Plus Environment

Journal of the American Chemical Society 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

82x44mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 20 of 20