Dynamic Nature of CTCF Tandem 11 Zinc Fingers ... - ACS Publications

Jul 2, 2018 - Although CTCF inter-ZF plasticity is essential for the recognition of multiple genomic sites, the dynamic nature of its 11 ZFs remains u...
3 downloads 0 Views 2MB Size
Subscriber access provided by - Access paid by the | UCSB Libraries

Biophysical Chemistry, Biomolecules, and Biomaterials; Surfactants and Membranes

Dynamic Nature of CTCF Tandem 11 Zinc Fingers in Multivalent Recognition of DNA as Revealed by NMR Spectroscopy Difei Xu, Rongsheng Ma, Jiahai Zhang, Zhijun Liu, Bo Wu, Junhui Peng, Yanan Zhai, Qingguo Gong, Yunyu Shi, Jihui Wu, Qiang Wu, Zhiyong Zhang, and Ke Ruan J. Phys. Chem. Lett., Just Accepted Manuscript • DOI: 10.1021/acs.jpclett.8b01440 • Publication Date (Web): 02 Jul 2018 Downloaded from http://pubs.acs.org on July 3, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry Letters

Dynamic Nature of CTCF Tandem 11 Zinc Fingers in Multivalent Recognition of DNA as Revealed by NMR Spectroscopy Difei Xu,† Rongsheng Ma,† Jiahai Zhang,† Zhijun Liu,‡ Bo Wu,# Junhui Peng,† Yanan Zhai,§ Qingguo Gong,† Yunyu Shi,† Jihui Wu,† Qiang Wu,§ Zhiyong Zhang,*,† Ke Ruan*,† †

Hefei National Laboratory for Physical Sciences at the Microscale, School of Life Sciences,

University of Science and Technology of China, Hefei, Anhui 230027, P. R. China. ‡

National Facility for Protein Science in Shanghai, ZhangJiang Lab, Shanghai Advanced

Research Institute, Chinese Academy of Sciences. Shanghai 201210, P. R. China. §

Center for Comparative Biomedicine, MOE Key Laboratory of Systems Biomedicine, Institute

of Systems Biomedicine, Collaborative Innovative Center of Systems Biomedicine, SCSB, State Key Laboratory of On-cogenes and Related Genes, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, P. R. China. #

High Magnetic Field Laboratory, Hefei Institutes of Physical Science, Chinese Academy of

Sciences, Hefei, Anhui 230031, P. R. China. Corresponding Author *Email: [email protected] (Z.Z.). *Email: [email protected] (K.R.)

ACS Paragon Plus Environment

1

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 25

ABSTRACT

The 11 zinc fingers (ZFs) of the transcription factor CTCF play a versatile role in the regulation of gene expression. CTCF binds to numerous genomic sites to form chromatin loops and topologically associated domains, and thus mediates the 3D architecture of chromatin. Although CTCF inter-ZF plasticity is essential for the recognition of multiple genomic sites, the dynamic nature of its 11 ZFs remains unknown. Here, we assigned the chemical shifts of the CTCF ZFs 1-11 and solved the solution structures of each ZF. NMR backbone dynamics, residual dipolar couplings and small-angle X-ray scattering experiments suggest a high inter-ZF plasticity of the free-form ZFs 1-11. As exemplified by two different protocadherin DNA sequences, titration of DNAs to

15

N-labeled CTCF ZFs 1-11 enabled systematic mapping of

binding of CTCF ZFs to various chromatin sites. Our work paves the way for illustrating the molecular basis of the versatile DNA recognized by CTCF and has interesting implications for its conformational transition during DNA binding.

TOC GRAPHICS

KEYWORDS: CTCF, DNA recognition, NMR spectroscopy, Zinc finger structure, Protein dynamics.

ACS Paragon Plus Environment

2

Page 3 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry Letters

Transcriptional repressor CCCTC-binding factor (CTCF) is widely involved in many cellular processes, e.g., insulator-binding activity, regulation of chromatin architecture and transcription14

. CTCF was initially found to suppress the transcription of the c-myc gene5. CTCF regulates

gene expression through the higher-order three-dimensional chromatin structure, where chromatin loops form upon the binding of CTCF to distant double-stranded DNA (dsDNAs) sites6-8. CTCF binds to insulators to block the interaction between enhancers and promoters4. In mammals, CTCF binds to more than 80,000 chromatin sites via diverse combinations of its 11 zinc fingers (ZFs) as a “multivalent protein”9. Although multifunctional roles that are often contradictory have been assigned to CTCF, a unifying model is now emerging, in which CTCF anchors sliding cohesin complex, in chromosome fibers to facilitate long-distance chromatin interactions between various transcription regulatory elements10-14. That is to say, CTCF and cohesion organize mammalian genomes into chromatin loops and topologically associated domains13, 15-18. Thus, CTCF enables the organization of genome three-dimensional architecture by binding specific sequences for gene regulation and recombination19-23. To further our understanding of the versatility of CTCF ZFs to recognize such a huge number of chromatin sites, structural studies of these ZFs were initiated a dozen years ago. The free-form structures of CTCF ZFs 6-7 and 10-11 were solved by NMR spectroscopy (PDB code: 1X6H and 2CT1) in 2005. Likely due to the intrinsic inter-ZF flexibility of these ZFs, crystallization has remained a challenge till the most recent breakthrough in the crystal structures of CTCF fragments of ZF2-7 and ZF4-9, in complex with their binding DNA sequences24-25. These studies provide a structural basis for the diversity and flexibility of CTCF binding to cognate sites. The ZF3-7 of CTCF binds to the major groove of a 15-bp core sequence with high affinity while tolerating high sequence variability. The crystal structures of CTCF ZFs in complex with the

ACS Paragon Plus Environment

3

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 25

CTCF-binding sites (CBSs) of the protocadherin (Pcdh) gene clusters reveal that ZF3, ZFs 4-7 and ZFs 9-11 directionally insert into the major groove along CBS modules, while ZF8 works as a spacer between distinct CBS modules25. Despite the excellent agreement within each individual CTCF ZF structure, the inconsistent inter-ZF spatial arrangement between these studies raises an interesting question regarding the intrinsic dynamic nature of these 11 ZFs, which can be probed effectively using NMR spectroscopy. Here, we assigned the chemical shifts of the CTCF ZFs 1-11 (residues 266-589), which poses an unparalleled challenge for NMR analysis because it is a large, multi-domain protein. NMR studies of poly-ZF proteins to date contain at most four ZFs. This full-length CTCF ZFs 1-11 was used throughout the work to investigate the structure and dynamics of these ZFs as a whole. We first applied various non-uniformly sampled (NUS) 4D experiments to lift the chemical shift degeneracy due to the large molecular weight and repetitive sequence motifs of CTCF ZFs. The solution structure of each individual ZF was thus solved with good agreement with the corresponding crystal structures. We further examined the inter-ZF dynamics using NMR backbone dynamics, residual dipolar couplings (RDCs) and small-angle X-ray scattering (SAXS) experiments. As exemplified in the human Pcdh CBSs of the promoter CSE and Pb1 (similar to human enhancer HS5-1b)24, 26, our solution NMR studies provide a platform for the systematic mapping of CTCF ZF binding sites to various DNA sequences, which shed light on molecular mechanisms of the multivalent recognition of DNA by CTCF ZFs 1-11. Backbone chemical shift assignment is the first, critical step towards studies of protein structure and dynamics. Given the high sequence similarity and thus high chemical shift degeneracy among the CTCF 11 ZFs (Fig. 1A), e.g., ZF4 shares 58% sequence identity and 83% similarity with ZF5, we acquired a series of NMR 3D experiments of perdeuterated [13C,

15

N]

ACS Paragon Plus Environment

4

Page 5 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry Letters

labeled protein to obtain the inter- and intra-residue chemical shifts with respect to the amide HN and N. As exemplified in residues YASV in ZF4 and YASR in ZF5, the sequential connectivity between Cα, Cβ and amide HN, N were retrieved from the transverse relaxation-optimized spectroscopy (TROSY)27-28 version spectra of HN(CO)CBCA, HNCBCA, HN(CO)CA, HNCA, and HNCB29 (Fig. S1A). To lift the ambiguity in chemical shift assignment, we acquired the TROSY version of HNCO and HN(CA)CO spectra (Fig. S1B) to retrieve the CO chemical shifts, and the NUS version30 of 3D HA(CACO)NH, HA(CA)NH to retrieve the Hα chemical shifts (Fig. S1C). The intra-residue chemical shifts of Cα and Hα bridge the side-chain spectra of the NUS version of 3D HCcH-TOCSY, CCH-TOCSY (Fig. S1D) and 4D HC(CCO)NH (Fig. 1B), HCCH-TOCSY spectra. For example, these side-chain spectra identified the corresponding spin systems of V361 from the two Cγ1 and Cγ2 signals, which were distinguishable in the 4D sidechain spectrum only. These multidimensional heteronuclear spectra facilitated the mapping of the chemical shift trains to the 1D amino acid sequence. Even for such a challenging large, multidomain protein, we finally assigned 298 out of 329 resides (including 12 prolines) at a completion rate of 94% (Fig. 1C).

ACS Paragon Plus Environment

5

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 25

Figure 1. Chemical shift assignment of multidomain CTCF ZFs 1-11. (A) Sequence alignment of 11 tandem ZFs of CTCF. Residues chelating with zinc ions were highlighted in red. Secondary structural motifs are depicted on top of the sequences. (B) Projection plane of V361 in NUS-version of 4D HC(CCO)NH spectrum. (C) The 2D 1H-15N HSQC spectrum of CTCF ZFs 1-11 with almost complete assignment of backbone chemical shifts.

ACS Paragon Plus Environment

6

Page 7 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry Letters

It remains difficult to assign the NOE (Nuclear Overhauser Effect) cross peaks due to the high 1

H chemical shift degeneracy. The computer-assisted NOE identification was of little help for

this complex system. We hence applied the "divide-and-conquer" strategy to resolve the intradomain NOE signals. The NOE cross peaks were pooled into 11 subsets corresponding to the CTCF 11 ZFs, and these peaks were assigned and adjusted either manually or assisted by the Sparky31 and CARA32 programs until the structures of each individual ZF converged (Table S1). The chemical shifts of N, HN, Cα, Cβ, CO, Hα were introduced to generate the dihedral angular restraints for the structure determination of each ZF. The solution structures of the ten C2H2-type ZFs and the 11th atypical CCHC ZF reveal the canonical “β-β-α” conformation33 that sandwiches the Zn ion between the two antiparallel β-sheet and the α-helix (Fig. 2). The Zn ion chelates two Sγ atoms (Cysm and Cysm+3) and two Nε atoms (Hisn and Hisn+4) to form a tetrahedron structure.

ACS Paragon Plus Environment

7

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 25

Figure 2. Solution structure of each individual ZF of CTCF. Depicted are the cartoon view of the lowest-energy conformation (left) and the superimposed 20 lowest-energy conformations. Four conserved amino acids chelating with Zn2+ ions (gray spheres) were illustrated in sticks.

During our NMR studies of the CTCF ZFs 1-11, two research groups published the highresolution crystal structures of three to seven tandem ZFs of CTCF in complex with dsDNAs2425

. The superimposed structures reveal that our solution structures for each individual ZF agree

with those retrieved from the complex structures (Fig. S2). The backbone RMSDs of the ZFs

ACS Paragon Plus Environment

8

Page 9 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry Letters

solved by NMR spectroscopy and crystallography varied from 0.52 to 2.01 Å (Table S2), indicating a limited magnitude of intra-domain dynamics. We further interrogated the backbone dynamics of the CTCF 11 ZFs using 15N T1, T2 and 1H15

N heteronuclear steady-state NOE (Fig. S3A). In general, the averaged T1 value of 0.75 s and

T2 value of 0.15 s correspond to a correlation time τc of 6 ns under the assumption of a globular structure, far below the value estimated from the apparent molecular weight of 38.4 kDa. This suggests that the overall 11 ZFs are quite flexible, which is consistent with the lack of inter-ZF NOE restraints. In addition, there are some differences between ZFs when analyzing the T1/T2/NOE data of individual ZFs. For example, ZF6 shows relatively higher T2 of 0.18 ± 0.08 s than ZF7 (0.10 ± 0.04 s) and ZF8 (0.10 ± 0.03 s). Low values of the 1H-15N heteronuclear NOEs were observed for residues in the inter-ZF linkers, e.g., residues 292, 320, 376, 405, 434, 462, 465, and 521, which indicates a high flexibility of the inter-ZF linkers. Spectral density function analysis also suggests that intermediate motions vary among the 11 ZFs (Fig. S3B). These data indicate that some ZFs may transiently interact with each other to modulate the interZF dynamics. To better characterize the inter-ZF spatial arrangement of the CTCF 11 ZFs, we determined the 1

H-15N RDCs that describe the time-averaged orientation of each 1H-15N internuclear vector

relative to a fixed molecular frame. RDCs have gained extensive applications in the delineation of domain-domain orientations of multi-domain proteins34-35. RDCs are the differences in coupling constants measured in anisotropic and isotropic media. Due to the rapid transverse relaxation rate of the anti-TROSY component, traditional IPAP-HSQC is practically limited up to around 20 kDa proteins. We hence acquired the HSQC and TROSY-HSQC spectra of the 15N labeled CTCF ZFs 1-11 in the 3.5% C12E5 alignment36 media (Fig. S4A) and isotropic buffer

ACS Paragon Plus Environment

9

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 25

(Fig. S4B), respectively. The experimental RDCs were consistent with those back-calculated for a single ZF retrieved from either X-ray or NMR structures using the PALES program37, as exemplified in ZF6 or ZF7 (Fig. 3A). The Q-value38 was here used as a quantitative assessment of the structural agreement with respect to the experimental RDCs, where a lower Q-value indicates a better structural consistency. The Q-values of CTCF ZF6 were 0.27 and 0.32 when best-fitting to the complex crystal structure (PDB code 5KKQ for ZFs 3-7) and NMR structure (PDB code 2CT1 for free-form ZFs 6-7), respectively. Similar RDC results were observed for CTCF ZF7. This analysis further underpins that individual ZFs are well structured in either free or bound forms. Interestingly, the spatial arrangement between ZFs or even for tandem ZFs (Fig. S5) was distinct in the crystal structures in complex with different dsDNAs (PDB code: 5KKQ and 5YEG)24-25. More remarkable inter-ZF plasticity was observed between the X-ray structures and the solution apo-form structure (PDB code: 2CT1). We hence assess the inter-ZF plasticity of the free-form CTCF ZFs 1-11 using the RDC dataset as an indicator of time-averaged conformations. Experimental RDCs did not agree with the tandem ZFs 6-7 structures retrieved from the known structures, as demonstrated by the poor correlation (Q-values above 0.8) between experimental and back-calculated RDCs (Fig. 3B). Similar RDC analysis was illustrated for CTCF ZFs 10-11 (Fig. 3C and 3D). Our RDC result suggests the free-form CTCF ZFs 1-11 may adopt multiple inter-ZF orientations due to the flexibility of the inter-ZF linkers, hence CTCF is empowered to recognize numerous DNA sequences.

ACS Paragon Plus Environment

10

Page 11 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry Letters

Figure 3. Correlation between experimental RDC and those back-calculated from different CTCF structures. (A) Correlation between experimental RDCs and those back-calculated using a single domain (ZF6 or ZF7) derived from crystal (black squares) or solution (red circles) structures. Annotated are the Q-values of ZF6/ZF7. (B) RDC correlation plots of tandem two ZFs (ZFs 6-7). (C) RDC correlation plots of a single ZF10 or ZF11 retrieved from either crystal (black square) or solution (red circle) structures. Annotated are the Q-values of ZF10/ZF11. (D) RDC correlation plots of CTCF ZFs 10-11.

Small-angle X-ray scattering (SAXS) experiments were then performed to determine the overall contour of CTCF ZFs 1-11 (Figure 4). The SAXS profiles may provide some low-

ACS Paragon Plus Environment

11

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 25

resolution but overall structural information. The molecular weight of CTCF ZFs 1-11 estimated from the SAXS data was 38.3 kDa, while the actual molecular weight was 38.4 kDa, indicating that CTCF ZFs 1-11 exists in monomer form in solution. The Rg value of the free CTCF ZFs 111 was estimated to be 41.2 ± 0.4 Å and the Dmax value was 160.0 Å, whereas those of the protocadherin enhancer CBS HS5-1b core-CTCF-ZFs1-11 complex were 41.3 ± 0.2 Å and 130.6 Å (Fig. 4A). There was a platform at approximately 3 of qRg in the Krakty plot of the free CTCF ZFs 1-11 (Fig. 4A), which indicates a relatively extensible and flexible structure. However, the Krakty plot of the complex suggests that the protein may become ordered after binding with DNA. We also built dummy residue models of the form CTCF ZFs 1-11 based on the SAXS data and found that the protein adopts various conformations in solution (Fig. 4B). In comparison, the models of the bound-form CTCF were much more compact and converged (Fig. 4C). These results together showed that the entire structure of CTCF ZFs 1-11 is quite flexible in the free form and prefers some conformations when bound to certain genomic DNA sites, which helps the elucidation of the versatile roles of CTCF in the regulation of many cellular processes.

ACS Paragon Plus Environment

12

Page 13 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry Letters

Figure 4. Determination of overall contour of CTCF ZFs 1-11 using SAXS. A) Left: Experimental SAXS profiles, with data points up to q=0.3 A-1. Middle: Pair distance distribution functions (PDDF). Right: Kratky plots. Black curve and red curve represent the free-CTCFZFs1-11 and the HS5-1b-CTCF-ZFs1-11 complex, respectively. B) Final shape model (top) of the free CTCF-ZFs1-11 with 10 dummy residue models in different colors (bottom). Images of all the SAXS models were generated by VMD39. C) Final shape model (top) of the HS5-1bCTCF-ZFs1-11 complex with 10 dummy residue models in different colors (bottom).

The availability of the chemical shift assignments of the CTCF ZFs 1-11 provides a platform to map the binding sites for various dsDNAs. The binding topology was validated first using a known oligonucleotide Pb1 (Fig. 5A) titrated to

15

N labeled CTCF ZFs 1-11. The 18-bp Pb1

ACS Paragon Plus Environment

13

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

binds CTCF ZFs 5-8 (PDB code: 5K5I) at a high affinity of approximately 30 nM

Page 14 of 25

24

, and this

suggests that the interfacial residues may undergo NMR intermediate to slow exchange upon binding of Pb1. The motion of the bound ZFs should also be highly restricted, which may lead to fast relaxation of these bound-state residues. As expected, signal attenuations were observed for residues of CTCF ZFs 5-8 (Fig. 5B). We hence interrogated the binding topology of the Pcdhγa10 promoter CSE DNA (Fig. 5A) on CTCF ZFs. The Pcdhγa genes are a subtype of members of the Pcdhγ gene cluster, which generates various γ mRNAs and diverse γ proteins by a combination of stochastic promoter choice and pre-mRNA alternative processing to maintain single-cell diversity for neuronal identity and self-avoidance in the brain26. The 42-bp Pcdhγa10 promoter CSE binds to CTCF ZFs 1-11 at an affinity of 270 nM, as determined by fluorescence polarization (Fig. S6). To dissect which fragment of Pcdhγa10 CSE binds to which ZFs of CTCF, we evaluated the intensity profile of CTCF ZFs 1-11 upon titration of the a3 fragment of Pcdhγa10 CSE DNA (Fig. 5C). Addition of the a3 fragment significantly attenuated the intensities of residues of ZFs 1-4, which were partially recovered upon salt titration (Fig. 5C). It suggests a gain in flexibility for these residues as the high ion strength weakens the protein-DNA interaction, even though high salt concentration normally leads to a sensitivity penalty. Taken together, the intensity pattern demonstrates that this a3 fragment binds to ZFs 1-4. More systematic studies regarding the DNA and CTCF ZF interactions shall be followed to eventually elucidate the molecular mechanism underlying the versatile recognition of chromatin sites by the CTCF ZFs.

ACS Paragon Plus Environment

14

Page 15 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry Letters

Figure 5. Binding topology of different dsDNAs to CTCF ZFs 1-11. (A) DNA sequence of a well characterized Pb1 oligonucleotide (PDB code: 5K5I), and Pcdhγa10 promoter CSE DNA (a3 fragment underlined) whose binding site to CTCF remains unknown. (B) Residue-by-residue intensity profile of CTCF ZFs 1-11 upon binding to Pb1 at a DNA-protein molar ratio of 1.5. Signal intensities are normalized relative to the free-form protein. (C) Intensity profiles of CTCF ZFs 1-11 upon binding to the a3 fragment of Pcdhγa10 CSE.

In summary, we investigated the structure and dynamics of the full-length CTCF ZFs 1-11. Based on this, we speculate that CTCF may recognize multiple chromatin sites through the

ACS Paragon Plus Environment

15

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 25

following molecular mechanism (Fig. 6). First, individual ZFs are well structured but with flexible linker regions in the free form, as has been demonstrated in the NMR solution structures and backbone dynamics. This work was technically challenging and requires NUS sampling to alleviate the signal degeneracy caused by the large molecular weight and high sequence identity. The ZF may adopt multiple orientations relative to the neighboring ZFs, as the inter-ZF spatial arrangement in the free-form ZFs 1-11 cannot be ascribed to a single conformation in the previously solved structures based on our NMR RDC datasets. Second, our CTCF-DNA interaction analysis indicates that CTCF employs various combinations of ZFs to recognize different DNA sequences. The plasticity of the ZFs directly interacting with DNA is restricted upon DNA binding. The other ZFs remain flexible, which may protrude out for further DNA binding. The flexibility of the inter-ZF linkers facilitates CTCF adopt various conformations depending on the DNA sequences. The assignment of the free-form CTCF ZFs 1-11 in deed provides a platform to systematically interrogate its interaction with a series of DNA sequences. Finally, facilitated by cofactors such as cohesin, CTCF ZFs form DNA loops at various chromatin loci40-43. This looping pulls sequentially distant DNA elements close in space for the regulation of gene transcription. Therefore, the detailed structure and dynamics studies of the 11 zinc fingers of CTCF provide important biophysical insights into its versatile regulatory functions of gene transcription.

ACS Paragon Plus Environment

16

Page 17 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry Letters

Figure 6. Schematic illustration of the multivalent recognition of numerous DNAs at different chromatin loci by CTCF ZFs. Individual ZFs are well structured but with flexible linker regions to adopt multiple orientations relative to the neighboring ZFs, which facilitates CTCF to recognize different DNA sequences using an artful combinations of ZFs. The plasticity of the ZFs directly interacting with DNA is restricted upon DNA binding. The other ZFs remain flexible, which may protrude out for further DNA binding. The flexibility of the inter-ZF linkers facilitates CTCF adopt various conformations depending on the DNA sequences. Together with cofactors such as cohesin, CTCF ZFs form DNA loops at various chromatin loci for the versatile regulation of gene transcription.

ACS Paragon Plus Environment

17

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 25

ASSOCIATED CONTENT Supporting Information. The following files are available free of charge: Details regarding materials and methods and supporting tables and figures (PDF) AUTHOR INFORMATION Corresponding Author *Email: [email protected] (Z.Z.) *Email: [email protected] (K.R.) Notes The authors declare no competing financial interests. ACKNOWLEDGMENT Part of our NMR work was performed at the National Facility for Protein Sciences Shanghai and the High Magnetic Field Laboratory, Chinese Academy of Sciences. This work was financially supported by grants from the Strategic Priority Research Program of the Chinese Academy of Sciences (XDB08030302, XDB08010101, XDPB10, XDA12020355), the Ministry of Science and Technology of China (2016YFA0500700 and 2014CB910604), the National Natural Science Foundation of China (U1632153, 21573205, 31630039), and the Fundamental Research Funds for the Central Universities (WK2070080002, WK2060190086).

ACS Paragon Plus Environment

18

Page 19 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry Letters

REFERENCES (1)

Holwerda, S. J.; de Laat, W., CTCF: the Protein, the Binding Partners, the Binding Sites

and Their Chromatin Loops. Philos. Trans. R. Soc. Lond. B. Biol. Sci. 2013, 368 (1620), 20120369. (2)

Tsai, Y. C.; Cooke, N. E.; Liebhaber, S. A., Tissue Specific CTCF Occupancy and

Boundary Function at the Human Growth Hormone Locus. Nucleic. Acids. Res. 2014, 42 (8), 4906-21. (3)

Lu, Y.; Shan, G.; Xue, J.; Chen, C.; Zhang, C., Defining the Multivalent Functions of

CTCF from Chromatin State and Three-Dimensional Chromatin Interactions. Nucleic. Acids. Res. 2016. (4)

Herold, M.; Bartkuhn, M.; Renkawitz, R., CTCF: Insights into Insulator Function during

Development. Development 2012, 139 (6), 1045-57. (5)

Klenova, E. M.; Nicolas, R. H.; Paterson, H. F.; Carne, A. F.; Heath, C. M.; Goodwin, G.

H.; Neiman, P. E.; Lobanenkov, V. V., CTCF, a Conserved Nuclear Factor Required for Optimal Transcriptional Activity of the Chicken C-Myc Gene, is an 11-Zn-Finger Protein Differentially Expressed in Multiple Forms. Mol. Cell. Biol. 1993, 13 (12), 7612-7624. (6)

de Wit, E.; Vos, E. S.; Holwerda, S. J.; Valdes-Quezada, C.; Verstegen, M. J.; Teunissen,

H.; Splinter, E.; Wijchers, P. J.; Krijger, P. H.; de Laat, W., CTCF Binding Polarity Determines Chromatin Looping. Mol. Cell. 2015, 60 (4), 676-84. (7)

Nichols, M. H.; Corces, V. G., A CTCF Code for 3D Genome Architecture. Cell 2015,

162 (4), 703-5.

ACS Paragon Plus Environment

19

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(8)

Page 20 of 25

Sandhu, K. S.; Li, G.; Poh, H. M.; Quek, Y. L.; Sia, Y. Y.; Peh, S. Q.; Mulawadi, F. H.;

Lim, J.; Sikic, M.; Menghi, F.; Thalamuthu, A.; Sung, W. K.; Ruan, X.; Fullwood, M. J.; Liu, E.; Csermely, P.; Ruan, Y., Large-Scale Functional Organization of Long-Range Chromatin Interaction Networks. Cell Rep. 2012, 2 (5), 1207-19. (9)

Schmidt, D.; Schwalie, P. C.; Wilson, M. D.; Ballester, B.; Goncalves, A.; Kutter, C.;

Brown, G. D.; Marshall, A.; Flicek, P.; Odom, D. T., Waves of Retrotransposon Expansion Remodel Genome Organization and CTCF Binding in Multiple Mammalian Lineages. Cell 2012, 148 (1-2), 335-48. (10) Dixon, J. R.; Selvaraj, S.; Yue, F.; Kim, A.; Li, Y.; Shen, Y.; Hu, M.; Liu, J. S.; Ren, B., Topological Domains in Mammalian Genomes Identified by Analysis of Chromatin Interactions. Nature 2012, 485 (7398), 376-80. (11) Sexton, T.; Yaffe, E.; Kenigsberg, E.; Bantignies, F.; Leblanc, B.; Hoichman, M.; Parrinello, H.; Tanay, A.; Cavalli, G., Three-Dimensional Folding and Functional Organization Principles of the Drosophila Genome. Cell 2012, 148 (3), 458-72. (12) Guo, Y.; Xu, Q.; Canzio, D.; Shou, J.; Li, J.; Gorkin, David U.; Jung, I.; Wu, H.; Zhai, Y.; Tang, Y.; Lu, Y.; Wu, Y.; Jia, Z.; Li, W.; Zhang, Michael Q.; Ren, B.; Krainer, Adrian R.; Maniatis, T.; Wu, Q., CRISPR Inversion of CTCF Sites Alters Genome Topology and Enhancer/Promoter Function. Cell 2015, 162 (4), 900-910. (13) Busslinger, G. A.; Stocsits, R. R.; van der Lelij, P.; Axelsson, E.; Tedeschi, A.; Galjart, N.; Peters, J.-M., Cohesin is Positioned in Mammalian Genomes by Transcription, CTCF and Wapl. Nature 2017, 544 (7651), 503-507.

ACS Paragon Plus Environment

20

Page 21 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry Letters

(14) Hansen, A. S.; Pustova, I.; Cattoglio, C.; Tjian, R.; Darzacq, X., CTCF and Cohesin Regulate Chromatin Loop Stability with Distinct Dynamics. Elife 2017, 6, e25776. (15) Barutcu, A. R.; Maass, P. G.; Lewandowski, J. P.; Weiner, C. L.; Rinn, J. L., A TAD Boundary is Preserved upon Deletion of The CTCF-Rich Firre Locus. Nature commun. 2018, 9 (1), 1444. (16) Wutz, G.; Várnai, C.; Nagasaka, K.; Cisneros, D. A.; Stocsits, R. R.; Tang, W.; Schoenfelder, S.; Jessberger, G.; Muhar, M.; Hossain, M. J., Topologically Associating Domains and Chromatin Loops Depend on Cohesin and are Regulated by CTCF, WAPL, and PDS5 Proteins. EMBO J. 2017, 36 (24), 3573-3599. (17) Tang, Z.; Luo, O. J.; Li, X.; Zheng, M.; Zhu, J. J.; Szalaj, P.; Trzaskoma, P.; Magalska, A.; Wlodarczyk, J.; Ruszczycki, B., CTCF-Mediated Human 3D Genome Architecture Reveals Chromatin Topology for Transcription. Cell 2015, 163 (7), 1611-1627. (18) Hnisz, D.; Young, R. A., New Insights into Genome Structure: Genes of a Feather Stick together. Mol. cell 2017, 67 (5), 730-731. (19) Ong, C. T.; Corces, V. G., CTCF: an Architectural Protein Bridging Genome Topology and Function. Nat. Rev. Genet. 2014, 15 (4), 234-46. (20) Handoko, L.; Xu, H.; Li, G.; Ngan, C. Y.; Chew, E.; Schnapp, M.; Lee, C. W.; Ye, C.; Ping, J. L.; Mulawadi, F.; Wong, E.; Sheng, J.; Zhang, Y.; Poh, T.; Chan, C. S.; Kunarso, G.; Shahab, A.; Bourque, G.; Cacheux-Rataboul, V.; Sung, W. K.; Ruan, Y.; Wei, C. L., CTCFmediated Functional Chromatin Interactome in Pluripotent Cells. Nat. Genet. 2011, 43 (7), 6308.

ACS Paragon Plus Environment

21

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 25

(21) Nora, E. P.; Goloborodko, A.; Valton, A.-L.; Gibcus, J. H.; Uebersohn, A.; Abdennur, N.; Dekker, J.; Mirny, L. A.; Bruneau, B. G., Targeted Degradation of CTCF Decouples Local Insulation of Chromosome Domains from Genomic Compartmentalization. Cell 2017, 169 (5), 930-944. (22) Narendra, V.; Bulajić, M.; Dekker, J.; Mazzoni, E. O.; Reinberg, D., CTCF-mediated Topological Boundaries during Development Foster Appropriate Gene Regulation. Genes Dev. 2016, 30 (24), 2657-2662. (23) Hsu, S. C.; Gilgenast, T. G.; Bartman, C. R.; Edwards, C. R.; Stonestrom, A. J.; Huang, P.; Emerson, D. J.; Evans, P.; Werner, M. T.; Keller, C. A., The BET Protein BRD2 Cooperates with CTCF to Enforce Transcriptional and Architectural Boundaries. Mol. cell 2017, 66 (1), 102116. (24) Hashimoto, H.; Wang, D.; Horton, J. R.; Zhang, X.; Corces, V. G.; Cheng, X., Structural Basis for the Versatile and Methylation-Dependent Binding of CTCF to DNA. Mol. Cell 2017, 66 (5), 711-720. (25) Yin, M.; Wang, J.; Wang, M.; Li, X.; Zhang, M.; Wu, Q.; Wang, Y., Molecular Mechanism of Directional CTCF Recognition of a Diverse Range of Genomic Sites. Cell Res. 2017, 27 (11), 1365-1377. (26) Guo, Y.; Monahan, K.; Wu, H.; Gertz, J.; Varley, K. E.; Li, W.; Myers, R. M.; Maniatis, T.; Wu, Q., CTCF/Cohesin-mediated DNA Looping is Required for Protocadherin Alpha Promoter Choice. Proc. Natl. Acad. Sci. U. S. A. 2012, 109 (51), 21081-6.

ACS Paragon Plus Environment

22

Page 23 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry Letters

(27) Fiaux, J.; Bertelsen, E. B.; Horwich, A. L.; Wuthrich, K., NMR Analysis of a 900K Groel-Groes Complex. Nature 2002, 418 (6894), 207-211. (28) Pervushin, K.; Riek, R.; Wider, G.; Wuthrich, K., Attenuated T-2 Relaxation by Mutual Cancellation of Dipole-Dipole Coupling and Chemical Shift Anisotropy Indicates an Avenue to NMR Structures of very Large Biological Macromolecules in Solution. Proc. Natl. Acad. Sci. U. S. A. 1997, 94 (23), 12366-12371. (29) Salzmann, M.; Pervushin, K.; Wider, G.; Senn, H.; Wuthrich, K., TROSY in TripleResonance Experiments: New Perspectives for Sequential NMR Assignment of Large Proteins. Proc. Natl. Acad. Sci. U. S. A. 1998, 95 (23), 13585-13590. (30) Coggins, B. E.; Werner-Allen, J. W.; Yan, A.; Zhou, P., Rapid Protein Global Fold Determination using Ultrasparse Sampling, High-Dynamic Range Artifact Suppression, and Time-Shared NOESY. J. Am. Chem. Soc. 2012, 134 (45), 18619-18630. (31) Lee, W.; Tonelli, M.; Markley, J. L., NMRFAM-SPARKY: Enhanced Software for Biomolecular NMR Spectroscopy. Bioinformatics 2015, 31 (8), 1325-7. (32) Keller, R., The Computer Aided Resonance Assignment Tutorial. Book 2004. (33) Zhao, Y.; Zhang, G.; He, C.; Mei, Y.; Shi, Y.; Li, F., The 11th C2H2 Zinc Finger and an Adjacent C-Terminal Arm are Responsible for TZAP Recognition of Telomeric DNA. Cell. Res. 2018, 28 (1), 130-134. (34) Tolman, J. R.; Ruan, K., NMR Residual Dipolar Couplings as Probes of Biomolecular Dynamics. Chem. Rev. 2006, 106 (5), 1720-1736.

ACS Paragon Plus Environment

23

The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 25

(35) Ravera, E.; Salmon, L.; Fragai, M.; Parigi, G.; Al-Hashimi, H.; Luchinat, C., Insights into Domain-Domain Motions in Proteins and RNA from Solution NMR. Acc. Chem. Res. 2014, 47 (10), 3118-3126. (36) Ruckert, M.; Otting, G., Alignment of Biological Macromolecules in Novel Nonionic Liquid Crystalline Media for NMR Experiments. J. Am. Chem. Soc. 2000, 122 (32), 7793-7797. (37) Zweckstetter, M., NMR: Prediction of Molecular Alignment from Structure using The PALES Software. Nat. Protoc. 2008, 3 (4), 679-90. (38) Ruan, K.; Briggman, K. B.; Tolman, J. R., De Novo Determination of Internuclear Vector Orientations from Residual Dipolar Couplings Measured in Three Independent Alignment Media. J. Biomol. NMR 2008, 41 (2), 61-76. (39) Humphrey, W.; Dalke, A.; Schulten, K., VMD: Visual Molecular Dynamics. J. Mol. Graph. 1996, 14 (1), 33-8, 27-8. (40) Zuin, J.; Dixon, J. R.; van der Reijden, M. I.; Ye, Z.; Kolovos, P.; Brouwer, R. W.; van de Corput, M. P.; van de Werken, H. J.; Knoch, T. A.; van IJcken, W. F., Cohesin and CTCF Differentially Affect Chromatin Architecture and Gene Expression in Human Cells. Proc. Natl. Acad. Sci. U. S. A. 2014, 111 (3), 996-1001. (41) Katainen, R.; Dave, K.; Pitkänen, E.; Palin, K.; Kivioja, T.; Välimäki, N.; Gylfe, A. E.; Ristolainen, H.; Hänninen, U. A.; Cajuso, T., CTCF/Cohesin-binding Sites are Frequently Mutated in Cancer. Nat. genet. 2015, 47 (7), 818. (42) Merkenschlager, M.; Odom, Duncan T., CTCF and Cohesin: Linking Gene Regulatory Elements with Their Targets. Cell 2013, 152 (6), 1285-1297.

ACS Paragon Plus Environment

24

Page 25 of 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry Letters

(43) Yardımcı, G. G.; Noble, W. S., Predictive Model of 3D Domain Formation via CTCFmediated Extrusion. Proc. Natl. Acad. Sci. U. S. A. 2015, 112 (47), 14404-14405.

ACS Paragon Plus Environment

25