A Scalable Epitope Tagging Approach for High ... - ACS Publications

Feb 19, 2017 - Our CRISPR-MMEJ mediated tagging approach addresses two major .... (31) Motif analyses reported the E-box sequence for MYC binding with...
0 downloads 0 Views 2MB Size
Subscriber access provided by University of Newcastle, Australia

Article

A Scalable Epitope Tagging Approach for High Throughput ChIP-Seq Analysis Xiong Xiong, Yanxiao Zhang, Jian Yan, Surbhi Jain, Sora Chee, Bing Ren, and Huimin Zhao ACS Synth. Biol., Just Accepted Manuscript • DOI: 10.1021/acssynbio.6b00358 • Publication Date (Web): 19 Feb 2017 Downloaded from http://pubs.acs.org on February 22, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

ACS Synthetic Biology is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 28

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

Submitted to ACS Synthetic Biology

A Scalable Epitope Tagging Approach for High Throughput ChIP-Seq Analysis

Xiong Xiong1, Yanxiao Zhang2, Jian Yan2,6, Surbhi Jain3, Sora Chee4, Bing Ren2,4,* and Huimin Zhao1,5*

1

Department of Chemical and Biomolecular Engineering, University of Illinois at UrbanaChampaign, Urbana, IL 61801 2

3

4

Ludwig Institute for Cancer Research, 9500 Gilman Drive, La Jolla, CA 92093

Department of Biochemistry, University of Illinois at Urbana-Champaign, Urbana, IL 61801

Department of Cellular and Molecular Medicine, Institute of Genome Medicine, Moores Cancer Center, University of California San Diego, School of Medicine, San Diego, CA 92093 5

Departments of Chemistry and Bioengineering, Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801 6

Department of Biosciences and Nutrition, Karolinska Institutet, 141 83 Huddinge, Sweden

1 ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ABSTRACT Eukaryotic transcriptional factors (TFs) typically recognize short genomic sequences alone or together with other proteins to modulate gene expression. Mapping of TF-DNA interactions in the genome is crucial for understanding the gene regulatory programs in cells. While chromatin immunoprecipitation followed by sequencing (ChIP-Seq) is commonly used for this purpose, its application is severely limited by the availability of suitable antibodies for TFs. To overcome this limitation, we developed an efficient and scalable strategy named cmChIP-Seq that combines the clustered regularly interspaced short palindromic repeats (CRISPR) technology with microhomology mediated end joining (MMEJ) to genetically engineer a TF with an epitope tag. We demonstrated the utility of this tool by applying it to four TFs in a human colorectal cancer cell line. The highly scalable procedure makes this strategy ideal for ChIP-Seq analysis of TFs in diverse species and cell types.

KEYWORDS ChIP-Seq, Microhomology mediated end joining, CRISPR/Cas9, genome engineering, FLAG tagging

2 ACS Paragon Plus Environment

Page 2 of 28

Page 3 of 28

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

INTRODUCTION Genome-wide profiling of TF-DNA interactions is crucial for dissecting the transcriptional regulation networks that govern the spatiotemporal expression of genes in an organism1-3. Chromatin immunoprecipitation followed by sequencing (ChIP-Seq) is the most common method for this purpose4-6. However, this technique is limited by the availability of ChIP-grade antibodies against transcription factors7, 8. To address this bottleneck, bacterial artificial chromosome (BAC) vector has been employed to introduce epitope tags to transcription factors for ChIP-Seq experiments by highly specific antibodies against the epitopes9. However, genetic engineering of BACs can be time-consuming and laborious

10

. Alternatively, adeno-associated

virus (rAAV) has been employed as a delivery vehicle for a knock-in (KI) vector to realize the epitope tag insertion and overcome low recombination efficiency11, 12. More recently, a method was reported that directly modifies an endogenous locus via CRISPR/Cas9 mediated homologous recombination13. All these approaches suffer from low scalability due to the laborious, costly and time consuming procedures for assembly of BAC vectors or construction of long homologous recombination (HR) arms.

To circumvent these difficulties, we have developed a highly scalable approach for epitopetagging of transcription factors in cultured cells.

Specifically, we combined the clustered

regularly interspaced short palindromic repeats (CRISPRs)14, 15 technology with microhomology mediated end joining (MMEJ)16 to insert a 3×FLAG-tag with screening markers to the Cterminus of a target TF. Recently, microhomology has been used for predicting nuclease target sites that allow efficient gene disruption17. Suzuki and coworkers developed an MMEJ-assisted KI method that has been applied to a variety of organisms, ranging from cell lines such as

3 ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

HEK293T, HeLa, CHO-K1 to silkworm, zebrafish and frog18. The advantages of this method include the easiness of vector construction and decent efficiency in precise gene editing, which could reach 85% in certain organisms. In addition, compared with the HR mediated integration method, MMEJ-assisted KI was accompanied by improved colony-forming efficiency19.

Our CRISPR-MMEJ mediated tagging approach addresses two major bottlenecks in the current KI strategies. One bottleneck is the low efficiency of gene targeting, which necessitates laborious downstream genotyping verification of individual clones. Our method alleviates this problem by using drug selection or fluorescent screening of cell populations. The other bottleneck is the low throughput of the procedure, limited by the laborious homology arm construction. Compared with NHEJ, MMEJ provides an alternative cellular repair mechanism with more precise integration17, 18. Using this CRISPR-MMEJ mediated tagging approach, we tagged TFs including SP1, MYC, TCF7L2 and CTCF and used the resulting cells for successful ChIP-Seq experiments.

RESULTS AND DISCUSSION Design of the CRISPR-MMEJ mediated ChIP-Seq (cmChIP-Seq) method We assembled an all-in-one expression vector CRISPRexp containing multiple guide RNA cassettes and a Cas9 nuclease20.

We also constructed donor plasmids for the target

transcriptional factors. The Cas9 nuclease gene is driven by the CBh promoter (the chicken βactin short promoter) while three sgRNAs are driven by the hU6 promoter individually (Supplemental Table 1). CRISPR/Cas9 nuclease generates a double-strand break a few base pairs upstream from the stop codon in the last coding exon. We chose CRISPR target sites close to the stop codon in order to maintain integrity of the coding sequence. The engineered donor plasmid

4 ACS Paragon Plus Environment

Page 4 of 28

Page 5 of 28

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

MicroDonor contains the epitope tag followed by a P2A sequence, mNeonGreen, a T2A and puromycin, and the cassette is flanked with only 8 to 10 bp microhomologous arms matching the sequences upstream and downstream from the cleavage site on the genome (Fig. 1).

The

MicroDonor was constructed using Gibson assembly21 and linearized by CRISPR cleavage in vivo. After up to 7 days of selection in puromycin, drug-resistant clones were isolated and the integration events were verified through genotyping analyses.

Analyses for SP1, MYC and TCF7L2 binding sites in HCT116 cells As a proof of principle, we first applied the above strategy to transcription factor SP1 (specificity protein 1). SP1 is a well characterized TF and implicated essential in cell growth, differentiation, apoptosis and carcinogenesis22. It activates the transcription of a variety of cellular genes by binding putative GC-rich sites in the promoters. The designed CRISPR/Cas9 nuclease creates a double-strand break (DSB) 10 bp upstream from the stop codon of SP1’s last coding exon and linearizes the tagging cassette in the donor vector. A well-studied colorectal carcinoma (CRC) cell line HCT116 was chosen for this experiment because of availability of many TF ChIP-Seq data for this cell line23. Initially, single cell clones were isolated and verified through genotyping. To validate the success of MMEJ mediated integration, PCR and Sanger sequencing were performed to ensure that the epitope tag was integrated in-frame (Fig. 2A and 2B). By examining the 5’ junction at the DSB, we found that 10 clones showed in-frame integration, 70% of which matched exact integration sequence. Sample No. 8 showed bi-allelic insertion as no wild type product was detected (Fig. 2B). When mutations were detected, they were unlikely to cause difference in transcription because it was far from the C-terminus of the gene sequence of interest19.

5 ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

We selected four clones with precise integration at 5’ junction for ChIP-Seq analyses. The experiments were successful for all despite of variations of their 3’ junctions. Compared to the published ChIP-Seq data of this factor in CRC cell line HCT116 (Fig. S1A)23, ChIP-Seq analysis of the tagged SP1 protein using a monoclonal antibody against the FLAG tag clearly showed ChIP signal enrichment at the previously identified SP1 binding sites in these cells (Fig. S1B). SP1 was characterized to play an essential role in activating human Ek1 promoter and in regulating AXL promoter constitutively24, 25. Examples of our enriched peaks at the promoter regions of ETNK1 and AXL are given in Figure 3A and Figure S2A, and the read enrichment tracks share high similarities among the four monoclonal samples. Moreover, when we selected the top 500 peaks in each ChIP-Seq dataset for de novo motif discovery, the known SP1 binding motif sequence (GGGCGG) was recovered as the top hit (Fig. 3B). This result demonstrated that our method can be used to map TF binding sites as effectively as with antibodies against the protein itself.

To further demonstrate the generality of our cmChIP-Seq method, we tagged transcription factors TCF7L2 and MYC with 3× FLAG epitope tags with junctions checked (Figs. S3 and S4). We picked three clones for ChIP-Seq analysis for each TF using anti-FLAG monoclonal antibodies. In all cases, our ChIP-Seq dataset matched the existing results obtained with TF antibodies26,

27

and consistent ChIP-Seq enrichment signals were obtained that matched the

results of previous ChIP-Seq studies using antibodies against the TFs (Fig. S5A, S5B and S5C). Since there was no deposited MYC data of HCT116, we compared our results to that from BL14 cell line27. Examples of genome enrichment tracks revealed that TCF7L2 occupied promoter regions of UAP1L1 and CCND1 (Fig 4A, Fig S2B), consistent with maps of the TCF7L2 binding

6 ACS Paragon Plus Environment

Page 6 of 28

Page 7 of 28

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

sites in a prior study28. In the case of MYC tracks, we observed the enrichment at KAT2A and HIF1A sites, indicating MYC controls the expression of the corresponding genes (Fig. 4A, Fig. S2C)29, 30. The canonical binding motifs for TCF7L2 and MYC were also enriched at top 500 binding sites (Fig. 4B and 4C). TCF7L2 binds the regulatory element sequence (ACATCAAAGGGA) and MYC binds to the E-box sequence (CACGTG). Taken together, these results demonstrated that the tagging process was successful and the C-terminal tandem peptide add-on allowed effective epitope recognition and the downstream chromatin precipitation.

ChIP-Seq analyses of TF binding sites in pooled cell populations In the above genetic engineering experiments, the rate limiting step was clonal expansion and genotyping. Encouraged by the relatively high tagging efficiency, we asked if this step could be eliminated and the pooled cell population, instead of genotyped cell clones, could be directly used for ChIP-Seq analysis. As a proof of principle, we performed cell transfection with tagging vectors for SP1, and collected three biological replicates of the transfected cell population for ChIP-Seq analyses. A portion of each replicate was also characterized by fluorescence-activated cell sorting (FACS) and genotyping to determine the success rate of epitope tagging (Fig. 5A). A good portion of each pooled cell samples showed fluorescent positive (Fig. S6). For instance, about 58.1±3.4% cells were GFP positive for SP1 after antibiotic drug selection for 7 days. ChIP-Seq was performed on the remaining cell pools either before or after FACS. The ChIP-Seq experiments for sorted samples showed similar binding patterns at ETNK1 and AXL regions which are comparable to monoclonal samples described in prior (Fig. 5B, Fig. S2D). Additionally, the aggregated read enrichment at published SP1 peaks also indicated that the pooled cell samples performed equally well as the monoclonal samples (Fig. S1C and S1D).

7 ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Similarly, for each dataset, top 500 peaks were selected and MEME analysis for motif discovery was performed. Notably, in all pooled cell samples, core SP1 motif enrichment was observed (Fig. 5C and 5D). We also compared the signal profile of all SP1 epitope-tagged samples (including monoclonal and pooled cell samples) on 3000 bp regions surrounding the published Encyclopedia of DNA Elements (ENCODE) SP1 peaks23. The heatmap not only showed that our samples had recapitulated the ChIP-Seq enrichment in ENCODE peaks, but also demonstrated high consistency between all our samples (Fig. 5E). In particular, the pooled cell samples are almost identical to monoclonal samples, except for the difference in read depth. The fact that ChIP-Seq data from pooled cells is of comparable quality to that from the single clone suggests that pooled cell samples are adequate for specific signal enrichment over the non-specific signal and that the single clone isolation step can be simply skipped.

We also assessed the feasibility of our approach for additional TFs, including MYC and CTCF. Without FACS, biological replicates after puromycin treatment were directly used for ChIP-Seq analyses. Genotypes were confirmed through PCR (Fig. S7A and S7B). The DNA binding patterns were consistent with previous MYC monoclonal cell populations (Fig. 6A, Fig. S2E). CTCF had enrichment at BBC3 region where CTCF has been proved bound to in the CTCFcohesin complex format31. Motif analyses reported the E-box sequence for MYC binding with high concordance (Fig. 6B). The top motif discovered for CTCF binds sequence (CCACCAGGGGGCGC) for all three replicates (Fig. 6D). The CTCF consensus binding sequence is considered to contain CpG and can be subject to DNA methylation. When we

8 ACS Paragon Plus Environment

Page 8 of 28

Page 9 of 28

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

extracted the ChIP-Seq enrichment signal within the previously published peak lists, again we found highly enriched signals towards all peak centers for MYC and CTCF (Fig. 6C and 6E).

The ENCODE consortium has carried out ChIP-Seq studies to investigate transcription factor binding in mammalian cells23. However, only a small fraction of the known human TFs has been characterized so far. Here we demonstrated that our approach to tag TFs is a quick and simple solution to high throughput ChIP-Seq analysis. This new method takes advantage of the newly developed CRISPR/Cas9 system and combines it with microhomology-mediated end joining mechanism. It enables the modification of the endogenous TF within weeks without the tedious single colony selection and individual genotyping. These additional procedures are required for the traditional TF tagging methods and may take a few months for one TF. One of the key advantages of this CRISPR-MMEJ based KI strategy is the relative ease for donor vector construction within 3 days. We designed very short homologous arms in the primers for amplifying insert cassette, which can be easily cloned to the backbone via Gibson assembly. The strategy makes donor vector assembly more feasible and scalable compared with any HR approach which requires the addition of long homologous arms flanking with the insert cassette.

The tagged TFs could be pulled down precisely and efficiently with the anti-FLAG monoclonal antibody. The data generated by the new method displayed higher quality than most TF ChIPSeq data using the polyclonal antibody against the TF protein. Mapping our enrichment tracks to the ENCODE and other prior work, our datasets presented high consistency. In the SP1 examples, signals of the sorted pooled cells were more enriched than the non-sorted cells, which could be due to less interference from the wild type cells. Despite this, all non-sorted samples are

9 ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 28

adequate for chromatin precipitation experiments and we confirmed that by testing tagged MYC and CTCF. As all ChIP-Seq experiments could be performed with the same antibody, this also makes possible the quantitative comparison of ChIP-Seq data among different TFs. This strategy could be applied to more TFs, especially those without commercial ChIP grade antibodies.

Although successful tagging events and improved sequencing signal enrichments were observed, we noticed this strategy is still limited for certain TFs. To assess the integration efficiency, we estimated the number of colonies formed after drug selection (data not shown). We observed more survived colonies for MYC than the other three TFs. With higher expression levels in HCT116 cells, MYC maintains stronger native promoters and could generate better resistance in the presence of same drug concentration. As a matter of fact, successful insertion depends on both the efficiency of CRISPR-Cas9 cutting and the MMEJ repairing. We chose the insertion site closer to the stop codon to maintain genome integrity. The Cas9 system requires unique protospacer adjacent motif (PAM) for its recognition. Because of that, we are constrained to choose the site to introduce the DSBs. The gRNA sequence affects Cas9 binding efficiency and consequently, results in different efficiency in generating DSBs32. As MMEJ is the alternative repair pathway, it shares enzymes involved in classic DNA repair pathways33. The mechanism of MMEJ is under extensive study but still remains incomplete34,

35

. Therefore, the insertion

efficiency is expected to be varied because of the inconsistency in the CRISPR-Cas9 cleavage efficiency and short homology recombination rates. We have also tested potential CRISPR offtarget sites using online prediction software (http://crispr.mit.edu/)36 for SP1 samples. For monoclone #8 and pooled cell samples, no significant mutations were detected (Supplementary Table 2 & 3). In our approach, the CRISPR-Cas system relies on one gRNA to direct recognition

10 ACS Paragon Plus Environment

Page 11 of 28

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

and two for releasing the donor. Yamamoto and coworkers reported an upgraded donor version that only one generic gRNA is required to enable the fragmentation of donor vector18. By adapting to that, we can leverage the scalability of the method further with simpler CRISPR plasmid construction. The overall efficiency is expected to be higher because no trimming of the extra bases outside the microhomologies leads to distal MMEJ. Distal MMEJ is considered to happen at higher frequency than proximal MMEJ, which is our integration system based upon18.

In summary, we repurposed the CRISPR-Cas9 enabled MMEJ process to introduce an epitope tag to genes of interest and achieved the downstream chromatin precipitation using a same ChIPgrade FLAG antibody. Given a variety of tagging efficiencies, we advanced pooled samples to ChIP preparation straightly after quick genotyping confirmation. In all targets tested, similar enrichment patterns between isolated monoclonal samples and pooled cell samples were observed at binding sites discovered by previous ChIP-Seq studies. As a result, this new method reduces the time and labor needed and could enable mapping of less-characterized TFs via ChIPSeq analysis and open avenues for other efficient genome modifications.

METHODS AND MATERIALS Construction of CRISPRexp plasmids. The CRISPRexp plasmid was assembled using the Multiplex CRISPR/Cas9 Assembly System kit (Addgene; Kit 1000000055). Three gRNAexpressing cassettes were incorporated into a single plasmid using Golden Gate assembly21. Oligonucleotides for gRNA templates were synthesized and annealed into corresponding intermediate vectors. The oligonucleotides used are listed in Supplementary Table 4.

11 ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 28

Construction of donor vectors. The donor vectors were constructed using PCR and Gibson Assembly Cloning kit (New England Biolabs, Ipswich, MA). Short homology arms have been designed and included in the primers. The original donor backbone was a gift from Dr. Ken-ichi T. Suzuki from Hiroshima University, Hiroshima, Japan.

Cell culture and transfection. HCT116 cells were routinely maintained in the McCoy’s 5A medium (ATCC, Manassas, VA) supplemented with 10% Fetal Bovine Serum (FBS; Hyclone, Logan, UT). Cells were seeded in 100-mm dish at a density of 1×106. After 24 h, cells were transfected with 6.66 µg CRISPRexp plasmids and 3.33 µg donor vectors using FuGene HD transfection reagent (Promega, Madison, WI) under conditions specified by the manufacturer. After transfection, cells were cultured with transfection reagent for 24 hours and cultured in growth medium described above for additional 2 days. Puromycin (0.5-1.0 µg/ml) selection was conditioned for 7 days and single clones were isolated using cloning cylinders (Sigma-Aldrich, St. Louis, MO). Only the SP1 pooled cells were used for FACS (Supplementary Methods and Materials).

Genomic PCR and DNA sequencing. The genomic DNA from cell pellets was extracted using QuickExtract DNA solution (Epicentre, Chicago, IL). Genomic PCR was performed using Herculase II Fusion DNA polymerase (Agilent Technologies, Santa Clara, CA) or Q5 HighFidelity DNA polymerase (New England BioLabs, Ipswich, MA) with primers listed in the Supplementary Table 4. The PCR products were subjected to direct DNA sequencing service (ACGT, Inc., Wheeling, IL & GENEWIZ, Cambridge, MA).

12 ACS Paragon Plus Environment

Page 13 of 28

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

ChIP-Seq analysis. The ChIP-Seq analysis was carried out following well-established guidelines37. Briefly, cells were crosslinked with 1% formaldehyde for 10 min at room temperature. Chromatin was sheared using Covaris M220 Focused-ultrasonicator (Covaris, Woburn, MA) to obtain DNA fragments of about 400-600 bp. Five microgram of monoclonal antibody M2 (Sigma, Cat. No. F1804) was used to pull down the tagged TF. The chromatin was then de-crosslinked at 65°C overnight with proteinase K (New England Biolabs, Ipswich, MA). DNA was purified using MinElute PCR purification kit (Qiagen) and made to library for sequencing with Illumina Hiseq 2500 sequencer (Illumina, San Diego, CA). Fifty bp of short single end reads were used and mapped to human genome hg19 with BWA alignment software38. Duplicated reads at the same genomic loci was removed and peak-calling was performed using MACS239. The peak numbers for all the monoclone and pooled cell samples are listed in Supplementary Table S5. Supplementary Figure S8 shows the Pearson correlations among the previously reported analyses, monoclone and pooled cell samples for each TF respectively. All data was deposited to public database GEO with an accession number GSE78064. For reviewing purpose,

please

use

the

following

link

(http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?token=cjslkcsstbqrxmf&acc=GSE78064). MEME suite was used to perform the de novo motif discovery40. For each experiment, top 500 peaks were selected and genomic sequences within 200 bp centered on each peak summit were used as input for MEME with default parameters. De novo MEME parameter is set up as – revcomp-dna-nmotifs 3–minw 5–maxw 20. The ChIP signal heatmaps for peak lists (Fig. 5E and others) were generated by HOMER annotatePeak script based on 3000 bp surrounding peak centers41.

13 ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 28

ASSOCIATED CONTENT Supporting Information Available Supplementary methods, additional figures and tables as described in the text. This material is available free of charge via the Internet at http://pubs.acs.org.

ABBREVIATIONS CRISPR-Cas: Clustered Regularly Interspaced Short Palindromic Repeats and CRISPRassociated proteins; DSB: double-strand break; gRNA: guide RNA; HR: homologous recombination; MMEJ: microhomology mediated end joining

AUTHOR INFORMATION Corresponding Author * To whom correspondence should be addressed. Bing Ren, Phone: 1(858)822-5766; Email: [email protected]; Huimin Zhao, Phone: 1(217)333-2631; Email: [email protected] Author Contributions X.X., Z.Y., Y.J., R.B. and Z.H. designed the experiments; X.X. and Z.Y. performed all the experiments with the help of Y.J., J.S. and C.S; X.X., Z.Y., R.B. and Z.H. wrote the manuscript. Notes The authors declare no competing financial interest.

ACKNOWLEDGEMENTS This work was supported by the Carl R. Woese Institute for Genomic Biology at the University of Illinois at Urbana-Champaign (H.Z.), National Institutes of Health (1U54DK107965) (H.Z.),

14 ACS Paragon Plus Environment

Page 15 of 28

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

the Ludwig Institute for Cancer Research (B.R.), National Institutes of Health (P50 GM08576404, 1U54DK107977-01) (B.R.), and an International Postdoctoral fellowship from the Swedish Vetenskapsrådet (537-2014-6796) (J.Y.). The authors thank Dr. Barbara Pilas (Flow Cytometry Facility, Biotechnology Center, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA) for helpful suggestions.

15 ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 28

References [1] Rivera, C. M., and Ren, B. (2013) Mapping human epigenomes, Cell 155, 39-55. [2] Hawkins, R. D., and Ren, B. (2006) Genome-wide location analysis: insights on transcriptional regulation, Human molecular genetics 15 Spec No 1, R1-7. [3] Tam, W. L., and Lim, B. (2008) Genome-wide transcription factor localization and function in stem cells, In StemBook, Cambridge (MA). [4] Rodriguez, R., and Miller, K. M. (2014) Unravelling the genomic targets of small molecules using high-throughput sequencing, Nat Rev Genet 15, 783-796. [5] Jothi, R., Cuddapah, S., Barski, A., Cui, K., and Zhao, K. (2008) Genome-wide identification of in vivo protein-DNA binding sites from ChIP-Seq data, Nucleic acids research 36, 5221-5231. [6] Furey, T. S. (2012) ChIP-seq and beyond: new and improved methodologies to detect and characterize protein-DNA interactions, Nat Rev Genet 13, 840-852. [7] Landt, S. G., Marinov, G. K., Kundaje, A., Kheradpour, P., Pauli, F., Batzoglou, S., Bernstein, B. E., Bickel, P., Brown, J. B., Cayting, P., Chen, Y. W., DeSalvo, G., Epstein, C., Fisher-Aylor, K. I., Euskirchen, G., Gerstein, M., Gertz, J., Hartemink, A. J., Hoffman, M. M., Iyer, V. R., Jung, Y. L., Karmakar, S., Kellis, M., Kharchenko, P. V., Li, Q. H., Liu, T., Liu, X. S., Ma, L. J., Milosavljevic, A., Myers, R. M., Park, P. J., Pazin, M. J., Perry, M. D., Raha, D., Reddy, T. E., Rozowsky, J., Shoresh, N., Sidow, A., Slattery, M., Stamatoyannopoulos, J. A., Tolstorukov, M. Y., White, K. P., Xi, S., Farnham, P. J., Lieb, J. D., Wold, B. J., and Snyder, M. (2012) ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia, Genome Res 22, 1813-1831. [8] Egelhofer, T. A., Minoda, A., Klugman, S., Lee, K., Kolasinska-Zwierz, P., Alekseyenko, A. A., Cheung, M. S., Day, D. S., Gadel, S., Gorchakov, A. A., Gu, T. T., Kharchenko, P. V., Kuan, S., Latorre, I., Linder-Basso, D., Luu, Y., Ngo, Q., Perry, M., Rechtsteiner, A., Riddle, N. C., Schwartz, Y. B., Shanower, G. A., Vielle, A., Ahringer, J., Elgin, S. C. R., Kuroda, M. I., Pirrotta, V., Ren, B., Strome, S., Park, P. J., Karpen, G. H., Hawkins, R. D., and Lieb, J. D. (2011) An assessment of histone-modification antibody quality, Nat Struct Mol Biol 18, 91-+. [9] Pilon, A. M., Ajay, S. S., Kumar, S. A., Steiner, L. A., Cherukuri, P. F., Wincovitch, S., Anderson, S. M., Center, N. C. S., Mullikin, J. C., Gallagher, P. G., Hardison, R. C., Margulies, E. H., and Bodine, D. M. (2011) Genome-wide ChIP-Seq reveals a dramatic shift in the binding of the transcription factor erythroid Kruppel-like factor during erythrocyte differentiation, Blood 118, e139-148. [10] Liu, M. G., S.; Battle, M.; and Stiles, J.K. (2011) Gene Functional Studies Using Bacterial Artificial Chromosome (BACs), Bacterial Artificial Chromosomes. [11] Wang, Z. (2009) Epitope tagging of endogenous proteins for genome-wide chromatin immunoprecipitation analysis, Methods in molecular biology 567, 87-98. [12] Zhang, X., Guo, C., Chen, Y., Shulha, H. P., Schnetz, M. P., LaFramboise, T., Bartels, C. F., Markowitz, S., Weng, Z., Scacheri, P. C., and Wang, Z. (2008) Epitope tagging of endogenous proteins for genome-wide ChIP-chip studies, Nature methods 5, 163-165. [13] Savic, D., Partridge, E. C., Newberry, K. M., Smith, S. B., Meadows, S. K., Roberts, B. S., Mackiewicz, M., Mendenhall, E. M., and Myers, R. M. (2015) CETCh-seq: CRISPR epitope tagging ChIP-seq of DNA-binding proteins, Genome Res 25, 1581-1589.

16 ACS Paragon Plus Environment

Page 17 of 28

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

[14] Ran, F. A., Hsu, P. D., Wright, J., Agarwala, V., Scott, D. A., and Zhang, F. (2013) Genome engineering using the CRISPR-Cas9 system, Nature protocols 8, 2281-2308. [15] Cong, L., Ran, F. A., Cox, D., Lin, S., Barretto, R., Habib, N., Hsu, P. D., Wu, X., Jiang, W., Marraffini, L. A., and Zhang, F. (2013) Multiplex genome engineering using CRISPR/Cas systems, Science 339, 819-823. [16] McVey, M., and Lee, S. E. (2008) MMEJ repair of double-strand breaks (director's cut): deleted sequences and alternative endings, Trends in genetics : TIG 24, 529-538. [17] Bae, S., Kweon, J., Kim, H. S., and Kim, J. S. (2014) Microhomology-based choice of Cas9 nuclease target sites, Nature methods 11, 705-706. [18] Sakuma, T., Nakade, S., Sakane, Y., Suzuki, K. T., and Yamamoto, T. (2016) MMEJassisted gene knock-in using TALENs and CRISPR-Cas9 with the PITCh systems, Nature protocols 11, 118-133. [19] Nakade, S., Tsubota, T., Sakane, Y., Kume, S., Sakamoto, N., Obara, M., Daimon, T., Sezutsu, H., Yamamoto, T., Sakuma, T., and Suzuki, K. T. (2014) Microhomologymediated end-joining-dependent integration of donor DNA in cells and animals using TALENs and CRISPR/Cas9, Nat Commun 5. [20] Sakuma, T., Nishikawa, A., Kume, S., Chayama, K., and Yamamoto, T. (2014) Multiplex genome engineering in human cells using all-in-one CRISPR/Cas9 vector system, Sci Rep-Uk 4. [21] Gibson, D. G., Young, L., Chuang, R. Y., Venter, J. C., Hutchison, C. A., 3rd, and Smith, H. O. (2009) Enzymatic assembly of DNA molecules up to several hundred kilobases, Nature methods 6, 343-345. [22] Vizcaino, C., Mansilla, S., and Portugal, J. (2015) Sp1 transcription factor: A long-standing target in cancer chemotherapy, Pharmacology & therapeutics 152, 111-124. [23] Consortium, E. P. (2012) An integrated encyclopedia of DNA elements in the human genome, Nature 489, 57-74. [24] Kuan, C. S., See Too, W. C., and Few, L. L. (2016) Sp1 and Sp3 Are the Transcription Activators of Human ek1 Promoter in TSA-Treated Human Colon Carcinoma Cells, PloS one 11, e0147886. [25] Mudduluru, G., and Allgayer, H. (2008) The human receptor tyrosine kinase Axl gene-promoter characterization and regulation of constitutive expression by Sp1, Sp3 and CpG methylation, Bioscience reports 28, 161-176. [26] Frietze, S., Wang, R., Yao, L., Tak, Y. G., Ye, Z., Gaddis, M., Witt, H., Farnham, P. J., and Jin, V. X. (2012) Cell type-specific binding patterns reveal that TCF7L2 can be tethered to the genome by association with GATA3, Genome biology 13, R52. [27] Seitz, V., Butzhammer, P., Hirsch, B., Hecht, J., Gutgemann, I., Ehlers, A., Lenze, D., Oker, E., Sommerfeld, A., von der Wall, E., Konig, C., Zinser, C., Spang, R., and Hummel, M. (2011) Deep sequencing of MYC DNA-binding sites in Burkitt lymphoma, PloS one 6, e26837. [28] Zhao, J., Schug, J., Li, M., Kaestner, K. H., and Grant, S. F. (2010) Disease-associated loci are significantly over-represented among genes bound by transcription factor 7-like 2 (TCF7L2) in vivo, Diabetologia 53, 2340-2346. [29] Yin, Y. W., Jin, H. J., Zhao, W., Gao, B., Fang, J., Wei, J., Zhang, D. D., Zhang, J., and Fang, D. (2015) The Histone Acetyltransferase GCN5 Expression Is Elevated and Regulated by c-Myc and E2F1 Transcription Factors in Human Colon Cancer, Gene expression 16, 187-196. 17 ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 28

[30] Chen, C., Cai, S., Wang, G., Cao, X., Yang, X., Luo, X., Feng, Y., and Hu, J. (2013) c-Myc enhances colon cancer cell-mediated angiogenesis through the regulation of HIF-1alpha, Biochemical and biophysical research communications 430, 505-511. [31] Gomes, N. P., and Espinosa, J. M. (2010) Gene-specific repression of the p53 target gene PUMA via intragenic CTCF-Cohesin binding, Genes & development 24, 1022-1034. [32] Wang, T., Wei, J. J., Sabatini, D. M., and Lander, E. S. (2014) Genetic screens in human cells using the CRISPR-Cas9 system, Science 343, 80-84. [33] Truong, L. N., Li, Y., Shi, L. Z., Hwang, P. Y., He, J., Wang, H., Razavian, N., Berns, M. W., and Wu, X. (2013) Microhomology-mediated End Joining and Homologous Recombination share the initial end resection step to repair DNA double-strand breaks in mammalian cells, Proceedings of the National Academy of Sciences of the United States of America 110, 7720-7725. [34] Kent, T., Chandramouly, G., McDevitt, S. M., Ozdemir, A. Y., and Pomerantz, R. T. (2015) Mechanism of microhomology-mediated end-joining promoted by human DNA polymerase theta, Nat Struct Mol Biol 22, 230-237. [35] Crespan, E., Czabany, T., Maga, G., and Hubscher, U. (2012) Microhomology-mediated DNA strand annealing and elongation by human DNA polymerases lambda and beta on normal and repetitive DNA sequences, Nucleic acids research 40, 5577-5590. [36] Hsu, P. D., Scott, D. A., Weinstein, J. A., Ran, F. A., Konermann, S., Agarwala, V., Li, Y., Fine, E. J., Wu, X., Shalem, O., Cradick, T. J., Marraffini, L. A., Bao, G., and Zhang, F. (2013) DNA targeting specificity of RNA-guided Cas9 nucleases, Nature biotechnology 31, 827-832. [37] Yan, J., Enge, M., Whitington, T., Dave, K., Liu, J., Sur, I., Schmierer, B., Jolma, A., Kivioja, T., Taipale, M., and Taipale, J. (2013) Transcription factor binding in human cells occurs in dense clusters formed around cohesin anchor sites, Cell 154, 801-813. [38] Li, H., and Durbin, R. (2009) Fast and accurate short read alignment with Burrows-Wheeler transform, Bioinformatics 25, 1754-1760. [39] Zhang, Y., Liu, T., Meyer, C. A., Eeckhoute, J., Johnson, D. S., Bernstein, B. E., Nusbaum, C., Myers, R. M., Brown, M., Li, W., and Liu, X. S. (2008) Model-based analysis of ChIP-Seq (MACS), Genome biology 9, R137. [40] Bailey, T. L., and Elkan, C. (1994) Fitting a mixture model by expectation maximization to discover motifs in biopolymers, Proceedings / ... International Conference on Intelligent Systems for Molecular Biology ; ISMB. International Conference on Intelligent Systems for Molecular Biology 2, 28-36. [41] Heinz, S., Benner, C., Spann, N., Bertolino, E., Lin, Y. C., Laslo, P., Cheng, J. X., Murre, C., Singh, H., and Glass, C. K. (2010) Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities, Molecular cell 38, 576-589.

18 ACS Paragon Plus Environment

Page 19 of 28

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

FIGURE LEGENDS Figure 1. Schematic depiction of the MMEJ-mediated TF tagging strategy named cmChIP-seq for high throughput ChIP-Seq analysis. Cells are transfected with plasmids containing the Cas9 nuclease, gRNAs, and epitope tag donor constructs, leading to the integration of the FLAG tag, 2A linker sequence, mNeonGreen and puromycin resistance gene at the 3′ end of the target transcription factor.

Figure 2. Genotyping analyses for SP1 monoclonal samples. A) Junction sequencing results. The intended knocked-in sequence is shown at the top. Blue: microhomologous arm; Yellow: initial sequence of 3×Flag; Green: inserted nucleotides. B) PCR check for selected samples no. 1,2,3,8. 3’ junction check used forward primer targeting puromycin and reverse primer targeting genomic region after the double-strand break. Genome check applied both primers targeting the genome. Triangular points refer to expected sized amplicons and an asterisk refers to wild type product.

Figure 3. A) Representative DNA-binding protein read enrichment tracks on the Integrative Genomics Viewer (IGV) for SP1 monoclonal samples. B) Motif analyses on the top 500 peaks of each SP1 monoclonal sample identified SP1 binding motif. MEME de novo software analyzed 500 top peaks with 100 bp surrounding the peak summit and gave all SP1 motif enrichment validation.

19 ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 28

Figure 4. A) Representative DNA-binding protein read enrichment tracks on the IGV for MYC and TCF7L2 monoclonal samples. B) Top motif identified for TCF7L2 monoclonal samples matched known TCF7L2 motif. C) Motif discovery for MYC monoclonal samples identified MYC motif as the top hit.

Figure 5. A) Genotyping analyses for SP1 pooled cell samples. Whole cassette analysis using both primers annealing to genomic DNA and junction check involved one primer targeting genome and the other recognizing the insert. Triangles point expected amplicon sizes with insertions and arterials refer to wild type bands. B) Representative DNA-binding protein read enrichment tracks on the IGV for SP1 pooled cell samples. C) Motif discovery for SP1 pooled cell samples before fluorescence-activated cell sorting (FACS). D) Motif discovery for SP1 pooled cell samples after FACS. E) Read enrichment heatmap of the 3000 bp regions centered on the SP1 ChIP-Seq peaks generated by ENCODE. Each row is a peak and the x-axis denotes the whole 3000 bp genomic regions. Each column is a TF-tagged SP1 sample.

Figure 6. A) Representative DNA-binding protein read enrichment tracks on the IGV for MYC and CTCF pooled cell samples. B) Motif discovery for MYC pooled cell samples before FACS. C) Read enrichment heatmap of the 3000 bp regions centered on the MYC peaks generated by Seitz and coworkers27. Each row is a peak and the x-axis denotes the whole 3000 bp genomic regions. Each column is a TF-tagged MYC sample. D) Motif discovery for CTCF pooled cell samples before FACS. E) Read enrichment heatmap of the 3000 bp regions centered on the SP1

20 ACS Paragon Plus Environment

Page 21 of 28

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

ChIP-Seq peaks generated by ENCODE. Each row is a peak and the x-axis denotes the whole 3000 bp genomic regions. Each column is a TF-tagged CTCF sample.

21 ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Fig 1

22 ACS Paragon Plus Environment

Page 22 of 28

Page 23 of 28

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

Fig 2

23 ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Fig 3

24 ACS Paragon Plus Environment

Page 24 of 28

Page 25 of 28

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

Fig 4

25 ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Fig 5

26 ACS Paragon Plus Environment

Page 26 of 28

Page 27 of 28

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

Fig 6

27 ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

74x36mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 28 of 28