Perspective Cite This: ACS Chem. Biol. XXXX, XXX, XXX−XXX
Discovering the Genome-Wide Activity of CRISPR-Cas Nucleases Shengdar Q. Tsai* Department of Hematology, St. Jude Children’s Research Hospital, Memphis, Tennessee, United States ABSTRACT: Originally discovered as part of an adaptive bacterial defense system against the invasion of foreign phages, programmable CRISPR-Cas nucleases have emerged as remarkable enzymes with transformative potential for both biological research and clinical application. CRISPR-Cas nucleases likely evolved in their natural context to tolerate imperfect specificity in order to recognize mutant bacteriophages. However, in the context of biological research and clinical applications, high specificity is generally preferred. For therapeutic applications in particular, it is important to carefully and empirically define the genome-wide activity of engineered nucleases, as hundreds of millions to billions of cells may be modified in a single therapeutic dose. Over the past several years, a number of both cellbased and in vitro sensitive and unbiased genome-scale methods to define CRISPR-Cas nuclease specificity have been developed. These methods will play important complementary roles in better understanding their global specificity profiles and identifying optimal nucleases for applications that demand high precision editing. Improving the sensitivity of mutation detection by next-generation sequencing, developing assays to define the functional consequences of unintended off-target activity nuclease activity, and understanding the consequences of individual human genetic variation on gene editing activity will be important areas for future research and development.
O
transcription activator-like effector nucleases (TALENs), CRISPR-Cas nucleases do not passively bind to doublestranded DNA in the major groove but rather actively unwind double-stranded DNA and recognize their targets through a combination of protein:DNA and RNA:DNA interactions. There is a sophisticated proof-reading mechanism for cleavage, mediated by a conformational change that occurs upon recognition of a sufficiently matched RNA:DNA duplex,4 such that the cleavage specificity exceeds the intrinsic binding specificity of Cas9. This explains why most Cas9 binding sites identified by ChIP-seq were generally not found to be cleaved.5 For most applications, high specificity is generally preferred, but effort is required to understand the full range of activities of a gene editing nuclease. First, in a research context, is the biological effect observed after a gene editing experiment the result of an on-target or off-target effect? Second, what is the genome-wide activity of the specific CRISPR-Cas nuclease used? The first question can often be addressed simply by use of multiple nonoverlapping gRNA targets or by functional rescue experiments, while the second requires the use of more sophisticated, sensitive, and unbiased genomic methods. In this Perspective, we focus on discussing current genomic methods and challenges that remain for defining the genome-wide activity of CRISPR-Cas nucleases as completely as possible. Further background and details on both methods for defining
riginally discovered as part of an adaptive bacterial defense system against the invasion of foreign phages,1,2 programmable CRISPR-Cas nucleases have emerged as remarkable enzymes with transformative potential for both biological research and clinical application. Broad and rapid adoption of CRISPR-Cas nucleases has been catalyzed by the simplicity of the engineered system, which requires only two components: a nuclease protein and an associated single guide RNA (sgRNA).3 DNA target recognition requires the presence of a protospacer adjacent motif (PAM) whose sequence specificity is determined by protein:DNA interactions. However, the bulk of the specificity is governed by Watson− Crick base pairing interactions between a sequence specified in the gRNA (protospacer) and a complementary DNA target. For example, the first CRISPR-Cas nuclease to be widely used, S. pyogenes Cas9, recognizes 20 nt adjacent to an NGG PAM (N20NGG). CRISPR-Cas nucleases likely evolved in their natural context to tolerate imperfect specificity. In an arms race between bacteria and foreign invading bacteriophages, from the bacterial perspective, ideal CRISPR-Cas nucleases would balance the specificity of DNA target recognition and cleavage to appropriately distinguish between the bacterial and phage genome, but tolerate sufficient imprecision to also recognize mutant phages. This stands in sharp contrast to genome editing applications of CRISPR-Cas nucleases where recognition of sites other than the intended target site is usually undesired. CRISPR-Cas nuclease specificity is not a simple function of its binding affinity. Unlike zinc finger nucleases (ZFNs) and © XXXX American Chemical Society
Special Issue: Chemical Biology of CRISPR Received: September 27, 2017 Accepted: December 18, 2017
A
DOI: 10.1021/acschembio.7b00847 ACS Chem. Biol. XXXX, XXX, XXX−XXX
Perspective
ACS Chemical Biology Table 1. Summary of Current Methods for Defining the Genome-Wide Activity of Genome Editing Nucleases method BLESS BLISS GUIDE-seq HTGTS IDLV capture CIRCLE-seq Digenomeseq SITE-seq
type
principle
references
hybrid
in situ ligation of adapters to transient nuclease-induced DSBs in fixed cells
cellbased cellbased cellbased in vitro
capture of short end-protected dsODN tag followed by tag-specific amplification
Crosetto et al.12 Yan et al.13 Tsai et al.11
detection of translocations between “bait” and “prey” DSBs
Frock et al.10
capture of IDLV into nuclease-induced DSBs followed by IDLV-anchored ligation-adapter mediated PCR
Gabriel et al.8
in vitro in vitro
genomic DNA circularization and enzymatic purification to minimize ends; in vitro nuclease cleavage selective Tsai et al.15 sequencing of cleaved fragments in vitro cleavage of genomic DNA, whole-genome sequencing, and bioinformatics identification of sites with uniform Kim et al.14 ends high-molecular weight genomic DNA isolation to minimize ends; in vitro nuclease cleavage and selective sequencing of Cameron et cleaved fragments al.16
CRISPR-Cas nucleases.9 The principle of the method is the capture of integration-defective lentiviral vectors (IDLVs) into nuclease-induced DSBs, a process that is not very efficient and typically employs marker selection for enrichment. High-throughput genome-wide translocation sequencing (HTGTS) detects DSBs repaired by translocations to a “bait” DSB.10 It requires two nuclease-induced DSBs and a translocation to occur in the same cell, which are relatively rare events. Because translocations occur more frequently between DSBs that are closer together in terms of nuclear distance, translocation frequency is not directly correlated with nucleaseinduced mutagenesis frequency. An advantage of HTGTS is that it does not require delivery of an additional DNA marker, but a limitation is that results may vary depending on which nuclease is used to generate the “bait” DSB. Genome-wide unbiased identification of DSBs enabled by sequencing (GUIDE-seq) is based on the optimized and efficient integration of an end-protected double-stranded oligodeoxynucleotide (dsODN) tag into the sites of nuclease-induced DSBs, followed by tag-specific amplification and high-throughput sequencing.11 It directly detects DSBs that have been repaired with intervening dsODN ligations, and dsODN tag integration is well correlated with the frequency of nuclease-induced mutagenesis. A limitation of GUIDE-seq is that it requires that cells be efficiently transfected with the dsODN tag. While GUIDE-seq can sensitively detect off-target activity of CRISPR-Cas nucleases down to ∼0.1%, enhancing the detection sensitivity beyond this threshold necessitates parallel increases in both the number of input genomes as well as sequencing depth. Breaks labeling enrichment on streptavidin and nextgeneration sequencing (BLESS) and a subsequent improvement, breaks labeling in situ and sequencing (BLISS), are methods for the direct labeling and detection of DSBs.12,13 They have the advantage that they do not require delivery and incorporation of exogenous DNA for detection, but as they are designed to directly detect unrepaired DSBs, they do not detect DSBs that have been previously repaired by cellular DNA damage response pathways. Digenome-seq was the first fully in vitro method for detection of CRISPR-Cas nuclease activity on bulk human genomic DNA.14 In this method, genomic DNA is isolated, treated with Cas9, and ligated to adapters for whole-genome sequencing. Genomic DNA fragments are mapped to a reference genome, and positions with an enrichment of uniformly mapping ends are identified as sites of likely
and improving the genome-wide specificity of CRISPR-Cas nucleases were reviewed by Tsai et al.6 For therapeutic applications, it is important to carefully and empirically define the genome-wide activity of engineered nucleases, because large numbers of cells (hundreds of millions to billions) may be modified in a single therapeutic dose. Of particular concern is the possibility that rare, unintended nuclease-induced mutagenesis may activate proto-oncogenes or inactivate tumor suppressors leading to uncontrolled cell proliferation. Unlike gene therapy with integrating viral vectors, gene editing does not by default leave behind defined genetic sequences that can be used to assess genomic integration positions and to quantify clonal dominance. Fortunately, there have been a number of approaches developed over the past several years to empirically address this challenge; these approaches can be roughly divided into two categories: cell-based or in vitro methods (see Table 1). The primary difference between cell-based and in vitro methods is that the former can be influenced by chromatin organization, epigenetic modifications, and binding of cellular transcription factors. In vitro methods have the potential to achieve higher sensitivity than cell-based methods by use of higher protein:DNA ratios, but with the caveat that activity at biochemical cleavage sites may differ from activity in cells due to interactions with cellular factors and chromatin organization. One approach often initially considered to define the genome-wide mutational activity of CRISPR-Cas nucleases is whole genome sequencing (WGS), but this approach has significant limitations in sensitivity. Typically, WGS is performed at 30−50× coverage, and it remains impractical with current high-throughput sequencing technologies to sequence to much higher depths due to cost. Therefore, standard WGS experiments are often only sensitive enough to reliably detect mutations of 5% or greater. However, WGS has been used to infer the presence or absence of off-target mutagenesis in specific clonal cell populations7 or the F1 progeny of genome edited animal models, though it is not well-suited to comprehensively discover or determine the frequency of unintended genome-wide activity. The first cell-based genome-scale method for evaluating engineered nuclease specificity was the IDLV capture method used with zinc finger nucleases (ZFNs), a nuclease architecture based on the fusion of zinc finger DNA-binding domains to the dimerization-dependent, nonspecific FokI nuclease domain.8 This approach was more recently adapted for use with B
DOI: 10.1021/acschembio.7b00847 ACS Chem. Biol. XXXX, XXX, XXX−XXX
Perspective
ACS Chemical Biology
methods such as GUIDE-seq due to lower tag incorporation into DSBs generated by these strategies. There remain some major challenges to fully understanding the genome-wide behavior of engineered nucleases. First, current high-throughput sequencing workflows practically limit the detection sensitivity for nuclease-induced indel mutations to approximately 0.1%. Further improvements to the sensitivity of sequencing technologies will be required to survey the nuclease-induced mutational landscape at levels that fully sample the large numbers of cells required for many therapeutic strategies. Second, these genomic methods are subject to the limitations of current high-throughput sequencing technologies, in particular the relatively short read lengths that prevent sequencing and unique mapping of highly repetitive regions. Third, while it is currently possible to determine the location and frequency of CRISPR-Cas nuclease activity, it remains challenging to classify the effects of mutations at these sites on cellular function. Assays for clonal dominance to determine whether cells with specific unintended edits gain a proliferative advantage will be an important part of the global safety assessment of CRISPR-Cas nucleases intended for clinical applications. Finally, it remains unclear to what extent individual genetic variability influences the cellular genome-wide activity of gene editing nucleases. Genetic differences at potential off-target sites can both increase and decrease the risk of nuclease activity. While this risk can be estimated in silico if an individual’s whole genome sequence is known, empirically defining this risk will require the generation of larger data sets characterizing the genome-wide activity of CRISPR-Cas nucleases in the same cell type obtained from different individuals. Over the past several years, the technology for defining the genome-wide activity of CRISPR-Cas nucleases has advanced in lockstep with improvements in the specificity and versatility of these amazing enzymes. We are fortunate to have acquired many diverse tools that enable us to define the scope and frequency with which CRISPR-Cas nucleases alter the genomes of cells in both intended and unintended ways. With the data from all of these different methods, it is clear that for a particular intended target site, these enzymes may have a range of biochemical preferences for cleavage of other unintended sites. Whether designing research experiments or clinical gene editing strategies, the genome-wide activity of these powerful tools should always be carefully considered.
nuclease-induced cleavage. An advantage is that the protocol is simple and does not require PCR. One limitation of Digenome-seq is the large number of reads required for each whole-genome sequencing experiment, as the majority of sequencing bandwidth is expended on regions of the genome that are unaffected by gene editing nuclease activity. CIRCLE-seq is a method for selective sequencing of CRISPR-Cas nuclease-cleaved genomic DNA, based on the principle of generating an enzymatically purified population of genomic DNA circles that can subsequently be linearized by nuclease treatment.15 Only linear DNA possesses available ends for adapter ligation and amplification for high-throughput sequencing. For most targets, CIRCLE-seq identifies all sites detected by previously described methods such as GUIDE-seq, as well as many additional sites. Because CIRCLE-seq captures both ends of the cleaved genomic DNA, it is possible to identify off-target cleavage sites even in the absence of a reference genome. SITE-seq is another method for selective sequencing of CRISPR-Cas nuclease-cleaved genomic DNA, which is based on the isolation of high-molecular weight genomic DNA to minimize available DNA ends, followed by sequential adapter ligation and biotinylated sequence capture.16 Recovered fragments are mapped to the reference genome, and sites with characteristic bidirectional mapping ends are identified. It is clear that both cell-based and in vitro methods for discovery of the genome-wide activity of CRISPR-Cas nucleases will continue to play an important and complementary role. In general, in vitro methods may be more comprehensive and sensitive but still require follow-up targeted validation in nuclease-treated genome edited cells. When feasible, cell-based methods can provide a rapid means of identifying sites of nuclease activity down to approximately 1 in 1000−10,000. Together, they can form part of a rigorous evaluation process for defining CRISPR-Cas nuclease specificity. There are many research uses of CRISPR-Cas nucleases, where potential off-target activity can be controlled by testing whether genome modification using nonoverlapping target sites can achieve the same biological effect, or by functional rescue by genetic complementation of a disrupted gene. In these cases, these simple control experiments are preferred to answer the question of whether the effect is likely caused by the intended genetic modification. One important use of genome-scale methods to define the activity of CRISPR-Cas nucleases is to characterize strategies to improve their specificity, ensuring that reductions in activity at known off-target sites are not accompanied by the creation of new off-target sites in other genomic locations. There is often a balance between specificity and activity, and ideal strategies for improving specificity seek to maintain equivalent on-target activity while reducing genome-wide off-target activity. There are three major groupings of methods to improve specificity of CRISPR-Cas nucleases that have been most thoroughly explored with Cas9: truncated17 or extended gRNAs, paired nicking18,19 or dimerization (RNA-guided FokIdCas9 nucleases20,21), and high-fidelity engineered variants that have been previously reviewed in detail.6 The most extensively characterized methods for improving specificity are truncated gRNAs and high-fidelity engineered variants where significant reductions in undesired genome-wide activity were observed. Paired nickases and RNA-guided FokI-dCas9 nucleases remain less well-characterized by end-capture
■
AUTHOR INFORMATION
Corresponding Author
*E-mail:
[email protected]. ORCID
Shengdar Q. Tsai: 0000-0001-9161-3993 Notes
The author declares the following competing financial interest(s): S.Q.T. is a scientific co-founder of Monitor Biotechnologies Corporation.
■
REFERENCES
(1) Mojica, Díez-Villaseñor, García-Martínez, and Soria (2005) Intervening Sequences of Regularly Spaced Prokaryotic Repeats Derive from Foreign Genetic Elements. J. Mol. Evol. 60, 174−182. (2) Barrangou, Fremaux, Deveau, Richards, Boyaval, Moineau, Romero, and Horvath (2007) CRISPR Provides Acquired Resistance Against Viruses in Prokaryotes. Science 315, 1709−1712. C
DOI: 10.1021/acschembio.7b00847 ACS Chem. Biol. XXXX, XXX, XXX−XXX
Perspective
ACS Chemical Biology (3) Jinek, Chylinski, Fonfara, Hauer, Doudna, and Charpentier (2012) A Programmable Dual-RNA−Guided DNA Endonuclease in Adaptive Bacterial Immunity. Science 337, 816−821. (4) Sternberg, LaFrance, Kaplan, and Doudna (2015) Conformational control of DNA target cleavage by CRISPR−Cas9. Nature 527, 110−113. (5) Wu, Scott, Kriz, Chiu, Hsu, Dadon, Cheng, Trevino, Konermann, Chen, Jaenisch, Zhang, and Sharp (2014) Genomewide binding of the CRISPR endonuclease Cas9 in mammalian cells. Nat. Biotechnol. 32, 670−676. (6) Tsai, and Joung (2016) Defining and improving the genomewide specificities of CRISPR-Cas9 nucleases. Nat. Rev. Genet. 17, 300. (7) Veres, Gosis, Ding, Collins, Ragavendran, Brand, Erdin, Cowan, Talkowski, and Musunuru (2014) Low Incidence of Off-Target Mutations in Individual CRISPR-Cas9 and TALEN Targeted Human Stem Cell Clones Detected by Whole-Genome Sequencing. Cell Stem Cell 15, 27−30. (8) Gabriel, R., Lombardo, A., Arens, A., Miller, J. C., Genovese, P., Kaeppel, C., Nowrouzi, A., Bartholomae, C. C., Wang, J., Friedman, G., Holmes, M. C., Gregory, P. D., Glimm, H., Schmidt, M., Naldini, L., and von Kalle, C. (2011) An unbiased genome-wide analysis of zinc-finger nuclease specificity. Nat. Biotechnol. 29, 816−823. (9) Wang, Wang, Wu, Wang, Wang, Qiu, Chang, Huang, Lin, and Yee (2015) Unbiased detection of off-target cleavage by CRISPRCas9 and TALENs using integrase-defective lentiviral vectors. Nat. Biotechnol. 33, 175−178. (10) Frock, Hu, Meyers, Ho, Kii, and Alt (2014) Genome-wide detection of DNA double-stranded breaks induced by engineered nucleases. Nat. Biotechnol. 33, 179−186. (11) Tsai, Zheng, Nguyen, Liebers, Topkar, Thapar, Wyvekens, Khayter, Iafrate, Le, Aryee, and Joung (2014) GUIDE-seq enables genome-wide profiling of off-target cleavage by CRISPR-Cas nucleases. Nat. Biotechnol. 33, 187−197. (12) Crosetto, N., Mitra, A., Silva, M. J., Bienko, M., Dojer, N., Wang, Q., Karaca, E., Chiarle, R., Skrzypczak, M., Ginalski, K., Pasero, P., Rowicka, M., and Dikic, I. (2013) Nucleotide-resolution DNA double-strand break mapping by next-generation sequencing. Nat. Methods 10, 361−365. (13) Yan, Mirzazadeh, Garnerone, Scott, Schneider, Kallas, Custodio, Wernersson, Li, Gao, Federova, Zetsche, Zhang, Bienko, and Crosetto (2017) BLISS is a versatile and quantitative method for genome-wide profiling of DNA double-strand breaks. Nat. Commun. 8, 15058. (14) Kim, Bae, Park, Kim, Kim, Yu, Hwang, Kim, and Kim (2015) Digenome-seq: genome-wide profiling of CRISPR-Cas9 off-target effects in human cells. Nat. Methods 12, 237−243. (15) Tsai, Nguyen, Malagon-Lopez, Topkar, Aryee, and Joung (2017) CIRCLE-seq: a highly sensitive in vitro screen for genomewide CRISPR-Cas9 nuclease off-targets. Nat. Methods 14, 607−614. (16) Cameron, Fuller, Donohoue, Jones, Thompson, Carter, Gradia, Vidal, Garner, Slorach, Lau, Banh, Lied, Edwards, Settle, Capurso, Llaca, Deschamps, Cigan, Young, and May (2017) Mapping the genomic landscape of CRISPR-Cas9 cleavage. Nat. Methods 14, 600− 606. (17) Fu, Y., Sander, J. D., Reyon, D., Cascio, V. M., and Joung, J. K. (2014) Improving CRISPR-Cas nuclease specificity using truncated guide RNAs. Nat. Biotechnol. 32, 279. (18) Mali, Aach, Stranges, Esvelt, Moosburner, Kosuri, Yang, and Church (2013) CAS9 transcriptional activators for target specificity screening and paired nickases for cooperative genome engineering. Nat. Biotechnol. 31, 833. (19) Ran, Hsu, Lin, Gootenberg, Konermann, Trevino, Scott, Inoue, Matoba, Zhang, and Zhang (2013) Double Nicking by RNA-Guided CRISPR Cas9 for Enhanced Genome Editing Specificity. Cell 154, 1380−1389. (20) Tsai, Wyvekens, Khayter, Foden, Thapar, Reyon, Goodwin, Aryee, and Joung (2014) Dimeric CRISPR RNA-guided FokI nucleases for highly specific genome editing. Nat. Biotechnol. 32, 569−576.
(21) Guilinger, Thompson, and Liu (2014) Fusion of catalytically inactive Cas9 to FokI nuclease improves the specificity of genome modification. Nat. Biotechnol. 32, 577.
D
DOI: 10.1021/acschembio.7b00847 ACS Chem. Biol. XXXX, XXX, XXX−XXX