Cre Recombinase and Other Tyrosine Recombinases - Chemical

May 10, 2016 - She received her B.S. in Chemistry at Eckerd College, St. Petersburg, FL, in 1984 and her Ph.D. in Chemistry with the late Dr. Fred L. ...
0 downloads 18 Views 16MB Size
Review pubs.acs.org/CR

Cre Recombinase and Other Tyrosine Recombinases Gretchen Meinke,† Andrew Bohm,† Joachim Hauber,‡ M. Teresa Pisabarro,§ and Frank Buchholz*,∥ †

Department of Developmental, Molecular & Chemical Biology, Tufts University School of Medicine, Boston, Massachusetts 02111, United States ‡ Heinrich Pette Institute, Leibniz Institute for Experimental Virology, 20251 Hamburg, Germany § Structural Bioinformatics, BIOTEC TU Dresden, 01307 Dresden, Germany ∥ Medical Systems Biology, UCC, Medical Faculty Carl Gustav Carus TU Dresden, 01307 Dresden, Germany ABSTRACT: Tyrosine-type site-specific recombinases (T-SSRs) have opened new avenues for the predictable modification of genomes as they enable precise genome editing in heterologous hosts. These enzymes are ubiquitous in eubacteria, prevalent in archaea and temperate phages, present in certain yeast strains, but barely found in higher eukaryotes. As tools they find increasing use for the generation and systematic modification of genomes in a plethora of organisms. If applied in host organisms, they enable precise DNA cleavage and ligation without the gain or loss of nucleotides. Criteria directing the choice of the most appropriate T-SSR system for genetic engineering include that, whenever possible, the recombinase should act independent of cofactors and that the target sequences should be long enough to be unique in a given genome. This review is focused on recent advancements in our mechanistic understanding of simple T-SSRs and their application in developmental and synthetic biology, as well as in biomedical research.

CONTENTS 1. Introduction 2. Tyrosine Recombinases: From Discovery to Application 2.1. Cre/loxP and FLP/FRT Systems 2.2. Emerging Tyrosine Recombinase Systems 3. Tyrosine Recombinase Structures and Mechanism 3.1. Overview of the Mechanism of Site-Specific Recombination 3.2. Structural Overview of Tyrosine Recombinases 3.3. Structure of Cre Recombinase Bound to LoxP 3.4. Cre-loxP Precleavage Synaptic Structure 3.4.1. Conformational Changes in Cre-loxP along the Recombination Pathway 3.4.2. Recombinase Active Site 3.4.3. Cre Protein−protein Interfaces 3.5. Cre/loxP Holliday Junction Intermediates 3.6. Order of Strand Exchange 3.7. Structural Comparison of Flp/FRT to Cre/loxP 3.8. Structural Summary and Outlook 4. Cre DNA Binding Specificity 4.1. Complexity of DNA Recognition by Cre and Basis for Its Selectivity 4.1.1. Relaxed versus High Target Specificity 4.1.2. Role Played by Noncontacting Positions in the Cre/loxP Complex

© 2016 American Chemical Society

12786 12787 12787 12788 12788 12789 5.

12789 12790 12790 12791 12791 12791 12791 12792 12793 12794 12795

6. 7. 8.

12795 12796

4.1.3. Cre/loxP Physicochemical Nature: Flexibility and Hydration 4.1.4. Cre/loxP Quaternary Structure 4.2. Engineering of Custom Cre Recombinases with Altered Specificities 4.2.1. Manipulating Protein−protein Interaction Properties to Tackle DNA Target Specificity 4.2.2. Exploiting “Interdependent Contacts” to Rationally Engineer Customized CreDNA Specificity Applied Use of Tyrosine Recombinases 5.1. Lineage Tracing and Brainbow 5.2. Recombinase Delivery Strategies 5.3. Modified Recombinase Systems 5.4. Recombinase Mediated Cassette Exchange (RMCE) 5.5. Synthetic Biology 5.6. Combinations of Different Recombinases 5.7. Tyrosine Recombinases in Combination with RNAi 5.8. Tyrosine Recombinases in Conjunction with Other Genome Editing tools Directed Evolution of Tyrosine Recombinases Tyrosine Recombinases for Clinical Applications Perspective

12797 12798 12798

12799

12799 12801 12801 12802 12802 12804 12805 12805 12805 12806 12806 12808 12809

Special Issue: Genome Modifying Mechanisms

12796

Received: February 1, 2016 Published: May 10, 2016 12785

DOI: 10.1021/acs.chemrev.6b00077 Chem. Rev. 2016, 116, 12785−12820

Chemical Reviews

Review

Figure 1. Schematic overview of the stepwise T-SSR recombination mechanism. Step 1. Synapsis. Four molecules of recombinase interact with two DNA substrates to form a tetrameric complex. This complex results in a large bend in the DNA shown schematically. Each strand of the duplex DNA is represented by a line, black or blue, and labeled 5′ or 3′ to indicate the DNA directionality. The recombinase is shown as an asymmetric shape bound to each half-site in either the “cleaving competent” conformation (green) or “noncleaving” conformation (pink). In addition, in the synaptic tetramer, the protomers are arranged head-to-tail and have a pseudo 4-fold symmetry. Each protomer communicates its state in trans to its neighbor via C-terminal conformational changes shown schematic as a rectangle interacting with a neighboring protomer. Note that opposing protomers are active (and poised to interact with the scissile phosphate on one strand of DNA) or inactive. The nucleophilic tyrosine is shown only for the active conformation and indicated by a Y in a green circle. The location of the activated scissile phosphate is indicated by a red circle around the letter P. Step 2. Cleavage/ligation. The nucleophilic tyrosine attacks the scissile phosphate forming a 3′-phosphotyrosine linkage and resulting in a free 5′OH. Step 3. Strand exchange. Strand exchange occurs when the free 5′OH attacks the neighboring phosphotyrosine and forms a Holliday Junction intermediate. Step 4. Isomerization. The complex shifts such that the inactive conformers adopt active conformations, and the active conformers become inactive conformers. The complex is ready for the second round of strand exchange. Again, the catalytic tyrosine attacks the scissile phosphate to form the phosphotyrosine intemediate. Step 5. The free 5′OH group attacks the phosphotyrosine. Step 6. Resolution of the recombined target.

Author Information Corresponding Author Notes Biographies Acknowledgments References

in prokaryotes and bacteriophages where they perform a plethora of functions, including DNA integration and excision, plasmid copy number control, regulation of gene expression, chromosome segregation at cell division, and separation of catenated circular DNAs.4 In addition, T-SSRs are frequently encoded on yeast plasmids, where they complement the partitioning system to maintain high-copy numbers of the plasmids.5 Furthermore, recent work has uncovered T-SSRs in higher eukaryotes, classified as Cryptons in DIRS and PAT family transposons.6,7 To date, more than 1300 T-SSRs have been identified and classified into different subfamilies,8 providing a rich ground to study these enzymes. Many TSSRs work in concert with other host proteins to regulate their activity, directionality, or processivity (reviewed in Jayaram et al.9). Other T-SSRs are simpler and do not require host factors to carry out the full site-specific recombination reaction. In particular, these simple T-SSRs have been studied extensively because of their ease of use and their utility for sophisticated genome engineering to delete, integrate, invert, or translocate genomic DNA (Figure 2). The precise details of the site-specific biochemical reaction are still under investigation. In-depth structural work in combination with computer-aided modeling have helped shed light on the molecular mechanisms ruling T-SSRs function.10 In recent years, ever-increasing sophistication of genome engineering has been achieved by combining different T-SSR systems. Clever approaches have been developed that allow complex genome rearrangements in living organisms, and there is an

12810 12810 12810 12810 12811 12811

1. INTRODUCTION Tyrosine site-specific recombinases (T-SSRs) are DNA modifying enzymes that bind, cleave, strand exchange, and rejoin DNA at their respective, typically palindromic, recognition target sites. Implicit in the name, the T-SSR protein family is classified by a catalytic tyrosine that forms a phospho-protein covalent linkage, DNA topological rearrangement, and DNA ligation coupled with protein regeneration (Figure 1). The reaction mechanism reflects the biochemical activity of type IB DNA topoisomerases to form 3′phosphotyrosine intermediates that are subsequently resolved by attack of the free 5′ DNA ends relative to the uncut strands.1−3 Recombination sites typically range between 30 to 200 nucleotides in length and consist of two motifs with a partial inverted-repeat symmetry, to which the recombinase binds, and which flank a central crossover sequence, often referred to as the spacer sequence, at which the recombination takes place. The spacer sequence itself is not strictly conserved and allows some flexibility, but recombination requires that the two spacers have matching sequences. Most T-SSRs are found 12786

DOI: 10.1021/acs.chemrev.6b00077 Chem. Rev. 2016, 116, 12785−12820

Chemical Reviews

Review

review, we provide an overarching assessment of recent advancements in our mechanistic understanding and applied use of simple T-SSRs that carry out the full recombination reaction without the help of any additional cofactors. In particular, we focus our review on the Cre/loxP system as it is currently the most used and best understood T-SSR system, but we extend our analysis to other systems where appropriate.

2. TYROSINE RECOMBINASES: FROM DISCOVERY TO APPLICATION As with many other important molecular biology discoveries the bacteriophage lambda served as the starting point for research on T-SSRs. To integrate and excise the genome into and out of the bacterial chromosome, the phage utilizes a protein called lambda-integrase (Int).11,12 Int became the founding family member of T-SSRs. It recombines attP and attB recognition target sites with the help of additional host cofactors.13,14 About 10 years after the discovery of Int, the TSSRs Cre and Flp were first described in biochemical assays.15,16 Unlike Int and many other site-specific recombinases identified to date, Cre and Flp do not require accessory proteins to organize the synaptic complex that initiates the recombination reaction. This circumstance allows them to work in heterologous host organisms when the target sites are engineered into the genome. Due to this fact, Cre and Flp are now used as tools for a large variety of applications and model organisms, and while the more complex T-SSRs are also used for applied purposes (reviewed in Landy,14 Midonet et al.17), simple T-SSRs have found more widespread use in a wide range of biotechnology and biomedical disciplines.

Figure 2. Possible T-SSR recombination reactions. Triangles mark recombinase target sites on DNA symbolized as lines in different colors where appropriate. The brown triangle depicts a recombinase target site that is different in sequence from the other target site for the cassette exchange reaction. Arrows indicate the reversibility of the reactions with the larger size of the arrow indicating the kinetically favored excision reaction.

increasing demand for additional T-SSRs that can be used in combination with the established players. Even more recently, the combination of T-SSR-mediated genome alterations with other genome manipulation tools, such as TALENs and CRISPR/Cas9, further broadens the applicability of T-SSRs in advanced genome engineering. Finally, genome editing in clinical settings is in sight with T-SSRs, providing an instrument for future precision medicine via genome surgery. With this

2.1. Cre/loxP and FLP/FRT Systems

Simple T-SSRs were discovered in bacteriophages and yeast plasmids more than 30 years ago.18,19 In the bacteriophage P1 the enzyme Cre (Causes recombination) plays important roles in the life cycle of the phage, including cyclization of the linear

Table 1. Recognition Target Sites of Bacterial and Yeast T-SSR Systems

a

Cleavage positions in loxP and FRT sites. 12787

DOI: 10.1021/acs.chemrev.6b00077 Chem. Rev. 2016, 116, 12785−12820

Chemical Reviews

Review

harbors an 8 bp spacer. It is currently unknown what the molecular basis for this difference is, but this establishes that TSSRs from the same family can operate on target sites containing spacers of different length. Of note, Flp recombinase can actually recombine target sites with different spacer lengths (see section 3.7). New Cre-like recombinases have also been discovered in various bacteria and their phages. Inspection of the P1-related phage D6 uncovered the recombinase Dre,31 which recognizes the 32 bp target site rox (Table 1). While the rox site shows some similarity to the loxP sequence, no cross-reactivity of these two systems has been observed,31 even when the two systems were tested in mice.32 Consequently, the two recombinase systems have now been successfully used in combination in several experimental settings.33−41 Using a bioinformatics approach to identify Cre-like proteins, the TSSRs VCre and SCre encoded on the phage-plasmids p0908 of Vibrio sp. and p1 of Shewanella sp., respectively, were recently identified.42 Their respective recognition target sites, VloxP and SloxP, are 34 bp long, where the 13 bp inverted repeats are separated by an 8 bp spacer (Table 1). No cross-reactivity was observed between Cre, VCre, and SCre, indicating that these recombinase systems can be used in combination. The most recent addition to Cre/loxP-like recombination systems is the Vika/vox system.43 Vika originates from the Gram-negative bacterium V. coralliilyticus that carries a degenerate prophage. Its binding site, vox, is a 13-bp inverted repeat separated by an 8 bp spacer (Table 1). Comparative studies showed that Vika, Cre, Dre, and VCre recognize their respective targets in a highly specific manner and do not cross-react with non-native sites.43 VCre, SCre, and Vika have also demonstrated their utility in mammalian cells,42−44 and Vika has been demonstrated to function in yeast.45 However, their utility in multicellular organisms, such as mice, awaits experimental validation. Nevertheless, with additional four Cre/loxP and five Flp/FRT-like recombination systems at hand that can be used in combination without cross reactivity, highly sophisticated genome engineering experiments are now in sight. Furthermore, detailed and comparative investigations of these related recombination systems making use of available structural and functional information should be helpful in deciphering the molecular bases of the site specific recombination reaction of these fascinating enzymes and to guide efforts to further develop them to shape and reshape the genome of model organisms.

genome and resolution of multimerized chromosomes that form after DNA replication.20 Cre binds to a 34 bp long sequence denoted loxP (locus of crossing (x) over of P1). The loxP sequence consists of two 13 bp palindromic sequences, which flank an 8bp spacer region (Table 1). At about the same time the Cre/loxP system was discovered, another T-SSR, Flp, was revealed to be encoded on the Saccharomyces cerevisae 2 μ plasmid.19 Flp also recombines a 34 bp long sequence called Flp recognition target (FRT) (Table 1). Both enzymes carry out a virtually identical recombination reaction on their respective target sites, and the enzymes were initially studied in their natural hosts. It took about another decade before it was appreciated that these T-SSRs are highly specific, fast, and efficient, even when faced with complex eukaryotic genomes.21−23 These characteristics explain why today these enzymes play a vital role in shaping and reshaping the genome of model organisms, in particular when trying to understand biological function and mechanism and to model human diseases.24 T-SSRs have found widespread use to address complex biological questions in development, normal tissue homeostasis, and cancer. The Cre/loxP system was most frequently used in mammalian systems initially,25,26 whereas the Flp/FRT system had found widespread use in flies.27 However, in recent years both systems have found prevalent use in a large variety of organisms. 2.2. Emerging Tyrosine Recombinase Systems

Both the Cre/loxP as well as the Flp/FRT systems are extensively used in sophisticated genome engineering experiments. However, strategies allowing even more complex genomic manipulations are ever increasing, driving the demand for additional tools, including access to additional simple TSSRs with novel DNA binding specificities. A number of recent studies have therefore focused on the discovery of additional TSSR systems and their respective recognition target sites (Table 1). Because Flp recombinase is encoded by the yeast 2 μ plasmid, several studies investigated plasmids present in other yeast strains and discovered novel Flp-related recombinases. The first additional yeast plasmid encoded T-SSR was called R and was found to be encoded on the Zygosaccharomyces rouxii plasmid pSR1, where it recognizes 12 bp inverted repeats interrupted by a 7 bp spacer sequence28 (Table 1). Likewise, the 2 μ-like plasmid pKWS1 from the yeast Kluyveromyces waltii encodes the Flp-like recombinase Kw.29 The binding site of Kw consists of slightly longer inverted repeats (18 bp), separated by a 7 bp spacer (Table 1). More recently, the spectrum of yeast plasmid encoded T-SSRs was increased by three additional family members, called KD, B2 and B3 recombinase,30 which are encoded by 2 μ-like plasmids from Kluyveromyces drosophilarum, Zygosaccharomyces bailii, and Zygosaccharomyces bisporus, respectively. Their recognition target sites also contain a 7 bp spacer, but their inverted repeat length varies from 12 to 27 bp (Table 1). Whether this length difference has any impact on recombination efficiency and/or specificity is currently unknown, but all of these recombinases have been demonstrated to function in heterologous hosts, with KD, B3, and Kw also showing activity in mammalian systems.29,30 The recognition target sites for B2 and R recombinases are relatively similar in sequence, resulting in considerable cross-reactivity of these two enzymes in vitro and in vivo,30 therefore limiting their simultaneous use. An interesting difference of these newer yeast T-SSRs is that they all recombine a target site containing a 7 bp spacer sequence, whereas the Flp recognition target FRT

3. TYROSINE RECOMBINASE STRUCTURES AND MECHANISM The goal of this section is to summarize current knowledge regarding the three-dimensional (3D) structure of tyrosine recombinases and the structural changes that occur during recombination. Crystal structures of Cre, Flp, and Int have been solved, and they have been the subject of excellent reviews (see Grindley et al.,4 Van Duyne et al.10,46). This section is designed to provide a streamlined introduction to T-SSRs structures and will focus primarily on the Cre/loxP recombinase system, mentioning other members where applicable. Together, the currently available structures provide us with structural “snapshots” of the Cre recombinase along the recombination pathway and allow a 3D understanding of the different steps in the recombinase mechanism with atomic level detail. 12788

DOI: 10.1021/acs.chemrev.6b00077 Chem. Rev. 2016, 116, 12785−12820

Chemical Reviews

Review

Figure 3. Structural homology of tyrosine recombinases. Comparison of the tyrosine recombinases tetrameric complexes with their DNA targets: Flp/FRT (left), Cre/loxP (center), and λ integrase/attP (right). The C-terminal (catalytic) domain is shown on top. Each protein monomer is colored to represent the active (pale green) and inactive (magenta) conformations. The catalytic tyrosine residues are shown as red surfaces. Despite the low amino acid sequence homology, these X-ray structures highlight the overall similarity among the different tyrosine recombinases. The PDB IDs used in this figure are 1FLO (Flp), 1KBU (Cre), and 1Z19 (Int). The molecular graphics program PyMOL was used (The PyMOL Molecular Graphics System, Version 1.7.4 Schrödinger, LLC.).

3.1. Overview of the Mechanism of Site-Specific Recombination

quaternary structure (Figure 3) and possess structural similarities in their catalytic domains. Of all T-SSRs, Cre remains the best studied structurally. Currently, there are several X-ray structures available of Cre bound to loxP in a variety of states in the RCSB Protein Data Bank49 (PDB; www. rcb.org) (summarized in Table 2). There are far fewer available structures for Flp and Int. Furthermore, structures are also available of other interesting, distantly related members of this family. These include crystal structures of hairpin telomere resolvases (also known as protelomerases) in complex with DNA50−53 and a bacterial integron integrase/DNA complex.50,54 Although these recombinases possess unique properties that may be exploited in future bioengineering applications, a detailed description of these enzymes is outside the scope of this review. Because Cre has been crystallized with numerous different DNA substrates, the structures provide “snap-shots” along the reaction pathway (precleavage complex, synaptic complex, Holliday junction complex, etc.). The first crystal structure of a Cre recombinase/DNA complex was reported in 199755 (PDB ID 1CRX) using a symmetric “suicide” loxP DNA target, a sequence that traps the Cre in as a covalent intermediate conformation wherein the catalytic residue (Y324) forms a phospho-tyrosine linkage with the scissile phosphate. This crystal structure delivered many insights to the field and provided the initial structural basis of recombination by Cre. Since that time, structures of Cre complexes containing modifications of the DNA (e.g., replacement of the scissile phosphate with thiol groups, or use of modified bases) or modifications of the protein (i.e., catalytic inactive point mutations) have also been solved (see Table 2). Together they have allowed researchers to piece together the structural requirements of recombination. The earliest structures of the Cre/DNA complexes crystallized represented Cre dimers in the asymmetric unit, and the full tetramer was constructed by applying the crystallographic symmetry operations. Later, structures containing the complete tetrameric complex in the asymmetric unit became available, confirming the orientation of the Cre/loxP dimers. In all the structures, the Cre/loxP dimers are related by a 2-fold axis of rotation (crystallographic in the case of dimers in the asymmetric unit). This means that the loxP sites are oriented in an antiparallel fashion in the tetrameric complexes (as depicted in Figure 1).

Recombination is a multistep process that implicates four Cre molecules that are involved in stepwise-ordered DNA cleavage and ligation at four distinct sites (Figure 1). The first step of recombination involves “synapsis”, that is, formation of a tetrameric protein/DNA complex from previously bound Cre/ loxP sites. The synaptic complex is very stable, having a Kd of 10 nM47 and consists of four copies of Cre and two loxP sites (step 1 of Figure 1). At this point, two opposing Cre molecules are in an active conformation and poised for cleavage of one DNA strand of each loxP duplex. This occurs when the conserved, nucleophilic tyrosine at position 324 in Cre (Y324) attacks the scissile phosphate in the spacer region and forms a 3′ phospho-tyrosine intermediate, thereby creating a free 5′hydroxyl group. Next, strand exchange occurs when the free 5′hydroxyl attacks the other neighboring 3′ phosphotyrosine linkage to form an almost planar Holliday Junction (HJ) intermediate (step 3). The HJ then isomerizes, and the protomers alter their structure such that the pair that was initially active is now inactive and vice versa (step 4). Then, the same cleavage and exhange steps are repeated with the second set of strands resulting in recombined DNA products (step 6). As noted in Figure 1, each of these steps is reversible. Thus, the same enzyme complex that inserts a DNA segment can also remove it. Though the recombination reaction itself is equally favorable in the forward and reverse directions, external factors can drive the equilibrium in one direction or another. Phages that utilize T-SSRs often require host-cofactors and phageencoded directionality factors to counteract the reversibility and drive the recombination forward (reviewed in Rutherford et al.48). In the absence of directionality factors, entropy arguments suggest that removal of a segment of DNA flanked by loxP sites is more favorable than reinsertion of such a fragment. The former reaction involves a single DNA substrate, while the latter involves two DNA molecules. 3.2. Structural Overview of Tyrosine Recombinases

While the cartoon view depicted in Figure 3 is useful, such depictions are no substitute for understanding the molecular details of the recombination process. Available crystal structures provide a more detailed view of how the synaptic complex progresses through the recombination pathway. Although the tyrosine recombinases Cre, Flp, and Int share little sequence homology and present differences in their respective 3D domain structures, they do exhibit an overall conserved 12789

DOI: 10.1021/acs.chemrev.6b00077 Chem. Rev. 2016, 116, 12785−12820

Chemical Reviews

Review

are not implicated in loxP binding. In fact, Cre protein shows recombination activity in vitro when as many as 20 N-terminal residues are removed.68,69 Despite the symmetry of the loxP sequence, the Cre/loxP complex is asymmetric. There are two Cre molecules per loxP site, and each Cre makes substantial contacts with the loxP target. In most of the structures, for each loxP target, one Cre molecule adopts a “cleavage competent” conformation where this monomer is closer to the DNA and has the ability to react, while the second Cre monomer is more distal to the loxP and too far for the nucleophilic Y324 to attack. Binding of Cre induces a large (∼100 deg) bend in the central region of DNA (Figure 5A). The N- and C-terminal domains form a “C-clamp” shape as one Cre molecule wraps around each half-site, making extensive contacts with the DNA (Figure 5B). Cre utilizes both charge and shape complementarity to interact with the DNA, as clearly demonstrated upon examination of the electrostatic potential of Cre mapped onto its surface (Figure 5B). The Nterminal domain engages the one side of the half-site in the major groove near the central spacer region. The C-terminal domain contacts the other side of the DNA, across the half-site in both the major and minor grooves. Superposition of all known Cre tetrameric structures reveals that, overall, the Cre/DNA complexes are in very similar conformations (Figure 5C). The N-terminal domains are generally more similar to each other than the C-terminal domains, where certain regions move depending on where Cre is along the recombination pathway. These data suggest that no large structural rearrangements are required for the reaction to proceed once the tetrameric synaptic complex structure is formed. Rather, the tetrameric structure provides a scaffold for the small conformational changes that are critical for catalysis. These structural changes are detailed below.

Table 2. Summary of T-SSR Crystal Structures PDB ID

resolution (Å)

1NZB 4CRX 5CRX 3C28 3C29 1Q3U 2HOF 2HOI 1Q3V 1OUQ 1CRX 3MGV

3.10 2.20 2.70 2.60 2.20 2.90 2.40 2.60 2.91 3.20 2.40 2.29

Ennifar et al.56 Guo et al.57 Guo et al.57

1XNS 1XO0 2CRX

2.80 2.00 2.50

3CRX

2.50

Cre-loxP

1KBU 1PVP

2.20 2.35

mutant/mutant-loxP

1PVQ

2.75

1PVR

2.65

1MA7 1DRG 1F44 1FLO 1M6X

2.30 2.55 2.05 2.65 2.80

1P4E 1Z1B

2.70 3.80

Ghosh et al.58 Ghosh et al.58 Gopaul et al.59 Gopaul et al.59 Martin et al.60 Baldwin et al.61 Baldwin et al.61 Baldwin et al.61 Martin et al.62 Woods et al.63 Woods et al.63 Chen et al.64 Conway et al.65 Chen et al.66 Biswas et al.67

1Z1G

4.40

Biswas et al.67

1Z19

2.80

Biswas et al.67

reaction state Cre-loxP synaptic complex

Cre-loxP precleavage intermediate Cre-loxP covalent intermediate Cre-loxP transition state intermediate Cre-loxP Holliday junction

Cre-three way Y junction Flp/FRT Holliday junction

Lambda-Integrase synaptic complex Lambda-Integrase Holliday junction Lambda-Integrase postexchange complex

ref

Ennifar et al.56 Ghosh et al.47 Ghosh et al.47 Ennifar et al.56 Ennifar et al.56 Guo et al.55 Guo et al.55

3.4. Cre-loxP Precleavage Synaptic Structure

To understand the difference between “cleavage-competent/ active” and “cleavage incompetent/inactive” Cre conformers before DNA cleavage, analysis of a Cre/loxP structure trapped in a covalent intermediate state was performed (PDB ID 1CRX). The largest differences in the Cre protein structure occur in the loop region (AA196−210) and the C- terminal helices M (which contains the nucleophilic Y324) and N (discussed in greater detail below) (Figure 5D). Because Cre binds each loxP half-site in either a “cleavage competent” or “incompetent” conformation, the resultant protein−DNA interactions are not identical across a given loxP site. Binding of two Cre molecules to one loxP buries over 5000 Å2 of solvent-accessible surface area (Figure 6). There are no available structures of the Cre recombinase in the absence of DNA, presumably because the inherent flexibility of the domains and the linker region prevent crystallization. A schematic showing the protein DNA interactions is shown in Figure 7. It is clear from this schematic that Cre makes many base-and-backbone contacts with the half-sites and very few within the spacer region (shown in yellow). The binding of Cre induces a bend or a kink in the spacer sequence, but the exact location of the kink depends on the spacer sequence.56 For one precleavage Cre-loxP complex structure (PDB ID 1Q3U), this kink occurs adjacent to the active scissile phosphate and results in unstacking of bases and widening of the minor groove in the crossover region. It has been proposed that this kink activates the nearby scissile phosphate for cleavage and lowers the

Inspection of all the available Cre tetrameric complexes show that the Cre protomers are arranged in a head-to-tail fashion and have pseudo 4-fold symmetry (this is highlighted in the cartoon in Figure 1 by the asymmetric shape of the Cre protomer). This aspect of the structures may be confusing at first, given the 2-fold symmetric nature of the DNA targets (e.g., the loxP contains two repeats), which are oriented in a head-to-head fashion. This conundrum is explained by the helical nature of DNA and the significant bend (∼100 degrees) in the spacer region of the DNA. Together, these properties result in a cyclic arrangement of Cre molecules rotated ∼90 deg from each other in a head-to-tail orientation. 3.3. Structure of Cre Recombinase Bound to LoxP

Cre is a 343 amino acid long and primarily helical protein consisting of two domains connected by a short linker. Cre is monomeric in the absence of its DNA target. The N-terminal domain (residues 20 to 129) contains helices A through E, and the C-terminal domain that harbors the active site (residues 132 to 341) contains helices F through N (Figure 4). Residues 1 to 10 are disordered and have not been seen crystallographically. Residues 11 to 19 are also disordered or have poor density in all reported structures, indicating that these residues 12790

DOI: 10.1021/acs.chemrev.6b00077 Chem. Rev. 2016, 116, 12785−12820

Chemical Reviews

Review

Figure 4. Modular architecture and sequence of Cre. Cartoon representing the N-terminal (NTD) and C-terminal (CTD) domains of a Cre molecule (monomer) with α-helices marked from A-N bound to its DNA target loxP (gray double helix ribbon). Cre sequence and secondary structure elements with color correspondence to the upper panel are shown. Some residues discussed in the text are highlighted. Residues that are directly involved in catalysis are underlined. The catalytic tyrosine is indicated by an asterisk.

negatively charged phosphate along the reaction pathway, whereas W315 plays a structural role in the active site (reviewed in Van Duyne10). Each Cre monomer contains a complete active site (Figure 4). This is not the case for Flp, where the active site is composed of residues from two adjacent protomers64,70 (Figure 8; see also section 3.7). 3.4.3. Cre Protein−protein Interfaces. The Cre/loxP tetramer contains two different protein−protein interfaces: one within a single loxP site, one across two different loxP sites. These interfaces are very similar, although not identical. The Cre−Cre interface within the loxP site buries approximately 2000 Å2, while the interface across loxP sites buries only ∼1600 Å2. The interface within a single loxP site includes interactions between three parts of Cre: (1) the N-terminal domains (helix A from one monomer interacts with helices E and C from the neighboring monomer), (2) the flexible loop that contains the conserved residue K201 that interacts with the region between helices E and F, (3) extensive C-terminal domain interactions from one molecule with the M helix (containing the catalytic Y324), the intervening loop, and the N helix of the neighboring molecule) (see Figure 9). As discussed below, Flp differs significantly, and it appears that the Flp tetramer is more flexible, perhaps in part because of Flp’s larger crossover region relative to Cre4 (see section 3.7).

barriers for melting of the cleaved strand in preparation for strand exchange.56 Despite the ability to detail these many protein−DNA interactions for Cre, it is oversimplistic to describe a simple set of “interaction rules”. For example, structures of Cre-loxP and Flp-FRT complexes contain a number of water-mediated protein−DNA base contacts, known as “indirect readout” of the DNA 3D conformation, which is an intrinsic property of the DNA sequence. These characteristic molecular properties present challenges when trying to fully understand and predict specificity4 (see section 4). 3.4.1. Conformational Changes in Cre-loxP along the Recombination Pathway. During the recombination reaction, the largest conformational changes occur in the Cterminal domain of Cre. For example, the C-terminal helices M and N and a loop region (residues 198 to 208) act as conformational switches between the active and inactive form56 (Figure 5D). The loop region contains the critical residue K201 and adopts a different conformation. This loop is more flexible in the noncleaving conformation (sometimes totally disordered) relative to the cleaving conformer. Also, the nucleophile Y324, located within helix M, is poised for catalysis with the scissile phosphate in the cleaving conformation and moved away in the noncleaving form. Finally, helix N interacts “in trans” by burying its hydrophobic surface into a pocket of the neighboring molecule. These movements both alter the active site and allosterically communicate the reaction state of one Cre molecule to its neighbor. 3.4.2. Recombinase Active Site. Sequence analysis of the T-SSRs family revealed the presence of conserved residues necessary for catalysis (Figure 4). These residues occur predominantly in the C-terminal domains. The Cre active site includes the conserved residues R173, H289, R292, K201, W315, and the nucleophile Y324. K201 acts as the general acid and stabilizes the leaving O5′. The positively charged R173 and R292 residues (and to a lesser degree H289) stabilize the

3.5. Cre/loxP Holliday Junction Intermediates

Several Cre/loxP structures exist as Holliday Junction (HJ) intermediates. At this step along the recombination pathway, the 5′ OH has religated with a neighboring strand of DNA. In these structures, the DNA adopts a pseudoplanar conformation (Figure 10). The DNA in the spacer region has become unstacked, and there is a kink in the “crossover” strand of the DNA, whereas the continuous strand has no sharp kink. The next step in the pathway is isomerization of the HJ. These structures helped establish the isomerization model of strand exchange.59 This model does not require large motion of the recombinase, and only three bases adjacent to the cleavage 12791

DOI: 10.1021/acs.chemrev.6b00077 Chem. Rev. 2016, 116, 12785−12820

Chemical Reviews

Review

Figure 5. Structural overview of Cre/loxP. (A) Cre dimer bound to loxP. Two Cre molecules bound to a single loxP site (colored as in Figure 4). The DNA has a ∼107 deg bend in the central asymmetric spacer region. The scissile phosphates are shown as magenta spheres. The second view is rotated ∼90 deg for a clearer view of the C-terminal domains. (B) Electrostatic potential mapped onto the surface of a monomer of Cre. DNA is shown as an orange ribbon. Blue is positively charged, red is negatively charged and scaled with a range of ±5 kT. The electrostatic potential was calculated using the program APBS.357 (C) Superposition of Cre recombinase tetramers. This calculation was performed using available crystal structures from the RCSB, excluding the Cre-three way junction structures (Table 2). For each Cre structure, the tetramer was generated, if necessary, using crystallographic symmetry operators. The 21 tetramers structures (two 3-way trimeric structures were omitted) were superimposed using the alignment program in PyMOL. The root-mean-square displacements (RMSDs) of the tetramers range from ∼0.5−2.5 Å. The DNA was not included in the superposition calculation. The Cre subunits are displayed as a cartoon diagram and colored pink and green to highlight the alternating “active” and “inactive” conformers. (D) Conformational switch in Cre. Active versus inactive protomer obtained by superposition of the loxP half-sites in the structure PDB ID 1CRX. The superposition of active (green, blue) and inactive (pink, magenta) Cre conformers highlights the position of the largest changes that occur in Cre. The largest conformational changes occur in a loop region (AA196−210) and the C-terminal helices M and N (AA 318−341). Note that these regions also contain components of the active site: the catalytic nucleophile, Tyr 324 (helix M), and Lys 201 (loop 196−210). The right panel shows in stick representation the relative positions of the “active” and “inactive” catalytic tyrosines, colored green and pink, respectively. For simplicity, only the DNA from the “active” half-site is shown. The scissile phosphate is labeled and colored magenta. The phospho-tyrosyl linkage showed as a red dashed line. The hydroxyl group of the catalytic tyrosine is displaced 3.5 Å in the inactive conformer and no longer interacts with the scissile phosphate but an adjacent phosphate (indicated by the yellow dashed line; PDB ID 1CRX). The nucleotides are numbered.

3.6. Order of Strand Exchange

need to be melted in the spacer region to form the HJ after cleavage has occurred. In this model, isomerization occurs by subtle alterations in the protein−protein interfaces, which effectively swap the continuous and crossing strand conformations and, hence, the roles of the DNA strands.55,71,72 This then prepares Cre for the second round of cleavage and strand exchange.

It is believed that Cre has a preferred order of strand selection for recombination, in contrast to Flp which has little or no preference.73 The asymmetry of the spacer region in the DNA implies directionality for the cleavage. Nonetheless, there is conflicting evidence and hence disagreement in the field as to which strand in Cre gets cleaved first.56,60,74−76 The available structural data support both models. For example the structure 12792

DOI: 10.1021/acs.chemrev.6b00077 Chem. Rev. 2016, 116, 12785−12820

Chemical Reviews

Review

Figure 6. Protein−DNA interaction area in a Cre/loxP complex. (A) The DNA is shown as a surface representation and colored white except where it interacts with Cre (omitted for clarity). Regions of the DNA that are contacted by Cre are colored blue (for interactions in the grooves) or red (for interactions with the phosphate backbone). The central spacer region has far fewer protein−DNA interactions. This figure was created using the PyMOL plugin PDIviz.358 (B) DNA-contacting Cre residues. Cre (pale green) is shown as a surface representation with DNA-interacting areas colored in magenta. For clarity, only one Cre molecule is shown, and the loxP is depicted as a ribbon.

DNA to inhibit cleavage on the second arm. These studies also emphasized the importance of the sequence and flexibility of the bases flanking the central region, which suggests that there is a preference for the bottom strand to be exchanged first.79 Future studies will be necessary to more fully resolve this issue. 3.7. Structural Comparison of Flp/FRT to Cre/loxP

Currently, there are three available crystal structures (wt, a thermostable mutant, and a W330F mutant) of the eukaryotic T-SSR Flp tetramer in complex with its target DNA, FRT, as Holliday junction intermediates.64,66 These structures provide insight into the shared properties of the T-SSR family, as well as the properties that are unique to Flp. Many similarities between Cre and Flp exist. These include formation of a “cyclic” synaptic tetramer when bound to the recombination binding sites and proceeding via a planar HJ intermediate. The first crystal structure of Flp revealed that it contains two domains, an N- and a C-terminal domain.64 The N-terminal domain and a C-terminal segment (residues 332 to 423) appear to be unique to Flp.64 The structural homology of Cre and Flp only occurs within the catalytic C-terminal domain (AA 155−331), and the homologous regions share only 13% sequence identity. Superposition of these regions in Flp and Cre results in an RMSD of 3.3 Å (see Figure 8D). Perhaps the most striking difference between Flp and Cre is that Flp performs its recombination “in trans”. That is, although the active site residues are conserved, the Flp active site is formed by two Flp adjacent molecules. The “cleavingcompetent” monomer contains the active site residues (R191, H305, R308, and K223) positioned proximal to the scissile phosphate. The adjacent, noncleaving conformer correctly orients helix M such that the nucleophilic tyrosine (Y343) is poised to attack the scissile phosphate (Figure 8). Helix M is largely disordered in the noncleaving active site. Like Cre, Flp wraps around the double-stranded DNA, making extensive contacts. The nature of these interfaces are distinct for Flp as are the protein−protein contacts between adjacent monomers. Some of these differences arise from Flp’s unique N-terminal domain. Although both Cre and Flp recognition sites have a central 8 bp spacer, the spacing between the two cleavage sites differs: 8 bp for FRT and 6 bp for loxP (see Table 1). In addition, Cre only recombines substrates with an 8 bp spacer, whereas Flp can recombine DNA substrates having spacer regions either one base shorter or longer than the wt FRT substrate.80 The Flp/FRT crystal structure is a HJ intermediate having 7 bp between cleavage sites (PDB ID 1FLO). This extra basepair (relative to loxP) positions the Flp monomers farther from the center of the

Figure 7. Schematic of protein−DNA interactions of Cre and loxP. Scheme of direct Cre-DNA interactions from a “precleavage intermediate” crystal structure. (PDB ID 1Q3U). The interactions were calculated using the program PDBSUM.359 The asymmetric spacer is shown in yellow. The scissile phosphates are shown as solid red circles. The Cre residue numbers are shown in blue (noncleaving monomer) and black (cleaving monomer). The bold lines and dark residue numbers indicate base interactions; the fainter lines and residue numbers indicate phosphate backbone interactions.

(PDB ID 1Q3U) supports a top strand cleavage, whereas the structure of a precleavage intermediate (PDB ID 2HOI) containing a K201A mutation supports bottom strand cleavage. Biochemical and biophysical studies of mutant Cre demonstrated that the order preference (bottom strand) was linked to the preference of a particular DNA bend orientation (from among several possibilities) during the formation of the synaptic complex;77 this model is further supported by single molecule FRET studies.78 Recent computational studies of the Cre/loxP system identified critical Cre residues that clamp the 12793

DOI: 10.1021/acs.chemrev.6b00077 Chem. Rev. 2016, 116, 12785−12820

Chemical Reviews

Review

Figure 8. Active site of Cre and Flp. (A) Crystal structure of Cre trapped in a transition state intermediate using the phosphate analogue vanadate (VO4). Shown are the active site residues of Cre (carbon green, nitrogen blue, and oxygen red) and relevant nucleotides (carbon yellow, phosphate orange, oxygen red, and nitrogen blue). Vanadate is colored magenta. The stabilizing pentavalent coordination between the vanadate ion and nucleophilic tyrosine (Y 324) and the 3′ and the 5′ hydroxyl are indicated by magenta dashed lines. Other hydrogen bonds are shown as yellow dashed lines. (PDB ID 3MGV). (B) Crystal structure of Cre trapped as covalent intermediate (PDB ID 1Q3V). Cre active site residues (colored as in A) showing the 3′-phospho-tyrosine DNA linkage. The scissile phosphate is colored magenta, and the DNA is colored as in A. Relevant hydrogen bonds are shown as dashed lines. (C) Crystal structure of the Flp/FRT active site (PDB ID 1FL0). Unlike Cre, the active site in Flp is comprised of two monomers (colored pink and green). The DNA is colored as in A. The catalytic tyrosine (Y 343) from an adjacent Flp monomer (green) inserts into the active site of the neighboring monomer (pink). (D) Superposition of Flp and Cre monomers. Superposition of Flp onto the structurally homologous region of Cre (AA 135−341) yielded an RMSD of 3.3 Å over 727 atoms. In this ribbon representation, Cre is shown in gray, and Flp is colored blue (for the nonstructually homologous regions) or magenta (the structurally homologous region, AA 155−331). The regions of Cre and Flp not involved in the superposition are displayed as transparent ribbons. Certain Cre alpha helices are labeled to help orient the viewer, in particular αM (which contains the nucleophilic tyrosine Y 324) and αN (which interacts with an adjacent Cre molecule). The Flp nucleophilic tyrosine (Y343) is shown in red as stick representation. Also shown (in yellow) is the adjacent Flp monomer that completes the Flp active site in trans. The PDB IDs used for this analysis were 1FLO (Flp) and 1CRX (Cre).

HJ.64 Flp also requires larger motions to resolve the Holliday junction. The Flp synaptic complex appears more flexible than that of Cre, and its N-terminal domains are linked together via flexible loops. It is possible that this greater flexibility allows Flp to recombine DNA containing varying spacer lengths.64 Detailed biochemical analyses of Flp and Cre have demonstrated clear differences in their kinetic pathways and the efficiency with which these enzymes catalyze recombination. Whereas the HJ intermediate of Cre can proceed in either the forward or reverse direction and (in vitro) often results in nonproductive synaptic complexes, the Flp reaction almost always proceeds to completion.81,82,78 These functional differences are consistent with the magnitude of the structural changes between active and inactive subunits when Flp and Cre are compared. As the changes in protein−DNA interactions between active and inactive monomers in Cre are small relative to those in Flp, it makes sense that the directionality of the Cre reaction is less committed to the forward (as opposed to the reverse) reaction than Flp.65

3.8. Structural Summary and Outlook

The Cre/loxP system and its related cousins have proven fertile grounds for structural investigation of the tyrosine recombinase pathway. Through the technique of X-ray crystallography along with the clever design of DNA substrates (reviewed in Verdine et al.83), Cre has been captured in a variety of reaction states ranging from synaptic complex, precleavage, etc. As seen earlier (Figure 5C), all these structures can be superimposed fairly well, indicating that once the synaptic complex has formed, no large protein−protein or protein−DNA arrangements are required to effect recombination. Analysis of these structures highlights the asymmetry and the complexity of the interactions that occur, the many small changes that occur as Cre transitions from one state to another, and how these are communicated across the synapse. In addition, while the active site is conserved within the T-SSR family, a different series of protein−protein interactions enable the recombination to occur with differing specificities. Nonetheless, certain questions remain (e.g., concerning the order of strand exchange). In 12794

DOI: 10.1021/acs.chemrev.6b00077 Chem. Rev. 2016, 116, 12785−12820

Chemical Reviews

Review

Figure 9. Protein−protein interfaces of Cre tetrameric complex. (A) Overview of protein−protein interactions in a covalent intermediate structure (PDB ID 1Q3V) highlighting the C-terminal portion that acts as a conformational switch during recombination. The active and inactive conformers are colored green and magenta, respectively. The residues involved in the protein−protein interfaces are shown as blue and magenta ribbons, respectively. Two subtly different protein−protein interfaces are present: “crossover interface” (type I), between monomers on the same loxP site, and the “synaptic interface” (type II), between monomers on opposing loxP sites59. The black box indicates the interface region depicted in panels B and C. (B) A close-up view of the interaction involving residues from the C-terminal domain. This illustrates how helix N sits into a pocket on the adjacent monomer making numerous contacts. The surface area of one monomer is colored magenta, and its interaction surface is highlighted in magenta dark. The adjacent monomer (colored green) is shown as a ribbon. Only side chains of interacting residues are displayed as sticks and colored blue. (C) Three close-up views of the type I interface. The left panel focuses on the interaction between the N-terminal domains (helices A and E). The central and right panels highlight the C-terminal domain interactions, showing the interacting residues from one monomer (blue) with the other (magenta). Interacting helices are labeled.

addition, it is very difficult to predict what changes in the protein will alter specificity (see section 4). Structures of evolved recombinases (see section 4.2) and novel naturally occurring T-SSRs (see section 2.2) will certainly extend our mechanistic understanding on how these enzymes recognize DNA and site-specifically recombine their targets.

high resolution (see section 3 and Table 2) have evidenced the complexity of this macromolecular complex, but at the same time, they have paved an extraordinary information-rich path toward understanding the basis of its molecular recognition, which constitutes the first step toward the design of customized specificity. This section provides an overview on how the complexity of Cre-DNA binding recognition controls selectivity, and it summarizes advances in the engineering of Cre recombinase specificity.

4. CRE DNA BINDING SPECIFICITY Whereas the overall recombination mechanism in the Cre/loxP system is currently well-understood (as described in section 3), there are still open questions regarding its target selectivity. Designing customized target-site specificity in T-SSRs requires a good understanding of the structure−function relationships (i.e., functional mechanisms) in these enzymes. Numerous structural, biological, biochemical, and computational studies have focused on understanding sequence recognition in the Cre/loxP system. Structures solved by X-ray crystallography at

4.1. Complexity of DNA Recognition by Cre and Basis for Its Selectivity

As seen in section 3.3, Cre utilizes both its N- and C-terminal domains to recognize DNA (Figure 4). This 3D architecture is well conserved in the T-SSR family despite the low sequence similarity that its members share outside the active site. Cre performs DNA recombination at specific sites by forming a Cshaped clamp on the DNA (Figure 5B) and establishes contacts 12795

DOI: 10.1021/acs.chemrev.6b00077 Chem. Rev. 2016, 116, 12785−12820

Chemical Reviews

Review

Figure 10. Holliday junction isomerization of Cre. (A) Ribbon diagram of the DNA in a Holliday junction (HJ) intermediate (PDB ID 1XNS). The Cre monomers are omitted for clarity. The top and bottom strands are colored cyan and orange, respectively. Scissile phosphates (4) are marked in red. In (A), the cyan strand is termed “crossing strand”, and the orange is the continuous strand. Below a rotated view highlighting the planarity of the structure is shown. (B) Isomerization. The continuous strand now becomes the crossing strand and vice versa via subtle change in protein− protein interactions. This results in the reorientation of the scissile phosphates such that the alternate Cre molecules are now in a cleaving conformation. At this point, the second round of strand exchange can occur.

It has been established that part of the difficulty in gathering a full understanding for expanding targeting may be due, in part, to the fact that in evolution the change into a new specificity profile proceeds through a previous stage of relaxed specificity.84−86 Thus, whereas some mutations may appear in order to recognize the new target, they maintain the ability to recognize the original target and then be responsible for residual specificity. For instance, the in vitro evolution studies used to evolve Cre into the new recombinase, Fre, recombining loxH (with different spacer than loxP and 4 mutations in the half-sites (see Table 3)) showed that the initially obtained Fre variants exhibited relaxed specificity and recombined both loxP and loxH.84 The most specific Fre/loxH recombinases obtained in this work contained the mutations K86N, Q94L, R101Q, S108G, N317T/H, and I320S (Figure 4). The respective residues in Cre had been previously inferred to be involved in positioning the loxP spacer for cleavage,57 indicating that their modifications in Fre are most probably allowing recombination of the new target site by bringing the altered spacer sequence into the correct conformation for subsequent cleavage. Other amino acid positions close to the recognition site mutated in the Fre variants include 30, 85, 129, 262, and 319. Interestingly, residue E262, which is close to the modified DNA bases in the inverted repeat (Figure 11), appeared mutated since the initial evolution cycles (first into Ala/Gly and then conserved to Gln). This residue was thus postulated at that time to be key in enhancing binding to loxH. E262 was indeed later confirmed to be a “guardian for loxP selectivity” (vide infra).87 We will see in the next sections how this residue has appeared in more recent studies to be a key player in specificity and related to the process of target specificity relaxation. 4.1.2. Role Played by Noncontacting Positions in the Cre/loxP Complex. The high specificity exhibited by Cre toward loxP cannot be fully explained only based on the relatively few direct contacts established between Cre residues and loxP bases in the major groove (Figure 11). Indeed, several studies have concluded that the specificity of proteins for DNA sequences cannot be simply explained by one-to-one direct

with the minor and major DNA grooves (Figure 6 and 7), which are distributed in two main regions: (i) the αJ in the Cterminal domain that binds in the major groove near the center of the loxP half-site and (ii) the αB and αD from the Nterminal domain that span both sides of the major groove near the start of the spacer region (Figure 11).

Figure 11. DNA recognition by Cre. Zoomed view into the Cre/loxP interface of PDB ID 1Q3U. Ribbon cartoon of a protein monomer showing side chains of protein residues (labeled in black) involved in interactions at the DNA major groove (bases involved labeled in gray).

In this section, we will see that it is not only the sequence variation of the catalytic core that grants Cre specificity toward its DNA substrate but a much more intricate system. The complexity of the Cre-loxP interface in terms of understanding its selectivity resides in four main features of this macromolecular system. 4.1.1. Relaxed versus High Target Specificity. The important role of some amino acids in Cre and some bases in its DNA target site loxP has been elucidated through studies carried out along the last two decades. However, identification of these residues is not sufficient to understand or redirect the high target-site specificity of the system. 12796

DOI: 10.1021/acs.chemrev.6b00077 Chem. Rev. 2016, 116, 12785−12820

Chemical Reviews

Review

Table 3. DNA Target Sequences for Cre and Cre Variantsa

a

Left half-sites are in dark blue and spacers in red; right half-sites are in light blue and bases deviating from loxP in green.

and establishing hydrogen-bonding networks. Furthermore, interfacial solvent has been shown to be more than simply space-filling or linking atoms but also key in assisting the conservation of protein interactions despite evolutive sequence.92 The ample water-mediated interactions network established between Cre and loxP acts indeed as a molecular bridge between both binding partners (Figure 12), and in addition, it has been shown to have important consequences for specificity. Crystallography studies revealed how the macromolecular plasticity of protein and DNA together with hydration plays a key role in remodeling the Cre-DNA interface and establishes target specificity.61 Two Cre variants CreA174‑L258‑S259‑H262‑G266

correspondences between protein amino acids and DNA bases (reviewed in Grindley et al.4). Illustrating the complexity of the macromolecular Cre/loxP system is the fact that non DNA-contacting residues have been found to impose site selectivity. For instance, the mutation E262G was shown to increase the recombinase activity by 104 fold against loxK2 (Table 3), a target site that is mutated at positions that do not contact the protein.87 This means that a single amino acid in Cre (i.e., E262) may act as a “custodian” for loxP selectivity and that its modification together with changes on noncontacting bases in the DNA may allow DNA target site discrimination. This work supported the establishment of site specificity relaxation as a possible pre-evolution step toward a new specificity as previously proposed,84 and it highlighted the important role played by noncontacting functionalities. 4.1.3. Cre/loxP Physicochemical Nature: Flexibility and Hydration. The intrinsic flexibility of the binding partners involved in the Cre/loxP macromolecular complex, the conformational plasticity of DNA (i.e., sequence-based local flexibility), and the solvent as an active player (i.e., interfacial water molecules establishing hydrogen bond networks) have been shown to be essential for the establishment of specific molecular recognition. Hydration has been related to hot-spots for protein interactions with DNA since the late 90s and is also considered to play a role in furnishing the specificity of protein−DNA recognition.88−90 A series of studies have analyzed in detail the available high-resolution crystal structures of protein−DNA complexes in order to decipher the rules governing amino acid−base recognition (reviewed in Jayaram et al.91). It has been well-established that interfacial water molecules have an important “linking” role by providing an extension to protein side-chains (i.e., they can occupy positions and play physicochemical roles analogous to amino acid functionalities)

Figure 12. Detail of Cre/loxP interface at the interaction between αJ and the major groove. Cre is shown in brown and loxP in white. Interacting protein amino acids and DNA bases are displayed as sticks and colored by atom type. Water molecules are shown as red spheres. Water-mediated interactions between Cre amino acids and loxP bases are highlighted with red dashed lines (PDB ID 1KBU). 12797

DOI: 10.1021/acs.chemrev.6b00077 Chem. Rev. 2016, 116, 12785−12820

Chemical Reviews

Review

Figure 13. Hydration in the Cre-DNA interface and sequence-based small DNA structural shifts rule target specificity. (A) Superposition of the crystal structures of complexes Cre/loxP (PDB ID 1KBU, gray), CreLNSGG/loxP (PDB ID 1PVR, yellow), CreLNSGG/loxM7 (PDB ID 1PVQ, orange), and CreALSHG/loxM7 (PDB ID 1PVP, blue) showing crystallographic water molecules as spheres. (B) Detail of hydration at Cre/loxP interface. Water molecules mediating interactions between Cre amino acids and loxP bases are highlighted as bigger spheres and colored in red. (C) Detail of hydration at Cre/loxP interface in comparison with CreLNSGG/loxP. (D) Detail of hydration at CreLNSGG/loxP in comparison with CreLNSGG/loxM7. Water molecules mediating interactions between CreLNSGG amino acids and loxM7 bases are highlighted as bigger spheres. (E) Detail of hydration at CreALSHG/loxM7 in comparison with Cre/loxP. Water molecules mediating interactions between CreALSHG amino acids and loxM7 bases are highlighted as bigger spheres in blue and between Cre amino acids and loxP bases in red. In panels B, C, D, and E, the same orientation as Figure 12 is used for reference. DNA is represented by ribbons, and bases at positions 7, 8, and 9 and complementary 26, 27, and 28 are displayed as sticks and colored by atom type.

and CreL174‑N258‑S259‑G262‑G26693 were shown to discriminate between loxP and loxM7, an engineered substrate inactive with wild-type Cre and containing 3 changes in each half-sites with respect to loxP (Table 3), thanks to the interplay of watermediated interactions and a small DNA structural shift (Figure 13). The crystal structure of CreALSHG (variant preferentially recombining loxM7) in complex with loxM7 (PDB ID 1PVP) and the complex structure of CreLNSGG (variant with relaxed specificity and recognizing both loxP and loxM7) with loxM7 and loxP (PDB ID 1PVQ and 1PVR, respectively) showed that CreALSHG contacts the modified DNA bases through a hydrated network of novel protein−DNA contacts, whereas CreLNSGG utilizes the same DNA backbone interactions but different base contacts, all facilitated by an unexpected DNA shift. This very interesting structure-based work illustrated how water contributes to the adaptation necessary for the establishment of new DNA binding specificities: few protein mutations can originate novel interactions with the DNA because of the assistance of interfacial hydration networks, and whereas flexibility may cause promiscuity, water-bridged interactions may increase selectivity. Water-assistied specificity in evolution may be seen as a crafty way that nature deals with a limited repertoire of available

amino acids and bases. This, in turn, presents certain challenges for predictive protein−DNA binding specificity schemes. On the basis of this information, it could be devised that, in principle, specificity in the Cre/loxP recombinase system could be remodeled by introducing small sequence variations that affect indirect readout (i.e., DNA’s sequence-dependent conformational parameters). This idea can be extended by considering and evaluating at the atomic level the interaction arbitration and specificity-leading role played by hydration networks forming at the protein−DNA interface. 4.1.4. Cre/loxP Quaternary Structure. The pseudo 4-fold symmetry in the Cre/loxP system (i.e., symmetric arrangement of protein subunits in space) introduced earlier (section 3.3) restricts target recognition to achieve high selectivity and imposes important challenges to the design of custom target site specificities. Indeed, recent engineering efforts to redesign specificity focused on disrupting the symmetry of the Cre tetramer assembly, work that will be reviewed in section 4.2. 4.2. Engineering of Custom Cre Recombinases with Altered Specificities

With advances in technology and the availability of new sequence and structural data, attention in recent years has been 12798

DOI: 10.1021/acs.chemrev.6b00077 Chem. Rev. 2016, 116, 12785−12820

Chemical Reviews

Review

R337E, and E123L. The E123L mutation appears to be key for penalizing formation of homotetrameric complexes by creating favorable van der Waals interactions between mutant monomers but unfavorable contacts with wild type. Three iterative rounds of structure-based computer-aided design were needed to redesign the monomer interfaces to allow them to be functional but incompatible with their wild-type counterparts and therefore control the assembly of the functional complex in order to achieve recombination at asymmetric sites avoiding off target recombination events. Of note, some naturally occurring T-SSRs, such as the XerCD system indeed form heterospecific recombination complexes,17 possibly providing a helpful model of how to engineer heterospecificity into homotetrameric complexes. Another recently applied approach has focused not on reducing the number of possible targets but on increasing the energy difference between binding to the target and to the offtarget sites.100 In this study, a theoretical model of DNA binding accuracy was established by reasoning that the formation of the Cre dimer of dimers is not site-specific as it does not involve new DNA binding events and, therefore, concluding that the precision of dimer formation determines the accuracy of recombination. In accordance with this theory, accuracy toward loxP could be improved by decreasing the cooperative binding involved in formation of the tetrameric complex. Therefore, when introducing mutations, the focus was directed toward the dimerization interface of Cre instead of at the DNA-Cre interaction. In particular, the authors found residue R32 (helix αA closest to the N-term, Figure 4) as a key amino acid to target, as it is heavily involved in protein−protein interactions between Cre subunits (i.e., R32 establishes intersubunit stacking interactions, salt bridges with three acidic side chains, and is tightly associated with water molecules). The Cre variants containing the selected mutations, CreR32V and CreR32M, indeed presented decreased off-target activity, albeit with reduced recombination efficiency compared to the Cre wild-type enzyme. The authors concluded that destabilizing binding cooperativity could be a general strategy for improving accuracy of these DNA binding proteins. However, it must be emphasized that the gain in accuracy occurs at the expense of loss in efficiency since the same network of intersubunit interactions being tackled in this approach is also responsible for stabilizing the synaptic complex. This brings into a prime position in the “design wish list” the generation of protein variants affecting specific binding and cooperativity but without having any negative impact on the synapsis. Along these lines, the last two strategies discussed above, “obligate heterodimer” and “cooperative destabilization”, could be combined (i.e., modulating binding cooperativity of heterodimers) and thus move toward the above-stated engineering goals for T-SSRs. 4.2.2. Exploiting “Interdependent Contacts” to Rationally Engineer Customized Cre-DNA Specificity. Directed molecular evolution approaches to alter Cre DNA binding specificity have indicated that most variants emerging during the evolution process have relaxed specificity.61,84,93,101 These, together with studies bringing evolution-based mutations into a structural three-dimensional context,61 have revealed that mutations leading to relaxed specificity can give rise to rearrangement of contacts and therefore multiple interdependent interactions. Exploiting “interdependent contacts” in a rational manner represents a promising opportunity to design high DNA recombination specificity.

directed toward the potential use of T-SSRs for therapeutic applications. As will be discussed in section 6, although directed evolution strategies have been successful in delivering efficient and specific T-SSRs, the achieved altered substrate specificities presented by these enzymes have typically been accompanied by residual activity toward their natural targets. This remaining target promiscuity has so far represented a hurdle for the application of T-SSRs in applications aiming at their therapeutic use. Directed evolution approaches have been combined with rational methods in recent years to best harvest their valuable potential for engineering Cre recombinase variants with altered specificity. This section summarizes the strategies followed and the successful results obtained. 4.2.1. Manipulating Protein−protein Interaction Properties to Tackle DNA Target Specificity. Several structure-based computer-aided methods have been applied in combination with experimental work in order to tackle Cre specificity. The studies carried out have targeted not only CreDNA interactions but also have focused on interfacial residues forming multiple Cre−Cre interactions that are not directly involved in contacting DNA. Initial studies exploited Cre protein−protein interactions for redesigning DNA site recognition consisting of engineering Cre recombinase pairs that targeted hybrid asymmetric sites.94 Cre wild type and the variant CreALSHG with altered-specificity toward loxM7 (vide supra, Table 3) were used to generate a “dual specificity” heterotetrameric complex, which recombined an asymmetric site containing half-sites from loxP and loxM7. A considerable problem with this strategy resided in the fact that free subunit association in Cre variant mixtures would imply possible offtarget effects. With this in mind, subsequent approaches focused on defining unique spatial arrangements of heterotetramer subunits to avoid nontarget cleavage. Motivated by previous work on zinc finger nucleases,95,96 an obligate heterodimer strategy was adopted in which oligomerization was controlled by designing a “hydrophobic size switch” among residues at the Cre dimer interface.97 Here, the molecular focus was the domain swap occurring between the N-helix at the Cterminus of one monomer subunit into a pocket in the Cterminal domain of the neighboring monomer and involving the hydrophobic residues M299, A302, V304, A334, M335, and L338 (Figure 4). Using an elegant and simple size switch of hydrophobic small-to-big and big-to-small mutations, the authors redesigned the Cre and CreALSHG interface to achieve a favorable interaction that only forms when bound to their corresponding half-sites in a loxP and loxM7 DNA hybrid. This approach results in a recombinase consisting of a constrained heterotetramer, where the subunit arrangement fits the desired hybrid target. More recent studies have exploited in a similar manner the electrostatics at the Cre interface instead of hydrophobic steric effects in order to engineer an obligate heterotetrameric complex of mutant pairs that recognizes asymmetric target sites.98 Here, a controlled assembly was presented, which is ruled by designing incompatibility with wild-type and by selective destabilization of the homotetrameric complex. In this “negative engineering” strategy Rosetta399 was used for predicting side chain modifications of residues at positions 25, 29, 32, 33, and 35 (in Cre chain A) and 69, 72, 76, 119, and 123 (in Cre chain B), as well as the salt bridge formed by E308 and R337. Two Cre mutants were engineered: CreB2 containing the mutations E69D, R72K, L76E, and E308R, and the CreA3 variant with the mutations K25R, D29R, R32E, D33L, Q35R, 12799

DOI: 10.1021/acs.chemrev.6b00077 Chem. Rev. 2016, 116, 12785−12820

Chemical Reviews

Review

Indeed, recent work has combined experimental evolution approaches with computational methods making use of available structural data on the Cre/loxP system in a strategy to rationally design a new highly specific recombinase (sTre), which efficiently recombines a sequence present in the long terminal repeat of an HIV-1 isolate (loxLTR, Table 3).102 As a first step in this work, substrate-linked directed molecular evolution was used to extend the evolution process on the previously generated Cre-based Tre recombinase103 in order to enrich for conserved mutations that efficiently recombine the loxLTR substrate. A selection of all mutations repeatedly appearing in the obtained pool of clones at a frequency higher than 85% was then considered in order to construct a consensus recombinase, Tre14 (containing 14 highly conserved mutations: V7I, P12S, P15L, Y77H, G93C, Q94R, S108G, A175S, N245Y, R259Y, E262Q, G263R, N317T, and I320S). Tre14 was found to be a highly active enzyme but still exhibited considerable activity on loxP as the substrate-linked directed molecular evolution enforces little selection pressure on the specificity of the enzyme. Molecular modeling and dynamics simulations were then used to build 3D atomic-detailed models of the interaction of Tre14 with loxP and loxLTR, and the mutations were investigated in comparison to the Cre/loxP complex in order to address the residual activity exhibited by Tre14 on loxP. In particular, the theoretical models were used to rationalize those evolution-derived mutations that could take the role of residues which particular mutation could balance the specificity toward a preference for loxLTR. By focusing on residues interacting in a dissimilar manner in the Tre14/loxP and Tre14/loxLTR 3D models and also by comparing to the Cre/loxP crystal structures, positions 43, 86, and 94 emerged as important players to balance the specificity toward a preference for loxLTR. The following rationale (Figure 14) was used to

engineer sTre, a highly active and specific loxLTR recombinase which does not recombine loxP: in Cre, residues K43 and K86 interact with the major groove in loxP (see Figure 11), and the introduction of mutations K43E and K86E would impair loxP recognition by sTre, whereas they would not affect loxLTR recombination as E43 can interact with the changed bases, and the role of K86 in sTre would be taken over by the Q94R mutation (appearing in the evolutionary process), which is close in space and could therefore grant specificity for loxLTR. This work showed that rationalizing substrate-linked directed molecular evolution data in the context of 3D atomic models using available structural data and simulation is useful for discerning the role in activity and selectivity of selected mutations. The work also provided insights into the functional role of “indispensable” residues and on how to make them “dispensable” in order to achieve the desired target site selectivity. A significant conclusion from this interdisciplinary work is the fact that, when targeting selectivity, efforts would benefit from focusing not only on concrete residues that may be considered indispensable for specificity but also on neighboring residues that could potentially take this role. Very recently, molecular evolution approaches have produced yet another Cre-based recombinase, Brec1, which is highly specific for loxBTR (Table 3).101 In this work, the structure-based analysis of the frequently obtained mutations in the context of knowledge acquired from other previously evolved Cre-based recombinases has identified four mutational hot spots clustering at the protein−DNA recognition interface. These regions have evolved differently and are likely to be critical for target site specificity as they cluster at the protein− DNA recognition interface (Figure 15). For instance, the region of DNA contacted by residues 259 and 262 is different for loxLTR and loxBTR, and precisely the mutations that evolved for these residues offer a different profile of hydrogen bond donors and acceptors pointing into the major groove.

Figure 14. Rationale for the design of the highly specific loxLTR recombinase sTre. Protein residues at positions 43, 86, and 94 are key for modulating loxP vs loxLTR specificity. In Cre, residues K43 ad K86 interact with the DNA major groove in loxP (see Figure 11 for atomic detail). Mutations K43E and K86E in sTre impair loxP recombination, whereas they do not affect loxLTR recognition as E43 can interact with the changed bases, and the role of K86 in sTre is taken by residue R94.

Figure 15. Mutational hotspots in evolved Cre-based recombinases with altered DNA binding specificity. The structure of Cre in complex with loxP (PDB ID 1Q3U) is shown as a cartoon. Frequently mutated DNA-contacting regions are highlighted in orange, green, purple, and blue. The residues corresponding to these regions in Cre, Brec, and Tre libraries are aligned and conserved mutations are shown in orange, green, purple, and blue, respectively. Residue frequencies are represented by the size of the amino acid letters. Note the permutations of certain residues in evolved Brec and Tre recombinases. 12800

DOI: 10.1021/acs.chemrev.6b00077 Chem. Rev. 2016, 116, 12785−12820

Chemical Reviews

Review

mouse T cells,124 and during mouse embryonic development.125 Furthermore, Cre expression systems that regulate the activity of the recombinase, such as Cre-estrogen receptor ligand binding domain fusions (see section 5.3), can be toxic to cells.126−129 Similar Flp-mediated off-target effects in heterologous organisms have not been reported so far. An important aspect to reduce inadvertent Cre-mediated effects is to limit the expression level and to keep the time of exposure of the recombinase minimal.119,130 To evaluate generated Cre driver lines, reporter alleles that inform on differences in Cre expression have been generated.131 On the basis of these findings, guidelines to perform Cre-mediated conditional mutagenesis have been proposed.132 Independent of these cautious notes, the Cre/loxP system has been instrumental to advance genetic studies and a major utility for the applied use of T-SSRs remains conditional mutagenesis. Numerous animals expressing the recombinase from constitutive or tissue specific promoters have been reported (http://www.creline.org/other_cre_db_resources). Furthermore, many additional animals have been generated where a gene of interest has been flanked by recombinase recognition targets to conditionally ablate the gene when crossed to a recombinase strain. T-SSR-mediated conditional mutagenesis is now the gold standard to investigate gene function in mice, but the technology is nowadays also extensively used in many other model organisms.133−140 Summarizing the recent work in this area would go beyond the scope of this review, and we refer the reader to specialized publications for this topic.141−147 In this Review, we will highlight more specialized applications of T-SSRs and their use in combination with other genome engineering tools, which promise to substantially improve shaping and reshaping genomes of model organisms.

Interestingly, compensatory sets of mutations occur, presumably to accommodate changes by maintaining specific interactions (for instance, a switch in charges in Brec1 with mutations R259D and E262R versus R259Y and E262Q in Tre). The evolution sequence data obtained in these studies represents notable ground to deepen our understanding of DNA binding specificity of T-SSRs. It will be interesting to investigate in detail sequence variations in the context of available structural data and to apply molecular modeling and dynamics simulations as performed in previous studies102 to unravel the molecular mechanisms involved in protein−DNA recognition and establish a rationale for specificity. In conclusion, recent studies have brought remarkable achievements in the design of Cre-based recombinases with improved specificity. Computer-aided atomistic insights into the functional role of evolution-selected mutations may become fundamental to decipher the complexity of DNA recognition by Cre-like recombinases. The priceless information from currently available and still to come high-resolution structures will surely ease the path to rapidly and efficiently generate customized or à la carte specificity designs and will open new possibilities to apply custom designed T-SSRs in sophisticated genome engineering exercises. While to date most work on engineering T-SSR DNA binding specificity has been conducted with Cre recombinase, other T-SSR systems have also been employed and extensive work has revealed that Flp specificity can be altered employing similar technologies.85,104−107 These studies indicate that different T-SSR systems are amenable to change their DNA binding specificity, substantially broadening their use in applied genome engineering.

5. APPLIED USE OF TYROSINE RECOMBINASES While solving the molecular basis for function and catalysis of Cre and other tyrosine recombinases is a fascinating scientific topic and important per se, their applied use in biotechnology and biomedicine has considerable impact. This is maybe best documented by the more than 4800 hits returned in a Pubmed search for “Cre recombinase”. Almost ten times fewer hits were retrieved for the search term “Flp recombinase”, reflecting the predominant use of the Cre/loxP system in applied site-specific recombination. One reason for the preferred use of Cre over Flp in applied research in certain experimental settings is that wild-type Flp operates efficiently at 30 °C but not at higher temperatures relevant for many organisms.108 Hence, the Cre/ loxP system is advantageous for applications in homeothermic species. Nevertheless, a thermostable version of Flp (Flpe)109 has been generated and further improved (Flpo)110 in recent years. These improvements have drastically enhanced the utility of the Flp/FRT system for use in homeothermic systems,111−117 which has also enhanced the combined use of these two recombinase systems (see section 5.6). The utility of T-SSRs for conditional mutagenesis rests on their efficiency and specificity for recombining only their respective target sites. DNA rearrangements elsewhere in the genome could be deleterious. Indeed, several studies have reported unwanted side effects when Cre recombinase was applied in particular experiments. loxP-independent Cremediated genomic alterations were first reported in cultured cells118,119 and in mouse spermatids.120 More recent work has validated that expression of Cre can be toxic in the mouse lung,121 mouse retinal pigment epithelium,122 mouse heart,123

5.1. Lineage Tracing and Brainbow

The combination of T-SSRs with recombination reporter constructs and live-cell imaging provides the means for heritable labeling of a particular cell type, and it has enabled investigations of cell fate in vivo. Lineage tracing is a powerful tool to track cells in vivo and provides enhanced spatial, temporal, and kinetic resolution of the mechanisms that underlie tissue renewal and repair. T-SSR-mediated cell labeling now represents the gold standard for defining cell fate, and the technology has substantially advanced our understanding of cell fates in living organisms. In T-SSR-mediated lineage tracing a recombinase expressed in a cell- or tissue-specific manner is employed to activate the expression of a conditional reporter gene (typically a fluorescent reporter), thus permanently labeling all progeny of the marked cell (Figure 16A). This strategy has now been widely applied to many model organisms and tissues to trace the origin of stem cells and their progeny during morphogenesis. For an in-depth assessment of lineage tracing employing T-SSRs, we refer the reader to excellent recent reviews on this topic.148−156 A major limitation in classical lineage tracing studies is that cells belonging to one cell type are typically labeled by the same color. Because labeled cells are frequently in close proximity to one another, it is often difficult to resolve detailed morphology or movement of individual cells. In complex tissues such as the brain, it is therefore difficult (if not impossible) to reliably track discrete cells. To solve this problem, the brainbow multicolor labeling approach was designed and implemented.157 By combining three or more distinctly colored fluorescent 12801

DOI: 10.1021/acs.chemrev.6b00077 Chem. Rev. 2016, 116, 12785−12820

Chemical Reviews

Review

patterns,157,164 genetic regulation of single cells in vivo,160,165 and dynamics of stem cells and organ regeneration.166,167 Importantly, this T-SSR-mediated labeling technique works in a wide range of species, including flies,168−171 fish,172,173 and plants,174 making it a universal tool to track cells in complex environments. 5.2. Recombinase Delivery Strategies

An important aspect for the applied use of T-SSRs is how the recombinase gets delivered into the cells of interest. In case of conditional mutagenesis in whole organisms, a recombinasetransgenic animal is typically generated and crossed to another animal carrying the target sites for the T-SSR. The offspring can be then investigated for possible phenotypes. However, novel and innovative strategies to deliver the recombinase have been implemented in recent years, which have accelerated experiments and allowed greater flexibility. First alternative T-SSR delivery strategies included the use of lentivirus175 or adenovirus176 delivery of Cre. These systems are now frequently used to induce site-specific recombination in infected cells and have also been adapted for recombinase delivery in vivo.177−179 Other innovative T-SSR delivery strategies include the delivery of Cre mRNA by Ms2-chimeric retrovirus-like particles,180 viral nanoparticle-encapsidated protein and DNA delivery,181 baculovirus infection-mediated recombinase delivery,182 extracellular vesicles-mediated Cre delivery,183 and delivery of Cre recombinase via a bacterial type III secretion system.184 Nucleic acids are typically transfected into tissue culture cells with cationic lipids; however, proteins are normally not encapsulated well enough to allow delivery into cells. Interestingly, fusion of a negatively supercharged protein epitope can drastically enhance the delivery of proteins transfected with cationic lipids, including Cre recombinase.185 Another elegant way to deliver a recombinase is through direct delivery of purified proteins fused to a cell penetrating peptide (CPP). Initial experiments documented that this method to deliver the recombinase is effective in cell culture for both Cre186−188 and Flp.189,190 More recent work has shown that Cre-CPP is also effective in vivo,191 and it has recently been shown that Cre protein can be delivered with up to 100% efficiency to the inner ear of live mice by combining codelivery of a supercharged protein192 with a peptide that enhances endosomal escape.193 The direct delivery of T-SSRs as proteins into whole organisms seems particularly interesting for future therapeutic applications194 (see also section 7).

Figure 16. T-SSR-based strategies to trace cells in vivo. (A) Scheme for classical lineage tracing. A cell- or tissue-type (C/T) specific promoter is driving the expression of the T-SSR. In cells that express the T-SSR the STOP cassette is removed by site-specific recombination, thereby activating the constitutive (const.) expression of a reporter gene (e.g., green fluorescent protein, GFP). A schematic illustration of the approach during embryonic development is illustrated. (B) In vivo labeling of individual cells utilizing the brainbow technology. A scheme of brainbow technology is illustrated. Genes encoding different fluorescent proteins are shown in green, red, and blue, respectively. Heterospecific recombinase target sites are shown as black and brown triangles, with the colored numbers indicating the color of the cells after recombination of the denoted target sites. The picture below shows an example of a Confetti166 transgenic mouse E14.5 limb bud converted using a Prrx1-Cre line (image kindly provided by Josh Currie and Elly Tanaka, CRTD Dresden).

5.3. Modified Recombinase Systems

proteins, which are activated through T-SSR-mediated sitespecific recombination, the system achieves to generate a multitude of color combinations. This way, cells carry unique color-codes serving as cellular identification tags, which can be visualized by fluorescence microscopy (Figure 16B). Recent work of Cre/loxP-mediated stochastic expression of fluorescent proteins has further improved the usefulness of the approach, extending the color range and subcellular localization of the fluorescently marked cells.158−162 Another improvement of the technology has been to exchange the fluorescent protein with a synthetic fluorophore, thereby allowing for both the manipulation of behavior and monitoring of cellular fluorescence from the same reporter.163 While the brainbow technology was initially praised for its esthetic and artful images, it is now widely recognized as a powerful technology to decipher neuronal connectivity

Sequencing entire genomes has increased the demand for sophisticated DNA processing enzymes. This need is being addressed by engineering such molecules,195 and T-SSRs are no exception in this regard. To further increase the applied utility of T-SSRs, a number of modifications have been implemented into the systems. In order to regulate recombinase activity in a spatial and temporal manner, the first modifications to both Cre and Flp were recombinase fusions to ligand binding domains (LBD) of nuclear receptors.196−198 Fusing the recombinase to LBDs retains the enzyme in the cytoplasm, thus preventing recombination. Addition of the ligand causes the fusion protein to localize to the nucleus, where it can then recombine its targets (Figure 17A). LBDs from several nuclear receptors have been demonstrated to work in this fashion, with a modified version of the estrogen receptor LBD (ERT2)199 in combination 12802

DOI: 10.1021/acs.chemrev.6b00077 Chem. Rev. 2016, 116, 12785−12820

Chemical Reviews

Review

pression of both fragments results in reconstitution of an active recombinase enzyme.201 To achieve less invasive and more controllable Cre recombination in mammalian cells, one of the fragments has been delivered as a protein fragment,202 and in a follow up study, the fragments were fused to heterodimeric leucine zipper domains, increasing recombination efficiency of the original system.203 The dimerization of two separately expressed recombinase fragments can also be induced by fusing the two protein moieties with FKBP12 (FK506-binding protein) and FRB (binding domain of the FKBP12-rapamycin-associated protein), respectively (Figure 17B). In this case, addition of rapamycin leads to heterodimerization and activation of the recombinase.204,205 The system has demonstrated its utility in mice by knock-in of the components into the Rosa26 locus.206 A similar study utilized the self-associating coiled-coil domains from yeast GCN4 to dimerize the two Cre fragments.207 This system was also demonstrated to work in the context of the ERT2 approach to further increase the spatial control of DNA recombination.208 An interesting twist to the Cre alpha-complementation approach has recently been described to make use of the investigation of protein−protein interactions. Here, recombination is only observed if two protein domains that are fused to the two Cre fragments interact with each other, making this approach useful to dissect protein−protein interactions by phenotypic readout.209 Other useful T-SSR engineering approaches include the generation of a system in which Cre recombinase activity is dependent on green fluorescent protein (GFP)210 and design of a destabilized Cre whose activity is controlled by the antibiotic trimethoprim (TMP).211 In the former system, GFP acts as a scaffold, bringing together two Cre fragments fused to GFP-binding proteins. Hence, site-specific recombination is restricted to cells that express GFP (Figure 17B). On the basis of the multitude of transgenic GFP mouse lines that have been characterized for labeling specific cell populations, this system holds promise to expedite functional investigation of these lines.211 For the latter, the fusion of Cre to a dihydrofolate reductase mutant causes the protein to be rapidly degraded by the proteasome,211 thereby preventing recombination. However, the decay of the fusion protein can be blocked with the ligand TMP,212 stabilizing the protein and making it amenable for recombination (Figure 17C). The approach is particularly suitable for experiments in vivo because TMP can easily be delivered to laboratory animals, where it exhibits a high rate of diffusion in peripheral tissues and the nervous system. Furthermore, TMP does not have known endogenous targets in mammals.213 Spatial and temporal control of Cre activity has also been accomplished in different ways by the generation of lightactivatable recombination systems.214−217 In one case, a photocaged estrogen receptor (ER) antagonist was used in combination with a Cre-ER fusion protein. The photocaged ligand does not bind to the estrogen receptor, rendering Cre inactive for recombination. Exposure to light changes the ligand conformation, which is now able to bind to Cre-ER.214−216 Consequently, the fusion protein translocates to the nucleus and recombines its target (Figure 17D). In another case, Cre itself was rendered inactive by incorporation of the unnatural amino acid o-nitrobenzyl tyrosine (ONBY) in place of the catalytic tyrosine Y324. Light exposure can remove the caging group and convert the enzyme into an active state, which now recombines loxP sites (Figure 17E). Impressively, this system

Figure 17. Methods to regulate Cre activity through protein engineering. (A) Illustration to regulate recombinase activity by protein fusion to ligand binding domains (LBD) of nuclear receptors. Cre-LBD is inactive without the addition of ligand (brown triangle) because the fusion protein is kept in the cytoplasm through an interaction with the heat shock protein 90 (HSP90) complex. (B) Schematic presentation of the Split-Cre approach. The Cre N-terminal domain (CreN) is fused to a protein domain that promotes its dimerization with the Cre C-terminal domain after addition of a dimerizer (brown circle). Without the addition of the dimerizer, the two fragments do not associate and are inactive. (C) Illustration of Cre regulation by fusion to dihydrofolate reductase (DHFR). Cre-DHFR is instable and gets rapidly degraded through the proteosomal pathway, but the decay can be blocked with the ligand Trimethoprim (TMP, brown square). (D) Scheme of optogenetic activation of Cre with a light-activatable version of a nuclear receptor ligand. The red circle illustrates the inactive form of the ligand, which is converted to the active form (green) by light. (E) Scheme of optogenetic activation of Cre with a photocaged version of Cre. Addition of a o-nitrobenzyl caging group at the catalytic tyrosine (ONBY) of Cre renders it inactive (red circle). Exposure to light removes the caging group, restoring recombinase activity (green circle). (F) Scheme of optogenetic activation of split-Cre fused to cryptochrome 2 (CRY2) and cryptochrome-interacting basic-helix−loop−helix protein 1 (CIBN). In the dark, the proteins do not interact with each other (red). Upon light exposure the two domains dimerize (green), reconstituting an active form of Cre recombinase.

with the synthetic ER antagonists 4-hydroxytamoxifen now being widely used.141 Another way to achieve temporal regulation of recombinase activity is to split the enzyme into two separately expressed polypeptides. Following the model of alpha complementation in the beta-galactosidase enzyme,200 individually the two polypeptides have no detectable activity. However, coex12803

DOI: 10.1021/acs.chemrev.6b00077 Chem. Rev. 2016, 116, 12785−12820

Chemical Reviews

Review

Figure 18. Advanced genome engineering approaches combining different T-SSR systems. (A) Illustration of the combined use of Cre and Flp recombinase for conditional mutagenesis. Exons are denoted ex1−3 with the initiating ATG codon shown in exon 2. loxP and FRT sites are shown as black and brown triangles, respectively. Expression of Flp removes the neomycin selectable marker (neo), which might otherwise interfere with the expression of the gene of interest. Expression of Cre removes the critical exon encoding the ATG start codon. (B) Scheme of Flp and Cre-based consecutive conditional mutagenesis, where Flp is used to delete the first gene (gene A) and the “stop-floxed” allele that prevents expression of Cre. As a second event, expression of Cre subsequently leads to deletion of the second gene (gene B). Important steps are described with p1 and p2 marking promoter 1 and 2, respectively. Black and brown triangles denote loxP and FRT sites, respectively.

reporter mouse line allowed the authors to monitor host-cell infection in vivo.221 These described examples demonstrate the flexibility of TSSRs for protein engineering and indicate that we can expect many additional useful recombinase fusion proteins to be designed in the future.

showed utility in cultured mammalian cells, where spatial control of Cre activity was demonstrated by restricting the exposure to specific regions of the cultured cells.217 An alternative way to allow light-induced activation of Cre is based on a split-Cre version fused to Arabidopsis thaliana crytochrome 2 (CRY2) and cryptochrome-interacting basichelix−loop−helix protein 1 (CIBN), which dimerize on bluelight exposure (Figure 17F).218 This system has even demonstrated utility in vivo to allow photoactivatable Cremediated recombination to stably modify gene expression in the mouse brain. 219 Furthermore, Cre has also been instrumental in other optogentic experiments for conditional transgenic optogenetic gene silencing in vivo, through the activation of light sensitive fluorescent proteins.220 Finally, engineering of the intracellular pathogen Toxoplasma was used, such that the pathogen injects active Cre recombinase upon infection into host cells.221 In this case, efficient secretion of Cre into cells was achieved by fusing the recombinase to toxofilin, a rhoptry protein that is introduced into cells during invasion.222 Combining this system with a recombination

5.4. Recombinase Mediated Cassette Exchange (RMCE)

As discussed earlier, T-SSRs are extensively used for conditional mutagenesis, where a gene of interest is excised upon expression of the recombinase. The reverse reaction of DNA integration is also possible (see Figure 2) and useful to predictably insert transgenes at a predefined genomic locus. However, the excision reaction is kinetically favored,223 making T-SSR-mediated delivery of genetic material inefficient with wild-type recombinase recognition target sites. This limitation was overcome by the development of recombinase-mediated cassette exchange (RMCE).224,225 In RMCE, two heterospecific recombinase target sites, typically differing in the composition of the spacer sequence, are employed that do not recombine with each other but are recombination proficient when encountering target sites of the same sequence (Figure 2). 12804

DOI: 10.1021/acs.chemrev.6b00077 Chem. Rev. 2016, 116, 12785−12820

Chemical Reviews

Review

rearrangements,45 offering a system to restructure the order of genetic information on chromosomes. Impressive T-SSRmediated gene switches have also been built in mice that can conditionally inactivate, report, and allow inducible restoration of targeted genes in vivo.249,250 Lastly, current efforts suggest that T-SSRs could find utility in building devices for in vitro and in vivo diagnostics.251

Hence, when cells expressing a recombinase that contain a DNA cassette that is flanked by two heterospecific recombinase target sites are transfected with a plasmid that harbors a different DNA sequence also flanked by the same heterospecific target sites, the cassette is efficiently exchanged (reviewed in Turan et al.226). Molar excess of the transfected donor over the single-copy target will drive the exchange reaction toward replacement with the DNA of interest encoded on the plasmid. Interestingly, Flp recombinase seems to be more efficient than Cre in RMCE,227 in particular when the extended 48 bp form of the FRT, as naturally occurring in the 2 μ circle plasmid, is employed.226 These extended sites comprise an additional 13 bp Flp binding repeat serving as an “FRT entry-path”. The extra repeat apparently enhances the site’s specificity and counteracts inadvertent side-reactions due to the application of an excess of the recombinase proteins (which is required to drive “Flp-in” and RMCE-pathways) or illegitimate endonucleolytic actions. Nevertheless, both the Cre/loxP and Flp/FRT systems have been successfully used for cassette exchange in a variety of applications,226 and the system has lately been improved for Cre-mediated RMCE.228 More recently identified T-SSRs have also been employed in RMCE experiments,44 but an efficiency comparison to the more established systems has not yet been performed. RMCE has gained significant importance in various research areas in recent years. In the stem cell field for instance, RMCE has improved the generation of induced pluripotent stem (iPS) cells and their subsequent use to study cellular differentiation.229 For the production of therapeutic monoclonal antibodies, RMCE has streamlined the production of cell lines that allow rapid, efficient, and reliable expression of the antibody.230,231 RMCE has also demonstrated to be useful to manipulate pest insects for research and field release232 and for exchanging large genomic regions in cells233 and in mice to replace a large endogenous genomic region precisely with the syntenic human sequence.234 Furthermore, RMCE was used to rapidly generate an allelic series of knock-in mice.235 Another recent improvement of RMCE is the implementation into lentviral vectors for potential gene therapy applications236,237 and the combination with other genomeediting technologies, such as TALENs238 and CRISPR/Cas239 (see also 5.8).

5.6. Combinations of Different Recombinases

Initially, single T-SSR systems were employed to perform the recombinase-mediated approaches. However, it was rapidly realized that the combination of two or more recombinase systems could be beneficial and allow the design of more sophisticated experiments. For instance, it was known for some time that the selectable marker gene included in constructs to conditionally target a gene can influence the expression of the targeted allele in ES cells and in mice.252 To circumvent this problem, the generation of conditional gene targeting is now typically done utilizing two T-SSR systems (Figure 18A), where one T-SSR system is reserved for conditional mutagenesis, while the other T-SSR system is employed to remove the selectable marker gene after successful gene targeting has been confirmed.253 Furthermore, two T-SSRs are now frequently employed to generate multipurpose conditional knockout alleles.254 More recent experiments have revealed that the combination of two orthogonal T-SSR systems can also be useful in different settings. For instance, the dual recombinase technology36,255 allows stepwise manipulation of the genome to sequentially ablate sequences so that the second T-SSR is only expressed in cells that previously expressed the first T-SSR (Figure 18B). This approach permits the exploration of more complex biological questions in development, tissue homeostasis, and cancer.256,257 Use of several T-SSRs has also proven advantageous for other applications, including lineage tracing41,258 and dual RMCE.259,260 Another major step forward in precise shaping and reshaping of genomes has been the combination of T-SSRs with serine recombinases.261 Here, the serine recombinase is typically employed in a first step for controlled genomic insertion of a DNA cassette at a “safe harbor” region in the genome,262 whereas the T-SSR is used in a second step to delete unwanted DNA. This approach has been successfully established to generate transgene-free mouse ES cells,263 human induced pluripotent stem cells,264,265 bovine fetal fibroblasts37 ex vivo corrected β-thalassemia cells,266 insects232 and cattle to produce antibiotic selectable marker free animals.267 Furthermore, the technology has been employed for optimized bispecific antibodies230 and to maximize the simultaneous or sequential integration of multiple gene-loading vectors into human artificial chromosomes.268,269 Moreover, the combination of the two systems has been used to allow the conversion of different overexpression alleles, combining the advantages of transgenic targeting with tunable transgene expression.270 These examples highlight the diverse applications that are possible by combining orthogonal recombinases in biology and medicine. Considering the recent discoveries of additional orthogonal T-SSRs (see section 2.2), it can be expected that even more sophisticated T-SSR-based genome engineering strategies will emerge in the near future.

5.5. Synthetic Biology

The artificial design and engineering of biological systems and living organisms for purposes of improving applications for industry or biological research is rapidly gaining importance, and the construction of biological devices and engineering metabolic pathways promises to transform biology and medicine. T-SSRs are important components of many synthetic biology approaches, and they are implemented in many experimental settings. One challenge of synthetic biology is to assemble genetic circuits efficiently so that they can carry out complex logic processing functions.240 T-SSRs are optimally positioned to accomplish this goal, and impressive genetic circuits that process complex logics,241 including the incorporation of cellular memory,242 have been established. Most synthetic biology approaches have so far focused on the engineering of bacteria and yeast in order to allow straightforward and efficient introduction of unmarked mutations into gene clusters243 or to modulate and improve metabolic pathways.244−248 Furthermore, pairwise orthogonal T-SSRs have been used in yeast for synthetic chromosome

5.7. Tyrosine Recombinases in Combination with RNAi

RNAi-mediated gene knockdown is a powerful method to analyze loss-of-function phenotypes in a large number of 12805

DOI: 10.1021/acs.chemrev.6b00077 Chem. Rev. 2016, 116, 12785−12820

Chemical Reviews

Review

in lines where Cre is used to self-excise a drug selection cassette.293 In plants, the order of events was reversed so that the T-SSR was used for gene delivery and the programmable nuclease was used for marker excision, thereby improving trait development by gene stacking.294 The combination of T-SSRs with programmable nucleases has also proven useful in making human iPS cells safer for their potential therapeutic use. iPS cells are generated from somatic cells by the forced expression of reprogramming factors, bringing them back to the pluripotent state.295 Because of their ability to be extensively propagated in vitro and their potential to differentiate into any cell of the human body, iPS cells hold great promise for regenerative medicine.296 However, the coding sequences for the reprogramming factors are typically integrated into the genome of the cell, rendering them unsuitable for therapeutic applications. To allow the removal of the reprogramming cassette, Chandrasegaran and colleagues have flanked the cassette with loxP sites to allow seamless removal of the reprogramming factors by transient Cre delivery after successful establishment of iPS cells.297,298 In another setting, a similar approach was used to improve production of marker-free vaccinia virus vectors.299 T-SSRs and programmable nucleases have also been used in conjunction to target transgenes into the AAVS1 locus implementing an RMCE approach in human iPS cells, thus allowing enhanced flexibility in transgene exchange.300,301 Furthermore, it was proposed that the CRISPR/Cas9 system could be combined with T-SSRs to fine-map genes to help explain phenotypes caused by natural variation302 and to accelerate vaccine development.303 Finally, the combination of both systems has helped generate better mouse models of human diseases. Jacks and colleagues demonstrated that Cre-dependent somatic activation of oncogenic KRAS, complemented with CRISPR/Cas9-mediated genome editing of different tumor suppressor genes, resulted in lung adenocarcinomas with distinct histopathological and molecular features.304 Two other groups independently generated Cre-regulated Cas9 transgenic mice to establish new pancreatic305 and adenocarcinoma306 models, respectively. In a different setting, a mouse model for a human cancer was generated that involves a chromosomal translocation, which constitutes an oncogenic fusion gene.307 Here, building a reliable mouse model was complicated by the fact that in the mouse the genes are in opposite orientation on their respective chromosomes, precluding formation of a functional fusion gene. The authors solved this problem by first inverting the orientation of a 4.9 Mb syntenic fragment encompassing one of the fusion genes by using Cre-mediated recombination. In a second step, the translocation was induced by simultaneous CRISPR/Cas9-induced DNA double strand breaks in the respective genes. These impressive examples only hint at the many additional possibilities the combination of the different genome engineering technologies offer. We can certainly expect that further improvements in the distinct fields will substantially enhance our abilities to flawlessly write in genomes.

species. Today, RNAi is a widespread instrument to study gene function both in cell culture and in vivo and it is used to investigate the role of an individual gene in a specific setting or in genome-wide screens to identify novel genes with a role in a particular pathway.271 Combining RNAi with the power of T-SSRs has led to a considerable improvement of the technology. One problem for RNAi from expressed constructs in cells is that the short hairpins are typically expressed from constitutive Poll III promoters. T-SSRs have helped to make the method conditional and reversible to allow inducible hairpin expression in cells,272,273 mice,274,275 and plants.276,277 The system has also been adapted for use employing lentiviral vectors,278 providing improved delivery into difficult to transfect cells and in a therapeutic setting to combat hepatocellular carcinoma in a mouse model.279 Furthermore, the system has been extended to turn alleles on or off using Cre recombinase to rapidly address questions of tissue-specific and cell autonomy of gene function in development.280 T-SSRs have also been instrumental to rapidly retrofit existing fly fosmid clones for cross-species RNAi rescue experiments in order to validate RNAi phenotypes in Drosophila melanogaster.281 Moreover, clever application of the enzymatic activities of Cre and Flp resulted in the generation of long hairpin RNA libraries for RNAi screens in a silkworm cell line282 and to build fully randomized shRNA libraries for phenotypic screens in mammalian cells.283 5.8. Tyrosine Recombinases in Conjunction with Other Genome Editing tools

After the major technological improvements in reading the DNA code of whole organisms,284 a next logical step is to improve the fluent writing of genomes. A major advance in this direction has been the discovery and development of programmable nucleases for advanced genome editing.285 The application of zinc finger nucleases (ZFN), transcription activator-like effector nucleases (TALEN), and, most recently, clustered regularly interspaced short palindromic repeats (CRISPR)-based nucleases is revolutionizing the field of genome engineering.286 Given the advances in this field, it is not surprising that T-SSRs are combined with programmable nucleases to shape and reshape the genome in cells and organisms. T-SSRs have played an important role in advanced genome engineering in the past. However, one limitation for their use is that the recognition target site for the recombinase has to be introduced into the genome first, typically using homologous recombination (HR)-based techniques. Gene targeting by HR is relatively inefficient in most cells, and many colonies have to be examined to find a clone that integrated the exogenously provided DNA in the correct place. However, HR is greatly stimulated when a double strand break is present at the place homologous to the targeting construct. The use of programmable nucleases has therefore dramatically improved the generation of conditional knockout alleles in several species. 287−291 Importantly, the combination of the two technologies has enabled streamlined generation of conditionally targeted alleles in human embryonic stem (ES) and human induced pluripotent stem (iPS) cells, providing a valuable method for spatial and temporal investigation of gene function in primary human cells.292 In C. elegans, the combined use of a programmable nuclease and Cre recombinase was applied to streamline the generation of fluorescent protein-tagged knock-

6. DIRECTED EVOLUTION OF TYROSINE RECOMBINASES T-SSRs have demonstrated their utility in numerous applications. However, the naturally occurring enzymes are not always optimal for particular applications. Directed molecular evolution is a powerful technique for improving or 12806

DOI: 10.1021/acs.chemrev.6b00077 Chem. Rev. 2016, 116, 12785−12820

Chemical Reviews

Review

have subsequently been applied for Flp109 and for Cre via altered pattern of fluorescent protein expression93 and through reconstitution of β-lactamase that confers ampicillin resistance to the bacterial host.310 More recently, other members of the TSSR family have also been subjected to directed evolution to alter their DNA binding specificity,311−313 providing a broader spectrum of how these enzymes can be evolved to target alternate sequences. All the above-mentioned directed evolution strategies were performed in bacterial cells, providing a reasonable generation time. Even faster generation times can be achieved through in vitro compartmentalization (IVC), which utilizes a cell-free platform via oil-in-water emulsion for directed molecular evolution.314 Indeed, lambda integrases with altered recombination specificity have been recently generated with this approach,311 potentially reducing the time necessary to obtain T-SSRs with desired properties. A possible drawback for directed evolution on T-SSRs purely in vitro is that their specificity might be strongly relaxed. During the directed evolution process in bacteria, selection for enzymes that are tolerated for growth of the bacteria is warranted and vastly promiscuous enzymes that recombine bacterial DNA would be eliminated. In contrast, these enzymes might survive the conditions of IVC conditions. Another successful method to reduce the time for obtaining Cre variants with altered DNA binding specificity has been the combination of directed molecular evolution with a structure-guided rational approach.102 Here, the number of cycles to obtain desired properties are reduced without compromising on the specificity of the enzyme (see also section 4.2.2). While the initial attempts to alter the DNA binding specificity of T-SSRs was rather academic, more recent efforts have focused on potential medical applications. These endeavors typically start with the selection of an appropriate pre-existing target site in a genome. This task is not trivial and requires bioinformatics support to identify target sequences that are remotely related to the original T-SSR target sites. Two software tools have been developed to search for T-SSR-like sequences in genomic data315,316 that help nominating native genomic sequences as potential recombination target sites. One very useful medical application for T-SSRs would be to use a naturally occurring “safe harbor” sequence to sitespecifically integrate exogenous DNA. Indeed, Voziyanov and colleagues have evolved a Flp-based recombinase that recombines a sequence upstream of the human interleukin-10 gene.85 The authors have more recently improved this system and demonstrated that it is effective in delivering DNA to Chinese hamster ovary and human embryonic kidney 293 cells.106 Similarly, directed evolution has recently been used to engineer a lambda integrase variant that displays improved recombination on a noncognate substrate present in the human genome.312 The most extensive directed evolution with a potential medical application is likely the development of Cre-based recombinases recognizing sequences in the long terminal repeat of HIV-1 (see also section 7). Here, SLiPE was used for more than 100 generation cycles to evolve the recombinase Tre.103,317,318 Furthermore, a second generation of anti-HIV-1 recombinases (Brec1) was recently developed that targets a highly conserved region present in the majority of HIV-1 primary isolates.101 In this case, a total of 145 evolution cycles were required to obtain mutants with the desired properties, indicating that the evolution process can take considerable time

altering the activity of diverse classes of proteins for industrial, research, and therapeutic applications.308 It is therefore not surprising that T-SSRs have been subjected to different directed evolution strategies to extend their usefulness. The first directed evolution strategy employed error-prone PCR coupled with DNA shuffling309 to improve the thermostability of Flp recombinase.109 Eight rounds of molecular evolution were required to obtain the Flp-variant Flpe, which harbors four mutations that improve recombination efficacy in vitro and in vivo at elevated temperatures.109 To enhance the speed and sensitivity of the molecular evolution procedure, substrate-linked directed protein evolution (SLiPE) was developed and applied to evolve Cre mutants with altered DNA binding specificity.84 In SLiPE, randomly mutated recombinase variants are assembled in a plasmid that also contains the two new substrate target sites, oriented as an excision substrate for a recombination reaction catalyzed by the expressed enzyme. Mutant recombinases with activity can easily be retrieved after propagation in E. coli via a simple PCR (Figure 19). This procedure simplifies the evolution process and allows generation times of 2 to 3 days.84 Modified selection schemes to obtain T-SSRs with altered DNA binding properties

Figure 19. Directed molecular evolution of T-SSRs. In SLiPE (substrate linked protein evolution), randomly mutated recombinase variants are assembled in a plasmid that also contains the two new substrate target sites, oriented as an excision substrate for a recombination reaction catalyzed by the expressed enzyme. Mutant recombinases with activity can easily be retrieved after propagation in E. coli via a simple PCR. Arrows illustrate important steps in the generation cycle. Isolated DNA is digested with a unique restriction enzyme (R1) that cleaves plasmids harboring inactive recombinases. A PCR employing primers 1 and 2 (P1 + P2) only amplifies the coding sequences of active recombinases. 12807

DOI: 10.1021/acs.chemrev.6b00077 Chem. Rev. 2016, 116, 12785−12820

Chemical Reviews

Review

Figure 20. HIV-1 provirus excision by recombinases. Scheme of the mode of action of Tre/Brec1 recombinase to excise the integrated provirus. After CD4+ host cells are infected by HIV, the proviral DNA is stably integrated into the host cell chromosome. Expression of the viral TAT transactivator protein results in activation of the HIV proviral DNA as well as activation of the therapeutic tre/brec1-vector (2TAR: Tat-responsive promoter regulating Tre/Brec1 recombinase expression). Tre/Brec1 recombinase recognizes and recombines a specific target sequence located in the HIV long terminal repeats (LTRs), thereby excising the HIV genome. The excised proviral DNA is degraded in the host cell. A single LTR remains at the former integration site. Tre/Brec1 expression is turned off in absence of TAT.

them, in theory, prime tools for potential clinical application. On the other hand, the requirement for two identical target sites for recombination (e.g., loxP) has so far restricted in vivo applications in humans to only a few diseases. Nevertheless, in a broader sense of clinical research and comparable to their wide use in the generation of transgenic or knockout animals, TSSRs are frequently employed for disease modeling by conditionally ablating (floxed) disease-relevant genes (for recent examples, see Nalbandian et al.321 and Yu et al.322) or to genetically modify specific cell types prior to their adoptive transfer into a living organism. Other prominent examples for T-SSR technology used in potentially clinically relevant applications frequently focus on the modification of various stem cell populations or on the manipulation of human artificial chromosomes for correction of genetic deficiencies in human cells.323 For example, recent allotransplantation of adipose-derived stem cells (ASCs) resulted in healing of femoral bone defects in minipigs.324 Importantly, to improve their osteogenesis potential prior to transplantation, ASCs were transduced with baculovirus vectors expressing bone morphogenetic protein 2 (BMP2) or vascular

when the novel recognition target sequence differs substantially from the original target site. Nevertheless, the successful generation of these recombinases also document that directed evolution is effective to generate T-SSRs that only recombine targets remotely related to the original target sequences. The presented examples document the utility of directed molecular evolution to produce variant T-SSRs with desirable properties. Improved methods for directed molecular evolution,319 such as in vivo continuous directed evolution,320 will likely accelerate the evolution process to obtain useful enzymes at a shorter time interval.

7. TYROSINE RECOMBINASES FOR CLINICAL APPLICATIONS As outlined above, during the last three decades, structure and mode of action of Cre and related recombinases have been investigated and elucidated in detail. Furthermore, their applied utility in model organisms has been documented in numerous studies. The fact that Cre and other T-SSRs do not require cellular pathways and recombine target sites with high efficiency, error-free, and with absolute fidelity also makes 12808

DOI: 10.1021/acs.chemrev.6b00077 Chem. Rev. 2016, 116, 12785−12820

Chemical Reviews

Review

LTR regions (Table 3). Proof of principle studies in cell culture and in HIV-infected humanized mice demonstrated that Tre recombinase efficiently suppresses viral replication.103,337 The recent engineering of broad range recombinase 1 (Brec1), a Tre derivative displaying pronounced antiviral activity against the vast majority of HIV-1 clinical isolates,101 provides the scientific basis for clinical phase I virus eradication studies in the near future. In particular, their error-free mode of action and their high target-site specificity distinguishes T-SSRs, such as Tre and Brec1 recombinase, for human application. Efforts to cure HIV/AIDS aim at delivering antiviral gene vectors (e.g., vectors encoding Tre or Brec1 recombinase) into the patient’s peripheral CD4+ T cells or CD34+ hematopoietic stem cells.338,101 Since LTR-specific T-SSRs are only required in cells harboring proviral DNA, their expression may be ideally regulated by a promoter element that is responsive to the HIV1 TAT trans-activator protein (depicted in Figure 20). Expression of the early retroviral TAT protein activates not only the HIV proviral DNA but also the therapeutic Tre/Brec1vector. In turn, integrated HIV is excised, and based thereon due to the absence of TAT, the Tre/Brec1-vector is inactivated. It is obvious that the exploitation of such a molecular switch provides an additional biosafety margin with respect to the application of genome editing enzymes, such as T-SSRs, in humans. As mentioned before, the requirement of two native and identical target sequences currently restricts broad clinical application of engineered T-SSRs. Nevertheless, and in analogy to HIV-1, it is conceivable that also infection with human Tlymphotropic virus type I (HTLV-I), a retrovirus causing adult T cell leukemia/lymphoma (ATL) and HTLV-I associated myelopathy/tropical spastic paraparesis (HAM/TSP),339 may be cleared from infected subjects by use of an LTR-specific recombinase. Other human conditions that appear to be highly suitable to T-SSR-mediated genome editing are diseases caused by specific chromosomal inversions, such as hemophilia A and possibly other human diseases.340−347 Hemophilia A is a hereditary bleeding disorder that can be caused by specific chromosomal inversions affecting the F8 gene encoding blood coagulation factor VIII.348 It is expected that the tailoring of an appropriate T-SSR (e.g., by molecular evolution) to the respective homology regions in the F8 gene will provide the essential tool to invert and thereby repair the disease causing mutations. The high-fidelity of T-SSRs favor this type of sitespecific genome editing enzymes for application in humans.

endothelial growth factor (VEGF). Unfortunately, however, baculovirus vectors do not integrate into mammalian chromosomes and therefore mediate only short-term transgene expression (