Letter pubs.acs.org/acscombsci
Combinatorial Library Based on Restriction Enzyme-mediated Modular Assembly Cuiping Ma,† Chao Liang,† Yifan Wang,† Mei Pan,† Qianqian Jiang,† and Chao Shi*,‡ †
Key Laboratory of Sensor Analysis of Tumor Marker, Ministry of Education, College of Chemistry and Molecular Engineering, Qingdao University of Science and Technology, Qingdao 266042, P.R. China ‡ College of Life Sciences, Qingdao University, Qingdao, 266071, P.R. China S Supporting Information *
ABSTRACT: Combinatorial approaches in directed evolution were proven to be more efficient for exploring sequence space and innovating function of protein. Here, we presented the modular assembly of secondary structures (MASS) for constructing a combinatorial library. In this approach, secondary structure elements were extracted from natural existing protein. The common linkers were flanking secondary structure elements, and then secondary structure elements were digested by Hinf I restriction endonuclease that was used in the construction of combinatorial library for the first time. The digested DNA fragments were randomly ligated in the sense orientation, then in sequence to be amplified by PCR and transformation. This approach showed that different DNA fragments without homologous sequences could be randomly assembled to create significant sequence space. With the structure analysis of recombinants, it would be beneficial to the rational design, even to the design of protein de novo, and to evolve any genetic part or circuit. KEYWORDS: combinatorial library, modular assembly, secondary structures, directed evolution, Hinf I
N
to uncontrolled cleavage sites. By contrast, in vitro combinatorial assembly of distinct protein subunits (e.g., exons, subdomains, etc.) seems more important to create libraries of structural and functional diversity.16−21 Recombination of these protein subunits is usually achieved by overlap PCR22 or ligation of gene fragments generated.23 Each method of combinatorial assembly has its limitations. Recombination based on overlap PCR has the disadvantage of amplification bias, and introduction of scar sequences due to long additional sequences contained in primers.22,24 Ligation of protein subunits often has relatively small structural diversity because of limited protein subunits, such as block shuffling.25 One promising alternative involves the use of nonpalindromic overhangs to selectively assembly combinatorial libraries in a one-pot ligation reaction. Graziano et al. used this approach to recombine secondary structural elements, forming a targeted library of de novo sequences.26 In this work, we extend this approach to the assembly of a fully randomized combinatorial library. Here, we reported a simple method that only utilized one restriction endonuclease to achieve in vitro nonhomologous random recombination by the modular assembly of secondary structures (MASS) for exploring nucleic acid sequence space.
early 200 years ago, Charles Darwin presented evolutionary ideas, on which the modern biological theory of evolution is founded. The long evolution occurred in nature for billions of years were successfully simulated in the laboratory, with the emergence of which, the rise of molecular evolution as a science was coincident. Until now, designing proteins de novo becomes an ultimate goal of human endeavor.1−4 The rational design depending on structural analysis and sitedirected mutagenesis has yielded many enzymes with improved properties.5−8 Unlike rational design, directed evolution can relatively rapidly engineer enzymes without requiring an advanced knowledge of structure−function relationships of proteins.9−12 Various methods for the design of protein libraries by directed evolution have been reported. In 1994, DNA shuffling13 was introduced for in vitro recombination. DNA shuffling is a very powerful tool that makes the molecular directed evolution enter a new stage. However, DNA shuffling method can work only on the sequences of high homology. In fact, although many proteins show similar three-dimensional structures, their sequence similarity is very low. In account of this, another alternative methods are developed to construct hybrid gene libraries that is independent of sequence homology, such as sequence homology-independent protein recombination (SHIPREC)14 and nonhomologous random recombination (NRR).15 Both approaches started with two parental sequences that were randomly digested with DNase I, which resulted in a large proportion of nonviable sequences due © XXXX American Chemical Society
Received: September 21, 2016 Revised: April 9, 2017 Published: April 24, 2017 A
DOI: 10.1021/acscombsci.6b00145 ACS Comb. Sci. XXXX, XXX, XXX−XXX
Letter
ACS Combinatorial Science
parent sequence (Figure S1 and Table S1). Two other secondary structure elements were respectively from Bacillus aminoglycoside 3′-phosphotransferase (uniprotkb P00553) and Kanamycin nucleotidyltransferase of Staphylococcus aureus (uniprotkb P05057). Primers containing the complementary sequence of Hinf I restriction site were used to generate doublestranded DNA (dsDNA) encoding each secondary structure of the library by Klenow fragment (exo−) DNA polymerase at 37 °C for 2 h. The resulted 24 dsDNA were digested by Hinf I restriction enzyme to leave 3-base sticky-end overhangs, then ligation reactions were performed by T4 DNA ligase to entirely favor intermolecular ligation. The length of the recombined library could be controlled by modulating the stoichiometry of initiator and terminator because 5′-end of initiator and 3′-end of terminator without Hinf I restriction sites could no longer ligate with other molecules. An average length of 24 dsDNA encoding each element was 38 bases, and if we wanted to recombine to target size ∼900 bp including 9 bp linker sequence, the average percentage of initiator, modules and terminator should be 1:21:1, that was, initiator and terminator with around 5 mol % respectively shared each. This inference was verified by Figure S2. Additionally, we further optimized ligation type. As shown in Figure S3, two-ligation (i.e., the initiator and terminator respectively were ligated with 22 modules, then mixed.) could obtain maximally efficient assembly of the correct product. Under the optimum conditions, as seen in Figure 2, dsDNA fragments encoding each element was recombined to target size of ∼900 bp including linker sequences. Ligated fragments of the desired length were directly amplified by PCR. PCR products were isolated by agarose gel electrophoresis and 300−
We used aminoglycoside 3′-phosphotransferase as an example and decomposed it into 22 modules including α-helices and βstrands. These modules were flanked with Hinf I restriction sites and ligated in a one-pot reaction to generate a fully random combinatorial library. Given the simplicity of this approach, MASS provides researchers a way to easily generate fully randomized combinatorial libraries of proteins and genetic circuits. The schematic illustration of the combinatorial library from random assembly of modules (for example, secondary structure elements, biobrick or parts) was shown in Figure 1. Modules
Figure 1. (A) Schematic illustration of constructing combinatorial library by restriction enzyme-mediated modular assembly. Secondary structure elements were recognized and short common linkers were flanked to introduce Hinf I restriction sites unique to the 5′ and 3′ ends of each module. The synthetic 5′ initiator and 3′ terminator also included Hinf I restriction site on one end. Each module was copied to generate dsDNA, then digested, and recombined with T4 DNA ligase to provide a combinatorial library. All base sequences used in this work were listed in Tables S1 and S2. (B) The base sequence of the common linkers. (C) The recognition sites of initiator and terminator for constructing a library.
such as secondary structure elements were prepared. Both ends of DNA fragments encoding each secondary structure element were attached short linkers incorporated a Hinf I restriction enzyme recognition site. The recognition sequence of Hinf I restriction enzyme was nonpalindromic, which ensured that all ligation products entirely consisted of secondary structures in the sense orientation, avoiding the inversion ligation between modules. As a result of this ligation reaction, between each secondary structure element after ligation was inserted 3 flexible amino acids, Gly, Ser, and Pro, used as a linker sequence. To increase the probability of functional protein, the linker residues should be chosen according to the feature of this module. The length of ligated fragments was controlled by adding initiator and terminator sequences, due to only one end included Hinf I restriction site at 3′-initiator and 5′-terminator, respectively. The ligated fragments were amplified by defined primers, followed by being inserted into a vector to construct a combinatorial library. To verify the ability of this method to nonhomologously recombine nucleic acid sequences, a structural gene, Klebsiella peneumoniae aminoglycoside 3′-phosphotransferase IIa (GenBank gi|125463, uniprotkb P00552), consisting of 22 secondary structure elements of 9−45 base pairs (bp), was used as a
Figure 2. Assembly products of secondary structure elements inserted into a transformable plasmid. (A) The ligation products of modular assembly were directly amplified by PCR and then inserted into a SV vector. The plasmid products of modular assembly were transformed into competent cells. (B) Successful colony PCR resulted from modular assembly. M. 2000-bp DNA ladder 1−9. Positive clones picked randomly. The size of assembled products was displayed below each lane. B
DOI: 10.1021/acscombsci.6b00145 ACS Comb. Sci. XXXX, XXX, XXX−XXX
Letter
ACS Combinatorial Science
Figure 3. (A) The statistic analysis of insert sizes from 500 positive clones by colony PCR and agarose gel electrophoresis. (B) The combination of secondary structure elements for 20 clones sequenced. (C) The statistic analysis of the probability of assembly for each secondary structure from 20 clones sequenced.
were arranged into different orders to successfully construct a combinatorial library. Moreover, we investigated the probability of assembly for each secondary structure from 20 clones sequenced (Figure 3C). The assembly frequencies of different secondary structures showed that there was no bias when different secondary structures were ligated in 20 clones. We further verified that the correlations between the length and time of module using a two-tailed bivariate correlation with Pearson′s coefficient (IBM SPSS Statistics 22). The result showed that the estimated correlation was 0.039 < 0.3, with a significance level of 0.864 > 0.01 (Tables 1 and S3). Therefore, there was no an assembly bias toward the length of assembly fragments. According to the
1000 bp DNA fragments were inserted into a SV vector for screening and sequencing. Approximately 105 clones could be obtained from 0.4 μg of the assembled combinatorial library by screening of ampicillin resistance, and competent cells with an efficiency of ∼108 cfu/μg pBluescript II KS (+). Owing to the lab-made SV constitutive expression vector with an agarase gene, the clones shown on agar plates containing 50 μg/mL ampicillin should be almost always positive. The assumption was further verified by the analysis of 500 clones randomly picked from agar plates. Library integrity was verified by colony PCR and agarose gel electrophoresis of 500 positive clones randomly picked for insert sizes and the statistics analysis for insert sizes of them was shown in Figure 3A. Insert sizes ranged from 100 to 550 bp, and 81% (405 of 500) of the clones were the assembled products in a size range from 250 to 500 bp. Twenty clones among them were fully sequenced and they were perfect, random and in the sense orientation. As seen in Figure 3B, all 20 clones were the product of the desired ligation reaction with 5′-initiator and 3′-terminator at the ends of library members, and all modules were ligated in the sense orientation. Additionally, the number of modules from each clone was from 4 to 16 with an average of 9. Permutation of secondary structure elements of these assembly products was entirely different from the wild-type protein. These modules
Table 1. Pearson Correlations for Length and Time of Module length (bp) length (bp)
times
C
Pearson correlation Sig. (2-tailed) N Pearson correlation Sig. (2-tailed) N
1 22 0.039 0.864 22
times 0.039 0.864 22 1 22
DOI: 10.1021/acscombsci.6b00145 ACS Comb. Sci. XXXX, XXX, XXX−XXX
Letter
ACS Combinatorial Science
ligase with 22 dsDNAs encoding secondary structure elements at 25 °C for 2 h, and then mixed to sequentially incubate 2 h at 25 °C. The ligated products were directly amplified by PCR with specific primers to introduce two recognition sites of Nde I and EcoR I at 5′ and 3′ ends, respectively. Fragments of the desired size range were purified by 1% (wt/vol) agarose gel electrophoresis and inserted into plasmid vectors. The constitutive expression vector was derived from pBluescript II KS (+). The promoter and SD sequence of an agarase gene was inserted into pBluescript II KS (+) vector according to the literature,27 and the following referred to as SV vector. PCR-amplified products and SV vector were digested, ligated, and transformed into competent E. coli DH5α cells. The E. coli DH5α cells carrying PCR products were grown at 37 °C in LB medium containing 50 μg/mL ampicillin.
current experimental data, the ligation between different lengths of modules was completely random. In this library, the appearance probability of each assembly product was equal. According to this advantage, MASS might be very useful for the combinatorial construction of networks and metabolic pathways from biopart components in synthetic biology. The development of methods to efficiently explore global sequence space and select proteins with desired properties remains an exciting challenge in the field of directed evolution. We have developed a new ligation method with random, simple and forward direction to construct the combinatorial library. This method did not place constraints on modules and enabled random assembly of many DNA parts into DNA constructs by a convenient workflow. In theory, a combinatorial library of MN (M = number of modules assembled, N = number of modules excluding initiator and terminator) sequences should be achieved by our method. Additionally, MASS took advantage of secondary structure elements to assemble, and there were no point mutations or frameshift observed. However, in spite of the high fidelity of our library assembly, the vast majority of the protein variants were found to be nonfunctional. We screened almost 1000 clones, but none conferred kanamycin resistance. Although the frequency of functional protein obtained from fully random assembly of short fragments (such as protein structural elements) is anticipated to be low,26 one possible reason for the lack of function observed in our case could be suboptimal linkage. The minimal 3 amino acid junctions formed between adjacent secondary structural elements may have constrained the conformations of the assembled proteins. In future work, the versatile MASS approach can be adapted to assemble proteins with longer and more flexible linkers.
■
ASSOCIATED CONTENT
S Supporting Information *
The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acscombsci.6b00145. Experimental details and data (PDF)
■
AUTHOR INFORMATION
Corresponding Author
*E-mail:
[email protected]. ORCID
Chao Shi: 0000-0002-5129-207X
■
Notes
The authors declare no competing financial interest.
■
EXPERIMENTAL PROCEDURES Library Preparation. The main secondary structural elements (α-helices and β-strands) used were selected from unique Klebsiella peneumoniae aminoglycoside 3′-phosphotransferase IIa (PDB accession number 1ND4) with structures on file in the Protein Data Bank (PDB). The PDB files of used proteins were provided to the PROM OTIF program, and then the primary sequences of each secondary structure element were identified by NCBI blast search (BLAST, http://www. ncbi.nlm.nih.gov/BLAST/) to obtain nucleotide sequences encoding each element (Table S1). An intact active site should be provided as much as possible. The linkers for 22 secondary structure elements were AGGGAGTCCA and GGGAGTCCAT (The recognition sites of Hinf I were underlined.) at 5′ and 3′ end, respectively. The terminator sequence only contained 5′-end linker. On the contrary, the initiator sequence only had 3′-end linker. Oligonucleotide primers for each secondary structural element, the initiator sequence, and the terminator sequence were ATGGACTCCC, CATATGGAATTCCGAT (the underlined sequence was the recognition site of Nde I), and TGAGAATTCCGGCAT (the underlined sequence was the recognition site of EcoR I), respectively. Library Construction. At first, dsDNAs encoding 22 secondary structure elements were produced by polymerase extension reaction with Klenow fragment polymerase (exo−) at 37 °C for 2 h. The formed dsDNA was heated to inactivate polymerase (75 °C for 20 min), followed by Hinf I digestion at 37 °C for 2 h to produce sticky ends. Hinf I enzyme was inactivated at 80 °C for 20 min. The initiator sequence and the terminator sequence respectively were ligated by T4 DNA
ACKNOWLEDGMENTS The work was supported by the National Natural Science Foundation of China (31170758, 21375071, 21675094, and 31670868).
■
REFERENCES
(1) Baltzer, L.; Nilsson, H.; Nilsson, J. De Novo Design of Proteins What Are the Rules? Chem. Rev. 2001, 101, 3153−3164. (2) Dahiyat, B. I.; Mayo, S. L. De Novo Protein Design: Fully Automated Sequence Selection. Science 1997, 278, 82−87. (3) DeGrado, W. F.; Summa, C. M.; Pavone, V.; Nastri, F.; Lombardi, A. De Novo Design and Structural Characterization of Proteins and Metalloproteins. Annu. Rev. Biochem. 1999, 68, 779−819. (4) Liu, T.; Fu, G.; Luo, X.; Liu, Y.; Wang, Y.; Wang, R. E.; Schultz, P. G.; Wang, F. Rational Design of Antibody Protease Inhibitors. J. Am. Chem. Soc. 2015, 137, 4042−4045. (5) Kaplan, J.; DeGrado, W. F. De Novo Design of Catalytic Proteins. Proc. Natl. Acad. Sci. U. S. A. 2004, 101, 11566−11570. (6) Jiang, L.; Althoff, E. A.; Clemente, F. R.; Doyle, L.; Röthlisberger, D.; Zanghellini, A.; Gallaher, J. L.; Betker, J. L.; Tanaka, F.; Barbas, C. F., III; Hilvert, D.; Houk, K. N.; Stoddard, B. L.; Baker, D. De Novo Computational Design of Retro-Aldol Enzymes. Science 2008, 319, 1387−1391. (7) Kobayashi, N.; Yanase, K.; Sato, T.; Unzai, S.; Hecht, M. H.; Arai, R. Self-Assembling Nano-Architectures Created from A Protein NanoBuilding Block Using An Intermolecularly Folded Dimeric De Novo Protein. J. Am. Chem. Soc. 2015, 137, 11285−11293. (8) Lin, Y. W.; Nagao, S.; Zhang, M.; Shomura, Y.; Higuchi, Y.; Hirota, S. Rational Design of Heterodimeric Protein Using Domain Swapping for Myoglobin. Angew. Chem., Int. Ed. 2015, 54, 511−515. (9) Chen, R. Enzyme Engineering: Rational Redesign Versus Directed Evolution. Trends Biotechnol. 2001, 19, 13−14. D
DOI: 10.1021/acscombsci.6b00145 ACS Comb. Sci. XXXX, XXX, XXX−XXX
Letter
ACS Combinatorial Science (10) Romero, P. A.; Arnold, F. H. Exploring Protein Fitness Landscapes by Directed Evolution. Nat. Rev. Mol. Cell Biol. 2009, 10, 866−876. (11) Bornscheuer, U. T.; Pohl, M. Improved Biocatalysts by Directed Evolution and Rational Protein Design. Curr. Opin. Chem. Biol. 2001, 5, 137−143. (12) Chemler, J. A.; Tripathi, A.; Hansen, D. A.; O’Neil-Johnson, M.; Williams, R. B.; Starks, C.; Park, S. R.; Sherman, D. H. Evolution of Efficient Modular Polyketide Synthases by Homologous Recombination. J. Am. Chem. Soc. 2015, 137, 10603−10609. (13) Stemmer, W. P. Rapid Evolution of A Protein in Vitro by DNA Shuffling. Nature 1994, 370, 389−391. (14) Sieber, V.; Martinez, C. A.; Arnold, F. H. Libraries of Hybrid Proteins from Distantly Related Sequences. Nat. Biotechnol. 2001, 19, 456−460. (15) Bittker, J. A.; Le, B. V.; Liu, D. R. Nucleic Acid Evolution and Minimization by Nonhomologous Random Recombination. Nat. Biotechnol. 2002, 20, 1024−1029. (16) Kamtekar, S.; Schiffer, J. M.; Xiong, H.; Babik, J. M.; Hecht, M. H. Protein Design by Binary Patterning of Polar and Nonpolar Amino Acids. Science 1993, 262, 1680−1702. (17) Tsuji, T.; Onimaru, M.; Yanagawa, H. Towards the Creation of Novel Proteins by Block Shuffling. Comb. Chem. High Throughput Screening 2006, 9, 259−269. (18) Edwards, W. R.; Busse, K.; Allemann, R. K.; Jones, D. D. Linking the Functions of Unrelated Proteins Using A Novel Directed Evolution Domain Insertion Method. Nucleic Acids Res. 2008, 36, e78−e78. (19) Harayama, S. Artificial Evolution by DNA Shuffling. Trends Biotechnol. 1998, 16, 76−82. (20) Lang, D.; Thoma, R.; Henn-Sax, M.; Sterner, R.; Wilmanns, M. Structural Evidence for Evolution of the B/A Barrel Scaffold by Gene Duplication and Fusion. Science 2000, 289, 1546−1550. (21) Nagano, N.; Orengo, C. A.; Thornton, J. M. One Fold with Many Functions: The Evolutionary Relationships Between TIM Barrel Families Based on Their Sequences, Structures and Functions. J. Mol. Biol. 2002, 321, 741−765. (22) Kolkman, J. A.; Stemmer, W. P. Directed Evolution of Proteins by Exon Shuffling. Nat. Biotechnol. 2001, 19, 423−428. (23) Villiers, B. R. M.; Stein, V.; Hollfelder, F. USER Friendly DNA Recombination (Userec): A Simple and Flexible Near HomologyIndependent Method for Gene Library Construction. Protein Eng., Des. Sel. 2010, 23, 1−8. (24) Heckman, K. L.; Pease, L. R. Gene Splicing and Mutagenesis by PCR-Driven Overlap Extension. Nat. Protoc. 2007, 2, 924−932. (25) Farrow, M. F.; Arnold, F. H. Combinatorial Recombination of Gene Fragments to Construct A Library of Chimeras. Current Protocols in Protein Science 2010, 26.2.1. (26) Graziano, J. J.; Liu, W.; Perera, R.; Geierstanger, B. H.; Lesley, S. A.; Schultz, P. G. Selecting Folded Proteins from A Library of Secondary Structural Elements. J. Am. Chem. Soc. 2008, 130, 176−185. (27) Burden, F. R.; Winkler, D. A. The Computer Simulation of High Throughput Screening of Bioactive Molecules. Molecular Modeling and Prediction of Bioactivity; Springer, 2000; pp 175−180.
E
DOI: 10.1021/acscombsci.6b00145 ACS Comb. Sci. XXXX, XXX, XXX−XXX