CiPerGenesis, a Mutagenesis Approach that Produces Small Libraries

Lomas del Texcal, Jiutepec, Morelos, 62550,. México. Keywords: Protein engineering, circular permutations, libraries, oligonucleotides, GFP, mutagene...
0 downloads 0 Views 3MB Size
Subscriber access provided by Kaohsiung Medical University

Article

CiPerGenesis, a Mutagenesis Approach that Produces Small Libraries of Circularly Permuted Proteins Randomly Opened at a Focused Region: Testing on the Green Fluorescent Protein. Paul Gaytán, Abigail Roldán-Salgado, Jorge Arturo Yáñez, Sandra Morales-Arrieta, and Víctor Rivelino Juárez-González ACS Comb. Sci., Just Accepted Manuscript • DOI: 10.1021/acscombsci.7b00152 • Publication Date (Web): 29 May 2018 Downloaded from http://pubs.acs.org on June 9, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 24 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Combinatorial Science

CiPerGenesis, a Mutagenesis Approach that Produces Small Libraries of Circularly Permuted Proteins Randomly Opened at a Focused Region: Testing on the Green Fluorescent Protein. Paul Gaytán,¥* Abigail Roldán-Salgado,¥ Jorge A. Yáñez,¥ Sandra MoralesArrieta≠ and Víctor R. Juárez-González.¥ ¥

Instituto de Biotecnología, Universidad Nacional Autónoma de México, Av. Universidad 2001, Cuernavaca, Morelos, 62210, México. ≠ Departamento de Ingeniería en Biotecnología, Universidad Politécnica del Estado de Morelos, Boulevard Cuauhnáhuac No. 566, Col. Lomas del Texcal, Jiutepec, Morelos, 62550, México. Keywords: Protein engineering, circular permutations, libraries, oligonucleotides, GFP, mutagenesis Abstract Circularly permuted proteins (cpPs) represent a novel type of mutant proteins with original termini that are covalently linked through a peptide connector and opened at any other place of the polypeptide backbone to create new ends. cpPs are finding wide applications in biotechnology because their properties may be quite different from those of the parental protein. However, the actual challenge for the creation of successful cpPs is to identify those peptide bonds that can be broken to create new termini and ensure functional and well-folded cpPs. Herein, we describe CiPerGenesis, a combinatorial mutagenesis approach that uses two oligonucleotide libraries to amplify a circularized gene by PCR, starting and ending from a focused target region. This approach creates small libraries of circularly permuted genes that are easily cloned in the correct direction and frame using two different restriction sites encoded in the oligonucleotides. Once expressed, the protein libraries exhibit a unique sequence diversity, comprising cpPs that exhibit ordinary breakpoints between adjacent amino acids localized at the target region as well as cpPs with new termini containing user-defined truncations and repeats of some amino acids. CiPerGenesis was tested at the lid region G134-H148 of GFP, revealing that the most fluorescent variants were those starting at Leu141 and ending at amino acids Tyr145, Tyr143, Glu142, Leu141, Lys140 and H139. Purification and biochemical characterization of some variants suggested a differential expression, solubility and maturation extent of the mutant proteins as the likely cause for the variability in fluorescence intensity observed in colonies.

1

ACS Paragon Plus Environment

ACS Combinatorial Science 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 24

INTRODUCTION Proteins can be artificially modified in the laboratory by performing amino acid substitutions, deletions or insertions, either in random, combinatorial or site-specific modes, based on the background of the target protein and the goal of the research project. However, whereas substitutions are usually well tolerated from a structural point of view, deletions and insertions generally affect the folding process because the shortening or enlargement of the polypeptide backbone modifies the internal interactions between the side chains of the 1,2 neighboring amino acids close to the modification site, affecting the thermodynamic stability of the proteins. Misfolded proteins may be subjected either to immediate cellular proteolytic degradation or to aggregation as 3 inclusion bodies, reducing the amount of soluble protein in both cases. The same misfolding drawback frequently occurs in circularly permuted proteins (cpPs), a less explored 4 type of mutants contained in the protein engineering toolkit. cpPs refer to rearranged proteins with original N and C termini that are directly joined by a peptide bond or through a peptide linker of appropriate length and 4,5 composition and are opened in another location of the polypeptide chain. In simple terms, cpPs start and end somewhere else in the polypeptide chain compared with the parental protein. Therefore, whereas amino acid substitutions maintain the same topology of the parental protein, insertions, deletions and cpPs rearrange the 6 amino acid sequence order of the target protein and its secondary structural elements. X-ray crystallographic studies have revealed few changes in the overall structures of cpPs compared to the parental proteins, and as expected, more pronounced variations are found in the linker and new terminus 4 regions. However, these subtle perturbations can generate significant conformational changes, altering protein 1 7 8 properties such as the stability to temperature, sensitivity to pH, resistance to proteolysis, and changes in 9 10,11 substrate specificity or enzyme activity. Because the linking of the original termini can easily be solved by testing linker options reported in the 4,5 literature, the actual challenge regarding cpPs is to find the best location of the protein to break an existing peptide bond and generate new termini that can maintain protein folding and function. To address this issue, Graf and Schachman developed a multi-enzymatic method to explore possible 12 new termini along the complete target protein. However, because this method relies on the use of DNase I to open a circularized gene at random, followed by cloning of the resulting linearized genes as blunt-ended products, most of the circularly permuted genes are cloned either in the incorrect orientation or in the incorrect frame, yielding a very low ratio of correct circularly permuted genes. The main problems associated with this approach are the difficult control of the highly active enzyme DNase I to generate one cut per circularized gene on average, and the frequently poor ligation efficiency of blunt-ended DNA fragments, as well as the many 1 intermediate purification steps that significantly decrease, step by step, the amount of the final insert. These troubles likely limit the method for generating all possible breakpoints along the complete target 13 protein, as revealed when Baird et al. applied the method to GFP. They found new functional breakpoints only 14 15 beyond the second half of the protein, whereas in independent studies, Topell et al. and Waldo et al. demonstrated that new ends were also tolerated in different locations in the first half of the protein via sitedirected mutagenesis. A second approach relies on the hypothesis that those sites of the protein that tolerate the insertion of a 16 pentapeptide must also function as new termini in circularly permuted proteins. Therefore, in the first part of this strategy, a library of genes containing insertions of 15 bp at random locations must be created via the use of transposon-based mutagenesis before the functional mutants are selected. In the second part, an individual circularly permuted protein is created by site-directed mutagenesis for each of the functional mutants that tolerated insertion. Unfortunately, as demonstrated in the original report, proteins are more prone to tolerate peptide insertions than new termini: whereas 6 fluorescent variants containing insertions at different places were isolated, only two of them gave rise to fluorescent circularly permuted proteins. Recently, a clever and simpler approach for creating libraries of circularly permutated proteins was 17,18 published. This third method, nicknamed PERMUTE, relies on the use of Transposase MuA to randomly insert an engineered minitransposon into a circularized gene. The transposon contains all the genetic information to function as a protein expression vector, translating any DNA sequence located downstream of the promoter designed at one end of the transposon. However, because transposons lack insertion directionality and may be introduced at any position into a codon, only 1/6 of the inserted transposons generate in-frame translatable products. Furthermore, due to the design of the engineered transposon, all the resultant circularly permuted proteins are appended at the new N-terminus with the foreign sequence MGFRIYRETLSRFSCAAQ encoded by one of the MuA recognition sites, and with two random amino acids at the new C-terminus, the identity of which depends on the position where the transposon was inserted. These modifications at the new termini may be ambiguous because it is not clear whether opening at this position is allowed or forbidden due to 2

ACS Paragon Plus Environment

Page 3 of 24 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Combinatorial Science

the presence of the extra sequences. Reduction of the appended peptide to two amino acids at the new Nterminus and three or one amino acid at the new C-terminus was achieved by modifying the MuA recognition 19,20 sites in two independent works, although the modified transposons were 7-fold less efficient in the 20 integration process. In this regard, although transposon-based methods seem to be the best choice for generating cpP libraries, perhaps the main disadvantage of these genetic elements is their intrinsic preference to be integrated at certain nucleotide sequences as demonstrated early, generating an important distribution 21 bias. Because the enzymatic methods create constrained sequence diversity, many researchers prefer a more conservative approach, sequentially opening consecutive peptide bonds localized in a target region of the 11,22,23 protein. For each single opening, a specific pair of oligonucleotides must be used to amplify a circularized 14 24 gene or a tandem fusion template, and the resultant permuted genes must be cloned independently. This site-directed approach is very effective for the exploration of all possible breakpoints, but it requires great effort for molecular cloning work as well as a high cost because many DNA primers and reagents are necessary for each independent cloning process. Fischereder et al. reduced the experimental work by cloning seven site25 specific circular permutations together, which they defined as a “small focused library”. The conservative approach may be appropriate for proteins having functions that are difficult to select or screen and for which each permutant must be assayed individually, but it is not suitable for proteins that exhibit 6 fluorescent properties, antibiotic resistance or another easily selectable property. For these types of proteins, we are reporting a novel oligonucleotide-based combinatorial approach that generates small libraries of circularly permuted proteins displaying random new termini between the amino acids in a focused protein region, including clean cleavages between each pair of adjacent amino acids or cleavages combined with some amino acid repeats or truncations at the new ends. This method has been named CiPerGenesis, meaning “generator of circular permutations”. Taking advantage of our experience in combinatorial chemistry and oligonucleotide synthesis, we realized that amplifying a circularized gene by PCR (Figure 1) with two libraries of oligonucleotides truncated at a codon-level, would generate a small library of circularly permutated genes with all components ready to be cloned in the correct direction and correct open reading frame. Each library of forward and reverse truncated oligonucleotides was synthesized via the robust resin-splitting approach, withdrawing a portion of resin every 26 three nucleotides until completion of the focused target sequence. The method was thoroughly tested in the region Gly134-His148 of GFP, revealing six strongly, nine moderately and ten weakly fluorescent cpPs, as well as several non-fluorescent variants. Purification and biochemical characterization of some variants suggested a differential expression, solubility and maturation extent of the mutant proteins as the likely causes for the variability in fluorescence intensity observed in colonies. Because the construction of libraries by CiPerGenesis relies on standard molecular biology techniques and can be performed with pools of commercial oligonucleotides or home-made oligonucleotide libraries, the method represents a practical, rather than a fundamental or conceptual advance and affords the systematic exploration of circular permutations in a focused region of the target protein. CiPerGenesis represents the 6 lacking mutagenesis technique recently mentioned by Higgins and Savage for the creation of libraries of topological mutations that are as tunable as oligonucleotide-based substitution methods. RESULTS AND DISCUSSION Rationale for CiPerGenesis creation of circularly permuted proteins. As shown in Figure 1, the method relies on the PCR amplification of a circularized gene, using a mixture of some pair of oligonucleotides designed to open contiguous peptide bonds focused on a target protein region. Combinations of the different forward and reverse oligonucleotides during the PCR step must create circular permutants with any possible breakpoint between the contiguous amino acids contained in that region as well as circular permutants with new termini that contain short repeats or truncations of amino acids contained in the focused region. By including different sticky-end forming restriction sites in both the forward and reverse oligonucleotides, CiPerGenesis must ensure that all circularly permuted variants are cloned in only one experiment, in the correct open reading frame and with the correct cloning directionality. Design and synthesis of the oligonucleotide libraries. As a proof-of-principle for testing CiPerGenesis, we decided to focus on the stretch from G134-H148 of GFP. This region was selected because it is part of the longest lid of GFP that crosses the complete β-barrel at the bottom of the structure, as shown in 3

ACS Paragon Plus Environment

ACS Combinatorial Science 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 24

Figure 2, and because it has been previously subjected to single amino acid deletions and substitutions by our research group. Furthermore, the first reported functional permutation, Y145M-N144, was located in this 13 5,7,27 region, and the same new termini have been used in other fluorescent proteins. The nomenclature proposed for identifying cpPs in our report indicates that the new protein begins at any “X” amino acid of the stretch from N135-H148, replaced with methionine for translation initiation, and ends at any of the residues from G134-S147, as shown in panels c and d of Figure 2. In this regard, standard cpPs will begin at any of the X amino acids and will end at the preceding “X-1” amino acid. However, for circular permutations containing terminal deletions, the protein must end at the “X-2”, “X-3”, “X-4” amino acid and so on. For variants containing terminal repeats, the proteins must end at the X, “X+1”, “X+2” amino acid and so on. This nomenclature assumes that the original termini are linked through an arbitrary peptide to perform continuous translation of the polypeptide. The easiest way to prepare both libraries of forward and reverse oligonucleotides to test CiPerGenesis would have been to synthesize each oligonucleotide individually and then mix them in equimolar concentrations. However, to reduce environmental contamination due to the chemicals used for the synthesis of many individual oligonucleotides, as well as to reduce experimental costs, we realized that each library of forward and reverse 28 oligonucleotides could be synthesized via only one process using the resin-splitting approach, if all members of the pool shared some features: the same 3’ priming site and the same 5’ cloning site. In contrast to the 29 monomer-based truncation method, this synthetic approach enables the creation of libraries of oligonucleotides that are truncated at a codon level instead of a nucleotide level and that after PCR amplification will ensure the generation of mutant genes with the correct open reading frame. Considering the longest oligonucleotide covering the whole focused region as the target sequence, the technical process implies (1) assembly of the 3’ flanking region, (2) removal of a small fraction of resin containing the growing oligonucleotide every three nucleotides, (3) storage of the removed fractions on a second synthesis column and (4) completion of all oligonucleotide variants with the 5’ restriction site in the second column. For assembly of the forward oligonucleotide library, the sequence 5’-TTTCATTTCCCATATG ↓atc↓ctg↓gga↓cac↓aaa↓ctg↓gaa↓tac↓aac↓tat↓aac↓tca↓cac↓AATGTATACATCATGGCAGAC-3’ was programmed on the synthesizer, and the synthesis was initiated from the 3’ end, as usual. After the first 21 nucleotides were incorporated, the synthesis was interrupted, the column was opened, and a small fraction of the oligonucleotidecontaining resin was removed and stored in a second column, as shown in Figure 3. The column was closed, and the synthesis resumed. This operation of synthesis and removal of resin was repeated every three nucleotides at the places indicated with the inverted arrows until the target region was completely synthesized. Then, the second synthesis column containing all the truncated oligonucleotide species was installed on the synthesizer, and the synthesis was resumed to incorporate the 5’ restriction cloning site into all the species, as well as some additional nucleotides, to enable an efficient enzymatic cleavage. In our case, we always used the NdeI restriction site (catatg) as the 5’ cloning site to also encode the starting methionine of the heterologous proteins. To compensate for the stepwise decrease in oligonucleotide yield as the chain is growing and produce libraries with a balanced concentration of truncated species, incremental amounts of resin were withdrawn after assembling each codon, as thoroughly described in Experimental Procedures. The reverse oligonucleotide library was programmed as 5’-AACGACGGCTCGAGTTA↓tga↓gtt↓ata↓gtt ↓gta↓ttc↓cag↓ttt↓gtg↓tcc↓cag↓gat↓gtt↓gccGTCTTCCTTGAAGTCGAT-3’ and synthesized in a similar fashion as the forward library, removing the resin at the places indicated by the inverted arrows. The oligonucleotides included in this set contained the 3’ cloning site XhoI (ctcgag) and the anti-stop codon TTA. After normal deprotection of the oligonucleotide libraries and purification by cartridges loaded by reverse-phase packing, both pools were observed as ladders of oligonucleotides of different lengths in roughly even concentrations as expected, when they were analyzed by denaturing polyacrylamide gel electrophoresis, showing a difference of three nucleotides between the growing species. This result indicated a successful synthesis of both libraries, as shown in the inset in Figure 3. Assembly of the library containing the circularly permuted protein variants. The GFP version used as the parental protein for testing CiPerGenesis contained the mutations F64L, S65C, F99S, I167T, L231H and K238N compared with the original GFP and was generated from the redshifted GFP (rsGFP) gene by addition of the mutation F99S. This mutant, hereafter referred to as GFP for simplicity, displays a unique excitation peak at 474 nm and an emission peak at 505 nm. It was selected as the model protein because its expression in E. coli cells produces green-fluorescent colonies that are more intense than those expressing the most well-known 30 enhanced GFP (EGFP) under our cloning conditions.

4

ACS Paragon Plus Environment

Page 5 of 24 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Combinatorial Science

For construction of the protein library, the GFP gene was amplified with two primers that incorporated the same restriction site at both ends of the gene, as well as the coding sequence to join both ends of the encoded proteins via the peptide linker GGSGGS. In this sequence, the underlined dipeptide is encoded by the BamHI restriction site ggatcc. After enzymatic digestion with BamHI, intramolecular ligation of the purified product with T4 DNA ligase and removal of the likely linear tandem repeat byproduct with exonuclease III, the circularized gene was used as a template to create the library of circularly permuted genes, using both libraries of truncated oligonucleotides as PCR primers. Only one PCR product of the expected size was obtained, as detailed in Experimental Procedures. Here, it is worth mentioning that although linear tandem repeats were enzymatically eliminated to avoid possible extension subproducts during PCR, perhaps this step was unnecessary because tandem repeats have been successfully used as templates for the creation of circularly 24 permuted genes in other works. Appropriate cloning of the PCR product into a constitutive, multicopy expression vector was achieved using a combination of NdeI and XhoI restriction sites. The usage of two different restriction sites flanking the insert obligated it to be cloned in the correct direction. Transformation of the ligation reaction into E. coli cells produced colonies that seemed to be non-fluorescent under visualization with blue light and an orange filter after being cultured at 37°C for 20 h on LB plates. However, some of them became clearly fluorescent after two days of storage at 4°C; some others became slightly fluorescent over subsequent days, and many other colonies were never fluorescent. For improved screening, approximately 10,000 transformed cells were cultured on LB plates incubated at 25°C for 40 h and then stored at 4°C for several days to perform the visual selection of functional and non-functional variants. Fluorescent colonies of different intensities represented 26.6% of the whole population, whereas the remainder corresponded to non-fluorescent colonies. These results clearly indicate that the region 134-148 is more tolerant to fission than to amino acid deletions, as previously 15 demonstrated when only 0.05% of all possible deletion variants resulted fluorescent. Analysis of library completeness. The expected number of protein variants in the library was only 196, as shown in the sequence space displayed in Table 1, because each oligonucleotide library was composed of 14 species. Therefore, the size of the experimental library N to achieve a library completeness of 99% must 31 be N = ln(1-0.99)/ln(1-1/196) = 900 clones, considering an equimolar concentration of the oligonucleotides in both pools, as well as no differential annealing during the PCR process. This number means that after screening 900 transformants, we had a 99% probability of analyzing the complete library of 196 variants. Because we obtained approximately 10000 transformed cells, it is evident the library was amply oversampled to compensate for experimental bias generated during the PCR step, as described below. Mapping and nature of viable and non-viable new termini. E. coli cells expressing the library of circularly permuted GFP variants (cpGFPs) produced colonies displaying four different phenotypes on LB plates: strongly fluorescent, moderately fluorescent, weakly fluorescent and non-fluorescent, classified after storage at 4°C for 1, 3, 7 and 14 days, respectively, by direct visualization of the brightness intensity of the colonies. DNA sequencing from plasmids isolated from different randomly selected colonies revealed the nature of the new termini in the creation of functional and non-functional cpGFPs. All sequenced variants were concentrated in Table 1, and some of the most fluorescent variants expressed in E. coli were re-streaked on solid LB to perform a qualitative follow-up study of their maturation process, as shown in Figure 4. Here, it is worth mentioning that, using conventional Sanger DNA sequencing, we were able to sample 32% of the whole sequence space because this library population is very small; however, for the analysis of larger libraries, it may be necessary to perform deep sequencing using Next-Generation DNA Sequencing 6 methods to analyze the quality of the naïve and selected libraries, as recently discussed. The first point to note in the library created with CiPerGenesis was an enrichment of the variants generated with the larger oligonucleotides, which produced cpPs containing terminal repeats at the new ends, located above the diagonal line in Table 1, whereas those variants that should contain large amino acid deletions generated with the shortest oligonucleotides, located in the left lower side of Table 1, were underrepresented. This result clearly represents an annealing bias of the oligonucleotides during the PCR step, a problem that can be overcome by reducing the stringent annealing conditions during the PCR process or increasing the molar ratio of the shortest oligonucleotides. For instance, withdrawing the same amount of resin at each step during the resin-splitting process. Second, these sequencing results demonstrated a high tolerance of the protein to be open in the region 134-148. Most of the amino acids located in this region, L137, G138, H139, K140, L141, Y143, N144, Y145, N146 and H148, were successfully replaced with methionine to act as the new N-terminus, producing 5

ACS Paragon Plus Environment

ACS Combinatorial Science 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 24

moderately or weakly fluorescent variants, except those permutations that started at L141M, which produced the most fluorescent variants, as shown in Figure 4. Analyzing the crystal structure of GFP (Figure 3), L141 is located at one extreme of the lid formed by residues D128-L141, and its hydrophobic isobutyl side-chain is buried in the protein core, near one of the edges of the β-barrel and far away from the chromophore and from the other side chains clustered in the center of the structure. Of the 14 residues N135-H148 that could be replaced with methionine to generate the new N-terminus of cpGFPs, L141 was the least structurally compromised residue because it contains the fewest atomic contacts with other residues or structural molecules of water. Additionally, this preference was likely increased by achieving a conservative replacement at position 141, replacing the hydrophobic isobutyl side chain of leucine with the hydrophobic S-methylthioether side chain of methionine. In fact, when single amino acid substitutions were performed in the parental GFP, the conservative replacements L141I, L141V and L141W produced cells as fluorescent as those expressing the parental protein, whereas replacement of L141 with the hydrophilic residues glycine, asparagine, serine and 15 aspartate produced non-fluorescent colonies. The third point to note from these sequencing results was the lack of fluorescent cpGFPs that started at N135, I136, E142 and S147, even when we sequenced 26 strongly fluorescent variants, 15 moderately fluorescent variants and 13 weakly fluorescent variants. The size of the analyzed sample may have been too small to identify all the possible functional variants, although we decided to suspend the screening when some cpGFPs started to be repetitively found, as indicated in Table 1. For instance, L141M-K140 was found eight times, L141M-H139 five times and L141M-L141 and L141M-E142 four times. This result does not represent a strong bias in the generation of diversity in the naïve library, but in contrast, it is the result of a robust screening procedure because we initially focused on analyzing the most fluorescent colonies in the library. In fact, the original screening of fluorescent variants had suggested that cpGFPs starting at H139 and Y143 were also nonfluorescent. Consequently, some variants starting at N135, I136, H139, E142, Y143 and S147 were constructed by site-directed mutagenesis to confirm the phenotype of this type of permutants. The resulting variants H139ME142 and Y143M-Y145 were moderately fluorescent, whereas N135M-G134, N135M-L137, I136M-N135, I136M-G138, E142M-N144 and S147M-S147 were non-fluorescent, suggesting that residues N135, I136, E142 and S147 play important structural roles in the generation of fluorescent molecules. Analysis of the most fluorescent variants in Table 1 also revealed the essence of CiPerGenesis, the capacity to generate standard cpPs breaking the peptide bond between contiguous amino acids, as in the case of the variant L141M-K140, as well as the capacity to generate variants containing terminal repeats or truncations of some amino acids. For instance, the cpGFP variants L141M-Y145, L141M-Y143, L141M-E142 and L141M-L141 contain the C-terminal repeats L141 to Y145, L141 to Y143, L141 to E142 and L141, respectively, whereas the variant L141M-H139 contains a truncation of K140. Frequently, terminal repeats 16,22 stabilize the new termini of circularly permuted proteins, as clearly evidenced by four of the moderately and weakly fluorescent cpGFP variants shown in Figure 5. G138M-K140, containing a repeat of three amino acids G138 to K140, is more fluorescent than G138M-H139 or G138M-G138 containing two and one amino acid repeats, respectively. Even worse, the standard permutation G138M-L137 containing no terminal repeats resulted in a non-fluorescent variant. In contrast to the parental GFP, which resisted truncation up to six amino acids at the N-terminus and 32 nine amino acids at the C-terminus, the sequencing results demonstrated that truncation of multiple amino acids at the new termini of cpGFPs produced non-fluorescent variants, as reported in Table 1, a result that is in 23 line with truncations of the circularly permuted red fluorescent protein mKate. Mutants containing a truncation of three or more amino acids at the C-terminus were unable to develop fluorescence and only resisted the truncation of one or two residues, as occurred with the fluorescent variants K140M-G138, L141M-H139, L141MG138, Y145M-Y143 and H148M-N146. This result is interesting considering that the twelve mutants of the native GFP protein containing single amino acid internal deletions in the region F130-L141 resulted in nonfluorescent mutations due to poor expression and solubility of the proteins, as previously determined by western 15 blot analysis. However, cpGFPs containing terminal truncations, exemplified by the variants L141M-H139 and Y145M-Y143, which lacked residues E140 and N144, respectively, were well expressed in solution, as will be shown later. Therefore, circular permutation may serve to release tensional stress that has accumulated in certain regions of the proteins, enhancing the folding process of the polypeptide. In vivo fluorescence intensity. The variability in fluorescence intensity observed with cells expressing our different cpGFP variants, as shown in Figure 4, might have arisen because permutation affected the intrinsic 19,33 fluorescent properties of the proteins, their maturation or their expression level. Because three clear different

6

ACS Paragon Plus Environment

Page 7 of 24 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Combinatorial Science

phenotypes of fluorescent cpGFPs were generated by CiPerGenesis, we performed a systematic search to identify the factors responsible for these differences. Effect of incubation temperature. As mentioned above and shown in Figure 4, cells expressing functional cpGFPs are more fluorescent if they are grown at 25°C than at 37°C, requiring in both cases obligate storage at 4°C to complete the maturation process. Lowering the temperature is necessary to achieve appropriate folding of each polypeptide and enable post-translational formation of the chromophore. Consistent with the “vectorial folding hypothesis” that assigns an important role in the folding pathway to the amino terminus 34 of the growing polypeptide chain, our results suggest that cpGFPs with new termini in the region 134-148 follow a different folding pathway in comparison to the parental GFP engineered protein, the folding and 35 maturation of which are optimized at high temperatures. Since the translation of our fluorescent cpGFP variants occurred immediately before β-strand number 7, the folding process must start at this secondary element to produce molten globules that are more stable at 25°C than at 37°C. Subsequent cooling at 4°C should reduce the movement of the un-structured polypeptide chains, favor the folding process and enable the maturation step. SDS-PAGE analysis of soluble protein produced in normalized cultures grown at 37°C and 25°C supported this idea (Figure 6), revealing, on average, an approximately 10-fold higher protein level at 25°C than at 37°C. Therefore, cpGFPs with new termini in the region 134-148 may likely follow a thermodynamic folding process because they are produced better at lower temperatures, in contrast to the kinetic folding presented by the parental GFP protein, which shows increased protein levels with risen of the incubation temperature. Expression and solubility. As determined by SDS-PAGE analysis of the total extract of proteins expressed at 25°C (Figure 6, panel b), most cpGFPs were expressed at similar or higher levels than the parental GFP protein, as determined by densitometric analysis (Figure S1). Such differential expression may be the result of a different translation initiation rate because the permuted genes displayed a different genetic context immediately next to the ribosome binding site (RBS), generating folded mRNAs that should exhibit very 30 different stabilities, slowing or favoring the translation process and giving rise to different protein levels. In a similar mode, libraries of synonymous genes of the enhanced green fluorescent protein (EGFP), carrying nucleotide mutations but encoding the same polypeptide, produced EGFP protein levels that varied 250-fold 33 across the library. A similar expression result has also been recently reported for random generated 19 permutations of the enzyme adenylate kinase. Another difference between the parental GFP and cpGFPs is their solubility, which relies on the folding 3,36 capacity of the protein. Because native GFP folds efficiently, most of the expressed protein remains in the soluble fraction, as apparent in lane 2 in Figure 6c, in contrast to only traces in the insoluble fraction in lane 2 in Figure 6d. cpGFPs fold inefficiently and are differentially distributed between the soluble and insoluble fractions. In this regard, the variant N146M-N146, analyzed in lane 6 in Figure 6c, produced the highest amount of soluble protein, as observed by SDS-PAGE, whereas a lower amount of soluble protein was produced by the variants L141M-Y145 and Y145M-K140, as shown in lanes 12 and 13, respectively. Since variant L141M-Y145 produced strongly fluorescent colonies, as shown in Figure 4, N146M-N146 produced moderately fluorescent colonies and Y145M-K140 produced non-fluorescent colonies, it is clear the level of colony brightness cannot be explained only by the amount of soluble protein contained in the cells. Therefore, spectroscopic or structural factors must be influencing the diversity of fluorescent phenotypes. Opposite to native GFP and cpGFPs, a negative control, the non-fluorescent GFP mutant containing the chromophore-destroying mutations L64A/C65M/Y66G/G67V was detected as traces in the total extract and insoluble fraction in lanes 1 in Figure 6b and 6d, respectively, but nothing was observed in the soluble fraction (lane 1, Figure 6c), suggesting that this protein exhibited serious 3 folding failures and was proteolytically degraded. Because many cellular factors may affect the protein expression and folding rate, conclusive results regarding expression and solubility would require the use of cell-free expression systems and in vitro refolding experiments of insoluble cpGFPs as suggested by one of the reviewers. Spectroscopic properties. As shown in supplementary Figure S2, the excitation and emission spectra of the soluble fraction of fluorescent cpGFPs resembled those of the parental GFP. Most of the variants displayed a maximum absorbance at 474 nm and maximum emission at 505 nm, suggesting that all of them contained a GFP-type chromophore in the ionized state. Only the cpGFP N146M-N146 exhibited a different excitation spectrum, with two maximum excitation peaks centered at 473 nm and 395 nm at a ratio 1.00:0.75, corresponding to the ionized and neutral chromophore, respectively, resembling the excitation spectrum of the original GFP isolated from Aequorea victoria. Due to these excitation peaks, the mutant N146M-N146 must be very sensitive to the surrounding environment and may be a good candidate for molecular sensing. However, no

7

ACS Paragon Plus Environment

ACS Combinatorial Science 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 24

additional differences were found between the other cpGFP variants that could account for the different phenotypes observed in the colonies, except the fluorescence intensity. In vitro characterization. Differential colony brightness was addressed by purifying and characterizing five cpGFP variants: G138M-K140, L141M-K140, L141M-Y143, L141M-Y145 and N146M-N146, as well as the previously reported cpGFP variant Y145M-N144 and the parental GFP protein. For purification purposes, each protein was appended with a poly-histidine tail (-GTGGSHHHHHH) at its carboxyl terminus, and the proteins were double-purified by immobilized metal affinity chromatography (IMAC) using Ni-NTA as the stationary phase, first in batch and then by HPLC using an imidazole gradient. For purification, the seven proteins were expressed at 22°C for 48 h and matured for 7 days at 4°C. The yields in grams/liter are reported in Table 2. Unexpectedly, the variant protein N146M-N146 was recovered with the lowest yield, even when it was the most expressed in solution, as previously mentioned. This inconsistency between the amount observed by electrophoresis and the final amount of purified protein can be explained by the generation of a soluble nonfluorescent dark green species, likely corresponding to a molecular aggregate, which was eluted immediately before the fluorescent fraction during HPLC purification. Some of the other variants also produced such subproducts, but in lower proportions with respect to the fluorescent fraction. The variant with the best obtained yield was L141M-Y143, which was even higher than the native protein, whereas the others were obtained in lower yields, confirming their diversified expression and therefore their partial participation in the brightness of the colonies. However, when we determined the extinction coefficient (Ɛ) of each variant by correlating protein concentration versus fluorescence intensity, we confirmed the diminished values for all the cpGFPs compared with the native protein, suggesting an incomplete maturation of the chromophore (Table 2). For instance, if we -1 -1 assume the parental protein performed complete maturation of its chromophore displaying Ɛ 25668 M cm , -1 -1 then the cpGFP variant G138M-K140 displaying Ɛ 7700 M cm reached only 30% maturation. A similar result for maturation reported in Table 2 was obtained by correlating the ratios of the absorbance intensity of the peak corresponding to the chromophore and the peak at 280 nm corresponding to the aromatic residues of the protein when the absorbance spectra were recorded at pH 11. This high pH was necessary to ensure complete ionization of the chromophores, especially in the case of variants Y145M-N144 and N146M-N146, which at physiological pH presented a high ratio of the neutral chromophores with significant absorbance at 400 nm, as shown in supplementary Figure S3. Therefore, these results provide clear evidence that the purified cpGFP proteins were actually mixtures of a mature GFP with a brightness very similar to that of the parental fluorescent protein and a non-fluorescent alloprotein. This non-fluorescent fraction may be a misfolded conformer that is incapable of promoting the series of posttranslational modifications necessary for chromophore formation, as 16 previously suggested. Such results are also consistent with the observation that most of the cpGFP variants exhibited quantum yields (ɸ) similar to the native protein because this parameter is exclusive to the fraction of mature proteins by correlating the maximum absorbance of the chromophore versus its fluorescence, leaving out the portion of un-matured proteins. Variants N146M-N146 and Y145M-N144 were the exception, displaying the lowest quantum yields of the series with excitations at 400 nm and 474 nm and emission at 505 nm. However, in contrast to the absorbance spectra of most cpGFPs, which exhibited only one peak centered approximately 474 nm, variants N146M-N146 and Y145M-N144 displayed very wide peaks centered at 465 nm and 474 nm, respectively, but they also contained shoulders approximately at 400 nm. Although similar, most cpGFP variants displayed a slightly lower ɸ than did the native protein, suggesting that their respective chromophores were hosted in more relaxed structures and therefore were more exposed to the solvent, suffering nonradiative decay. The confirmation of a more relaxed structure was supported by measuring the Tm of the variants (Table 2), which was 12°C to 22°C lower than that of the parental protein and had a more acidtitratable chromophore, as shown in supplementary Figure S3. Therefore, these in vitro experiments suggested that differential expression, solubility and maturation extent of the mutant proteins may be the cause for the different fluorescence phenotypes observed in the library. However, conclusive results will require to remove the purification tag from the cpGFPs because it is well known that any tag appended either at the N or C-terminus 37 may introduce unpredictable changes in the protein properties, a result supported by our own findings regarding the importance of repeats at the new C-terminus of cpPs. Testing of CiPerGenesis in other regions of GFP. Once the GFP region G134-H148 was thoroughly scrutinized by CiPerGenesis to seek for new viable termini, the method was applied to scan three other regions of the protein. Preliminary results revealed that none of the 14 peptide bonds in the V120-G134 region produced functional fluorescent cpGFP variants when they were randomly opened, suggesting this region is intolerant to 38 new termini and is important for maintaining the structural integrity of the protein. However, the same analysis 8

ACS Paragon Plus Environment

Page 9 of 24 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Combinatorial Science

of the H148-K162 and K162-V176 regions produced some cpGFP variants that were more fluorescent than those found in the region G134-H148, displaying faster maturation at 37°C and 25°C, as shown in 13 supplementary Figure S4. In agreement with previous reports, some of the new N-termini fell within β-strands 7 and 8, near the chromophore, as in the cases of the permutants N149M-K156, N149M-N159 and V150MD155, or at the end of the same β-strands, as in the cases of I152M-K158, I152M-K156, R168M-E172 and R168M-G174. Only two of them, N170M-E171 and I171M-E172, fell within the loop joining β-strands 8 and 9. Based on these results, it is evident that if the new N-terminus falls within a β-strand, then a long terminal repeat is required at the C-terminus to encode the complete strand, displacing the initial truncated strand. This requirement is not necessary if the breakpoint falls within a loop, as exemplified by variants N170M-E171 and I171M-E172, each of which contains only two terminal repeats. From these libraries, it is important to highlight the fluorescence of the circularly permuted protein R168M-G174, which was more brilliant than the native protein expressed in E. coli at both temperatures, 25°C and 37°C, a property making it attractive for use in organisms that grow at low temperatures. The engineered native GFP was highly expressed at 37°C but poorly expressed at 25°C, as shown in supplementary Figure S4. In summary, we demonstrated that CiPerGenesis is a reliable and straightforward oligonucleotidebased method that creates small libraries of circularly permuted genes that are conventionally cloned in the correct direction and frame, ensuring high-quality libraries of circularly permuted proteins. The members of these libraries may display standard breakpoints between contiguous amino acids of a target protein region, as well as new termini containing terminal repeats or truncations of one or more amino acids within the target region. C-terminal repeats were more beneficial for maintaining the folding and function of the circularly permuted proteins than ordinary variants containing breakpoints between contiguous amino acids or those lacking some residues. L141M was the preferred new N-terminus for the creation of the most fluorescent cpGFPs among the 14 possible new N-termini from N135 to H148. Although CiPerGenesis was tested with groups of 14 peptide bonds, the search can be extended to randomly open 40-50 peptide bonds using more 39 efficient methods of chemical synthesis of DNA, provided the assembly method can be automated to produce libraries of truncated oligonucleotides. Automation may be feasible by using two sets of nucleosidephosphoramidites that are protected with orthogonal protecting groups in their 5’ hydroxyl group, as used in a 40 previous report. One set may be protected with the 4,4’-dimethoxytrityl group (DMTr), whereas the other may be protected with 9-fluorenylmethoxycarbonyl group (Fmoc). DMTr is labile to acid and resistant to alkali, whereas Fmoc behaves oppositely. In this regard, a fraction of the growing oligonucleotide sequence can be repetitively arrested every three nucleotides by coupling a mixture of Fmoc-protected nucleoside phosphoramidites and DMTr-protected nucleoside phosphoramidites. Since conventional oligonucleotide synthesis occurs under acidic conditions, the Fmoc-protected oligonucleotide chains will be accumulated during the synthesis process until one removes the Fmoc group with weak basic treatment and then incorporates the 5’ cloning site in all arrested species. Mixing several forward and reverse primers that are designed to open contiguous peptide bonds localized on a target protein region may be a viable commercial choice to prepare oligonucleotide libraries. Finally, libraries of truncated oligonucleotide can also be applied to perform truncation studies of any gene to experimentally determine the minimum size of the encoded protein.

EXPERIMENTAL PROCEDURES GFP circularization. In a standard PCR reaction (1 cycle: 94°C – 3 min; 25 cycles: 94°C – 1 min, 58°C – 1 min, 72°C – 1 min; 1 cycle: 72°C – 5 min), the GFP gene was amplified with the forward primer ttgatcctgattggat ccagcaaaggagaagaactcttcac and the reverse primer gcggcttaagaaggatccaccgctaccaccgttgtacagttcatccatgccatg using Vent DNA polymerase for the extension process. Both primers contained a BamHI restriction site (underlined) for circularization at the DNA level, whereas at the protein level they were designed to replace amino acids Met1 and Ala1b with the peptide linker GGSGGS to covalently join the original termini of the encoded protein. Because the PCR reaction yielded only one product of the expected size, the remaining dNTPs, salts and primers were removed by a spin column using the EZ-10 PCR purification kit from Bio Basic Inc. The DNA fragment was recovered with water and immediately digested with the restriction enzyme BamHI, as recommended by the supplier (New England BioLabs). After overnight incubation at 37°C, the digested product was purified in a 1% agarose gel using the EZ-10 Spin Column DNA Gel Extraction Kit, also from Bio Basic Inc. The pure product was subjected to intramolecular 9

ACS Paragon Plus Environment

ACS Combinatorial Science 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 24

enzymatic ligation with T4 DNA ligase over 16 h at 16°C, and then the reaction was treated with exonuclease III (ExoIII) over 2 h at 37°C to eliminate potential concatamers as well as unreacted starting material. ExoIII was inactivated at 70°C for 20 min, and the circularized gene was used as a template for the permutation experiments with no further treatment. Chemical synthesis of oligonucleotides truncated at a codon level. Two libraries of truncated 28,41 oligonucleotides were synthesized by conventional solid-phase chemistry using the resin-splitting approach, a process that was facilitated by Twist columns from Glen Research and the 391, 392 or 394 automated DNA synthesizers from Applied Biosystems. Each library was designed to contain 14 primers of different lengths, although this number may change accordingly with the number of peptide bonds comprising the encoded protein region to be explored. As shown in Figure 3, synthesis of the forward library started with the assembly of the 3’ flanking region (5’ aatgtatacatcatggcagac 3’) in column 1, in trityl-on mode, over 15 mg of 2000 Å controlled-pore glass support dC (from Chemgenes). This monomer was selected as the starting support because standard chemical synthesis is achieved in the 3’→5’ direction and dC is the first nucleoside located in the 3’ end of the target sequence. When the synthesis ended, the column was manually opened, and the support was resuspended in 200 µL of the viscous solvent ethylene glycol; 7 µL was transferred and preserved in the second empty synthesis column, column 2. Column 1 was closed, and the synthesis was resumed to incorporate the cac codon in the previous sequence 5’ aatgtatacatcatggcagac 3’ anchored on the support. When the synthesis ended, column 1 was opened again, the support was resuspended in 193 µL of ethylene glycol; 8 µL was transferred and preserved in column 2, together with the first removed sample. Column 1 was closed again, and the synthesis was resumed to incorporate the next codon tca into the growing oligonucleotide 5’ cacaatgtatacatcatggcagac 3’. When the synthesis ended, column 1 was opened, the support was resuspended in 185 µL of ethylene glycol and 9 µL was transferred and preserved in column 2. Column 1 was closed, and the synthesis resumed to incorporate the next codon aac to the growing sequence 5’ tcacacaatgtatacatcatggcagac 3’. The process of support removal and codon addition was repeated until the target DNA stretch was completed and column 2 was installed on the DNA synthesizer; the synthesis of all truncated primers was ended with the addition of the sequence 5’ tttcatttcccatatg 3’. An outline of the synthesis process and the complete sequence of oligonucleotides contained in this library is shown in Figure 3. Here, it is worth noting that after each cycle of codon addition, the removed fraction was increased to compensate for the stepwise loss of material due to the inefficient yield (98.5-99.5%) commonly observed during each nucleotide coupling. The reverse library was assembled in a similar mode and contained the sequences shown in Figure 3. Each library was synthesized in the dimethoxytritylated mode (Trityl-on) and purified on Poly-Pak cartridges, as recommended by the supplier (Glen Research Inc). Cloning. Both oligonucleotide libraries were used to amplify the circular GFP gene using Vent DNA polymerase for the PCR reaction under the following conditions. 1 cycle: 94°C – 3 min; 25 cycles: 94°C – 1 min, 58°C – 1 min, 72°C – 1 min; 1 cycle: 72°C – 5 min. Since the PCR generated a pool of products of similar size as visualized on the agarose gel, this DNA pool was cleaned up with a spin column using the EZ-10 PCR 30 purification kit as previously described. The pure product and the pJOQ cloning vector were independently double digested with the restriction enzymes NdeI/XhoI, purified by agarose gel electrophoresis and ligated with T4 DNA ligase. A tenth of the ligation reaction was used to transform strain E. Coli MC1061 by electroporation, and the transformants were plated on solid LB containing kanamycin at 25 µg/mL. The plates were incubated at 37°C for 20 h, yielding more than 10,000 colonies. Selection of functional and non-functional variants. For selection of fluorescent and non-fluorescent mutants, another fraction of the ligation reaction was electroporated as described above, but this time the transformants were distributed in 10 plates, incubated for 40 h at 25°C and refrigerated at 4°C for a long time. Visualization of the plates over a blue light transilluminator (dark reader transilluminator from Clare Chemical Research, Dolores, CO) over the following days allowed for selection of the most fluorescent mutants after one day of refrigeration, moderately fluorescent after 3 days, weakly fluorescent after 7 days, and non-fluorescent mutants after 15 days. Plasmid from each mutant was purified and sequenced to localize the site and nature of the new termini. Site-directed circularly permutated variants. The cpGFP variants N135M-G134, N135M-L137, I136M-N135, I136M-G138, H139M-E142, E142M-N144, Y143M-Y145, and S147M-S147 were constructed by site-directed mutagenesis using specific oligonucleotides as PCR primers and the circular GFP gene as template. All inserts were cloned in similar mode to the protein library previously described. Protein analysis by SDS-PAGE electrophoresis. The plasmid containing each interesting cpGFP, as well as GFP and a non-fluorescent GFP were retransformed into E. coli MC1061 cells, and two of the resulting 10

ACS Paragon Plus Environment

Page 11 of 24 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Combinatorial Science

colonies per variant were used to inoculate 5 mL LB/km in duplicate. Cultures were grown at 25°C for 40 h and stored at 4°C for four days. One more set of cultures was grown at 37°C for 20 h and stored at 4°C for four days like the other cultures. Subsequently, the cultures were normalized by the absorbance density at 600 nm and 100 µL of each were pelleted, dissolved in 10 µL of denaturing loading buffer and boiled for 5 min to constitute the total protein extract. On the other hand, 4 mL of each normalized culture were centrifuged at 13,000 rpm for 10 min, the pellet was washed with 1 mL PBS and resuspended in 250 µL of B-PER lysis solution (Thermo Scientific) for 60 min at room temperature. The samples were centrifuged at 13,000 rpm for 10 min, and the supernatants were transferred to clean Eppendorf tubes to constitute the soluble fraction. The pellet was washed with 1 mL of PBS to eliminate any trace of the soluble fraction and resuspended in 250 µL of reagent to solubilize the inclusion bodies (Thermo Scientific), constituting the insoluble fractions. Total extracts, soluble and insoluble fractions were analyzed by SDS-PAGE under denaturing conditions. The target protein bands were identified by western blot analysis using anti-GFP coupled to alkaline phosphatase (Genway) and revealed with BClP/NBT-blue liquid substrate (Sigma), generating dark-blue bands that were quantified by densitometric analysis using ImageJ software. Protein purification. For purification purposes, the genes encoding the parental GFP protein, as well as the cpGFPs L141M-Y145, L141M-Y143, L141M-K140, N146M-N146 and G138M-K140, were genetically modified to produce proteins derivatized with the poly-histidine tail -GTGGSHHHHHH at their C-terminus. The mutant Y145M-N144 originally reported by Baird et al. was constructed by site-directed mutagenesis using the gene L141M-Y145 as a template and derivatized with the same poly-histidine tail. All the proteins expressed in the E. coli MC1061 strain were produced in 3 L of LB/km cultures grown at 22°C/48 h under agitation at 200 rpm. After this period of incubation, the cultures were stored at 4°C for one week before starting the purification process. The cells were recovered by centrifugation and resuspended in 50 mL of B-PER reagent (Thermo Scientific) for 1 h at room temperature for the lysis process. To ensure complete lysis, 30 mL of PBS was added, and the resuspended cells were subjected to 3 pulses of sonication (2 min each with intermediate cooling on ice water). After centrifugation at 13,000 rpm, the clear supernatant was loaded on 5 mL of Ni-NTA resin equilibrated with 10 mM imidazole in 100 mM phosphate buffer, 0.5 M NaCl and packed on a disposable plastic column. After washing the column with 15 mL of the same buffer, the enriched protein was recovered with 10 mL of 300 mM imidazole and 0.5 M NaCl in phosphate buffer. Using 10 K centricons, the solution was concentrated, washed with PBS and loaded on two HisTrap HP/1 mL columns (GE Healthcare) connected in tandem. The proteins were purified by HPLC using an imidazole gradient from 30 mM to 300 mM over 30 min, with a flow rate of 0.75 ml/min and detection at 280 nm. Imidazole solutions were prepared in 100 mM phosphate buffer (pH 7.2) containing 0.5 M NaCl. Fractions containing the pure protein were combined, concentrated on centricons (amicon Ultra-4 10 k, Millipore) and washed with PBS (2 x 4 mL) to remove the residual imidazole. The pure proteins were diluted in 1.5 mL of PBS, and their concentrations were determined using the commercial BCA kit following the instructions recommended by the supplier (Sigma-Aldrich). Spectral measurements. Excitation and emission spectra of the crude and pure mutant proteins were recorded on a luminescence spectrometer, model LS55 from Perkin Elmer. The extinction coefficient (Ɛ) of pure proteins at their maximum absorbance peak was determined spectrophotometrically by measuring the absorbance of six duplicate dilutions. The plotting concentration (M) -1 -1 versus absorbance produced a linear graph with a slope of Ɛ in units M cm . Each of the dilutions was further diluted 25-fold, and the fluorescence was measured at the maximum emission wavelength with excitation at the maximum absorbance wavelength. The plotting absorbance versus fluorescence produced a linear graph with a particular slope. As a standard to determine the quantum yield (φ) of the mutants, we determined the relationship between absorbance (474 nm) and fluorescence (505 nm) for the pure parental protein GFP, with a reported 42 value of 0.76. Dividing the slope of the sample by that of GFP and multiplying by 0.76 gives the quantum yield of the sample. Melting temperature (Tm). Two milliliters of each purified protein at a 1 µM concentration in PBS was subjected to thermal denaturation directly on the fluorometer, increasing the temperature from 25°C to 90°C over 5 min, under continuous magnetic stirring and continuous measurement of the fluorescence emitted at 505 nm with excitation at 474 nm. The experiments were performed in duplicate, and the Tm values were determined by plotting fluorescence versus temperature. Tm was the temperature at which half the initial fluorescence had disappeared.

11

ACS Paragon Plus Environment

ACS Combinatorial Science

TABLES AND FIGURES Table 1. Sequenced circularly permuted GFP variants generated by CiPerGenesis in the region 134-148 of the parental protein rsGFP-F99S, selected by differential phenotypes. New C-terminus of cpGFPs L137 G138 H139 K140 L141 E142 Y143 N144 Y145 N146 S147

G134 N135 I136 New N-terminus of cpGFPs

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 24

N135M I136M L137M G138M H139M K140M L141M E142M Y143M N144M Y145M N146M S147M H148M

SD

SD SD

SD

1

1

2 2

2

1 2

1 2 1

SD 2 5

8

4

4

1 1

1 1

3

2

1

SD 1

SD 2

1

4

2

2 3

SD

Each cell represents a circular permutation starting at any amino acid between N135-H148 and ending between G134-S147. The sequence space is composed of only 196 variants. The original termini were linked with the hexapeptide GGSGGS. Color code: = strongly fluorescent, = moderately fluorescent, =weakly fluorescent, = non-fluorescent Standard cpPs starting at one amino acid and finishing at the precedent residue are crossed with the black line. cpPs found below the line correspond to variants with terminal truncations at the new C-termini, whereas those found above the line correspond to variants with terminal repeats. The number in the cell represents the number of times each variant was found during screening of the library created with CiPerGenesis. SD means a mutant created by site-directed mutagenesis. Table 2. Fluorescent properties of some purified cpGFPs compared with those of the parental protein rsGFPF99S. Sample

a

Yield (mg/L)

b

Ɛ (M-1cm-1)

c

rsGFP-F99S L141M-K140 L141M-Y143 L141M-Y145 Y145M-N144

5.951 3.717 6.276 2.363 2.333

N146M-N146

1.870

G138M-K140

3.766

25668 ± 24 20483 ± 165 15118 ± 156 13584 ± 162 11836 ± 157 3822 ± 317e 10762 ± 62 5273 ± 15e 7700 ± 23

0.760 ± 0.010 0.788 ± 0.021 0.759 ± 0.003 0.700 ± 0.002 0.187 ± 0.004 0.128 ± 0.008e 0.339 ± 0.003 0.165 ± 0.005e 0.732 ± 0.030

ɸ

Matured fraction (%) 100 79.8 58.9 52.9 61.0

d

A474/A280

2.059 1.791 1.234 1.136 1.245

Matured fraction (%) 100 86.9 59.9 55.2 60.4

72.5 ± 0.5 59.6 ± 0.5 52.8 ± 0.7 50.5 ± 0.4 60.5 ± 1.0

62.5

1.451

70.4

56.1 ± 0.5

30.0

0.597

29.0

57.5 ± 0.7

a

Tm (°C)

b

The protein concentration was measured by the BCA method after purification by HPLC. Calculated by calibration curves using six dilutions of the purified proteins and recording the absorbance at 474 nm. c Experiments were performed in duplicate. Considering 0.76 as the quantum yield reported in the literature for 42 d e the protein sg25. Ratio determined with samples diluted in buffer pH 11. Value calculated at 400 nm.

12

ACS Paragon Plus Environment

Page 13 of 24 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Combinatorial Science

Figure 1. Outline of the CiPerGenesis method. The wild-type gene encodes the native protein, GFP in this example, starting and ending at its original termini. However, if the gene is circularized and used as template to amplify some DNA primers designed to open contiguous peptide bonds localized on a target region of the encoded protein, then a library of circularly permuted proteins is generated displaying new random termini. The oligonucleotides are represented by the blue arrows, and the method was thoroughly tested in the region G134H148 between beta-strands 6 and 7, represented by wide arrows. The chromophore-containing alfa-helix is represented by a cylinder and the chromophore with a green star.

13

ACS Paragon Plus Environment

ACS Combinatorial Science 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 24

Figure 2. Amino acid regions of the protein rsGFP-F99S subjected to CiPerGenesis using the crystallographic structure of sg25 as a model (PDB: 1emk). a) Schematic representation of the secondary structural elements found in the protein. The regions subjected to circular permutation are labeled in cyan: V120-G134, green: G134-H148, blue: H148-K162 and orange: K162-V176. b) The same regions are indicated in the tridimensional structure. CiPerGenesis was thoroughly tested in the region G134-H148. The original termini were linked with the hexapeptide GGSGGS. Leu141 and the chromophore are shown as sticks. c) Nucleotide sequence encoding the amino acid region I128-D154, which includes the stretch from G134-H148. D) Nomenclature employed to identify the cpGFP variants generated with CiPerGenesis.

14

ACS Paragon Plus Environment

Page 15 of 24 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Combinatorial Science

Figure 3. Resin-splitting procedure for generation of the libraries of oligonucleotides truncated at the codon level. The solid-phase chemical synthesis occurs from the 3’ end toward the 5’ end, removing a small portion of resin every three nucleotides at the position of the inverted arrows. The removed fractions are stored together on a second column. Once all fractions have been removed, the synthesis continues in the second column. Inset: oligonucleotide libraries analyzed by PAGE under denaturing conditions. Lane 1: standard containing the shortest and longest oligonucleotides that can be found in the forward library. Lane 2: library of forward truncated oligonucleotides. Lane 3: library of reverse truncated oligonucleotides. Lane 4: standard containing the shortest and longest oligonucleotides that can be found in the reverse set. The gel was stained with methylene blue for visualization of the DNA bands.

15

ACS Paragon Plus Environment

ACS Combinatorial Science 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 24

Figure 4. Maturation of some selected fluorescent cpGFP variants. Two petri dishes were streaked with E. coli colonies expressing the following variants: 1) parental GFP, 2) GFPnoFl, 3) L141M-Y143, 4) G138M-G138, 5) L141M-K140, 6) N146M-N146, 7) L141M-E142, 8) L141M-L141, 9) Y145M-Y143, 10) L141M-H139, 11) G138M-K140, 12) L141M-Y145, 13) N144M-Y143, 14) Y145M-K140. One dish was incubated at 25°C for 40 h, whereas the other was incubated at 37°C for 20 h. Next, both dishes were excited with blue light and photographed through an orange filter. Both dishes were refrigerated at 4°C, and the maturing process was documented after 1, 2, 4 and 7 days.

16

ACS Paragon Plus Environment

Page 17 of 24 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Combinatorial Science

Figure 5. Streaks of E. coli cells expressing cpGFPs containing different terminal amino acid repeats at the new C-terminus. The petri dish was incubated 40 h at 25°C and then stored for 7 days at 4°C.

17

ACS Paragon Plus Environment

ACS Combinatorial Science 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 24

Figure 6. SDS-PAGE analysis of some cpGFP variants generated by CiPerGenesis. The GFP-containing bands were identified by western blotting using anti-GFP coupled to alkaline-phosphatase. a) Total extract of proteins expressed at 37°C/20 h and stored for 4 days at 4°C. b) Total extract of proteins expressed at 25°C/40 h and stored for 4 days at 4°C. c) Soluble fractions of proteins expressed at 25°C/40 h and stored for 4 days at 4°C. d) Insoluble fractions of proteins expressed at 25°C/40 h and stored 4 days at 4°C. Samples: M) Protein Ladder in KDa, 1) GFPnoFl, 2) parental GFP 3) L141M-Y143, 4) G138M-G138, 5) L141M-K140, 6) N146M-N146, 7) L141M-E142, 8) L141M-L141, 9) Y145M-Y143, 10) L141M-H139, 11) G138M-K140, 12) L141M-Y145, 13) Y145M-K140, 14) N144M-Y143.

18

ACS Paragon Plus Environment

Page 19 of 24 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Combinatorial Science

Supporting information Figure S1. Densitometric analysis performed using ImageJ of the gels shown in Figure 6. a) Total extract of proteins expressed at 37°C/20 h and stored for 4 days at 4°C. b) Total extract of proteins expressed at 25°C/40 h and stored for 4 days at 4°C. c) Soluble fractions of proteins expressed at 25°C/40 h and stored for 4 days at 4°C. d) Insoluble fractions of proteins expressed at 25°C/40 h and stored for 4 days at 4°C. Samples: M) Protein Ladder in KDa, 1) GFPnoFl, 2) parental GFP 3) L141M-Y143, 4) G138M-G138, 5) L141M-K140, 6) N146MN146, 7) L141M-E142, 8) L141M-L141, 9) Y145M-Y143, 10) L141M-H139, 11) G138M-K140, 12) L141M-Y145, 13) Y145M-K140, 14) N144M-Y143. Figure S2. Excitation and emission spectra of some fluorescent circularly permuted GFPs created with CiPerGenesis and analyzed in crude extracts. Figure S3. Absorbance spectra of pure cpGFP variants subjected to different pH values and compared with the parental GFP protein. Figure S4. Maturation of some fluorescent cpGFP variants found in regions H148-K162 and K162-V176. Corresponding Author *Paul Gaytán. [email protected] Author Contributions All authors contributed significantly to performance of the experiments, especially ARS. PG performed most of the critical experiments, conceived the idea, supervised the project and wrote the manuscript. Funding Sources The present research was financially supported by the Core Facility of Oligonucleotide Synthesis and DNA Sequencing of the Institute of Biotechnology, belonging to the National Autonomous University of Mexico, with revenues obtained from the technical services. ACKNOWLEDGMENTS Technical assistance provided by Eugenio López-Bustos, Santiago Becerra-Ramírez, Ana Yanci Alarcón, Leopoldo Güereca and Humberto Flores-Soto is highly appreciated. We are especially grateful to Dr. Gloria Saab for the preliminary revision of our manuscript. ABBREVIATIONS GFP, green fluorescent protein; rsGFP, redshifted GFP; EGFP, enhanced GFP; cpGFP, circularly permuted GFP; cpP, circularly permuted protein; SDS-PAGE, polyacrylamide gel electrophoresis using sodium dodecyl sulfate; LB, Luria-Bertani broth; dNTPs, deoxynucleoside triphosphates; PCR, polymerase chain reaction; BPER, bacterial protein extraction reagent; PBS, phosphate buffered saline; BClP/NBT, 5-bromo-4-chloro-3indolyl phosphate (BCIP) and nitro blue tetrazolium (NBT); HPLC, high-performance liquid chromatography. REFERENCES

(1) (2) (3) (4) (5)

Topell, S.; Glockshuber, R. Circular permutation of the green fluorescent protein. Methods Mol Biol 2002, 183, 31-48. Shortle, D.; Sondek, J. The emerging role of insertions and deletions in protein engineering. Curr Opin Biotechnol 1995, 6 (4), 387-393. Baneyx, F.; Mujacic, M. Recombinant protein folding and misfolding in Escherichia coli. Nat Biotechnol 2004, 22 (11), 1399-1408. Yu, Y.; Lutz, S. Circular permutation: a different way to engineer enzyme structure and function. Trends Biotechnol 2011, 29 (1), 18-25. Akemann, W.; Raj, C. D.; Knopfel, T. Functional characterization of permuted enhanced green fluorescent proteins comprising varying linker peptides. Photochem Photobiol 2001, 74 (2), 356-363. 19

ACS Paragon Plus Environment

ACS Combinatorial Science 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(6) (7) (8)

(9)

(10) (11)

(12)

(13) (14) (15)

(16)

(17) (18)

(19)

(20) (21)

(22) (23)

Page 20 of 24

Higgins, S. A.; Savage, D. F. Protein Science by DNA Sequencing: How Advances in Molecular Biology Are Accelerating Biochemistry. Biochemistry 2018, 57 (1), 38-46. Nagai, T.; Sawano, A.; Park, E. S.; Miyawaki, A. Circularly permuted green fluorescent proteins engineered to sense Ca2+. Proc Natl Acad Sci U S A 2001, 98 (6), 3197-3202. Whitehead, T. A.; Bergeron, L. M.; Clark, D. S. Tying up the loose ends: circular permutation decreases the proteolytic susceptibility of recombinant proteins. Protein Eng Des Sel 2009, 22 (10), 607-613. Guntas, G.; Kanwar, M.; Ostermeier, M. Circular permutation in the Omega-loop of TEM-1 beta-lactamase results in improved activity and altered substrate specificity. PLoS One 2012, 7 (4), e35998. Qian, Z.; Lutz, S. Improving the catalytic activity of Candida antarctica lipase B by circular permutation. J Am Chem Soc 2005, 127 (39), 13466-13467. Daugherty, A. B.; Govindarajan, S.; Lutz, S. Improved biocatalysts from a synthetic circular permutation library of the flavin-dependent oxidoreductase old yellow enzyme. J Am Chem Soc 2013, 135 (38), 14425-14432. Graf, R.; Schachman, H. K. Random circular permutation of genes and expressed polypeptide chains: application of the method to the catalytic chains of aspartate transcarbamoylase. Proc Natl Acad Sci U S A 1996, 93 (21), 11591-11596. Baird, G. S.; Zacharias, D. A.; Tsien, R. Y. Circular permutation and receptor insertion within green fluorescent proteins. Proc Natl Acad Sci U S A 1999, 96 (20), 11241-12246. Topell, S.; Hennecke, J.; Glockshuber, R. Circularly permuted variants of the green fluorescent protein. FEBS Lett 1999, 457 (2), 283-289. Flores-Ramirez, G.; Rivera, M.; Morales-Pablos, A.; Osuna, J.; Soberon, X.; Gaytan, P. The effect of amino acid deletions and substitutions in the longest loop of GFP. BMC Chem Biol 2007, 7, 1. Li, Y.; Sierra, A. M.; Ai, H. W.; Campbell, R. E. Identification of sites within a monomeric red fluorescent protein that tolerate peptide insertion and testing of corresponding circular permutations. Photochem Photobiol 2008, 84 (1), 111-119. Mehta, M. M.; Liu, S.; Silberg, J. J. A transposase strategy for creating libraries of circularly permuted proteins. Nucleic Acids Res 2012, 40 (9), e71. Jones, A. M.; Atkinson, J. T.; Silberg, J. J. PERMutation Using Transposase Engineering (PERMUTE): A Simple Approach for Constructing Circularly Permuted Protein Libraries. Methods Mol Biol 2017, 1498, 295-308. Jones, A. M.; Mehta, M. M.; Thomas, E. E.; Atkinson, J. T.; Segall-Shapiro, T. H.; Liu, S.; Silberg, J. J. The Structure of a Thermophilic Kinase Shapes Fitness upon Random Circular Permutation. Acs Synth Biol 2016, 5 (5), 415-425. Pierre, B.; Shah, V.; Xiao, J.; Kim, J. R. Construction of a random circular permutation library using an engineered transposon. Analytical Biochemistry 2015, 474, 16-24. Haapa-Paananen, S.; Rita, H.; Savilahti, H. DNA transposition of bacteriophage Mu - A quantitative analysis of target site selection in vitro. Journal of Biological Chemistry 2002, 277 (4), 2843-2851. Carlson, H. J.; Cotton, D. W.; Campbell, R. E. Circularly permuted monomeric red fluorescent proteins with new termini in the beta-sheet. Protein Sci 2010, 19 (8), 1490-1499. Shui, B.; Wang, Q.; Lee, F.; Byrnes, L. J.; Chudakov, D. M.; Lukyanov, S. A.; Sondermann, H.; Kotlikoff, M. I. Circular permutation of red fluorescent proteins. PLoS One 2011, 6 (5), e20505. 20

ACS Paragon Plus Environment

Page 21 of 24 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(24)

(25)

(26)

(27) (28) (29) (30)

(31) (32)

(33) (34) (35) (36) (37)

(38)

(39)

(40)

(41)

ACS Combinatorial Science

Chiang, J. J. H.; Li, I.; Truong, K. Creation of circularly permutated yellow fluorescent proteins using fluorescence screening and a tandem fusion template. Biotechnol Lett 2006, 28 (7), 471475. Fischereder, E.; Pressnitz, D.; Kroutil, W.; Lutz, S. Engineering strictosidine synthase: rational design of a small, focused circular permutation library of the beta-propeller fold enzyme. Bioorg Med Chem 2014, 22 (20), 5633-5637. Noda-Garcia, L.; Juarez-Vazquez, A. L.; Avila-Arcos, M. C.; Verduzco-Castro, E. A.; Montero-Moran, G.; Gaytan, P.; Carrillo-Tripp, M.; Barona-Gomez, F. Insights into the evolution of enzyme substrate promiscuity after the discovery of (betaalpha)(8) isomerase evolutionary intermediates from a diverse metagenome. BMC Evol Biol 2015, 15, 107. Zapata-Hommer, O.; Griesbeck, O. Efficiently folding and circularly permuted variants of the Sapphire mutant of GFP. BMC Biotechnol 2003, 3, 5. Glaser, S. M.; Yelton, D. E.; Huse, W. D. Antibody engineering by codon-based mutagenesis in a filamentous phage vector system. J Immunol 1992, 149 (12), 3903-3913. Pues, H.; Holz, B.; Weinhold, E. Construction of a deletion library using a mixture of 5'truncated primers for inverse PCR (IPCR). Nucleic Acids Res 1997, 25 (6), 1303-1304. Rodriguez-Mejia, J. L.; Roldan-Salgado, A.; Osuna, J.; Merino, E.; Gaytan, P. A Codon Deletion at the Beginning of Green Fluorescent Protein Genes Enhances Protein Expression. J Mol Microbiol Biotechnol 2017, 27 (1), 1-10. Bosley, A. D.; Ostermeier, M. Mathematical expressions useful in the construction, description and evaluation of protein libraries. Biomol Eng 2005, 22 (1-3), 57-61. Li, X.; Zhang, G.; Ngo, N.; Zhao, X.; Kain, S. R.; Huang, C. C. Deletions of the Aequorea victoria green fluorescent protein define the minimal domain required for fluorescence. J Biol Chem 1997, 272 (45), 28545-28549. Kudla, G.; Murray, A. W.; Tollervey, D.; Plotkin, J. B. Coding-sequence determinants of gene expression in Escherichia coli. Science 2009, 324 (5924), 255-258. Heinemann, U.; Hahn, M. Circular permutation of polypeptide chains: implications for protein folding and stability. Prog Biophys Mol Biol 1995, 64 (2-3), 121-143. Cormack, B. P.; Valdivia, R. H.; Falkow, S. FACS-optimized mutants of the green fluorescent protein (GFP). Gene 1996, 173 (1 Spec No), 33-38. Jung, S.; Pluckthun, A. Improving in vivo folding and stability of a single-chain Fv antibody fragment by loop grafting. Protein Eng 1997, 10 (8), 959-966. Arnau, J.; Lauritzen, C.; Petersen, G. E.; Pedersen, J. Current strategies for the use of affinity tags and tag removal for the purification of recombinant proteins. Protein Expres Purif 2006, 48 (1), 1-13. Hennecke, J.; Sebbel, P.; Glockshuber, R. Random circular permutation of DsbA reveals segments that are essential for protein folding and stability. J Mol Biol 1999, 286 (4), 11971215. LeProust, E. M.; Peck, B. J.; Spirin, K.; McCuen, H. B.; Moore, B.; Namsaraev, E.; Caruthers, M. H. Synthesis of high-quality libraries of long (150mer) oligonucleotides by a novel depurination controlled process. Nucleic Acids Research 2010, 38 (8), 2522-2540. Gaytan, P.; Yanez, J.; Sanchez, F.; Soberon, X. Orthogonal combinatorial mutagenesis: a codon-level combinatorial mutagenesis method useful for low multiplicity and amino acidscanning protocols. Nucleic Acids Res 2001, 29 (3), E9. Gaytan, P.; Roldan-Salgado, A. Elimination of Redundant and Stop Codons during the Chemical Synthesis of Degenerate Oligonucleotides. Combinatorial Testing on the 21

ACS Paragon Plus Environment

ACS Combinatorial Science 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(42)

Page 22 of 24

Chromophore Region of the Red Fluorescent Protein mKate. Acs Synth Biol 2013, 2 (8), 453462. Palm, G. J.; Zdanov, A.; Gaitanaris, G. A.; Stauber, R.; Pavlakis, G. N.; Wlodawer, A. The structural basis for spectral variations in green fluorescent protein. Nature Structural Biology 1997, 4 (5), 361-365.

22

ACS Paragon Plus Environment

Page 23 of 24 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Combinatorial Science

74x38mm (300 x 300 DPI)

ACS Paragon Plus Environment

ACS Combinatorial Science Page 24 of 24 1 2 3 4 5

ACS Paragon Plus Environment