'Site and Mutation'-Specific Predictions Enable Minimal Directed

May 21, 2018 - 'Site and Mutation'-Specific Predictions Enable Minimal Directed Evolution Libraries ... we found that the methods used produced librar...
1 downloads 0 Views 2MB Size
Research Article Cite This: ACS Synth. Biol. 2018, 7, 1730−1741

pubs.acs.org/synthbio

“Site and Mutation”-Specific Predictions Enable Minimal Directed Evolution Libraries Jeffrey C Moore,*,† Agustina Rodriguez-Granillo,‡ Alejandro Crespo,‡,∞ Sridhar Govindarajan,§ Mark Welch,§ Kaori Hiraga,∥ Katrina Lexa,‡,# Nicholas Marshall,∥ and Matthew D. Truppo⊥ Biocatalysis, Biochemical Engineering and Structure, ‡Modeling and Informatics, ∥Protein Engineering, Biochemical Engineering and Structure, and ⊥Biochemical Engineering and Structure, MRL, Merck & Co., Inc., P.O. Box 2000, Rahway, New Jersey 07065, United States § ATUM, 37950 Central Court, Newark, California 94560, United States Downloaded via ST FRANCIS XAVIER UNIV on August 21, 2018 at 05:49:58 (UTC). See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.



S Supporting Information *

ABSTRACT: Directed evolution experiments designed to improve the activity of a biocatalyst have increased in sophistication from the early days of completely random mutagenesis. Sequence-based and structure-based methods have been developed to identify “hotspot” positions that when randomized provide a higher frequency of beneficial mutations that improve activity. These focused mutagenesis methods reduce library sizes and therefore reduce screening burden, accelerating the rate of finding improved enzymes. Looking for further acceleration in finding improved enzymes, we investigated whether two existing methods, one sequencebased (Protein GPS) and one structure-based (using Bioluminate and MOE), were sufficiently predictive to provide not just the hotspot position, but also the amino acid substitution that improved activity at that position. By limiting the libraries to variants that contained only specific amino acid substitutions, library sizes were kept to less than 100 variants. For an initial round of ATA-117 R-selective transaminase evolution, we found that the methods used produced libraries where 9% and 18% of the amino acid substitutions chosen were amino acids that improved reaction performance in lysates. The ability to create combinations of mutations as part of the initial design was confounded by the relatively large number of predicted mutations that were inactivating (30% and 45% for the sequence-based and structure-based methods, respectively). Despite this, combining several mutations identified within a given method produced variant lysates 7- and 9-fold more active than the wild-type lysate, highlighting the capability of mutations chosen this way to generate large advances in activity in addition to the reductions in screening. KEYWORDS: directed evolution, in silico mutagenesis, R-specific transaminase ATA-117, Bioluminate, MOE, Protein GPS

D

screening efforts. Being able to move activity and selectivity quickly in early rounds of directed evolution experiments by increasing the magnitude of improvement, decreasing the time of each round, or both is critically important to delivering on the promise of industrial biocatalysis.9 When the method for mutation selection is well suited to the desired property improvement, such as the FRESCO method for stability, large changes in performance can be made in extremely small libraries.10 The strategies for generating small, but smart libraries for improved activity focus generally on two fundamental approaches to identify hotspots. The first uses the large collections of sequence information in enzyme sequence databases to identify and catalog natural diversity under the presumption that natural evolution has sampled a significant

irected evolution has become a foundational technology for improving a wide variety of enzyme properties, of which activity and selectivity are the two most notable for industrial biocatalysis. The technology was initiated in the early 1990s as a nature-inspired approach to improve enzyme function through random mutation of the protein’s amino acid sequence and screening for desired effect.1 Over the intervening years, directed evolution has become more sophisticated,2 such that today, several strategies exist that enable practitioners to reduce the randomness of the initial approach by making educated guesses as to the location of mutations likely to be responsible for improvement of activity or selectivity, which has been extensively reviewed.3−8 By limiting the random mutations to a subset of well-chosen amino acid residues (often called “hotspots”) rather than the entire protein, the pool of variant sequences decreases dramatically, reducing the number of screening samples required to identify improved enzymes, giving rise to the name “small, but smart” libraries and accelerating evolution projects often limited by © 2018 American Chemical Society

Received: October 9, 2017 Published: May 21, 2018 1730

DOI: 10.1021/acssynbio.7b00359 ACS Synth. Biol. 2018, 7, 1730−1741

Research Article

ACS Synthetic Biology

ment of sequence alignment analyses (3DM, ATUM (formerly DNA2.0), HotSpot Wizard) alongside the expansion of sequences in the databases, and with improved structural computational capabilities. Here we reinvestigated this initial round of sitagliptin evolution using “site directed, specif ic substitution” capabilities by examining two existing hotspot prediction methodologies representative of the sequence-based and structurally based strategies that also provide amino acid substitution predictions and testing whether the predicted substitutions improve the activity of the enzyme. We evaluated both total and specific activity of variant sequences to validate the predictions and to maximally move the activity of the catalyst produced. At this early stage of an evolution effort, moving the activity forward is paramount, and as a result activity improvements from a variety of mechanisms (expression, stability if it improves activity, reduced inhibition, etc.) are valuable.

amount of mutation space and identified the critical residues, their key interactions with substrates and each other, and that the prediction of hotspots can be inferred from an analysis of these data.5,6,11,12 The second approach uses 3-dimensional protein structural information combined with an understanding of the substrate, transition state, or product binding within the enzyme active site to identify residues likely to control reaction progress.6,7,13 A recent advancement highlights the combination of tools from both strategies into a single software package, which should generate more accurate predictions of hotspots in the future.14 With few exceptions,7,15,16 small, but smart directed evolution experiments have relied on site-saturation mutagenesis as a “site directed, randomized substitution” approach to incorporate random diversity at the hotspot positions chosen and screened to identify the best substitution for the chosen position. A number of degenerate codon sets have also been developed to limit bias and control redundancy in the genetic code and in some cases to reduce the overall amino acids sampled to keep the screening effort to a minimum.4,11 These and other strategies for generating “small, but smart” libraries narrow search space for improved variants from many thousands to several hundreds of variants screened.7,17−20 Although the reduced degree of randomness in the pool of variants construction limits the screening burden, it still creates theoretically unnecessary screening. Screening could be maximally accelerated if the correct amino acid substitution could also be inferred from the hotspot analysis. We were interested in understanding how close current methodologies are to correctly identifying valuable amino acid substitutions in addition to hotspot positions. Small, but smart libraries were used in what was then (ca. 2008) the state-of-the-art directed evolution effort of a transaminase to synthesize sitagliptin, including the initial round of substrate walking that was designed to bridge the substrate scope of the (R)-selective transaminase ATA-117 with the desired sitagliptin molecule.21 This initial round screened for activity improvements against a truncated ketone analogue of sitagliptin (1-(3-(trifluoromethyl)-5,6-dihydro-[1,2,4]triazolo[4,3-a]pyrazin-7(8H)-yl)butane-1,3-dione, abbreviated THTP-BDO, Scheme 1) and examined site-saturation mutagenesis to randomize the amino acids at 12 positions, resulting in the identification 8 positions where at least one amino acid substitution improved the reaction rate. Since this work was done, predictive tools have improved with continued develop-



RESULTS AND DISCUSSION The sequence-based strategy used multiple sequence alignments of 1250 homologues of the ATA-117 gene to identify ∼1900 natural variations as diversely potentially useful for substitution. Using ATUM’s Protein GPS Engineering technology as described in the Methods and Materials section, 66 unique mutations (Table 4) were prioritized for evaluation and arranged into 95 sequences. One of the advantages of this small library approach is that the relatively few number of samples allows for more detailed data collection to provide higher quality data overall. In this case, it allowed for quantitation of enzyme expression in each lysate, providing information about a significant source of variability and enabling access to higher quality specific activity data. Under the screening conditions in which the substrate concentration is much less than the KM, the specific activity data is a scaled measurement of an apparent catalytic efficiency (kcat/KM)app for this reaction. This sequence-based Round 1 (R1) library was arranged with three mutations per sequence, allowing each mutation to be observed in five different variants (Supporting Information, Table S1), under the premise that the prescreening of these mutations in homologous sequences in Nature would provide a high degree of tolerability in the resulting variant set. Screening this sequenced-based R1 library (Figure S1A) identified three lysates with more activity than the wild-type lysate, including the best variant comprising mutations I84L, F122Y, and S223P that was 4.6-fold improved in total activity, 3.9-fold improved in specific activity and 1.2fold improved in expression (Figure S1B, Table S2). Twelve additional variants had specific activities between 1.2-fold and 2.2-fold improved relative to wild type, although expression of these variants was sufficiently poor that total activity was reduced relative to the wild-type lysate. The premise that the prescreening of substitutions in Nature would provide a majority of well tolerated substitutions and ensure reasonable activity in the collection did not hold for this library. This is a direct result of having every mutation seen in five variantsif a given mutation is deleterious, five inactive variants are expected. Sixty-three of the 95 variant sequences displayed activity below the threshold of detection and were deemed inactive. Of these, 39 sequences had low expression, with levels 0−10% of the wild-type transaminase as measured by gel electrophoresis (Figure S1B). When further examined, 20 substitutions (∼30%) were not present in any active sequence; these were presumed responsible for inactivating the

Scheme 1. Truncated Sitagliptin Ketone (THTP-BDO) to Amine Reaction, Shown with Pyruvate as the Amine Donor and Lactate Dehydrogenase, NAD, Glucose and Glucose Dehydrogenase To Shift Irreversibly the Equilibrium to Product Formation

1731

DOI: 10.1021/acssynbio.7b00359 ACS Synth. Biol. 2018, 7, 1730−1741

Research Article

ACS Synthetic Biology

Figure 1. (A) Fold improvement over wild-type activity obtained by screening the sequence-based R2 library. The wild-type variant is highlighted in white. (B) The same FIO WT activity data plotted against the associated expression data. WT variant is circled. The diagonal line represents the activity expected from the wild type across the range of expression levels seen in the screen. Variants above this line represent specific activity improvements, while variants with a FIO WT activity greater than 1.0 represent total activity improvements.

Figure 2. Calculated contribution of each single-point mutation for the sequence-based R2 library using the Solver routine of Excel. Dark bars reflect total activity contributions, white bars reflect specific activity contributions. Bold line indicates WT amino acid activity.

(Figure 1A), with only 17 sequences (vs 63 in the first case) showing nondetectable activity, and no mutations showing up only in inactive sequences. Not surprisingly, a large proportion of sequences have specific activities greater than wild type because the mutation(s) responsible for the improved variants in the first round were incorporated into more sequences. Although this round has greater scatter in protein expression (Figure 1B), this had little effect on the ranking of activities as the top five variants in total and specific activity were identical, with only a slight rank ordering difference between them (Table S4). Two improved variants have activities above the R1 best variant; the first at 4.7 fold improved specific activity containing all three mutations from the R1 variant (I84L, F122Y, and S223P) and additionally A91C, A114R, and V125L, and the second at 4.5 fold improved specific activity containing two mutations from the R1 variant (F122Y, S223P), and the two additional mutations Y60F and V235L. Both of these variants additionally benefit from improvements in expression,

variant sequences. Unfortunately, the small number of sequences with measurable activity (32) relative to the number of mutations introduced (66) made deconvolution of the impact of individual mutations unfeasible. To fully evaluate the original amino acid analysis, 26 mutations responsible for inactive or nonexpressing variants were dropped from the original 66, and the remaining 40 mutations were rearrayed in 95 sequences at 2−6 substitutions per sequence (sequence-based R2). No new mutations were introduced in the sequence-based R2. The resynthesis of the genes in this collection also allowed for a reweighting of mutations, such that mutations appearing in the three sequences that produced more activity than wild type in R1 were more heavily weighted in this second library, appearing in more than 10 sequences each (Table S3).22 This library did not attempt to compile any clones that combined only the best mutations from R1, but modulated the frequencies of aminoacid changes seen in the library to provide an overall improved library. This R2 library performed much better than the original 1732

DOI: 10.1021/acssynbio.7b00359 ACS Synth. Biol. 2018, 7, 1730−1741

Research Article

ACS Synthetic Biology

and binding free energy scores using both Bioluminate and MOE computational programs as described in the Methods and Materials section. The enzyme is known to be (R)-selective, and the aldimine intermediate allowed the modeling of the correct chirality in the enzyme active site. We believed the catalytic machinery of the transaminase was sufficiently active on preferred substrates based on a previous use of ATA-117,23 and that the THTP-BDO substrate simply did not fit well in the active site and was therefore poorly accepted. Thus, optimizing the intermediate binding should be a reasonable predictor of activity improvement, justifying the use of binding free energy score calculations to predict improvement. We also reasoned new mutations should not destabilize the enzyme structure, which justified using improvements in stability scores to rank variants. All substitutions chosen were calculated to be improved in both binding free energy and in stability Figure S2A,B). Because a large number of predicted substitutions occupied the improved binding and improved stability space (Figure S2C,D), we chose to sample two groups of mutations for improved activity: those that ranked highest in binding energy and those that ranked highest in stability. In addition, preferential treatment was given to residues within 5 Å of the substrate, making sure that at least one amino acid substitution was represented at each of these residues. The 93 mutations selected comprising structure-based R1 and the summary of their selection are shown in Table 5 (Methods and Materials section); the screening results are shown in Figure 4A. In this single mutation library, 17 of 93 lysates demonstrated greater than 20% improvement over the wild-type lysate (18%, or 1 in 5.5); 15 of 93 variants met this criteria for improved specific activity (Table 1). H62Y and F122Y were the top two variants in both specific activity (5.7- and 1.8-fold improved respectively) and in total lysate activity (5.3- and 2.2-fold improved respectively). Keeping the library as single substitution sequences proved a correct choice as 42 single mutations (45%) were below the limit of detection in the assay and were deemed inactive, although only a few variants were poorly expressed (10 variants with levels 0−10% of WT, Figure 4B).

providing another 1.5 fold improvement over the wild-type lysate activity. If we assume that the effects of each mutation are additive, a mathematical model can quickly determine the impact of each individual mutation (the calculation details are given in the SI and results are shown in Figure 2). Deconvolution of the sequence-based R2 total activity data (Figure 2, black bars) indicated that six mutations of the original 66 are greater than 20% improved over wild type, corresponding to 9% (1 in 11) of the original library. The mutations in order of degree of improvement are S223P, F122Y, Y60F, T282S, V69G, and A91C. Similar deconvolution of the specific activity (Figure 2, white bars) identified the same residues, and in addition highlighted three more: V227I, V278I, and C281T. Although V278I and C281T were the third and fourth best specific activity substitutions according to the deconvolution, they did not appear in any improved sequences. They also have the largest difference between low total activity and high specific activity, suggesting that these residues have significant negative impact on enzyme expression, and might need to be installed in a construct that is already highly expressed. These are mutations that would be revisited in future rounds of evolution. The structure-based substitutions were identified in silico by making every possible single amino acid substitution in an enzyme crystal structure containing the docked PLP aldimine intermediate (Figure 3), followed by ranking based on stability

Figure 3. PLP-aldimine of THTP-BDO, the transaminase reaction intermediate that was docked into transaminase structure for computational calculations.

Figure 4. (A) FIO WT conversion obtained by screening the structure-based R1 library. WT variants are highlighted in white, all sequences provided in Table S5. (B) The same conversion data plotted against the FIO WT protein expression level with WT variants circled. The diagonal line is the theoretically expected data of wild type across the entire range of enzyme concentrations. Data above this line are specific activity improvements; data over a FIOP WT activity of 1.0 represent total activity improvements. 1733

DOI: 10.1021/acssynbio.7b00359 ACS Synth. Biol. 2018, 7, 1730−1741

Research Article

ACS Synthetic Biology

mutations regardless of their relative scores with respect to the entire collection of possible mutations. The evaluation of the computationally derived mutations was completed by confirming that the single mutations identified in the structure-based R1 library would successfully work in combination. Therefore, the top six positions containing the top seven substitutions plus two additional substitutions (improved variants at the same six positions, highlighted with an asterisk in Table 1) were combined in a PCR-based synthesis of all 192 possible combinations of mutations, followed by pooling the resulting DNA, cloning, transformation, and plating. Seven 96 well plates were picked from the transformants and screened. The 30 variants showing 1.5× or more improvement relative to the H62Y parent from each plate were collected into one plate (structure-based R2), sequenced, and rescreened in triplicate using the WT as reference (Figure S4, sequences in Table S6). The best resulting variant lysate was 9-fold improved in total activity and contained three mutations (T13R/H62Y/F122Y). Deconvolution of these data is provided in Figure S5. Interestingly, when arrayed in combination H62Y is still the most improved mutation, but F122Y and F190Y are no longer second and third. T13R is a better performer than T13H and as expected, both are preferred to the wild-type amino acid. However, the mutations of G136 have completely inverted in order of improvement, with G136Y being the best and G136N providing a slight negative contribution to activity. The best variants along both evolution paths were scaled up and kinetically characterized both in lysates and in purified form in order to understand the nature of the improvements. The solubility of the substrate in the reaction system was approximately 100 mM, which is well below the apparent KM of the wild-type enzyme, making accurate predictions of discrete values for both kcat and KM scientifically unsound, and as a result only the catalytic efficiency (kcat/KM) is generally reported. The lysates were prepared using the same chemical lysis method as the screening samples and transaminase protein content measured by densitometry of stained SDS-PAGE gels. The lysate data from the scaled-up fermentations agrees generally with the fold improvement trends in the original screening data (Table 2), although the wild-type enzyme was less active, resulting in larger than expected fold improvements. The purified enzyme variants were created using a mechanical lysis method and then chromatographed to >90% purity. The catalytic efficiencies between the pairs of lysate and purified protein samples are generally in good agreement except for the purified wild-type enzyme, which is 5.4-fold more active than expected based on the lysate data. Repeating the lysate preparations and purifications reproduced this result. Mechanically lysed wild-type cultures were not similarly negatively

Table 1. Mutation Analysis for the Structure-Based R1 Library (Variants with FIO WT > 1.2 Only) fold improvement total activity

specific activity

expresssion

H62Y*

5.33

5.67

0.94

MOE

F122Y* Q155Y P135Q E42T T13H* D274I Q146R E85R A169M* S49Y F190Y* E92R I10P L61M G136N* T13R* E256R G136R* G239H A209H I196R G136Y*

2.21 1.15 1.23 0.43 1.25 0.60 0.64 1.29 1.55 1.22 1.71 1.26 0.95 0.73 1.59 1.34 1.33 1.34 1.22 1.32 1.27 1.23

1.81 1.79 1.75 1.68 1.53 1.42 1.41 1.40 1.37 1.31 1.25 1.24 1.23 1.22 1.18 1.11 1.06 1.02 0.93 0.89 0.86 0.72

1.22 0.64 0.71 0.26 0.82 0.42 0.45 0.92 1.13 0.93 1.37 1.01 0.77 0.60 1.35 1.21 1.25 1.31 1.31 1.49 1.48 1.70

MOE MOE Bioluminate Bioluminate Bioluminate Bioluminate Bioluminate Bioluminate Bioluminate Bioluminate MOE Bioluminate Bioluminate Bioluminate Bioluminate Bioluminate Bioluminate Bioluminate Bioluminate Bioluminate Bioluminate Bioluminate/ MOE

variant

program

score active site/ binding active site active site stability binding binding binding binding stability stability stability active site binding stability binding binding stability stability binding binding stability stability stability

In addition to the fold improvement, Table 1 also highlights from which program and score mutation selections were derived. Comparing the structure-based R1 library (Figure S3), Bioluminate significantly outperforms MOE in the use of improved binding free energy and improved stability to predict improvements in overall catalytic activity. Twenty of the 60 variants (33.3%) selected by Bioluminate’s calculations were more active than wild type (9 of 34 or 26% of variants improved from binding free energy and 11 of 21 or 52% improved from stability group) compared with 2 of 29 for MOE. Additionally, 79% of MOE predicted variants in the binding energy and stability categories were completely inactive compared with 27% of Bioluminate’s. However, MOE was markedly better than Bioluminate for active site residue substitution, with 4 of 8 or 50% amino acid substitutions generating more active variants compared to 0 of 8 improved variants from Bioluminate. In addition, the MOE active site collection contained the top three mutations found overall. This underscores the importance of selecting active site

Table 2. Kinetic Analysis of Lysates and Purified Enzyme Variants lysate enzyme wt Seq-R1:S1 Seq-R2:S2 Seq-R2:S1 Struct-R1:S2 Struct-R1:S1 Struct-R2:S1

kcat (s−1)

0.11 0.24

purified

KM (M)

(kcat/KM)app

FIO WT

0.043 0.046

0.19 1.26 1.91 1.51 0.36 2.57 5.17

1.00 6.70 10.1 8.03 1.92 13.6 27.4 1734

kcat (s−1)

0.33

KM (M)

kcat/KM

FIO WT

0.1

1.01 1.15 1.70 1.10 0.51 1.48 3.29

1.00 1.14 1.69 1.08 0.51 1.46 3.26

DOI: 10.1021/acssynbio.7b00359 ACS Synth. Biol. 2018, 7, 1730−1741

Research Article

ACS Synthetic Biology

variants improving catalytic efficiency 1.46- and 3.26-fold, respectively. The greater improvements in catalyic efficiency of these variants might be a direct result of the structural focus on substrate−enzyme interaction in the mutation design. These variants improved substrate binding sufficiently that kcat and KM values could be estimated. The mutations from both methods were mapped onto the 3D structure of the protein. Most mutations from the sequence-based method address the inhibition of activity in the lysate. Of these, four are in the active site (S223P and F122Y as the two most significant, and V69G and T282S), A91C is in the core of the protein, and Y60F is located in the dimer interface likely improving interactions between the two monomers (Figure 5A). Of those in the active site, S223P and T282S are located in a loop and turn adjacent to the PLP cofactor and help structure the cofactor binding site; F122Y is expected to interact with the contents of the active site; and V69G creates additional space for substrates (Figure 5B). Four mutations of the 66 in the entire library were located in positions that model as interacting with the substrate (V69G, V69T, F122Y, F190L); two of these (50%) were identified as more active in lysates. The mutations in more active variants (FIO WT > 1.2; Table 1) identified with the structure-based design are mostly located in the active site or on the surface of the protein (Figure 5C). The top active mutants (H62Y, F122Y, F190Y, G136N/R/Y, and P135Q) correspond to first shell positions within the active site, participating in direct contacts with the substrate. In the structural models, H62Y improves the contacts with the CF3 group consistent with its improvement of KM, whereas G136N/ R and P135Q donate a HB to the N of the triazole ring (Figure 5D). Mutations of G136 are known to exert conformation control over a substrate determining loop from residues 129 to 145.24 Of six active mutations on the surface of the protein (T13R, E256R, E85R, E92R, G239H, S49Y), four are to Arg, and three of these are making a salt bridge with nearby Asp or Glu residues in the structural models. A169M improves the packing of the hydrophobic core, A209H in the second shell of the active site donates a HB to a nearby backbone O, whereas I196R is located in the dimer interface and engages in a salt bridge with an Asp residue from the other monomer. The ATA-117 transaminase enzyme that we chose to evolve in this evaluation is 330 amino acids long. This corresponds to 330 positions for 19 amino acid substitutions for a total of 6270 possible single amino acid substitutions. To screen this library completely assuming a randomization step would require in excess of 10 000 samples. By reducing the positions for mutation by 10-fold through rational means and randomizing each position (33 positions × 19 possible substitutions = 627) reduces the screening burden 10-fold to on the order of 1000 samples. Here we evaluated whether two standard methods for making predictions about position and amino acid substitution were sufficiently advanced to provide significant improvement to transaminase activity in lysates against a non-natural substrate, while at the same time limiting the number of mutations evaluated and therefore limiting the screening burden of the evolution effort an additional order of magnitude to less than 100 samples for each library. Table 3 summarizes a comparison of the results. Both methods significantly reduce the number of mutations made and the corresponding screening burden. The sequence specific method was capped at about 1% of the overall possible mutations made, while the

affected in lysate form, suggesting that a chemical lysis component might be responsible for the wild-type enzyme inactivation. Diluting the purified wild-type enzyme in the lysis solution, as well as in solutions of each potentially inhibiting component (PMSF, EDTA, 0.1% deoxycholate, buffers, lysozyme) independently had a negligible effect on the purified enzyme activity (except for the 0.1% deoxycholate, which reduced all variant activities by ∼50%). Diluting purified enzyme in the negative control (no expressed enzyme) lysate and in the poorly active wild-type lysate also had negligible effect. Additionally, little to no change in thermal stability between the purified enzyme samples was seen. Taken together, these suggest that the chemical lysis method indirectly inactivated or inhibited the wild-type enzyme, perhaps by liberating inhibiting components from the E. coli host that were not seen in the mechanical lysis method or by interfering with the dimerization maturation process required to develop active enzyme. Regardless of the mechanism, the loss of activity because of the chemical lysis is improved by evolution. The sequencebased R1 library generated one variant with significantly improved activity in lysate. The mutations in this variant were partially reassorted into the R2 library, and the resulting sequences with those mutations (F122Y or S223P) comprised 30 of the 34 positive sequences (Table S4). Thus the beneficial mutations discovered in round 1 led to a heritable improvement in population performance in round 2, despite the fact that the catalytic efficiency of the purified enzyme (Seq-R1:S1, I84L_F122Y_S223P) is improved over the wild-type enzyme only slightly (1.14 fold). Additionally when purified, a second of the sequence-based variants (Seq-R2:S1:I84L_A91C_A114R_F122Y_V125L_S223P) has similar catalytic efficiency compared to the wild type (1.08 fold), but much larger apparent catalytic efficiency than the wild type in lysate (8 fold), further illustrating specific targeting of inhibition by key mutations. As a result, the course of the evolution and the impact of the mutations are best understood from the context of the apparent catalytic efficiency in lysates, and the corresponding specific activity fold improvements in the screening data. Addressing systemic improvements is no less important to the overall success of an evolution programs than catalytic efficiency improvements. Chemical lysis methods are the primary approach to lysing even small libraries of variants in a parallel manner. Improvements that increase activity in the screening lysate allow for the necessary increase of screening pressure toward the desired reaction in subsequent rounds in exactly the same manner that kinetic improvements do. As an example, the previous sitagliptin work on this substrate and using this wild-type enzyme reported that the mutation S223P also found in these lysate-only improvements was responsible for a similar 12-fold improvement using chemically lysed cultures. More importantly, the next round of evolution, which involved switching substrates to the more demanding prositagliptin ketone, could not be advanced without the S223P mutation due to lack of activity.21 In addition to the variants whose effects are primarily seen in the lysate screening, one of the two top variants from the sequence-based round 2 (the Seq-R2:S2 variant) contained a 1.69-fold increase in catalytic efficiency in addition to a full recovery of the activity lost by the wild-type enzyme in the lysate. The structure-based libraries also demonstrated significantly improved catalytic efficiency, with R1 and R2 1735

DOI: 10.1021/acssynbio.7b00359 ACS Synth. Biol. 2018, 7, 1730−1741

Research Article

ACS Synthetic Biology

Figure 5. (A) 3D homodimeric structure of the WT ATA-117 with the positions identified as beneficial by the sequence-based approach depicted as yellow spheres. The two intermediate-cofactor complexes are rendered as sticks. (B) Close up view of the active site showing the four positions identified as beneficial. (C) 3D homodimeric structure of the WT ATA-117 with the positions identified as beneficial by the structure-based R1 library depicted as yellow spheres. (D) Close up view of the active site showing the five positions identified as beneficial. The G136N and P135Q mutations show a hydrogen bond to the triazole ring.

computational method was designed to fill one entire 96-well plate (1.5% of the possible mutations). Despite the orthogonality of the two different methods (Figure S6), they both were successful at identifying positive variants at high frequency, making the two methods highly complementary. The sequence-based prediction method generated amino acid substitutions more active than wild type in 9% of the amino acids selected. The structure-based

method provided positive amino acid substitutions in 18% of predicted mutations. Each method identified 2 amino acid substitutions that resulted in >2-fold improvement in activity under the reaction conditions (S223P and F122Y; and H62Y and F122Y for the sequence- and structure-based R1 libraries, respectively). 2-Fold improvements are the typical benchmark improvement seen historically per round of directed evolution experiments, although early in evolution programs such as in this experiment, larger jumps in activity per successful mutation are seen. In this effort, the best mutations in each case are 2.4fold and 5.6-fold improved. Both libraries identified the substitution F122Y as the second of the >2 fold improved mutations, and this was the only mutation overlap between the two methods. During the first round of the sitagliptin evolution on this same substrate, eight mutations were identified.21 Four of those eight mutations were identified by the two libraries designed here, including three of the top five. Mutations at position 122, such as F122Y were not identified in the original sitagliptin evolution as that work focused significantly on the large binding pocket. Mutations at position 122 were found as part of opening up the small binding pocket to fit the entire sitagliptin molecule. However, seven new positive activity positions were identified in these two libraries that have not been previously seen, including mutations at position T13, such as T13R that contributes 1.85-fold improvement in total activity to the most improved variant resulting from structure-based R2. The frequencies of positive mutations at 9 and 18% are sufficiently high enough to incorporate multiple mutations per sequence in order to accelerate evolution progress. In fact, the

Table 3. Summary of Evolution Results from the Two Methods Evaluated sequence-based model mutation space physically evaluated mutations identified as positive mutations identified as inactivating number of unique positions substituted amino acids not used in mutation design mutations >2-fold improved in activity highest lysate activity - single aa contribution highest lysate activity recombined best variant (positive residues underlined)

structure-based model

1.05%

1.48%

6/66 = 9%

17/93 = 18%

20/66 = 30%

42/93 = 45%

58

62

H, N, Q, W

A, D, E, G, K

2

2

2.4-fold

5.6-fold

7-fold

9-fold

I84L, A91C, A14R, F122Y, V125L, S223P

T13R, H62Y, F122Y 1736

DOI: 10.1021/acssynbio.7b00359 ACS Synth. Biol. 2018, 7, 1730−1741

Research Article

ACS Synthetic Biology

produce stabilities approaching 20−40-fold improvements within 100−200 variants screened in a single evolution experimentexactly the rapid evolution result desired. The naturally prescreened substitutions lead to an expectation of a low frequency of inactivating mutations. This was not true in the sequence-based prediction in this case (sequencebased R1 library) and serves to highlight the detrimental impact of the inactivating mutations. A 9% beneficial mutation success frequency suggests that 25% of the sequences should contain at least one beneficial mutation,c and 2.3% should contain two (see footnotea). The high inactivation frequency of 30% eliminated many of the improved sequences from consideration, leaving only three sequences improved in activity relative to wild type in the initial roundtwo single amino acid improvements, and the double mutation described above. This also made the deconvolution of mutations impossible to solve as the number of equations (nonzero screening data) dropped below the number of unknowns (mutations). The structurebased library’s inactivation frequency is much worse at 45%, requiring larger additional improvements in inactivating mutation elimination in order to produce these mutations in combination. Under the circumstances where the number of detrimental changes is high, creating additional constructs in R2 that eliminated detrimental mutations and screening an additional ∼100 variants provided an improved library despite keeping the number of variants screened fewer than 200. Combining the orthogonal approaches in R2 could lead to better libraries, screening fewer variants, and providing more improved variants. Improvements in the sequence-based predictions might come from the increasing numbers of sequences filling the databases and in refinements to the analysis of selection methods. An exciting new tool from Bio-Prodict called CorNet deduced not just variable positions, but also the residues that covary in a correlated network indicated for enantioselectivity within the α/β-hydrolase family.26 Variants from within this pool were improved relative to wild type in enantioselectivity at a frequency of 17%, about twice as often as seen here for activity. Structure-based predictions would certainly be improved with bound substrate crystal structures rather than docked substrates, and possibly be improved with more computationally demanding methods described in the literature. Molecular Dynamics (MD) simulations have been used to filter out beneficial mutations from high throughput in silico screens for enzyme activity or selectivity.13,27 In these cases, MD simulations were shown to increase the chance of successfully combining beneficial mutations, and thus appear as the next logical step to incorporate into current enzyme design workflows. MD simulations followed by an MMGBSA calculation of an ensemble of structures, as has been previously employed for a related transaminase in silico mutagenesis study,28 would be able to more accurately capture the effect of mutations that induce significant conformational changes. Additionally, Free Energy Perturbation29 is another promising method to interrogate changes in stability30 and substrate binding free energy for individual and combinatorial mutations. Finally, by optimizing for substrate binding only, enzyme turnover rate (kcat) has not been considered, which will contribute to the overall efficiency of the enzyme. To this end, each transition state of the multistep reaction would need to be optimized using QM/MM hybrid methods to estimate the activation barriers for each mutant (good reviews in refs 31−33). To our knowledge, these have not been evaluated in

9% positive mutation frequency from the original library was sufficient to locate the best double point mutant (F122Y, S223P) in the initial screening despite the high number of inactivating mutations. Finding a few “double point mutants” is likely in this screening scenario as the likelihood of finding any two positive mutations in each three mutation sequence is 2.3%,a although this falls to 1.5%b when incorporating the effects of a potential inactivating mutation as seen in the original library. Additionally, given these underlying positive mutation frequencies and assuming a library where every triple mutant could be made, triple mutants consisting of three positive mutations would be expected to occur between 1 in every 164 (1/0.1833) to 1 in every 1330 (1/0.0913) which is easily within reach of a more conventional screening paradigm where larger sampling is allowed. These double and triple mutant scenarios highlight the main advantage of being able to predict with greater accuracy “site directed, specific substitutions”the ability to compile multiple mutations in single sequences in a single round of evolution with the intent to move the evolution along faster than a single positive mutation each round. To further underscore this point, the two “second round” libraries constructed with both higher frequencies of positive mutations and inactivating mutations removed successfully combine three positive mutations derived from their respective initial libraries to produce 7- and 9-fold more active variant lysates. To make this multiple mutation per sequence approach a strategy that moves evolution programs to rapid successful conclusions without additional recombination steps, the successful mutations need to be as prevalent as the nonsuccessful ones, so that accumulating mutations occur as often or more often than they are omitted, requiring a hit rate of 50% or greater. This represents a 2.7-fold improvement in the structure-based predictions and a 5.5-fold improvement in the sequence-based hit rate over the predictions seen here. A suggestion of this level of improvement exists in this data set already as the active site predictions made by the sequencebased method and MOE in the structure-based method were beneficial in improving activity in 2 of 4 (50%) and 4 of 8 (50%) amino acid substitutions, respectively. Further, Bioluminate predictions using the binding free energy and stability changes were improved in activity in 20 of 55 mutations, including 11 improved actives in 21 predictions (52%) that utilized the stability scoring function. This warrants additional investigation in the future to understand whether these are general phenomena, or whether they are specific to this example. The prediction of mutations that improve activity are not yet as successful as those that improve stability. As a comparison to this work, a study that examined predictive methods for improving the stability of an α/β-hydrolase enzyme to temperature and to urea found that the sequence-based methods employed successfully predicted 15/19 (79%) stabilizing mutations, while the computational methods were successful in 13/28 (46%) of predictions; each mutation averaged 2.6 fold improvement in stability.25 Further, a round of recombination of 26 positive mutations at 24 positions produced several variants containing between 7 and 19 substitutions, each greater than 40-fold more stable than wild type. At the 79% positive prediction frequency, screening sequences containing 10 mutations each from the outset should produce sequences in which 1 in every 10 sequences contains only beneficial mutations. These sequences should readily 1737

DOI: 10.1021/acssynbio.7b00359 ACS Synth. Biol. 2018, 7, 1730−1741

Research Article

ACS Synthetic Biology Table 4. Mutations Selected on the Basis of Homologue Diversity A44E I84L V125P N189S A220T R236Y T282N

R52K E85D S126I F190L S223P A242I T282S

I55V L87F S126V G193L F225A P244E A284T

F56L N90S Y131V L195N F225S H260I G285A

Y60F A91C E137L I196V V227I L269R G286E

V69G A91L Y148I H203Y V228F V278I F290V

V69T A114R Y154I F207A V229A G280L I298V

F70V F122Y V171T A209E V234 K C281T R313A

H71R V123I V187A D214N V235L T282G Y314F

W73Y V125L V187I

activity in a small number of screening samples, and very large improvements in single rounds of evolution when traditional screening efforts are applied.

high throughput with thousands of variants because of the computational cost of these calculations. With the continuous improvements in force fields and computational power, these methodologies will likely be incorporated into the enzyme design workflows in the near future. As the science behind the mutation identification predictions improves, and as importantly, the understanding and reduction in the frequency of inactivating mutations also improves, the number of simultaneous positive mutations incorporated into newly designed sequences can increase, enabling larger jumps in activity performance. A 3−5 fold improvement in predictive power in both the positive variants and in the control of inactivating mutations would fully enable this predictive combination approach as a primary strategy for directed evolution efforts.



MATERIALS AND METHODS A sequence-based library consisting of 95 variant sequences and one wild-type control was designed by ATUM through their proprietary Protein GPS Engineering technology (sequencebased R1), similar to the imagabalin transaminase effort.34 This method relies on sequence alignments from a large collection of homologues and their own algorithms to identify specific amino acid substitutions to investigate.18 In brief, multiple sequence alignments of 1250 homologues were performed to identify natural diversity as prevetted accepted mutations. An analysis of the mutations in conjunction with the associated pattern of sitespecific and branch-specific variations was performed using a phylogenetic tree generated from the alignment, as previously described in detail,35 resulting in a numerical scoring of each observed mutation related to its acceptability in the sequence. All possible substitutions (∼1900 in this case) seen in the alignment were ranked ordered and a set of 250 were identified as candidate mutations for the directed evolution process. The 66 highest ranking mutations were selected for evaluation in Round 1 (Table 4). Because of the conservative nature of these mutations, three substitutions were evaluated simultaneously in each sequence. Substitutions were grouped such that each mutation was seen 5 times in the collection of 95 variants, each time with a different context of partnered substitutions.22 Thus, all substitutions are equally represented in the designed variants with no random mutations. From this design ATUM then created the DNA sequences, expressed the enzyme variants, and produced clarified lysates,36 which were then screened on THTP-BDO and analyzed as described below. Each of the constructs made was used to transform E. coli strain BL21(DE3) (Agilent). Transformants were selected on LB agar with 30 mg/L kanamycin at 37 °C. Transformants were picked to 0.5 mL LB medium containing 30 mg/L kanamycin, and cultures were incubated with shaking for 16 h at 37 °C. Ten microliters of each culture were used to inoculate 0.5 mL of fresh LB media containing 30 mg/L kanamycin and the resulting cultures were incubated at 37 °C with shaking until an OD of 0.6 at 600 nm. Cultures were cooled to 25 °C and induced with the addition of 1 mM IPTG. Induced cultures were incubated with shaking at 25 °C for 19 h, and cells were harvested by centrifugation at 5000g for 10 min at 4 °C. Final ODs at 600 nm were 7.0 on average. Cell pellets were frozen at −20 °C, thawed at 25 °C, and then fully lysed with lysozyme and detergent treatment in a total volume of 0.4 mL as follows. Cells were resuspended in 0.2 mL of 200 mM Tris-HCl (pH 7.5), 20% sucrose, 1 mM EDTA, 1 mM PMSF and 30U/μL Ready-Lyse lysozyme (Epicenter) and incubated for 15 min at 25 °C. To this mixture, 0.2 mL of 10 mM Tris-HCl (pH 7.5), 50 mM KCl, 1 mM EDTA, 1 mM PMSF, 20 mM MgCl2, 0.1%



CONCLUSION The ultimate solution to the directed evolution improvement of an enzyme with some initial activity on a non-natural, industrially relevant substrate should be to examine sources of mutational diversity and through some scientific algorithm, simply rewrite the amino acid sequence with the new substitutions that will render the enzyme suitably effective− screening one or at most a few variants for confirmation of the desired activity. While this goal appears impossible, great strides have already been made in this direction. Starting from the Nature-inspired technique of random mutagenesis and screening of 10 000 variants, many scientists have moved to smart libraries that identify smaller subsets of amino acid positions that are randomly mutated (“site directed, random substitutions”) that reduce the screening burden to the order of 1000 variants. Here we scouted the next advancewe examined two existing methods for determining “site directed, specific substitutions” that successfully identified mutations that improve the ATA-117 transaminase’s activity on a non-natural substrate, each while examining on the order of 100 variants. The two orthogonal methods, one based on the evaluation of sequence alignments and the other on structure-based in silico prediction provided hit rates of 1 positive variant for every 11 and 5.5 mutations predicted, respectively. These frequencies are already valuable from a directed evolution perspective, rapidly enabling 7- and 9-fold improvements in transaminase activity by taking advantage of recombination strategies. Additionally, subsets of the structure-based in silico prediction data reached one positive variant in every two sequences. By combining these two enzyme design approaches, the orthogonal diversity generated here could potentially serve as a way to accelerate an evolution campaign by limiting the number of variants to screen in each round and by reducing the number of rounds necessary to achieve a desired fold improvement. Improving the predictive science of both methods through better analysis of sequence information and improved computational techniques should lead to significant improvements in evolving enzyme 1738

DOI: 10.1021/acssynbio.7b00359 ACS Synth. Biol. 2018, 7, 1730−1741

Research Article

ACS Synthetic Biology Table 5. 93 in Silico Mutants Selected As Structure-Based R1 I10P S49Y D81V S124M D139C A169M F190Y G224C I246L T282M P297V

T13H S54M E85Q S124Q H143Y Q173R G193C G224L I246Y T282W S299R

T13R D57N E85R S124R P145M S174Q G193I G224M R248W T282Y G301Y

D25T L61M E92H S126Y Q146R T178M I196R G239H R248Y A284M

E27R L61W E92R P135Q Y150C T178W D204W G245F E256R G285C

D29S H62R K106R P135W Y150I S181I A209H G245I D274H G285I

P33R H62Y K115R G136N Q155M N189R G217L G245L D274I G285Y

E42T T66W F122Y G136R Q155Y N189T E221I G245N D274R G286I

P48I V69I S124I G136Y P159F F190R E221Q G245Y G280I V293Y

puting Group, Montreal, Canada. Details in SI).37 For each software package (Bioluminate and MOE) and for each score (ligand binding free energy and protein stability), the top mutants were selected (excluding mutations to catalytic residues) from the quadrant where both the stability and binding scores were improved relative to ATA-117, up to a maximum of five mutants per position (SI, Figure S2). Additionally, for active site residues (residues within 5 Å of ligand) not selected with either program/score, the best single mutant was also selected taking into account both programs and scores simultaneously. The 93 unique structure-based mutants are shown in Table 5. ATUM created the DNA sequences, expressed wild type and the enzyme variants and produced clarified lysates, which were then screened on THTPBDO and analyzed as described above. The top mutants were recombined into a structure-based R2 combinatorial library. The recombination of variants chosen as described in the text for the structure-based R2 combinatorial library was accomplished by multiple PCR overlap extensions (SOEing) and ligation. A combination of terminal primers and primers located in between every mutation were used to individually synthesize all possible variations of mutation sequences as described in detail in the Supporting Information. These sequences were transformed, expressed, clarified lysates generated and screened as described in the SI and above.

deoxycholate, and 80U of OmniCleave endonuclease (Epicenter) were added and incubation at 25 °C was continued for 15 min. The soluble crude lysate fraction was recovered after centrifugation at 5000g for 10 min at 4 °C, combined with an equal volume of glycerol and stored at −20 °C. Lysate fractions were analyzed by quantitative PAGE using densitometry relative to protein concentration standards after staining with Simply Blue (Invitrogen). Full-length transaminase was estimated for each sample. For purification, cells were resuspended in buffer (25 mM Bis-Tris pH 6.5, 300 mM NaCl) and lysed by sonication. Cell debris was discarded by centrifugation, and the supernatant was loaded onto a HiTrap Q HP column (GE Healthcare Lifescience). The resin was washed with buffer, then proteins were eluted with NaCl using a 10 column volume linear gradient. The transaminase frations were collected, desalted into phosphate buffered saline (PBS) using PD-10 spin columns. SDS-PAGE or capillary electrophoresis (PerkinElmer GXII) confirmed that the protein was >90% pure, and finally enzyme concentration was determined by A280 (Nanodrop). E. coli lysates were screened using the reaction system in Scheme 1 described previously such that all reagents other than the transaminase were supplied in significant excess.23 Ketone and amine standards were made as described previously.21 The final concentration of all added reaction components was 0.38 mM nicotinamide adenine dinucleotide (NAD), 2 mM pyridoxal-5-phosphate (PLP), 0.15 mg/mL lactate dehydrogenase-101, 0.5 mg/mL glucose dehydrogenase-105, 0.6 M Dalanine, 0.3 M glucose, 5 vol % methanol and 18 mM substrate in 0.1 M phosphate buffer. Reactions were run at pH 7.5 overnight. Lysate concentrations were 40 vol % in round 1 screens and 20 vol % in round 2 screens. The samples were evaluated on a Waters UPLC using an Atlantis T3 column run isocratically at 95% solution A (0.05% trifluoroacetic acid, TFA) and 5% solution B (acetonitrile with 0.05% TFA) over 2 min, followed by a 95% B wash for 0.3 min. The amine eluted at 1.26 min, and the ketone eluted at 1.49 min. A second library was constructed using a structure-based approach (structure-based R1). To generate all possible singlepoint mutations computationally, we first docked the PLPaldimine intermediate of THTP-BDO (Figure 3) into the wildtype cofactor-bound structure (3WWH.pdb)24 using Glide (Schrödinger Release 2015-1: Glide, Schrödinger, LLC, New York, NY, 2015). We then performed a full residue scan (i.e., mutating each amino acid to all others) and calculated both the change in ligand-binding free energy and stability scores using two different software packages: Bioluminate from Schrödinger (Release 2016-3) and the protein design module in MOE (Molecular Operating Environment 2013.08, Chemical Com-



ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acssynbio.7b00359. Sequence-based approach round 1 and round 2 sequences, round 1 and round 2 screening results and activities; description of mutation deconvolution algorithm; structure-based approach computational methods and calculation results; structure-based round 1 and round 2 sequences, distibution anlaysis of mutations; recombination methodology for generating round 2 sequences; round 2 screening results and deconvolution of mutations; evaluation of sequence and structure method orthogonality (PDF)



AUTHOR INFORMATION

Corresponding Author

*Tel.: 732-594-4836. E-mail: jeff[email protected]. ORCID

Jeffrey C Moore: 0000-0002-9807-6315 Matthew D. Truppo: 0000-0003-4917-7838 1739

DOI: 10.1021/acssynbio.7b00359 ACS Synth. Biol. 2018, 7, 1730−1741

Research Article

ACS Synthetic Biology Present Addresses

(9) Truppo, M. D. (2017) Biocatalysis in the Pharmaceutical Industry: The Need for Speed,. ACS Med. Chem. Lett. 8, 476−480. (10) Wijma, H. J., Floor, R. J., Jekel, P. A., Baker, D., Marrink, S. J., and Janssen, D. B. (2014) Computationally designed libraries for rapid enzyme stabilization. Protein Eng., Des. Sel. 27, 49−58. (11) Beier, A., Bordewick, S., Genz, M., Schmidt, S., van den Bergh, T., Peters, C., Joosten, H. J., and Bornscheuer, U. T. (2016) Switch in Cofactor Specificity of a Baeyer-Villiger Monooxygenase. ChemBioChem 17, 2312−2315. (12) Bendl, J., Stourac, J., Sebestova, E., Vavra, O., Musil, M., Brezovsky, J., and Damborsky, J. (2016) HotSpot Wizard 2.0: automated design of site-specific mutations and smart libraries in protein engineering. Nucleic Acids Res. 44, W479−487. (13) Wijma, H. J., Floor, R. J., Bjelic, S., Marrink, S. J., Baker, D., and Janssen, D. B. (2015) Enantioselective enzymes by computational design and in silico screening. Angew. Chem., Int. Ed. 54, 3726−3730. (14) Schwarte, A., Genz, M., Skalden, L., Nobili, A., Vickers, C., Melse, O., Kuipers, R., Joosten, H. J., Stourac, J., Bendl, J., Black, J., Haase, P., Baakman, C., Damborsky, J., Bornscheuer, U., Vriend, G., and Venselaar, H. (2017) NewProt - a protein engineering portal. Protein Eng., Des. Sel. 30, 1−7. (15) Nobili, A., Tao, Y., Pavlidis, I. V., van den Bergh, T., Joosten, H. J., Tan, T., and Bornscheuer, U. T. (2015) Simultaneous use of in silico design and a correlated mutation network as a tool to efficiently guide enzyme engineering. ChemBioChem 16, 805−810. (16) Chen, C. Y., Georgiev, I., Anderson, A. C., and Donald, B. R. (2009) Computational structure-based redesign of enzyme activity. Proc. Natl. Acad. Sci. U. S. A. 106, 3764−3769. (17) Nobili, A., Gall, M. G., Pavlidis, I. V., Thompson, M. L., Schmidt, M., and Bornscheuer, U. T. (2013) Use of ’small but smart’ libraries to enhance the enantioselectivity of an esterase from Bacillus stearothermophilus towards tetrahydrofuran-3-yl acetate. FEBS J. 280, 3084−3093. (18) Minshull, J., Ness, J. E., Gustafsson, C., and Govindarajan, S. (2005) Predicting enzyme function from protein sequence. Curr. Opin. Chem. Biol. 9, 202−209. (19) Denard, C. A., Ren, H., and Zhao, H. (2015) Improving and repurposing biocatalysts via directed evolution. Curr. Opin. Chem. Biol. 25, 55−64. (20) Sebestova, E., Bendl, J., Brezovsky, J., and Damborsky, J. (2014) Computational tools for designing smart libraries. Methods Mol. Biol. 1179, 291−314. (21) Savile, C. K., Janey, J. M., Mundorff, E. C., Moore, J. C., Tam, S., Jarvis, W. R., Colbeck, J. C., Krebber, A., Fleitz, F. J., Brands, J., Devine, P. N., Huisman, G. W., and Hughes, G. J. (2010) Biocatalytic asymmetric synthesis of chiral amines from ketones applied to sitagliptin manufacture. Science 329, 305−309. (22) Liao, J., Warmuth, M. K., Govindarajan, S., Ness, J. E., Wang, R. P., Gustafsson, C., and Minshull, J. (2007) Engineering proteinase K using machine learning and synthetic genes. BMC Biotechnol. 7, 16. (23) Girardin, M., Ouellet, S. G., Gauvreau, D., Moore, J. C., Hughes, G., Devine, P. N., O’shea, P. D., and Campeau, L. C. (2013) Convergent Kilogram-Scale Synthesis of Dual Orexin Receptor Antagonist. Org. Process Res. Dev. 17, 61−68. (24) Guan, L. J., Ohtsuka, J., Okai, M., Miyakawa, T., Mase, T., Zhi, Y., Hou, F., Ito, N., Iwasaki, A., Yasohara, Y., and Tanokura, M. (2015) A new target region for changing the substrate specificity of amine transaminases. Sci. Rep. 5, 10753. (25) Jones, B. J., Lim, H. Y., Huang, J., and Kazlauskas, R. J. (2017) Comparison of Five Protein Engineering Strategies for Stabilizing an alpha/beta-Hydrolase. Biochemistry 56, 6521−6532. (26) van den Bergh, T., Tamo, G., Nobili, A., Tao, Y., Tan, T., Bornscheuer, U. T., Kuipers, R. K. P., Vroling, B., de Jong, R. M., Subramanian, K., Schaap, P. J., Desmet, T., Nidetzky, B., Vriend, G., and Joosten, H. J. (2017) CorNet: Assigning function to networks of co-evolving residues by automated literature mining. PLoS One 12, e0176427. (27) Wijma, H. J., Marrink, S. J., and Janssen, D. B. (2014) Computationally efficient and accurate enantioselectivity modeling by

#

K. L.: Denali Therapeutics, 151 Oyster Point Blvd, South San Francisco, CA 94080. ∞ EMD Serono Research and Development Institute, Inc., 45A Middlesex Turnpike, Billerica, MA 01821, USA Author Contributions

J.C.M. and M.D.T. conceived the study, J.C.M. designed the experiments. S.G. designed the sequence-based libraries; A.C., A.R.-G., and K.L. designed and executed the structure-based library calculations. M.W. produced and characterized enzyme samples. K.H. executed the structure-based round 2 library recombination and protein production. J.C.M. and N.M. screened the reactions. J.C.M., S.G., A.R.-G., and A.C. analyzed the data. J.C.M., A. R.-G., A.C., and K.H. wrote the manuscript with input from the other authors. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS

This work was funded entirely by Merck Research Laboratories of Merck, Sharp, and Dohme (MSD). In silico calculations were performed within the MSD internal modeling environment and relied heavily upon the resources of our High Performance Computing group.



ADDITIONAL NOTES This is calculated as the combination of ways two mutations can be arrayed within three positions times the probability of a positive (Pmut) or nonpositive mutation (Pno mut) at each position, that is, 3C2·Pmut·Pmut·Pno mut = 3 × 0.091 × 0.091 × 0.909 = 0.0226, or 2.3%. b As before, except positive sequences arise only when Pno mut is not inactivating (inactivating mutations were 30% of the original library), i.e., 3C2·Pmut·Pmut·(Pno mut − Pinact mut) = 3 × 0.091 × 0.091(0.909 − 0.30) = 0.0151, or 1.5%. c This probability is equal to 1 minus the probability of the sequence with zero positive mutations, or 1 − (0.909 × 0.909 × 0.909) = 0.249, or 25%. a



REFERENCES

(1) Chen, K. Q., and Arnold, F. H. (1991) Enzyme engineering for nonaqueous solvents: random mutagenesis to enhance activity of subtilisin E in polar organic media. Bio/Technology 9, 1073−1077. (2) Bornscheuer, U. T., Huisman, G. W., Kazlauskas, R. J., Lutz, S., Moore, J. C., and Robins, K. (2012) Engineering the third wave of biocatalysis. Nature 485, 185−194. (3) Davids, T., Schmidt, M., Bottcher, D., and Bornscheuer, U. T. (2013) Strategies for the discovery and engineering of enzymes for biocatalysis. Curr. Opin. Chem. Biol. 17, 215−220. (4) Reetz, M. T. (2017) Recent Advances in Directed Evolution of Stereoselective Enzymes, In Directed Enzyme Evolution: Advances and Applications (Alcalde, M., Ed.), pp 69−99, Springer International Publishing, Cham. (5) Jochens, H., and Bornscheuer, U. T. (2010) Natural diversity to guide focused directed evolution. ChemBioChem 11, 1861−1866. (6) Ebert, M. C., and Pelletier, J. N. (2017) Computational tools for enzyme improvement: why everyone can - and should - use them. Curr. Opin. Chem. Biol. 37, 89−96. (7) Lutz, S. (2010) Beyond directed evolution−semi-rational protein engineering and design. Curr. Opin. Biotechnol. 21, 734−743. (8) Chaparro-Riggers, J. F., Polizzi, K. M., and Bommarius, A. S. (2007) Better library design: data-driven protein engineering. Biotechnol. J. 2, 180−191. 1740

DOI: 10.1021/acssynbio.7b00359 ACS Synth. Biol. 2018, 7, 1730−1741

Research Article

ACS Synthetic Biology clusters of molecular dynamics simulations. J. Chem. Inf. Model. 54, 2079−2092. (28) Sirin, S., Kumar, R., Martinez, C., Karmilowicz, M. J., Ghosh, P., Abramov, Y. A., Martin, V., and Sherman, W. (2014) A computational approach to enzyme design: predicting omega-aminotransferase catalytic activity using docking and MM-GBSA scoring,. J. Chem. Inf. Model. 54, 2334−2346. (29) Wang, L., Wu, Y., Deng, Y., Kim, B., Pierce, L., Krilov, G., Lupyan, D., Robinson, S., Dahlgren, M. K., Greenwood, J., Romero, D. L., Masse, C., Knight, J. L., Steinbrecher, T., Beuming, T., Damm, W., Harder, E., Sherman, W., Brewer, M., Wester, R., Murcko, M., Frye, L., Farid, R., Lin, T., Mobley, D. L., Jorgensen, W. L., Berne, B. J., Friesner, R. A., and Abel, R. (2015) Accurate and reliable prediction of relative ligand binding potency in prospective drug discovery by way of a modern free-energy calculation protocol and force field. J. Am. Chem. Soc. 137, 2695−2703. (30) Ford, M. C., and Babaoglu, K. (2017) Examining the Feasibility of Using Free Energy Perturbation (FEP+) in Predicting Protein Stability. J. Chem. Inf. Model. 57, 1276−1285. (31) Garcia-Guevara, F., Avelar, M., Ayala, M., and Segovia, L. (2015) Computational Tools Applied to Enzyme Design - a review. Biocatalysis 1, 109−117. (32) van der Kamp, M. W., and Mulholland, A. J. (2013) Combined quantum mechanics/molecular mechanics (QM/MM) methods in computational enzymology. Biochemistry 52, 2708−2728. (33) Carvalho, A. T., Barrozo, A., Doron, D., Kilshtain, A. V., Major, D. T., and Kamerlin, S. C. (2014) Challenges in computational studies of enzyme structure, function and dynamics. J. Mol. Graphics Modell. 54, 62−79. (34) Midelfort, K. S., Kumar, R., Han, S., Karmilowicz, M. J., McConnell, K., Gehlhaar, D. K., Mistry, A., Chang, J. S., Anderson, M., Villalobos, A., Minshull, J., Govindarajan, S., and Wong, J. W. (2013) Redesigning and characterizing the substrate specificity and activity of Vibrio fluvialis aminotransferase for the synthesis of imagabalin. Protein Eng., Des. Sel. 26, 25−33. (35) Ehren, J., Govindarajan, S., Moron, B., Minshull, J., and Khosla, C. (2008) Protein engineering of improved prolyl endopeptidases for celiac sprue therapy. Protein Eng., Des. Sel. 21, 699−707. (36) Gustafsson, C., Minshull, J., Govindarajan, S., Ness, J., Villalobos, A., and Welch, M. (2012) Engineering genes for predictable protein expression. Protein Expression Purif. 83, 37−46. (37) Beard, H., Cholleti, A., Pearlman, D., Sherman, W., and Loving, K. A. (2013) Applying physics-based scoring to calculate free energies of binding for single amino acid mutations in protein-protein complexes. PLoS One 8, e82849.

1741

DOI: 10.1021/acssynbio.7b00359 ACS Synth. Biol. 2018, 7, 1730−1741