Specific Predictions Enable Minimal Directed Evolution Libraries

current address: Denali Therapeutics, 151 Oyster Point Blvd, South San Francisco, CA 94080. *To whom correspondence should be addressed. Page 1 of 37...
0 downloads 0 Views 1MB Size
Subscriber access provided by UNIVERSITY OF TOLEDO LIBRARIES

Article

‘Site and Mutation’-Specific Predictions Enable Minimal Directed Evolution Libraries Jeffrey C Moore, Agustina Rodriguez-Granillo, Alejandro Crespo, Sridhar Govindarajan, Mark Welch, Kaori Hiraga, Katrina Lexa, Nicholas Marshall, and Matthew David Truppo ACS Synth. Biol., Just Accepted Manuscript • DOI: 10.1021/acssynbio.7b00359 • Publication Date (Web): 21 May 2018 Downloaded from http://pubs.acs.org on May 22, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

‘Site and Mutation’-Specific Predictions Enable Minimal Directed Evolution Libraries *

Jeffrey C Moore1, , Agustina Rodriguez-Granillo2, Alejandro Crespo2, Sridhar Govindarajan3, Mark Welch3, Kaori Hiraga4, Katrina Lexa2,†, Nicholas Marshall4, Matthew D. Truppo5 1

Biocatalysis, Biochemical Engineering and Structure, MRL, Merck & Co., Inc., PO Box 2000, Rahway, NJ 07065 USA

2

Modeling and Informatics, MRL, Merck & Co., Inc., PO Box 2000, Rahway, NJ 07065 USA

3

ATUM, 37950 Central Court, Newark, CA 94560 USA

4

Protein Engineering, Biochemical Engineering and Structure, MRL, Merck & Co., Inc., PO Box 2000, Rahway, NJ 07065 USA

5

Biochemical Engineering and Structure, MRL, Merck & Co., Inc., PO Box 2000, Rahway, NJ 07065 USA



current address: Denali Therapeutics, 151 Oyster Point Blvd, South San Francisco, CA 94080

*

To whom correspondence should be addressed.

1 ACS Paragon Plus Environment

ACS Synthetic Biology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 37

ABSTRACT Directed evolution experiments designed to improve the activity of a biocatalyst have increased in sophistication from the early days of completely random mutagenesis.

Sequence-based and

structure-based methods have been developed to identify “hotspot” positions that when randomized provide a higher frequency of beneficial mutations that improve activity. These focused mutagenesis methods reduce library sizes and therefore reduce screening burden, accelerating the rate of finding improved enzymes. Looking for further acceleration in finding improved enzymes, we investigated whether two existing methods, one sequence-based (Protein GPS) and one structurebased (using Bioluminate and MOE), were sufficiently predictive to provide not just the hotspot position, but also the amino acid substitution that improved activity at that position. By limiting the libraries to variants that contained only specific amino acid substitutions, library sizes were kept to less than 100 variants. For an initial round of ATA-117 R-selective transaminase evolution, we found that the methods used produced libraries where 9% and 18% of the amino acid substitutions chosen were amino acids that improved reaction performance in lysates . The ability to create combinations of mutations as part of the initial design was confounded by the relatively large number of predicted mutations that were inactivating (30% and 45% for the sequence-based and structure-based methods respectively). Despite this, combining several mutations identified within a given method produced variant lysates 7 and 9 fold more active than the wild type lysate, highlighting the capability of mutations chosen this way to generate large advances in activity in addition to the reductions in screening.

KEYWORDS directed evolution / in silico mutagenesis / R-specific transaminase ATA-117 / Bioluminate / MOE / Protein GPS

2 ACS Paragon Plus Environment

Page 3 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

Directed Evolution has become a foundational technology for improving a wide variety of enzyme properties, of which activity and selectivity are the two most notable for industrial biocatalysis. The technology initiated in the early 1990s as a Nature-inspired approach to improve enzyme function through random mutation of the protein’s amino acid sequence and screening for desired effect.1 Over the intervening years, Directed Evolution has become more sophisticated,2 such that today, several strategies exist that enable practitioners to reduce the randomness of the initial approach by making educated guesses as to the location of mutations likely to be responsible for improvement of activity or selectivity, which has been extensively reviewed.3-8 By limiting the random mutations to a subset of well-chosen amino acid residues (often called “hotspots”) rather than the entire protein, the pool of variant sequences decreases dramatically, reducing the number of screening samples required to identify improved enzymes, giving rise to the name “small, but smart” libraries and accelerating evolution projects often limited by screening efforts. Being able to move activity and selectivity quickly in early rounds of directed evolution experiments by increasing the magnitude of improvement, decreasing the time of each round, or both is critically important to delivering on the promise of industrial biocatalysis.9 When the method for mutation selection is well suited to the desired property improvement, such as the FRESCO method for stability, large changes in performance can be made in extremely small libraries.10 The strategies for generating small, but smart libraries for improved activity focus generally on two fundamental approaches to identify hotspots. The first uses the large collections of sequence information in enzyme sequence databases to identify and catalog natural diversity under the presumption that natural evolution has sampled a significant amount of mutation space and identified the critical residues, their key interactions with substrates and each other, and that the prediction of hotspots can be inferred from an analysis of these data.5, 6, 11, 12 The second approach uses 3-dimensional protein structural information combined with an understanding of the substrate, transition state or product binding within the enzyme active site to identify residues likely to control reaction progress.6,

7, 13

A recent advancement highlights the combination of tools from both 3 ACS Paragon Plus Environment

ACS Synthetic Biology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 37

strategies into a single software package, which should generate more accurate predictions of hotspots in the future.14 With few exceptions,7,

15, 16

small, but smart Directed Evolution experiments have relied on site-

saturation mutagenesis as a “site directed, randomized substitution” approach to incorporate random diversity at the hotspot positions chosen and screened to identify the best substitution for the chosen position. A number of degenerate codon sets have also been developed to limit bias and control redundancy in the genetic code and in some cases to reduce the overall amino acids sampled to keep screening effort to a minimum.4, 11 These and other strategies for generating “small, but smart” libraries narrow search space for improved variants from many thousands to several hundreds of variants screened.7, 17-20 Although the reduced degree of randomness in the pool of variants construction limits the screening burden, it still creates theoretically unnecessary screening. Screening could be maximally accelerated if the correct amino acid substitution could also be inferred from the hotspot analysis.

We were interested in understanding how close current

methodologies are to correctly identifying valuable amino acid substitutions in addition to hotspot position. Small, but smart libraries were used in what was then (circa 2008) the state-of-the-art directed evolution effort of a transaminase to synthesize sitagliptin, including the initial round of substrate walking that was designed to bridge the substrate scope of the (R)-selective transaminase ATA-117 with the desired sitagliptin molecule.21 This initial round screened for activity improvements against a truncated ketone analog of sitagliptin (1-(3-(trifluoromethyl)-5,6-dihydro-[1,2,4]triazolo[4,3a]pyrazin-7(8H)-yl)butane-1,3-dione, abbreviated THTP-BDO, Scheme 1) and examined siteSCHEME 1 saturation mutagenesis to randomize the amino acids at 12 positions, resulting in the identification 8 positions where at least 1 amino acid substitution improved reaction rate. Since this work was done,

4 ACS Paragon Plus Environment

Page 5 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

predictive tools have improved with continued development of sequence alignment analyses (3DM, ATUM (formerly DNA2.0), HotSpot Wizard) alongside the expansion of sequences in the databases, and with improved structural computational capabilities. Here we re-investigated this initial round of sitagliptin evolution using “site directed, specific substitution” capabilities by examining two existing hotspot prediction methodologies representative of the sequence-based and structurally-based strategies that also provide amino acid substitution predictions and testing whether the predicted substitutions improve the activity of the enzyme.

We evaluated both total and specific activity of variant sequences to validate the

predictions and to maximally move the activity of the catalyst produced. At this early stage of an evolution effort, moving the activity forward is paramount, and as a result activity improvements from a variety of mechanisms (expression, stability if it improves activity, reduced inhibition, etc) are valuable.

RESULTS & DISCUSSION The sequence-based strategy used multiple sequence alignments of 1250 homologues of the ATA117 gene to identify ~1900 natural variations as diversity potentially useful for substitution. Using ATUM’s Protein GPS Engineering technology as described in the Methods section, 66 unique mutations (Table 4, Methods Section) were prioritized for evaluation and arranged into 95 sequences. One of the advantages of this small library approach is that the relatively few number of samples allows for more detailed data collection to provide higher quality data overall. In this case, it allowed for quantitation of enzyme expression in each lysate, providing information on a significant source of variability and enabling access to higher quality specific activity data. Under the screening conditions where the substrate concentration is much less than the KM, the specific activity data is a scaled measurement of an apparent catalytic efficiency (kcat/KM)app for this reaction.

5 ACS Paragon Plus Environment

ACS Synthetic Biology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 37

This sequence-based Round 1 (R1) library was arranged with 3 mutations per sequence (allowing each mutation to be observed in 5 different variants, SI-Table S1) under the premise that the prescreening of these mutations in homologous sequences in Nature would provide a high degree of tolerability in the resulting variant set. Screening this sequenced-based R1 library (SI–Figure S1A) identified 3 lysates with more activity than the wild type lysate, including the best variant comprised of mutations I84L, F122Y and S223P that was 4.6 fold improved in total activity, 3.9 fold improved in specific activity and 1.2 fold improved in expression (Figure S1B, Table S2).

Twelve additional

variants had specific activities between 1.2-fold and 2.2 fold improved relative to wild type, although expression of these variants was sufficiently poor that total activity was reduced relative to the wild type lysate. The premise that the pre-screening of substitutions in Nature would provide a majority of well tolerated substitutions and ensure reasonable activity in the collection did not hold for this library. This is a direct result of having every mutation seen in 5 variants – if a given mutation is deleterious, 5 inactive variants are expected. Sixty-three of the 95 variant sequences displayed activity below the threshold of detection and were deemed inactive. Of these, 39 sequences had low expression, with levels 0-10% of the wild type transaminase as measured by gel electrophoresis (SI-FIGURE S1B). When further examined, 20 substitutions (~30%) were not present in any active sequence; these were presumed responsible for inactivating the variant sequences. Unfortunately, the small number of sequences with measurable activity (32) relative to the number of mutations introduced (66) made deconvolution of the impact of individual mutations unfeasible. To fully evaluate the original amino acid analysis, 26 mutations responsible for inactive or nonexpressing variants were dropped from the original 66, and the remaining 40 mutations were rearrayed in 95 sequences at 2-6 substitutions per sequence (sequence-based R2). No new mutations were introduced in the sequence-based R2. The resynthesis of the genes in this collection also allowed for a re-weighting of mutations, such that mutations appearing in the three sequences that produced more activity than wild type in R1 were more heavily weighted in this second library, 6 ACS Paragon Plus Environment

Page 7 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

appearing in more than 10 sequences each (SI-Table S3).22 This library did not attempt to compile any clones that combined only the best mutations from R1, but modulated the frequencies of amino-acid changes seen in the library to provide an overall improved library. This R2 library performed much better than the original (Figure 1A), with only 17 sequences (vs. 63 in the first case) showing non-detectable activity, and no mutations showing up only in inactive sequences. Not FIGURE 1

surprisingly, a large proportion of sequences have specific activities greater than wild type because the mutation(s) responsible for the improved variants in the first round were incorporated into more sequences. Although this round has greater scatter in protein expression (Figure 1B), this had little effect on the ranking of activities as the top 5 variants in total and specific activity were identical, with only a slight rank ordering difference between them (Table S4). Two improved variants have activities above the R1 best variant; the first at 4.7 fold improved specific activity containing all three mutations from the R1 variant (I84L, F122Y and S223P) and additionally A91C, A114R, and V125L, and the second at 4.5 fold improved specific activity containing two mutations from the R1 variant (F122Y, S223P), and the two additional mutations Y60F and V235L.

Both of these variants

additionally benefit from improvements in expression, providing another 1.5 fold improvement over the wild type lysate activity. If we assume that the effects of each mutation are additive, a mathematical model can quickly determine the impact of each individual mutation (the calculation details are given in the SI and results are shown in Figure 2). Deconvolution of the sequence-based R2 total activity data (Figure 2, black bars) indicated that 6 mutations of the original 66 are greater than 20% improved over wild FIGURE 2 type, corresponding to 9% (1 in 11) of the original library. The mutations in order of degree of improvement are: S223P, F122Y, Y60F, T282S, V69G and A91C. Similar deconvolution of the specific

7 ACS Paragon Plus Environment

ACS Synthetic Biology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 37

activity (Figure 2, white bars) identified the same residues, and in addition highlighted 3 more: V227I, V278I and C281T. Although V278I and C281T were the 3rd and 4th best specific activity substitutions according to the deconvolution, they did not appear in any improved sequences. They also have the largest difference between low total activity and high specific activity, suggesting that these residues have significant negative impact on enzyme expression, and might need to be installed in a construct that is already highly expressed. These are mutations that would be revisited in future rounds of evolution. The structure-based substitutions were identified in silico by making every possible single amino acid substitution in an enzyme crystal structure containing the docked PLP aldimine intermediate (Figure 3), followed by ranking based on stability and binding free energy scores using both Bioluminate and FIGURE 3 MOE computational programs as described in the Methods section. The enzyme is known to be (R)selective, and the aldimine intermediate allowed the modeling of the correct chirality in the enzyme active site. We believed the catalytic machinery of the transaminase was sufficiently active on preferred substrates based on previous use of ATA-117,23 and that the THTP-BDO substrate simply did not fit well in the active site and was therefore poorly accepted. Thus, optimizing intermediate binding should be a reasonable predictor of activity improvement, justifying the use of binding free energy score calculations to predict improvement. We also reasoned new mutations should not destabilize the enzyme structure, which justified using improvements in stability scores to rank variants. All substitutions chosen were calculated to be improved in both binding free energy and in stability (SI-Figure 2a,b). Because a large number of predicted substitutions occupied the improved binding and improved stability space (SI-Figure 2c,d), we chose to sample two groups of mutations for improved activity: those that ranked highest in binding energy and those that ranked highest in stability. In addition, preferential treatment was given to residues within 5Å of the substrate, making sure that at least one amino acid substitution was represented at each of these residues. 8 ACS Paragon Plus Environment

Page 9 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

The 93 mutations selected comprising structure-based R1 and the summary of their selection are shown in Table 4 (Methods section); the screening results are shown in Figure 4a. FIGURE 4

In this single mutation library, 17 of 93 lysates demonstrated greater than 20% improvement over the wild type lysate (18%, or 1 in 5.5); 15 of 93 variants met this criteria for improved specific activity (Table 1). H62Y and F122Y were the top two variants in both specific activity (5.7 and 1.8 fold improved respectively) and in total lysate activity (5.3 and 2.2 fold improved respectively). Keeping the library as single substitution sequences proved a correct choice as 42 single mutations (45%) were below the limit of detection in the assay and were deemed inactive, although only a few variants were poorly expressed (10 variants with levels 0-10% of WT, Figure 5b). In addition to the fold improvement, Table 1 also highlights from which program and score mutation selections were derived. Comparing the structure-based R1 library (SI-Figure S3), Bioluminate significantly outperforms MOE in the use of improved binding free energy and improved stability to

Table 1: Mutation analysis for the structure-based R1 library (variants with FIO WT > 1.2 only).

Fold Improvement Specific Activity Expresssion

Variant

Total Activity

Program

Score

H62Y* F122Y*

5.33 2.21

5.67 1.81

0.94 1.22

MOE MOE

Active Site/Binding Active Site

Q155Y P135Q

1.15 1.23

1.79 1.75

0.64 0.71

MOE Bioluminate

Active Site Stability

E42T T13H*

0.43 1.25

1.68 1.53

0.26 0.82

Bioluminate Bioluminate

Binding Binding

D274I Q146R

0.60 0.64

1.42 1.41

0.42 0.45

Bioluminate Bioluminate

Binding Binding

E85R A169M*

1.29 1.55

1.40 1.37

0.92 1.13

Bioluminate Bioluminate

Stability Stability

S49Y F190Y*

1.22 1.71

1.31 1.25

0.93 1.37

Bioluminate MOE

Stability Active Site

E92R I10P

1.26 0.95

1.24 1.23

1.01 0.77

Bioluminate Bioluminate

Binding Stability

L61M G136N*

0.73 1.59

1.22 1.18

0.60 1.35

Bioluminate Bioluminate

Binding Binding

9 ACS Paragon Plus Environment

ACS Synthetic Biology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 37

T13R* E256R

1.34 1.33

1.11 1.06

1.21 1.25

Bioluminate Bioluminate

Stability Stability

G136R* G239H

1.34 1.22

1.02 0.93

1.31 1.31

Bioluminate Bioluminate

Binding Binding

A209H I196R

1.32 1.27

0.89 0.86

1.49 1.48

Bioluminate Bioluminate

Stability Stability

G136Y*

1.23

0.72

1.70

Bioluminate/MOE

Stability

predict improvements in overall catalytic activity. Twenty of the 60 variants (33.3%) selected by Bioluminate’s calculations were more active than wild type (9 of 34 or 26% of variants improved from binding free energy and 11 of 21 or 52% improved from stability group) compared with 2 of 29 for MOE. Additionally, 79% of MOE predicted variants in the binding energy and stability categories were completely inactive compared with 27% of Bioluminate’s. However, MOE was markedly better than Bioluminate for active site residue substitution, with 4 of 8 or 50% amino acid substitutions generating more active variants compared to 0 of 8 improved variants from Bioluminate. addition, the MOE active site collection contained the top 3 mutations found overall.

In This

underscores the importance of selecting active site mutations regardless of their relative scores with respect to the entire collection of possible mutations. The evaluation of the computationally derived mutations was completed by confirming that the single mutations identified in the structure-based R1 library would successfully work in combination. Therefore, the top 6 positions containing the top 7 substitutions plus 2 additional substitutions (improved variants at same 6 positions, highlighted with a * in Table 1) were combined in a PCRbased synthesis of all 192 possible combinations of mutations, followed by pooling the resulting DNA, cloning, transformation and plating. Seven 96 well plates were picked from the transformants and screened. The 30 variants showing 1.5x or more improvement relative to the H62Y parent from each plate were collected into one plate (structure-based R2), sequenced and rescreened in triplicate using the WT as reference (SI-Figure S4, sequences in Table S6). The best resulting variant lysate was 9-fold improved in total activity and contained 3 mutations (T13R/H62Y/F122Y). 10 ACS Paragon Plus Environment

Page 11 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

Deconvolution of these data is provided in SI-Figure S5. Interestingly, when arrayed in combination H62Y is still the most improved mutation, but F122Y and F190Y are no longer second and third. T13R is a better performer than T13H and as expected, both are preferred to the wild type amino acid. However, the mutations of G136 have completely inverted in order of improvement, with G136Y being the best and G136N providing a slight negative contribution to activity. The best variants along both evolution paths were scaled up and kinetically characterized both in lysates and in purified form in order to understand the nature of the improvements. The solubility of the substrate in the reaction system was approximately 100 mM, which is well below the apparent KM of the wild type enzyme, making accurate predictions of discrete values for both kcat and KM scientifically unsound, and as a result only the catalytic efficiency (kcat/KM) is generally reported. The lysates were prepared using the same chemical lysis method as the screening samples and transaminase protein content measured by densitometry of stained SDS-PAGE gels. The lysate data from the scaled-up fermentations agrees generally with the fold improvement trends in the original screening data (Table 2), although the wild type enzyme was less active, resulting in larger

Table 2. Kinetic Analysis of Lysates and Purified Enzyme Variants Lysate -1 Enzyme kcat (s ) KM (M) (kcat/KM)app FIOWT kcat (s-1) wt 0.19 1.00 Seq-R1:S1 1.26 6.70 Seq-R2:S2 1.91 10.1 Seq-R2:S1 1.51 8.03 Struct-R1:S2 0.36 1.92 Struct-R1:S1 0.11 0.043 2.57 13.6 Struct-R2:S1 0.24 0.046 5.17 27.4 0.33

Purified KM (M) kcat/KM 1.01 1.15 1.70 1.10 0.51 1.48 0.1 3.29

FIOWT 1.00 1.14 1.69 1.08 0.51 1.46 3.26

than expected fold improvements. The purified enzyme variants were created using a mechanical lysis method and then chromatographed to >90% purity. The catalytic efficiencies between the pairs of lysate and purified protein samples are generally in good agreement except for the purified wild type enzyme, which is 5.4 fold more active than expected based on the lysate data. Repeating the 11 ACS Paragon Plus Environment

ACS Synthetic Biology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 37

lysate preparations and purifications reproduced this result. Mechanically lysed wild type cultures were not similarly negatively affected in lysate form, suggesting that a chemical lysis component might be responsible for the wild type enzyme inactivation. Diluting the purified wild type enzyme in the lysis solution, as well as in solutions of each potentially inhibiting component (PMSF, EDTA, 0.1% deoxycholate, buffers, lysozyme) independently had negligible effect on the purified enzyme activity (except for the 0.1% deoxycholate, which reduced all variant activites by ~50%). Diluting purified enzyme in negative control (no expressed enzyme) lysate and in the poorly active wild type lysate also had negligible effect. Additionally, little to no change in thermal stability between the purified enzyme samples was seen. Taken together, this suggests that the chemical lysis method indirectly inactivated or inhibited the wild type enzyme, perhaps by liberating inhibiting components from the E. coli host that were not seen in the mechanical lysis method or by interfering with the dimerization maturation process required to develop active enzyme.

Regardless of the mechanism, the loss of activity because of the chemical lysis is improved by evolution. The sequence-based R1 library generated one variant with significantly improved activity in lysate. The mutations in this variant were partially re-assorted into the R2 library, and the resulting sequences with those mutations (F122Y or S223P) comprised 30 of the 34 positive sequences (Table S4). Thus the beneficial mutations discovered in round 1 led to a heritable improvement in population performance in round 2, despite the fact that the catalytic efficiency of the purified enzyme (Seq-R1:S1, I84L_F122Y_S223P) is improved over the wild type enzyme only slightly (1.14 fold). Additionally when purified, a second of the sequence-based variants (Seq-R2:S1: I84L_A91C_A114R_F122Y_V125L_S223P) has similar catalytic efficiency compared to the wild type (1.08 fold), but much larger apparent catalytic efficiency than wild type in lysate (8 fold), further illustrating specific targeting of inhibition by key mutations. As a result, the course of the evolution and the impact of the mutations are best understood from the context of the apparent catalytic efficiency in lysates, and the corresponding specific activity fold improvements in the screening data.

12 ACS Paragon Plus Environment

Page 13 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

Addressing systemic improvements is no less important to the overall success of an evolution programs than catalytic efficiency improvements. Chemical lysis methods are the primary approach to lysing even small libraries of variants in a parallel manner. Improvements that increase activity in the screening lysate allow for the necessary increasing of screening pressure toward the desired reaction in subsequent rounds in exactly the same manner that kinetic improvements do. As an example, the previous sitagliptin work on this substrate and using this wild type enzyme reported that the mutation S223P also found in these lysate-only improvements was responsible for a similar 12-fold improvement using chemically lysed cultures.

More importantly, the next round of

evolution, which involved switching substrates to the more demanding pro-sitagliptin ketone, could not be advanced without the S223P mutation due to lack of activity.21

In addition to the variants whose effects are primarily seen in the lysate screening, one of the two top variants from the sequence-based round 2 (the Seq-R2:S2 variant) contained a 1.69-fold increase in catalytic efficiency in addition to a full recovery of the activity lost by the wild type enzyme in the lysate. The structure-based libraries also demonstrated significantly improved catalytic efficiency, with R1 and R2 variants improving catalytic efficiency 1.46 and 3.26 fold respectively. The greater improvements in catalyic efficiency of these variants might be a direct result of the structural focus on substrate-enzyme interaction in the mutation design. These variants improved substrate binding sufficiently that kcat and KM values could be estimated.

The mutations from both methods were mapped onto the 3D structure of the protein. Most mutations from the sequence-based method address the inhibition of activity in the lysate. Of these, 4 are in the active site (S223P and F122Y as the two most significant, and V69G and T282S), A91C is in the core of the protein, and Y60F is located in the dimer interface likely improving interactions between the two monomers (Figure 5). Of those in the active site, S223P and T282S are FIGURE 5

13 ACS Paragon Plus Environment

ACS Synthetic Biology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 37

located in a loop and turn adjacent to the PLP cofactor and help structure the cofactor binding site; F122Y is expected to interact with the contents of the active site; and V69G creates additional space for substrates (Figure 5b). Four mutations of the 66 in the entire library were located in positions that model as interacting with the substrate (V69G, V69T, F122Y, F190L); 2 of these (50%) were identified as more active in lysates. The mutations in more active variants (FIO WT > 1.2; Table 1) identified with the structure-based design are mostly located in the active site or on the surface of the protein (Figure 5c). The top active mutants (H62Y, F122Y, F190Y, G136N/R/Y and P135Q) correspond to first shell positions within the active site, participating in direct contacts with the substrate. In the structural models, H62Y improves the contacts with the CF3 group consistent with its improvement of KM, whereas G136N/R and P135Q donate a HB to the N of the triazole ring (Figure 3d). Mutations of G136 are known to exert conformation control over a substrate determining loop from residues 129 to 145.24 Of 6 active mutations on the surface of the protein (T13R, E256R, E85R, E92R, G239H, S49Y), 4 are to Arg and 3 of these are making a salt bridge with nearby Asp or Glu residues in the structural models. A169M improves the packing of the hydrophobic core, A209H in the second shell of the active site donates a HB to a nearby backbone O, whereas I196R is located in the dimer interface and engages in a salt bridge with an Asp residue from the other monomer. The ATA-117 transaminase enzyme that we chose to evolve in this evaluation is 330 amino acids long. This corresponds to 330 positions for 19 amino acid substitutions for a total of 6270 possible single amino acid substitutions. To screen this library completely assuming a randomization step would require in excess of 10,000 samples. By reducing the positions for mutation by 10-fold through rational means and randomizing each position (33 positions x 19 possible substitutions = 627) reduces the screening burden 10-fold to on the order of 1,000 samples. Here we evaluated whether two standard methods for making predictions about position and amino acid substitution were sufficiently advanced to provide significant improvement to transaminase activity in lysates

14 ACS Paragon Plus Environment

Page 15 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

against a non-natural substrate, while at the same time limiting the number of mutations evaluated and therefore limiting the screening burden of the evolution effort an additional order of magnitude to less than 100 samples for each library. Table 3 summarizes a comparison of the results. Both Table 3: Summary of evolution results from the two methods evaluated. Sequence-Based Model Mutation space physically evaluated 1.05% Mutations identified as positive 6/66 = 9% Mutations identified as inactivating 20/66 = 30% Number of unique positions substituted 58 Amino acids not used in mutation design H, N, Q, W Mutations > 2-fold improved in activity 2 Highest lysate activity - single aa contribution 2.4 fold Highest lysate activity - recombined 7 fold Best variant (positive residues underlined) I84L, A91C, A14R, F122Y, V125L, S223P

Structure-Based Model 1.48% 17/93 = 18% 42/93 = 45% 62 A, D, E, G, K 2 5.6 fold 9 fold T13R, H62Y, F122Y

methods significantly reduce the number of mutations made and the corresponding screening burden. The sequence specific method was capped at about 1% of the overall possible mutations made, while the computational method was designed to fill 1 entire 96-well plate (1.5% of the possible mutations). Despite the orthogonality of the two different methods (SI-Figure S6), they both were successful at identifying positive variants at high frequency, making the two methods highly complementary. The sequence-based prediction method generated amino acid substitutions more active than wild type in 9% of the amino acids selected. The structure-based method provided positive amino acid substitutions in 18% of predicuted mutations. Each method identified 2 amino acid substitutions that resulted in > 2-fold improvement in activity under the reaction conditions (S223P and F122Y; and H62Y and F122Y for the sequence- and structure-based R1 libraries, respectively). Two-fold improvements are the typical benchmark improvement seen historically per round of directed evolution experiments, although early in evolution programs such as in this experiment, larger jumps in activity per successful mutation are seen. In this effort, the best mutations in each case are 2.4-

15 ACS Paragon Plus Environment

ACS Synthetic Biology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 37

fold and 5.6-fold improved. Both libraries identified the substitution F122Y as the second of the > 2 fold improved mutations, and this was the only mutation overlap between the two methods. During the first round of the sitagliptin evolution on this same substrate, 8 mutations were identified.21 Four of those 8 mutations were identified by the two libraries designed here, including 3 of the top 5. Mutations at position 122, such as F122Y were not identified in the original sitagliptin evolution as that work focused significantly on the large binding pocket. Mutations at position 122 were found as part of opening up the small binding pocket to fit the entire sitagliptin molecule. However, 7 new positive activity positions were identified in these two libraries that have not been previously seen, including mutations at position T13, such as T13R that contributes 1.85 fold improvement in total activity to the most improved variant resulting from structure-based R2. The frequencies of positive mutations at 9 and 18% are sufficiently high enough to incorporate multiple mutations per sequence in order to accelerate evolution progress. In fact, the 9% positive mutation frequency from the original library was sufficient to locate the best double point mutant (F122Y, S223P) in the initial screening despite the high number of inactivating mutations. Finding a few “double point mutants” is likely in this screening scenario as the likelihood of finding any 2 positive mutations in each 3 mutation sequence is 2.3%,1 although this falls to 1.5%2 when incorporating the effects of a potential inactivating mutation as seen in the original library. Additionally, given these underlying positive mutation frequencies and assuming a library where every triple mutant could be made, triple mutants consisting of 3 positive mutations would be expected to occur between 1 in every 164 (0.1833) to 1 in every 1330 (0.0913) which is easily within reach of a more conventional screening paradigm where larger sampling is allowed. These double and triple mutant scenarios highlight the main advantage of being able to predict with greater 1

This is calculated as the combination of ways 2 mutations can be arrayed within 3 positions times the probability of a positive (Pmut) or non-positive mutation (Pno mut) at each position, i.e. 3C2 · Pmut · Pmut · Pno mut = 3 x 0.091 x 0.091 x 0.909 = 0.0226, or 2.3%. 2 As before, except positive sequences arise only when Pno mut is not inactivating (inactivating mutations were 30% of the original library), i.e. 3C2 · Pmut · Pmut · (Pno mut – Pinact mut) = 3 x 0.091 x 0.091 x (0.909 – 0.30) = 0.0151, or 1.5%.

16 ACS Paragon Plus Environment

Page 17 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

accuracy “site directed, specific substitutions” – the ability to compile multiple mutations in single sequences in a single round of evolution with the intent to move the evolution along faster than a single positive mutation each round. To further underscore this point, the two ‘second round’ libraries constructed with higher frequencies of positive mutations and with inactivating mutations removed each successfully combine three positive mutations derived from their respective initial libraries to produce 7- and 9-fold more active variant lysates. To make this multiple mutation per sequence approach a strategy that moves evolution programs to rapid successful conclusions without additional recombination steps, the successful mutations need to be as prevalent as the non-successful ones, so that accumulating mutations occur as often or more often than they are omitted, requiring a hit rate of 50% or greater. This represents a 2.7 fold improvement in the structure-based predictions and a 5.5 fold improvement in the sequence-based hit rate over the predictions seen here. A suggestion of this level of improvement exists in this data set already as the active site predictions made by the sequence-based method and MOE in the structure-based method were beneficial in improving activity in 2 of 4 (50%) and 4 of 8 (50%) amino acid substitutions, respectively. Further, Bioluminate predictions using the binding free energy and stability changes were improved in activity in 20 of 55 mutations, including 11 improved actives in 21 predictions (52%) that utilized the stability scoring function. This warrants additional investigation in the future to understand whether these are general phenomena, or whether they are specific to this example. The prediction of mutations that improve activity are not yet as successful as those that improve stability. As a comparison to this work, a study that examined predictive methods for improving the stability of an α/β-hydrolase enzyme to temperature and to urea found that the sequence-based methods employed successfully predicted 15/19 (79%) stabilizing mutations, while the computational methods were successful in 13/28 (46%) of predictions; each mutation averaged 2.6 fold improvement in stability.25 Further, a round of recombination of 26 positive mutations at 24

17 ACS Paragon Plus Environment

ACS Synthetic Biology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 37

positions produced several variants containing between 7-19 substitutions, each greater than 40fold more stable than wild type. At the 79% positive prediction frequency, screening sequences containing 10 mutations each from the outset should produce sequences where 1 in every 10 sequences contain only beneficial mutations. These sequences should readily produce stabilities approaching 20-40-fold improvements within 100-200 variants screened in a single evolution experiment - exactly the rapid evolution result desired. The naturally pre-screened substitutions lead to an expectation of a low frequency of inactivating mutations. This was not true in the sequence-based prediction in this case (sequence-based R1 library) and serves to highlight the detrimental impact of the inactivating mutations. A 9% beneficial mutation success frequency suggests that 25% of the sequences should contain at least one beneficial mutation,3 and 2.3% should contain 2 (see footnote 1). The high inactivation frequency of 30% eliminated many of the improved sequences from consideration, leaving only 3 sequences improved in activity relative to wild type in the initial round – 2 single amino acid improvements, and the double mutation described above. This also made the deconvolution of mutations impossible to solve as the number of equations (non-zero screening data) dropped below the number of unknowns (mutations). The structure-based library’s inactivation frequency is much worse at 45%, requiring larger additional improvements in inactivating mutation elimination in order to produce these mutations in combination. Under the circumstances where the number of detrimental changes is high, creating additional constructs in R2 that eliminated detrimental mutations and screening an additional ~100 variants provided an improved library despite keeping the number of variants screened fewer than 200. Combining the orthogonal approaches in R2 could lead to better libraries, screening fewer variants and providing more improved variants. Improvements in the sequence-based predictions might come from the increasing numbers of sequences filling the databases and in refinements to the analysis of selection methods. An exciting

3

This probability is equal to 1 minus the probability of the sequence with zero positive mutations, or 1 – (0.909 x 0.909 x 0.909) = 0.249, or 25%.

18 ACS Paragon Plus Environment

Page 19 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

new tool from Bio-Prodict called CorNet deduced not just variable positions, but also the residues that co-vary in a correlated network indicated for enantioselectivity within the α/β-hydrolase family.26 Variants from within this pool were improved relative to wild type in enantioselectivity at a frequency of 17%, about twice as often as seen here for activity. Structure-based predictions would certainly be improved with bound substrate crystal structures rather than docked substrates, and possibly be improved with more computationally demanding methods described in the literature. Molecular Dynamics (MD) simulations have been used to filter out beneficial mutations from high throughput in silico screens for enzyme activity or selectivity.13, 27 In these cases, MD simulations were shown to increase the chance of successfully combining beneficial mutations, and thus appear as the next logical step to incorporate into current enzyme design workflows. MD simulations followed by an MMGBSA calculation of an ensemble of structures, as has been previously employed for a related transaminase in silico mutagenesis study,28 would be able to more accurately capture the effect of mutations that induce significant conformational changes. Additionally, Free Energy Perturbation29 is another promising method to interrogate changes in stability30 and substrate binding free energy for individual and combinatorial mutations. Finally, by optimizing for substrate binding only, enzyme turnover rate (kcat) has not been considered, which will contribute to the overall efficiency of the enzyme. To this end, each transition state of the multi-step reaction would need to be optimized using QM/MM hybrid methods to estimate the activation barriers for each mutant (good reviews in 31-33). To our knowledge, these have not been evaluated in high throughput with thousands of variants because of the computational cost of these calculations. With the continuous improvements in force fields and computational power, these methodologies will likely be incorporated into the enzyme design workflows in the near future. As the science behind the mutation identification predictions improves, and as importantly, the understanding and reduction in the frequency of inactivating mutations also improves, the number of simultaneous positive mutations incorporated into newly designed sequences can increase, enabling larger jumps in activity performance. A 3-5 fold improvement in predictive power in both 19 ACS Paragon Plus Environment

ACS Synthetic Biology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 37

the positive variants and in the control of inactivating mutations would fully enable this predictive combination approach as a primary strategy for directed evolution efforts.

CONCLUSION The ultimate solution to the directed evolution improvement of an enzyme with some initial activity on a non-natural, industrially-relevant substrate should be to examine sources of mutational diversity and through some scientific algorithm, simply re-write the amino acid sequence with the new substitutions that will render the enzyme suitably effective – screening one or at most a few variants for confirmation of the desired activity. While this goal appears impossible, great strides have already been made in this direction. Starting from the Nature-inspired technique of random mutagenesis and screening of 10,000 variants, many scientists have moved to smart libraries that identify smaller subsets of amino acid positions that are randomly mutated (“site directed, random substitutions”) that reduce the screening burden to the order of 1,000 variants. Here we scouted the next advance – we examined two existing methods for determining “site directed, specific substitutions” that successfully identified mutations that improve the ATA-117 transaminase’s activity on a non-natural substrate, each while examining on the order of 100 variants. The two orthogonal methods, one based on the evaluation of sequence alignments and the other on structure-based in silico prediction provided hit rates of 1 positive variant for every 11 and 5.5 mutations predicted, respectively. These frequencies are already valuable from a directed evolution perspective, rapidly enabling 7- and 9-fold improvements in transaminase activity by taking advantage of recombination strategies.

Additionally, subsets of the structure-based in silico

prediction data reached 1 positive variant in every 2 sequences. By combining these two enzyme design approaches, the orthogonal diversity generated here could potentially serve as a way to accelerate an evolution campaign by limiting the number of variants to screen in each round and by reducing the number of rounds necessary to achieve a desired fold improvement. Improving the

20 ACS Paragon Plus Environment

Page 21 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

predictive science of both methods through better analysis of sequence information and improved computational techniques should lead to significant improvements in evolving enzyme activity in a small number of screening samples, and very large improvements in single rounds of evolution when traditional screening efforts are applied.

MATERIALS AND METHODS A sequence-based library consisting of 95 variant sequences and one wild type control was designed by ATUM through their proprietary Protein GPS Engineering technology (sequence-based R1), similar to the imagabalin transaminase effort.34 This method relies on sequence alignments from a large collection of homologues and their own algorithms to identify specific amino acid substitutions to investigate.18 In brief, multiple sequence alignments of 1250 homologues were performed to identify natural diversity as pre-vetted accepted mutations.

An analysis of the mutations in

conjunction with the associated pattern of site-specific and branch-specific variations was performed using a phylogenetic tree generated from the alignment, as previously described in detail,35 resulting in a numerical scoring of each observed mutation related to its acceptability in the sequence. All possible substitutions (~1900 in this case) seen in the alignment were ranked ordered and a set of 250 were identified as candidate mutations for the directed evolution process. The 66 highest ranking mutations were selected for evaluation in Round 1 (Table 4). Because of the Table 4: Mutations selected on the basis of homolog diversity A44E I84L V125P N189S A220T R236Y T282N

R52K E85D S126I F190L S223P A242I T282S

I55V L87F S126V G193L F225A P244E A284T

F56L N90S Y131V L195N F225S H260I G285A

Y60F A91C E137L I196V V227I L269R G286E

V69G A91L Y148I H203Y V228F V278I F290V

V69T A114R Y154I F207A V229A G280L I298V

F70V F122Y V171T A209E V234K C281T R313A

H71R V123I V187A D214N V235L T282G Y314F

W73Y V125L V187I

conservative nature of these mutations, 3 substitutions were evaluated simultaneously in each sequence. Substitutions were grouped such that each mutation was seen 5 times in the collection of 21 ACS Paragon Plus Environment

ACS Synthetic Biology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 37

95 variants, each time with a different context of partnered substitutions.22 Thus, all substitutions are equally represented in the designed variants with no random mutations. From this design ATUM then created the DNA sequences, expressed the enzyme variants and produced clarified lysates,36 which were then screened on THTP-BDO and analyzed as described below. Each of the constructs made was used to transform E. coli strain BL21(DE3) (Agilent). Transformants were selected on LB agar with 30mg/L kanamycin at 37ºC. Transformants were picked to 0.5 mL LB medium containing 30mg/L kanamycin and cultures were incubated with shaking for 16h at 37ºC. Ten microliters of each culture were used to inoculate 0.5mL of fresh LB media containing 30mg/L kanamycin and the resulting cultures were incubated at 37ºC with shaking until OD of 0.6 at 600nm. Cultures were cooled to 25ºC and induced with addition of 1mM IPTG. Induced cultures were incubated with shaking at 25ºC for 19h and cells were harvested by centrifugation at 5000 x g for 10min at 4ºC. Final ODs at 600nm were 7.0 on average. Cell pellets were frozen at -20ºC, thawed at 25ºC and then fully lysed with lysozyme and detergent treatment in a total volume of 0.4ml as follows. Cells were resuspended in 0.2ml 200mM Tris-HCl (pH 7.5), 20% sucrose, 1mM EDTA, 1mM PMSF and 30U/µl Ready-Lyse lysozyme (Epicentre) and incubated for 15min at 25ºC. To this mixture, 0.2ml of 10mM Tris-HCl (pH 7.5), 50mM KCl, 1mM EDTA, 1mM PMSF, 20mM MgCl2, 0.1% deoxycholate, and 80U of OmniCleave endonuclease (Epicentre) were added and incubation at 25ºC was continued for 15min. The soluble crude lysate fraction was recovered after centrifugation at 5000 x g for 10min at 4ºC, combined with an equal volume of glycerol and stored at -20ºC. Lysate fractions were analyzed by quantitative PAGE using densitometry relative to protein concentration standards after staining with Simply Blue (Invitrogen). Full-length transaminase was estimated for each sample. For purification, cells were resuspended in buffer (25 mM Bis-Tris pH 6.5, 300 mM NaCl) and lysed by sonication. Cell debris was discarded by centrifugation, and the supernatant was loaded onto a HiTrap Q HP column (GE Healthcare Lifescience). The resin was washed with buffer, then proteins

22 ACS Paragon Plus Environment

Page 23 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

were eluted with NaCl using a 10 column volume linear gradient. The transaminase frations were collected, desalted into phosphate buffered saline (PBS) using PD-10 spin columns. SDS-PAGE or capillary electrophoresis (Perkin Elmer GXII) confirmed that the protein was >90% pure, and finally enzyme concentration was determined by A280 (Nanodrop). E. coli lysates were screened using the reaction system in Scheme 1 described previously such that all reagents other than the transaminase were supplied in significant excess.23 Ketone and amine standards were made as described previously.21 The final concentration of all added reaction components were 0.38 mM nicotinamide adenine dinucleotide (NAD), 2 mM pyridoxal-5-phosphate (PLP), 0.15 mg/ml lactate dehydrogenase-101, 0.5 mg/ml glucose dehydrogenase-105, 0.6 M Dalanine, 0.3 M glucose, 5 vol% methanol and 18 mM substrate in 0.1 M phosphate buffer. Reactions were run at pH 7.5 overnight. Lysate concentrations were 40 vol% in round 1 screens and 20 vol% in round 2 screens. The samples were evaluated on a Waters UPLC using an Atlantis T3 column run isocratically at 95% solution A (0.05% trifluoroacetic acid - TFA) and 5% solution B (acetonitrile with 0.05% TFA) over 2 minutes, followed by a 95% B wash for 0.3 minutes. The amine eluted at 1.26 minutes and the ketone at 1.49 minutes. A second library was constructed using a structure-based approach (structure-based R1).

To

generate all possible single-point mutations computationally, we first docked the PLP-aldimine intermediate of THTP-BDO (Figure 4) into the wild-type cofactor-bound structure (3WWH.pdb)24 using Glide (Schrödinger Release 2015-1: Glide, Schrödinger, LLC, New York, NY, 2015). We then performed a full residue scan (i.e. mutating each amino acid to all others) and calculated both the change in ligand-binding free energy and stability scores using two different software packages: Bioluminate from Schrödinger (Release 2016-3) and the protein design module in MOE (Molecular Operating Environment 2013.08, Chemical Computing Group, Montreal, Canada. Details in SI).37 For each software package (Bioluminate and MOE) and for each score (ligand binding free energy and protein stability), the top mutants were selected (excluding mutations to catalytic residues) from the

23 ACS Paragon Plus Environment

ACS Synthetic Biology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 37

quadrant where both the stability and binding scores were improved relative to ATA-117, up to a maximum of 5 mutants per position (SI Figure 2). Additionally, for active site residues (residues within 5Å of ligand) not selected with either program/score, the best single mutant was also selected taking into account both programs and scores simultaneously. The 93 unique structurebased mutants are shown in Table 4. ATUM created the DNA sequences, expressed wild Table 4. 93 in silico mutants selected as structure-based R1. I10P T13H T13R D25T E27R D29S S49Y S54M D57N L61M L61W H62R D81V E85Q E85R E92H E92R K106R S124M S124Q S124R S126Y P135Q P135W D139C H143Y P145M Q146R Y150C Y150I A169M Q173R S174Q T178M T178W S181I F190Y G193C G193I I196R D204W A209H G224C G224L G224M G239H G245F G245I I246L I246Y R248W R248Y E256R D274H T282M T282W T282Y A284M G285C G285I P297V S299R G301Y

P33R H62Y K115R G136N Q155M N189R G217L G245L D274I G285Y

E42T T66W F122Y G136R Q155Y N189T E221I G245N D274R G286I

P48I V69I S124I G136Y P159F F190R E221Q G245Y G280I V293Y

type and the enzyme variants and produced clarified lysates, which were then screened on THTPBDO and analyzed as described above. The top mutants were recombined into a structure-based R2 combinatorial library.

The

recombination of variants chosen as described in the text for the structure-based R2 combinatorial library was accomplished by multiple PCR overlap extensions (SOEing) and ligation. A combination of terminal primers and primers located in between every mutation were used to individually synthesize all possible variations of mutation sequences as described in detail in the Supplemental Information.

These sequences were transformed, expressed, clarified lysates generated and

screened as described in the SI and above.

SUPPORTING INFORMATION Sequence-based approach round 1 and round 2 sequences, round 1 and round 2 screening results and activities; description of mutation deconvolution algorithm; Structure-based approach computational methods and calculation results; Structure-based round 1 and round 2 sequences,

24 ACS Paragon Plus Environment

Page 25 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

distibution anlaysis of mutations; recombination methodology for generating round 2 sequences; round 2 screening results and deconvolution of mutations; evaluation of sequence and structure method orthogonality. AUTHOR INFORMATION Corresponding Author *

Jeffrey C. Moore, Merck & Co., Inc., PO Box 2000 RY800-C363, Rahway, NJ 07065. Phone 732-594-

4836. E-mail: [email protected]

ORCID Jeffrey C. Moore: 0000-0002-9807-6315

Author Contributions J.C.M. and M.D.T. conceived the study, J.C.M. designed the experiments.

S.G. designed the

sequence-based libraries; A.C., A.R.-G., K.L. designed and executed the structure-based library calculations. M.W. produced and characterized enzyme samples. K.H. executed the structure-based round 2 library recombination and protein production. J.C.M. and N.M. screened the reactions. J.C.M., S.G., A.R.-G.,and A.C. analyzed the data. J.C.M., A. R.-G., A.C., K.H. wrote the manuscript with input from the other authors. The authors declare no competing financial interest. ACKNOWLEDGMENTS This work was funded entirely by Merck Research Labs of Merck, Sharp and Dohme (MSD). In silico calculations were performed within the MSD internal modeling environment and relied heavily upon the resources of our High Performance Computing group.

25 ACS Paragon Plus Environment

ACS Synthetic Biology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 37

REFERENCES 1. Chen, K. Q., and Arnold, F. H. (1991) Enzyme engineering for nonaqueous solvents: random mutagenesis to enhance activity of subtilisin E in polar organic media, Biotechnology (N Y) 9, 10731077. 2. Bornscheuer, U. T., Huisman, G. W., Kazlauskas, R. J., Lutz, S., Moore, J. C., and Robins, K. (2012) Engineering the third wave of biocatalysis, Nature 485, 185-194. 3. Davids, T., Schmidt, M., Bottcher, D., and Bornscheuer, U. T. (2013) Strategies for the discovery and engineering of enzymes for biocatalysis, Current Opinion in Chemical Biology 17, 215-220. 4. Reetz, M. T. (2017) Recent Advances in Directed Evolution of Stereoselective Enzymes, In Directed Enzyme Evolution: Advances and Applications (Alcalde, M., Ed.), pp 69-99, Springer International Publishing, Cham. 5. Jochens, H., and Bornscheuer, U. T. (2010) Natural diversity to guide focused directed evolution, Chembiochem 11, 1861-1866. 6. Ebert, M. C., and Pelletier, J. N. (2017) Computational tools for enzyme improvement: why everyone can - and should - use them, Curr Opin Chem Biol 37, 89-96. 7. Lutz, S. (2010) Beyond directed evolution--semi-rational protein engineering and design, Curr Opin Biotechnol 21, 734-743. 8. Chaparro-Riggers, J. F., Polizzi, K. M., and Bommarius, A. S. (2007) Better library design: datadriven protein engineering, Biotechnol J 2, 180-191. 9. Truppo, M. D. (2017) Biocatalysis in the Pharmaceutical Industry: The Need for Speed, ACS Med Chem Lett 8, 476-480. 10. Wijma, H. J., Floor, R. J., Jekel, P. A., Baker, D., Marrink, S. J., and Janssen, D. B. (2014) Computationally designed libraries for rapid enzyme stabilization, Protein Eng Des Sel 27, 49-58. 11. Beier, A., Bordewick, S., Genz, M., Schmidt, S., van den Bergh, T., Peters, C., Joosten, H. J., and Bornscheuer, U. T. (2016) Switch in Cofactor Specificity of a Baeyer-Villiger Monooxygenase, Chembiochem 17, 2312-2315. 12. Bendl, J., Stourac, J., Sebestova, E., Vavra, O., Musil, M., Brezovsky, J., and Damborsky, J. (2016) HotSpot Wizard 2.0: automated design of site-specific mutations and smart libraries in protein engineering, Nucleic Acids Res 44, W479-487. 13. Wijma, H. J., Floor, R. J., Bjelic, S., Marrink, S. J., Baker, D., and Janssen, D. B. (2015) Enantioselective enzymes by computational design and in silico screening, Angew Chem Int Ed Engl 54, 3726-3730. 14. Schwarte, A., Genz, M., Skalden, L., Nobili, A., Vickers, C., Melse, O., Kuipers, R., Joosten, H. J., Stourac, J., Bendl, J., Black, J., Haase, P., Baakman, C., Damborsky, J., Bornscheuer, U., Vriend, G., and Venselaar, H. (2017) NewProt - a protein engineering portal, Protein Eng Des Sel, 1-7. 15. Nobili, A., Tao, Y., Pavlidis, I. V., van den Bergh, T., Joosten, H. J., Tan, T., and Bornscheuer, U. T. (2015) Simultaneous use of in silico design and a correlated mutation network as a tool to efficiently guide enzyme engineering, Chembiochem 16, 805-810. 16. Chen, C. Y., Georgiev, I., Anderson, A. C., and Donald, B. R. (2009) Computational structure-based redesign of enzyme activity, Proc Natl Acad Sci U S A 106, 3764-3769. 17. Nobili, A., Gall, M. G., Pavlidis, I. V., Thompson, M. L., Schmidt, M., and Bornscheuer, U. T. (2013) Use of 'small but smart' libraries to enhance the enantioselectivity of an esterase from Bacillus stearothermophilus towards tetrahydrofuran-3-yl acetate, FEBS J 280, 3084-3093. 18. Minshull, J., Ness, J. E., Gustafsson, C., and Govindarajan, S. (2005) Predicting enzyme function from protein sequence, Curr Opin Chem Biol 9, 202-209. 19. Denard, C. A., Ren, H., and Zhao, H. (2015) Improving and repurposing biocatalysts via directed evolution, Curr Opin Chem Biol 25, 55-64. 20. Sebestova, E., Bendl, J., Brezovsky, J., and Damborsky, J. (2014) Computational tools for designing smart libraries, Methods Mol Biol 1179, 291-314. 26 ACS Paragon Plus Environment

Page 27 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

21. Savile, C. K., Janey, J. M., Mundorff, E. C., Moore, J. C., Tam, S., Jarvis, W. R., Colbeck, J. C., Krebber, A., Fleitz, F. J., Brands, J., Devine, P. N., Huisman, G. W., and Hughes, G. J. (2010) Biocatalytic asymmetric synthesis of chiral amines from ketones applied to sitagliptin manufacture, Science 329, 305-309. 22. Liao, J., Warmuth, M. K., Govindarajan, S., Ness, J. E., Wang, R. P., Gustafsson, C., and Minshull, J. (2007) Engineering proteinase K using machine learning and synthetic genes, BMC Biotechnol 7, 16. 23. Girardin, M., Ouellet, S. G., Gauvreau, D., Moore, J. C., Hughes, G., Devine, P. N., O'shea, P. D., and Campeau, L. C. (2013) Convergent Kilogram-Scale Synthesis of Dual Orexin Receptor Antagonist, Org Process Res Dev 17, 61-68. 24. Guan, L. J., Ohtsuka, J., Okai, M., Miyakawa, T., Mase, T., Zhi, Y., Hou, F., Ito, N., Iwasaki, A., Yasohara, Y., and Tanokura, M. (2015) A new target region for changing the substrate specificity of amine transaminases, Sci Rep 5, 10753. 25. Jones, B. J., Lim, H. Y., Huang, J., and Kazlauskas, R. J. (2017) Comparison of Five Protein Engineering Strategies for Stabilizing an alpha/beta-Hydrolase, Biochemistry 56, 6521-6532. 26. van den Bergh, T., Tamo, G., Nobili, A., Tao, Y., Tan, T., Bornscheuer, U. T., Kuipers, R. K. P., Vroling, B., de Jong, R. M., Subramanian, K., Schaap, P. J., Desmet, T., Nidetzky, B., Vriend, G., and Joosten, H. J. (2017) CorNet: Assigning function to networks of co-evolving residues by automated literature mining, PLoS One 12, e0176427. 27. Wijma, H. J., Marrink, S. J., and Janssen, D. B. (2014) Computationally efficient and accurate enantioselectivity modeling by clusters of molecular dynamics simulations, J Chem Inf Model 54, 2079-2092. 28. Sirin, S., Kumar, R., Martinez, C., Karmilowicz, M. J., Ghosh, P., Abramov, Y. A., Martin, V., and Sherman, W. (2014) A computational approach to enzyme design: predicting omegaaminotransferase catalytic activity using docking and MM-GBSA scoring, J Chem Inf Model 54, 23342346. 29. Wang, L., Wu, Y., Deng, Y., Kim, B., Pierce, L., Krilov, G., Lupyan, D., Robinson, S., Dahlgren, M. K., Greenwood, J., Romero, D. L., Masse, C., Knight, J. L., Steinbrecher, T., Beuming, T., Damm, W., Harder, E., Sherman, W., Brewer, M., Wester, R., Murcko, M., Frye, L., Farid, R., Lin, T., Mobley, D. L., Jorgensen, W. L., Berne, B. J., Friesner, R. A., and Abel, R. (2015) Accurate and reliable prediction of relative ligand binding potency in prospective drug discovery by way of a modern free-energy calculation protocol and force field, J Am Chem Soc 137, 2695-2703. 30. Ford, M. C., and Babaoglu, K. (2017) Examining the Feasibility of Using Free Energy Perturbation (FEP+) in Predicting Protein Stability, J Chem Inf Model 57, 1276-1285. 31. Garcia-Guevara, F., Avelar, M., Ayala, M., and Segovia, L. (2015) Computational Tools Applied to Enzyme Design - a review, Biocatalysis 1, 109-117. 32. van der Kamp, M. W., and Mulholland, A. J. (2013) Combined quantum mechanics/molecular mechanics (QM/MM) methods in computational enzymology, Biochemistry 52, 2708-2728. 33. Carvalho, A. T., Barrozo, A., Doron, D., Kilshtain, A. V., Major, D. T., and Kamerlin, S. C. (2014) Challenges in computational studies of enzyme structure, function and dynamics, J Mol Graph Model 54, 62-79. 34. Midelfort, K. S., Kumar, R., Han, S., Karmilowicz, M. J., McConnell, K., Gehlhaar, D. K., Mistry, A., Chang, J. S., Anderson, M., Villalobos, A., Minshull, J., Govindarajan, S., and Wong, J. W. (2013) Redesigning and characterizing the substrate specificity and activity of Vibrio fluvialis aminotransferase for the synthesis of imagabalin, Protein Eng Des Sel 26, 25-33. 35. Ehren, J., Govindarajan, S., Moron, B., Minshull, J., and Khosla, C. (2008) Protein engineering of improved prolyl endopeptidases for celiac sprue therapy, Protein Eng Des Sel 21, 699-707. 36. Gustafsson, C., Minshull, J., Govindarajan, S., Ness, J., Villalobos, A., and Welch, M. (2012) Engineering genes for predictable protein expression, Protein Expr Purif 83, 37-46. 37. Beard, H., Cholleti, A., Pearlman, D., Sherman, W., and Loving, K. A. (2013) Applying physicsbased scoring to calculate free energies of binding for single amino acid mutations in protein-protein complexes, PLoS One 8, e82849. 27 ACS Paragon Plus Environment

ACS Synthetic Biology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 28 of 37

28 ACS Paragon Plus Environment

Page 29 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

FIGURE LEGENDS Scheme 1: Truncated sitagliptin ketone (THTP-BDO) to amine reaction, shown with pyruvate as the amine donor and lactate dehydrogenase, NAD, glucose and glucose dehydrogenase to irreversibly shift the equilibrium to product formation. Figure 1: (A) Fold Improvement over wild type activity obtained by screening the sequence-based R2 library. WT variant is highlighted in white. (B) The same FIO WT activity data plotted against the associated expression data. WT variant is circled. The diagonal line represents the activity expected from wild type across the range of expression levels seen in the screen. Variants above this line represent specific activity improvements, while variants with a FIO WT activity greater than 1.0 represent total activity improvements. Figure 2: Calculated contribution of each single-point mutation for the sequence-based R2 library using the Solver routine of Excel. Dark bars reflect total activity contributions, white bars reflect specific activity contributions. Bold line indicates WT amino acid activity. Figure 3: PLP-aldimine of THTP-BDO, the transaminase reaction intermediate that was docked into transaminase structure for computational calculations. Figure 4: (A) FIO WT conversion obtained by screening the structure-based R1 library. WT variants are highlighted in white, all sequences provided in Table S5. (B) The same conversion data plotted against the FIO WT protein expression level with WT variants circled. The diagonal line is the theoretically expected data of wild type across the entire range of enzyme concentrations. Data above this line are specific activity improvements; data over a FIOP WT Activity of 1.0 represent total activity improvements. Figure 5: (A) 3D homodimeric structure of the WT ATA-117 with the positions identified as beneficial by the sequence-based approach depicted as yellow spheres.

The two intermediate-cofactor

complexes are rendered as sticks. (B) Close up view of the active site showing the 4 positions

29 ACS Paragon Plus Environment

ACS Synthetic Biology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 30 of 37

identified as beneficial. (C) 3D homodimeric structure of the WT ATA-117 with the positions identified as beneficial by the structure-based R1 library depicted as yellow spheres. (D) Close up view of the active site showing the 5 positions identified as beneficial. The G136N and P135Q mutations show a hydrogen bond to the triazole ring.

30 ACS Paragon Plus Environment

Page 31 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

Scheme 1: Truncated sitagliptin ketone (THTP-BDO) to amine reaction, shown with pyruvate as the amine donor and lactate dehydrogenase, NAD, glucose and glucose dehydrogenase to irreversibly shift the equilibrium to product formation. 134x65mm (300 x 300 DPI)

ACS Paragon Plus Environment

ACS Synthetic Biology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 1: (A) Fold Improvement over wild type activity obtained by screening the sequence-based R2 library. WT variant is highlighted in white. (B) The same FIO WT activity data plotted against the associated expression data. WT variant is circled. The diagonal line represents the activity expected from wild type across the range of expression levels seen in the screen. Variants above this line represent specific activity improvements, while variants with a FIO WT activity greater than 1.0 represent total activity improvements. 363x138mm (150 x 150 DPI)

ACS Paragon Plus Environment

Page 32 of 37

Page 33 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

Figure 2: Calculated contribution of each single-point mutation for the sequence-based R2 library using the Solver routine of Excel. Dark bars reflect total activity contributions, white bars reflect specific activity contributions. Bold line indicates WT amino acid activity. 508x381mm (96 x 96 DPI)

ACS Paragon Plus Environment

ACS Synthetic Biology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 3: PLP-aldimine of THTP-BDO, the transaminase reaction intermediate that was docked into transaminase structure for computational calculations. 63x45mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 34 of 37

Page 35 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

Figure 4: (A) FIO WT conversion obtained by screening the structure-based R1 library. WT variants are highlighted in white. (B) The same conversion data plotted against the FIO WT protein expression level with WT variants circled. The diagonal line is the theoretically expected data of wild type across the entire range of enzyme concentrations. Data above this line are specific activity improvements; data over a FIOP WT Activity of 1.0 represent total activity improvements. 393x146mm (150 x 150 DPI)

ACS Paragon Plus Environment

ACS Synthetic Biology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 5: (A) 3D homodimeric structure of the WT ATA-117 with the positions identified as beneficial by the sequence-based approach depicted as yellow spheres. The two intermediate-cofactor complexes are rendered as sticks. (B) Close up view of the active site showing the 4 positions identified as beneficial. (C) 3D homodimeric structure of the WT ATA-117 with the positions identified as beneficial by the structurebased R1 library depicted as yellow spheres. (D) Close up view of the active site showing the 5 positions identified as beneficial. The G136N and P135Q mutations show a hydrogen bond to the triazole ring. 483x344mm (150 x 150 DPI)

ACS Paragon Plus Environment

Page 36 of 37

Page 37 of 37 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

TOC 254x190mm (96 x 96 DPI)

ACS Paragon Plus Environment