Statistical Analysis of the Benefits of Focused Saturation Mutagenesis

Jul 19, 2019 - Among the different gene mutagenesis techniques, saturation mutagenesis (SM) at sites lining the enzyme's binding pocket has emerged as...
0 downloads 0 Views 662KB Size
Subscriber access provided by BUFFALO STATE

Article

Statistical Analysis of the Benefits of Focused Saturation Mutagenesis in Directed Evolution Based on Reduced Amino Acid Alphabets Aitao Li, Ge Qu, Zhoutong Sun, and Manfred T. Reetz ACS Catal., Just Accepted Manuscript • DOI: 10.1021/acscatal.9b02548 • Publication Date (Web): 19 Jul 2019 Downloaded from pubs.acs.org on July 19, 2019

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 11 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Catalysis

Statistical Analysis of the Benefits of Focused Saturation Mutagenesis in Directed Evolution Based on Reduced Amino Acid Alphabets Aitao Li,†# Ge Qu,‡# Zhoutong Sun‡* and Manfred T. Reetz,‡ξ¶* †State

Key Laboratory of Biocatalysis and Enzyme Engineering, Hubei Collaborative Innovation Center for Green Transformation of Bio-resources, Hubei Key Laboratory of Industrial Biotechnology, College of Life Sciences, Hubei University, 368 Youyi Road, Wuchang Wuhan, 430062, China ‡Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, 32 West 7th Avenue, Tianjin Airport Economic Area, Tianjin 300308, China ξMax-Planck-Institut für Kohlenforschung, Kaiser-Wilhelm-Platz 1, 45470 Mülheim an der Ruhr, Germany ¶Chemistry Department, Philipps-University, Hans-Meerwein-Strasse 4, 35032 Marburg, Germany ABSTRACT: Directed evolution of stereo-, regio- and chemoselective enzymes has enriched the toolbox of synthetic organic chemistry. Among the different gene mutagenesis techniques, saturation mutagenesis (SM) at sites lining the enzyme’s binding pocket has emerged as a particularly viable approach to control selectivity and activity. Traditionally, NNK codon degeneracy encoding all 20 canonical amino acids are used, but as the size of the randomization site increases beyond a single residue, oversampling of transformants needed to ensure ≥95% library coverage rapidly reaches astronomical dimensions, impossible to screen in a practical manner. Therefore, many groups have been content with screening only a small segment of the designed protein sequence space, but this means that the best mutants will be missed. Alternatively, it has been shown that the use of highly reduced amino acid alphabets allows the generation of small and smart libraries requiring less screening. Here we address the question which approach is more efficient. Two different enzyme types serve as model systems in stereoselective reactions, limonene epoxide hydrolase (LEH) and P450-BM3. Equal numbers of transformants were screened for differently sized SM libraries that were constructed using NNK codon degeneracy and other codons corresponding to only one or just a few amino acids. Conversion as a rough indication of activity was used as the parameter in primary screening followed by GC-based ee-determination. Statistical analyses clearly show that it is more efficient to opt for rationally designed reduced amino acid alphabets because this approach results in a distinctly higher frequency of active mutants, stereoselectivity also being notably higher. KEYWORDS: Protein engineering; iterative saturation mutagenesis; stereoselectivity; reduced amino acid alphabets; ee-screening; oversampling algorithms

INTRODUCTION Directed evolution of active and stereoselective enzymes has provided a plethora of useful biocatalysts for different applications in synthetic organic chemistry and biotechnology.1,2 The general technique of laboratory evolution has also been extended to protein engineering of artificial metalloenzymes.3 Since screening is the labor-intensive step (bottleneck of directed evolution),4 methodology development in the quest to generate small and higher-quality mutant libraries continues to be the focus of current research. Saturation mutagenesis (SM) at sites lining the binding pocket as the basis of Combinatorial Active-site Saturation Test (CAST) and Iterative Saturation Mutagenesis (ISM) have emerged as viable strategies.5 SM is a stochastic process which can be analyzed statistically either by the Patrick/Firth algorithm for estimating the degree of oversampling necessary for ensuring a defined %-coverage of a mutant library,6 or by the Nov metric for obtaining the nth best mutant therein.7 The mathematical relationship between

the two approaches has been pointed out.8 The Patrick/Firth metric GLUE (or GLUE-IT) is a useful tool for calculating “(1) the expected number of distinct variants represented in a given library, (2) the library size required to sample a given fraction of the variants, or (3) the library size required to have a given probability of sampling all possible variants.”6c The mathematics underlying the algorithm can be found in the original papers.6 Pelletier et al have extended the statistical analysis of mutant libraries.6d The advantage of ensuring essentially full library coverage or of obtaining the very best mutant is obvious, but as the size of the randomization site increases, screening efforts rapidly reach astronomically high numbers (assayed transformants). For example, SM using traditional NNK codon degeneracy encoding all 20 canonical amino acids as building blocks at a 10-residue randomization site requires the screening of 1015 transformants for 95% library coverage, and even a 4-residue site poses practical problems for such coverage (>3x106).5,9 This phenomenon can be ignored by screening only a minute fraction of such high numbers, but then uncertainties arise. For

ACS Paragon Plus Environment

ACS Catalysis 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

an example, in an early study of the lipase from Pseudomonas aeruginosa as the catalyst in the hydrolytic kinetic resolution of a racemic ester, the screening of only 5,000 mutants generated by NNK-based SM at a 4-residue site lining the binding pocket led to a variant with moderately improved stereoselectivity,10 but at the time a statistical analysis was not performed. Today it is clear that only a very small segment of the respective protein sequence space was actually screened, which means that most of the potentially improved variants were never considered. In subsequent research, the use of reduced amino acid alphabets was introduced in SM-based directed evolution, accompanied by statistical analyses using the CASTER computer aid11 (available free of charge on the author’s homepage: http://www.kofo.mpg.de/en/research/biocatalysis), which is based on the Patrick/Firth metric.6 Rather than employing NNK codon degeneracy encoding all 20 canonical amino acids, the use of reduced amino acids means choosing degenerate codons that encode a smaller number of amino acids as building blocks in SM-based randomization.5,9,11 The smaller the reduced amino acid alphabet, the less oversampling for a defined library coverage, be it 95% or any other lower value. When opting for a reduced amino acid alphabet, the specific choice needs to be supported by one or all of the following guides1b: • Structural and mechanistic data • Phylogenetic analysis (consensus approach) • Molecular dynamics (MD) computations • In silico techniques • Machine learning The only comparative study to date involved the Aspergillus niger epoxide hydrolase (ANEH) as the catalyst in the hydrolytic kinetic resolution of a bulky racemic disubstituted epoxide substrate which is not accepted by the wildtype (WT) enzyme.12 Two mutant SM-libraries were generated for comparison, one by conventional NNK codon degeneracy, and the other by NDT codon degeneracy which encodes only 12 amino acids (Phe, Leu, Ile, Val, Tyr, His, Asn, Asp, Cys, Arg, Ser, Gly). In each case 5,000 transformants were screened, corresponding to 15% versus 95% library coverage. The comparison of the two libraries showed that the quality of the NDT based library is better, meaning a higher frequency of improved variants.12 However, it was not certain whether this observation was restricted to this particular case, so that a general conclusion could not be made. Nevertheless, it

Page 2 of 11

motivated us and other groups to apply NDT as well as a variety of other codon degeneracies corresponding to even smaller amino acid alphabets in the directed evolution of other enzymes in attempts to influence stereoselectivity.13,14 In some studies, 95% library coverage was ensured, but in other likewise successful cases considerably less screening was performed. It should be noted that in the Patrick-Firth statistics (and similar metrics), amino acid bias resulting from a number of factors, including amino acid inequality in the PCR step, is assumed to be absent, which however is generally not the case.1b,13,14 In the present study, we address the fundamental question whether striving for high or even maximum library coverage using appropriately chosen reduced amino acid alphabets is superior to ensuring maximum structural diversity by including all 20 canonical amino acids as building blocks but covering only a small segment of the respective protein sequence space. We utilize limonene epoxide hydrolase (LEH), previously characterized by X-ray crystallography,15 as the catalyst in the hydrolytic desymmetrization of a prochiral epoxide, the plan being the generation of similarly sized SM libraries using NNK codon degeneracy and different codon degeneracies corresponding to defined reduced amino acid alphabets. A statistical analysis of the results was expected to provide a sound basis for assessing the two different approaches. We also planned to utilize a second experimental platform using a structurally and mechanistically more complicated enzyme, namely P450-BM3 as the catalyst in the regio- and stereoselective oxidative hydroxylation of a prochiral ketone. The respective oversampling numbers using NNK versus other codon degeneracies corresponding to reduced amino acid alphabets necessary for 95% coverage are listed in Tables 1a-b. We note that the amino acid positions in a given randomization site need not be consecutive, i.e., they can be spatially separated. The process of grouping such individual amino acid positions into multi-residue CAST randomization sites should be supported by the same guides that are used in choosing optimal reduced amino acid alphabets (see above). It should also be noted that success depends upon the quality of the inherent decisions concerning the nature of the reduced amino acid alphabet and the grouping issue. If a sub-optimal decision is made, optimal results cannot be expected. Further information with useful hints in methodology development of directed evolution of selective enzymes can be found in a recent review.1b

Table 1a. Oversampling necessary for 95% library coverage as a function of the size of the randomization site and NNK versus NDT codon degeneracy, relevant in the directed evolution of the epoxide hydrolase LEH. number of residues NNK NDT (amino acid positions) codons transformants needed codons transformants needed at one randomization site 1 32 94 12 34 2 1 024 3 066 144 430 3 32 768 98 163 1 728 5 175 6 6 4 > 1.0 × 10 > 3.1 x 10 20 736 62 118 5 > 3.3 × 107 > 1.0 x 108 248 832 745 433 6 > 1.0 × 109 > 3.2 × 109 > 2.9 × 106 > 8.9 × 106 7 > 3.4 × 1010 > 1.0 × 1011 > 3.5 × 107 > 1.1 ×108

ACS Paragon Plus Environment

Page 3 of 11 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Catalysis

8 9 10

> 1.0 × 1012 > 3.5 × 1013 > 1.1 × 1015

> 3.3 × 1012 > 1.0 × 1014 > 3.4 × 1015

Table 1b. Oversampling necessary for 95% library codon degeneracies. number of Quadruple code KWCa) amino acid positions at transformants one Codons needed randomizati on site 1 5 15 2 25 75 3 125 275 4 625 1872 5 3125 9362 6 15625 46808 7 78125 234042 8 390625 1170208 9 1953125 5851040 10 9765625 29255198

> 4.2 × 108 > 5.1 × 109 > 6.1 × 1010

> 1.3 × 109 > 1.5 × 1010 > 1.9 × 1011

coverage as a function of the size of the randomization site and three further Double codeb)

Single codec)

codons

transformants needed

codons

transformants needed

3 9 27 81 243 729 2187 6561 19683 59049

9 27 81 243 728 2184 6552 19655 58965 176895

2 4 8 16 32 64 128 256 512 1024

6 12 24 48 96 192 384 767 1534 3068

Codon for saturation mutagenesis encoding 4 amino acids, in addition to WT. Codon for saturation mutagenesis encoding 2 amino acids, in addition to WT. c) Codon for saturation mutagenesis encoding 1 amino acid, in addition to WT. a)

b)

RESULTS AND DISCUSSION Limonene epoxide hydrolase (LEH) as the model system. In the present study, we employed the LEH-catalyzed hydrolytic desymmetrization of cyclohexene oxide (1) with formation of (R,R)- and (S,S)-2 as the model reaction (Scheme 1). WT LEH is a poor catalyst in this reaction (ee = 4% in slight favor of (S,S-2). In previous studies, LEH was already subjected to directed evolution.16 This time, 10 CAST residues lining the LEH binding pocket were considered for SM and grouped into a single large combinatorial randomization site (Figure. 1). O 1

LEH H2O

OH

OH +

OH (R,R)-2

OH (S,S)-2

Scheme 1. Model hydrolytic desymmetrization catalyzed by WT LEH and mutants generated by saturation mutagenesis (SM).

Figure 1. Binding pocket of LEH based on the reported crystal structure (PDB code 1NWW),15 surrounded by 10 residues L74, F75, M78, I80, L103, L114, I116, F134, F139 and L147 (all in gray sticks) which were considered for combinatorial SM-based randomization. The catalytic residues Y53, N55, R99, D101, W130 and D132 are shown in purple lines. In initial experiments, we focused SM on the 10-residue randomization site (Figure 1) and employed several different codon degeneracies. At this stage, we were interested only in the relative number of active hits following the screening of a defined number of transformants. Active hits were identified in preliminary and approximate form by application of Reymond’s adrenaline-assay,17 a convenient on-plate color test, in combination with GC analysis. The activity threshold was defined as ≥10% conversion of 5 mM substrate 1 within 16 hours. The results together with estimations of library coverage are summarized in Table 2.

ACS Paragon Plus Environment

ACS Catalysis 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 11

Table 2. Results of comparing the number of active LEH hits discovered in five different SM mutant libraries in which the randomization site was composed of 10 residues lining the binding pocket (CAST residues). degenerate number amino acids used as screened transformants number Theoretical library nthd) best a) b) code of building blocks in SM plates screened of library coverage variant amino active sizec) (%) acids variants NNK 20 all 20 amino acids 16 1472 45 3.4x1015 1.3x10-10 1015nd 11 -6 NDT 12 F,L,I,V,Y,H,N,D,C,R,S,G 16 1472 48 1.9x10 2.3x10 1.0x1011nd 7 -2 2.9x10 KWC 5 D,V,F,Y,WT 16 1472 90 1.5x10 1.2x104nd 5 1.8x10 GTG/TTT 3 V,F,WT 16 1472 88 2.5 120th GTG 2 V,WT 16 1472 260 3068 76.3 3rd GTG 2 V,WT 35 3220 533 3068 95.7 1st N = A/C/G/T; K = G/T; D = A/G/T; W = A/T. The ratios of designed primers mixed in a given library according to the number of codons contained in the degenerate codon ensured that each codon is introduced as with equal stoichiometry. b) 96-well microtiter plates. c) Calculated by Patrick/Firth algorithm for 95% library coverage. d) Calculated by Nov metric. a)

Table 2 reveals a clear trend which provides an initial answer to the central question posed in this study. Upon performing the same amount of screening work (16 microtiter plates), quite different results emerge, depending upon the nature of the (reduced) amino acid alphabet. In the case of NNK and NDT codon degeneracies, 1.3x10-10 % and 2.3x10-6 % library coverage was ensured, respectively, resulting in the identification of 45 and 48 active hits, respectively. In contrast, high library coverage but using limited structural diversity as defined by the chosen reduced amino acid alphabet leads to considerably higher numbers of active LEH hits. Particularly upon going from NNK to GTG codon degeneracy using valine as the sole building block (in addition to WT) and again assaying 16 microtiter plates in both cases (1.3x10-10 % versus 76% library coverage), the number of active hits (260) increases by a factor of ≈6. It needs to be pointed out that the use of a reduced amino acid alphabet as such, coupled with high library coverage, is not the sole reason for the observed better performance. As already alluded to, the optimal choice of the reduced codon as well as the proper choice of the CAST residues and their grouping is a prerequisite for obtaining best results and for making logical comparisons with the use of NNK at low library coverage. As will be seen below, for this reason we specifically included in our comparisons a “bad” choice of an amino acid as the sole building block in SM. Before analyzing sequences and enantioselectivities of the active hits, we first consider the influence of the nature of the chosen amino acid in Single Codon Saturation Mutagenesis (SCSM)18 based on the use of a single amino acid as building block (in addition to WT). As in the previous study, valine with its fairly bulky lipophilic isopropyl sidechain was chosen because the crystal structure15,18a reveals that essentially all of the amino acids lining the LEH binding pocket (CAST residues) have hydrophobic character. A phylogenetic (consensus) analysis likewise points to the clear dominance of hydrophobic residues that surround the active site.18a,20 Consequently, the choice of the amino acid is crucial, which means that other amino acids can be expected to provide very different results in analogous experiments. In the cases studied herein, high library coverage amounting to 76-95% was strived for. Of the tested amino acids when using KWC codon degeneracy (encoding amino acids D, V, F, Y, WT), GTG (V, WT) or GTG/TTT (V, F, WT), valine (V) is indeed the best

candidate (see Figure 2 and Table S1). The valine library GTG harbors the best enantioselective variants (up to 86% ee in favor of (S,S)-2 and 76% ee for (R,R)-2). For curiosity reasons as delineated above, we also included other libraries such as the serine library, because the polar sidechain of this amino acid can hardly be expected to enhance the probability of obtaining active mutants. Indeed, this library did not harbor many active hits (only 16 active hits were identified at 76% library coverage, the best variant slightly favoring (R,R)-selectivity with 15% ee) (Table S2, Table S3 and Table S4). Due to these poor results, we did not continue screening up to 95% library coverage as in the case of the Val library.

Figure 2. Relationship between codon usage encoding single amino acids in SCSM and the number of active LEH hits in the respective library. The % values indicate the degree of library coverage resulting from the number of screened transformants. Some of the results were analyzed statistically as shown in Figure 3. Enantioselectivity was measured only of the active hits, which means that variants characterized by extremely low activity, but possibly high enantioselectivity, were not considered. Upon going from extremely low library coverage amounting to a mere 1.3x10-10 % (NNK with 45 active hits) to 2.5% library coverage (GTG/TTT with active 88 hits) (Table 2), the number of stereoselective variants in favor of (S,S)- or (R,R)-2 increases markedly; 70 and 18 hits were identified, respectively. Although the whole GTG library was not analyzed for enantioselectivity by GC in the same way, which would have required excessive analytical work, the results demonstrate once again the benefits of employing highly

ACS Paragon Plus Environment

Page 5 of 11 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Catalysis

reduced amino acid alphabets at higher degrees of library coverage. All of the active hits identified from the screening plates are shown in Figure 3.

Figure 3. The enantiomeric excess (ee) distribution of active LEH hits obtained from libraries NNK, NDT, KWC, V/F (valine and phenylalanine) and V (valine). N indicates the number of active hits found in the respective library (Table 2). Keeping in mind the enantioselectivity thresholds of active hits (≥80% ee for (S,S)-selective variants and ≥15% ee for

(R,R)-selective variants, some of them were sequenced. Table 3 reveals that (S,S)-enantioselectivity amounting to ≥90% ee occurs only in libraries which were created using a highly limited number of amino acids as building blocks, and which were screened with high coverage (95%). This led to several variants showing 91-97% ee with excellent conversion under the reaction conditions. Active hits in the NNK and NDT libraries, requiring the same screening effort but corresponding to only 1.3x10-10% library coverage, showed only poor (S,S)-enantioselectivity. In the case of (R,R)-selectivity, high degrees of stereoselectivity amounting to >90% ee were not reached in any of the libraries, although the same trend regarding codon preference prevails (Table 3). Variant SZ677 (L74F/L114F/I116V/L147F) from the V/F library proved to have the best enantioselectivity (49% ee). Notably, none of the active mutants in the NNK and NDT libraries contained variants showing more than 25% ee (Table 3). We wish to refrain from speculations as to the reason for these observations, deep-seated MD computations being necessary for deriving explanations on a molecular level in future studies.

Table 3. Best LEH variants from SM libraries NNK, NDT, KWC and VF. library code mutation ee% NNK SZ793 L147K 16 NDT SZ795 L147D 25 SZ796 L74F/F75L/M78F/I80F 77 KWC SZ797 L74V/M78Y/I80F/L103Y 10 SZ798 L74F/M78Y/I80V/I116V/L147D 30 VF SZ669 L74F/M78F/I80V/L114V/I116V/F139V/L147V 88 SZ670 L74F/L103V/L114V/I116V/F139V 91 SZ671 L74V/M78Y/I80V/I116V/L147V 40 SZ672 L74V/M78F/I80V/L103V/I116V 16 SZ673 L74F/M78F/I80F/L114V/I116V/F139V 97 SZ674 L74F/I80F/L103V/L114V/I116V/F139V/L147F 86 SZ675 L74F/M78V/I80V/L103V/L114V/L147V 19 SZ676 L74F/L103F/L114V/F139V 42 SZ677 L74F/L114F/I116V/L147F 49 SZ678 L74F/I80F/L114V/I116V/F139V 95 SZ679 L74V/M78F/I80V/L103V/I116V/F139V 38 SZ681 L74F/M78V/L103V/L114V/I116V/F139V 86 SZ682 L74F/L103V/L114F/L147F 40 In addition to the frequently employed Patrick/Firth algorithm,6 the Nov metric has also emerged as a useful guide in SM-based directed evolution,7 which was recently extended to include extremely large multi-residue randomization sites.13i As already pointed out in the Introduction, the Nov approach focuses on the nth best mutant, not % library coverage. In a previous study we pointed out the quantitative relationship between the two analytical tools,8 which is illustrated here in extended form up to the 20th best mutant (Figure 4). In the present investigation, we applied the Nov metric and compared it to the results of applying the Patrick/Firth algorithm. For example, when library coverage is >95%, the very best variant (“number one”) should appear. According to the Nov statistics,7 when it is only 76%, one or all of the 3 best variants should appear. Table 2 includes the results of applying the Nov metric.

conversion% 84 55 99 47 99 99 99 96 94 98 81 94 56 65 97 97 96 92

favored enantiomer (2) (R,R) (R,R) (S,S) (R,R) (S,S) (S,S) (S,S) (R,R) (R,R) (S,S) (S,S) (R,R) (R,R) (R,R) (S,S) (R,R) (S,S) (R,R)

Figure 4. The relationship between % library coverage (Patrick/Firth algorithm) and the nth best mutants (Nov metric).

ACS Paragon Plus Environment

ACS Catalysis 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 11

P450-BM3 as the model system. In order to corroborate the conclusions made in the above LEH model study, it was important to test a mechanistically and structurally more complex enzyme type. Therefore, we chose cytochrome P450-BM319 as the catalyst in the regio- and stereoselective hydroxylation of cyclohexanone (3) with formation of the acyloins (R)- and (S)-4 (Scheme 2). O

O

OH +

NAD(P)H 3

O OH

P450 BM3, O2

(R)-4

(S)-4

Scheme 2. Model oxidative hydroxylation catalyzed by P450-BM3 mutants. WT P450-BM3 does not accept this small substrate, probably due to the large binding pocket of this monooxygenase. In a recent P450-BM3 study,21 we performed Triple Code Saturation Mutagenesis (TCSM),20 in this case using Asn-Ile-Phe as the three amino acids (NIF code) which led to active mutants showing either (R)- or (S)-selectivity, followed by ISM for final improvements (up to 95:5 and 5:95 enantiomeric ratios, respectively).21 At the time, the choice of the NIF code was made as follows: Following exploratory NNK-based SM experiments at 8 known CAST sites,19 the screening of which revealed the best amino acid exchange events, and docking substrate 3 into the binding pocket, the decision for Asn-Ile-Phe as the reduced amino acid alphabet (NIF) was made.21 In the present SM investigation, we compared the NNK and NDT codes at a 4-residue randomization site (Figure 5) with the use of an NIF code.

Figure 5. Binding pocket of wild type P450-BM3 based on the reported crystal structure (PDB code 1JPZ),22 surrounded by 4 residues V78, A82, A328 and T438 (all in purple) which were considered for combinatorial SM randomization, substrate cyclohexanone (3) being shown in white. It is well known that these four residues (V78, A82, A328 and T438) play particularly important roles in defining the size and the shape of the binding pocket of P450-BM3.19f,21 In all cases, 8 microtiter plates (96-format) were screened, amounting to the following values for the % library coverage: NNK (0.07 %), NDT (3.5 %) and NIF (94.4%). The activity threshold was defined as ≥5% conversion of 5 mM substrate 3 within 18 hours, measured by GC. The results are shown in Table 4. It can be seen that screening the NIF library with 94.4% coverage is superior to screening only 0.07% of the NNK library or 3.5% of the NDT library in terms of the absolute number of active variants and the fraction of active variants. The data in Table 4 and in Figure 6 leads to the same general conclusion as in the case of the LEH study, namely that high library coverage using a degenerate codon corresponding to a properly chosen reduced amino acid library is distinctly better than using full genetic diversity (NNK) or moderately reduced genetic diversity at extremely reduced library coverage.

Table 4. Results of comparing the number of active P450-BM3 hits discovered in three different mutant libraries in which the randomization site is composed of 4 residues lining the binding pocket. degenerate number amino acids used as screened transformants active theoretical fraction of coverage codea) of building blocks in SM platesb) screened variants library active (%) amino sizec variants(%)d acids NNK 20 All 20 amino acids 8 736 42 4.8x105 5.7 0.07 NDT 12 P,L,I,V,Y,H,N,D,C,R,S,G 8 736 24 6.2x104 3.3 3.5 AWC/TTT 4 N,I,F,WT 8 736 103 768 14.0 94.4 a) N

= A/C/G/T; K = G/T; D = A/G/T; W = A/T. 96-well microtiter plates. c) Calculated by Patrick/Firth algorithm for 95% library coverage. d) Calculated based on the ratio of active variants to number of transformants screened. b)

ACS Paragon Plus Environment

Page 7 of 11 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Catalysis Figure 6. Relationship between the usage of different codon degeneracies and the number of active P450-BM3 hits in the oxidative hydroxylation of substrate 3. The %-values indicate the degree of library coverage resulting from the number of screened microtiter plates. As in the case of LEH, the enantioselectivities of active hits were measured by GC. Generally a higher frequency of most active (R)-mutants relative to (S)-selective mutants were found in the NIF library compared with NNK and NDT libraries. The selected mutants with best catalytic performance were also chosen and sequenced from all three libraries (Table 5).

Table 5. Best P450-BM3 variants from SM libraries using NNK, NDT and NIF codon degeneracies. library code mutation ee% conversion% favored enantiomer NNK-2-09 V78L/A82S/A328N/T438L 78 10 R NNK-4-44 V78L/A82F 75 12 R NNK NNK-5-23 V78N/A82F 77 15 R NNK-7-95 V78Y/A82Y 74 18 R NNK-8-73 V78N/A82V/A328F 65 18 S NDT-1-20 V78W/A82F 72 11 R NDT-2-81 V78L/A82F 75 14 R NDT-2-88 V78I/A82F 71 66 R NDT NDT-3-54 A82F 77 20 R NDT-3-60 V78C/A82F 78 26 R NDT-7-59 A82F/A328L 61 8 S NIF-1-25 A82N/A328N 64 39 R NIF-1-33 A82I/A328N 70 19 R NIF-1-61 V78N/A82I/A328N 74 27 R NIF-2-04 V78N/A82F/A328F 54 49 S NIF-2-61 V78I/A328N 74 12 R NIF-2-89 A328N 75 10 R NIF-3-38 V78N/A82F/A328F/T438I 54 17 S NIF NIF-4-09 V78N/A328F 50 29 S NIF-4-61 V78N/A82F/A328N 69 44 R NIF-5-96 V78F/A82F/A328F/T438I 64 5 S NIF-6-37 V78F/A82F/A328N/T438I 78 33 R NIF-6-93 V78I/A82F/A328F/T438I 75 14 S NIF-7-08 V78I/A82F/A328F 44 55 S NIF-8-94 V78I/A82I/A328N 73 35 R The experimental results were then analyzed statistically as in the LEH case. Figure 7 reveals for the P450-BM3 case study the notably better result of using SM based on a rationally chosen highly reduced amino acid alphabet (NIF). More active variants at relatively high library coverage were identified compared with the randomized NNK and NDT libraries. However, upon going from NDT to NIF, the ratio of (R)- to (S)-selective variants does not increase much. For the NDT library, the number of active hits decreases compared with NNK. This may be due to the fact that wild-type mutations at residues A328, T438 and A82 are theoretically excluded in the designed NDT degenerate codon. Such highly focused smart libraries ensure a higher chance of getting active hits relative to the alternative approach to SM library generation using full structural diversity of the amino acids which prevents full library coverage. It is important to note that in the SM-based directed evolution of P450-BM3 using a variety of different substrates for regio- and stereoselective oxidative

hydroxylation, NDT codon degeneracy was often used with respectable results.13i,19

Figure 7. The enantiomeric excess (ee) distribution of active P450-BM3 hits obtained from libraries NNK, NDT, and NIF (asparagine, isoleucine and phenylalanine) screened in the

ACS Paragon Plus Environment

ACS Catalysis 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

oxidative hydroxylation of cyclohexanone (3). N indicates the number of active hits found in the library. CONCLUSIONS AND PERSPECTIVES Methodology development in directed evolution of stereo- and regioselective enzymes as catalysts in organic chemistry and biotechnology has proven to be crucial for success.1b In this ongoing endeavor, saturation mutagenesis (SM) at first and second sphere residues lining the enzyme’s binding pocket has emerged as a particularly viable strategy, often but not consistently referred to as CAST sites.1-3,9,11-14,16,18 Unfortunately, to date a fundamental question has been largely ignored which has a profound influence on the efficiency of directed evolution: Is it best to apply NNK codon degeneracy encoding all 20 canonical amino acids, while covering for practical reasons only a very small segment of the defined protein sequence space, or is it better to choose a rationally conceived degenerate codon corresponding to a designed highly reduced amino acid alphabet which allows maximal library coverage of up to 95%? In our study we present unequivocal experimental evidence that the latter option leads to superior results, this conclusion being based on the use of two different enzyme types, an epoxide hydrolase and a P450 monooxygenase. This conclusion is based on the assumption that an optimal choice of the designed reduced amino acid alphabet and of the CAST randomization sites has been made, supported by such guides as X-ray structural and mechanistic data, phylogenetic (consensus) analysis, molecular dynamics computations, in silico techniques and machine learning.1b Some of these guides can be used as such to predict mutation sites (hot spots) and even specific mutations thereat, three of many studies1b being cited here.13l,23,24 At this stage it is too early to provide a rigorous comparison of the different approaches, which was also not the goal of the present study. In our first case study, limonene epoxide hydrolase (LEH)15 was employed as the catalyst in the hydrolytic desymmetrization of cyclohexene oxide with formation of the two enantiomeric products, (R,R)and (S,S)-cyclohexane-1,2-diol. Several highly reduced amino acid alphabets were chosen, the smallest one involving a single amino acid (in addition to WT), their respective codon degeneracies being used in SM at a 10-residue randomization site lining the binding pocket. Library coverage in the range 76% to >95% was ensured. These libraries were compared to the use of NNK codon degeneracy, in which case only 3x10-10 % of the theoretical protein sequence space was covered. The result of a statistical analysis clearly demonstrates that it is better to aim for maximal library coverage despite the fact that structural diversity of the amino acid building blocks is distinctly lower. Such small and “smartly” designed libraries are characterized by a higher frequency of active and stereoselective variants. Similar experimental results were observed in the case of P450-BM3 as the catalyst in the stereoand regioselective oxidative α-hydroxylation of cyclohexanone, leading to the same general conclusion. When small focused SM libraries are rationally designed based on optimally chosen reduced amino acid alphabets, which allows for high coverage screening, the likelihood of redundant and non-useful codons as well as nonsense codons is reduced. In the two experimental platforms (LEH and P450-BM3), the rational choice of optimal reduced amino acid alphabets and decisions on how to group single CAST residues into

Page 8 of 11

multi-residue had been made earlier,18a,21 which are briefly recapitulated as guidelines in the present study. Further details useful in future directed evolution studies can be found in a recent review.1b MATERIALS AND METHODS Materials KOD Hot Start DNA Polymerase was obtained from Novagen. The oligonucleotides were synthesized by Life Technologies. Plasmid preparation kit was ordered from Zymo Research and PCR purification kit was bought from QIAGEN. Mutants were sequenced by GATC Biotech. All commercial chemicals were purchased from Sigma-Aldrich. Lysozyme and DNase I were purchased from AppliChem. PCR based methods for library construction Saturation mutagenesis (SM) libraries were constructed using the Over-lap PCR and megaprimer approach2 with KOD Hot Start polymerase. 50 µl reaction mixtures typically contained 30 µl water, 5 µl KOD hot start polymerase buffer (10×), 3 µl 25 mM MgSO4, 5 µl 2 mM dNTPs, 2.5 µl DMSO, 0.5 µl (50~100 ng) template DNA, 100 µM primers Mix 0.5 µl each and 1 µl KOD hot start polymerase. The PCR conditions for short fragment: 95 °C 3 min, (95 °C 30 sec, 56 °C 30 sec, 68 °C 40 sec) × 32 cycles, 68 °C 120 sec, 16 °C 30 min. For mega-PCR: 95 °C 3 min, (95 °C 30 sec, 60 °C 30 sec, 68 °C 5 min 30 sec) × 24 cycles, 68 °C 10 min, 16 °C 30 min. The PCR products were analyzed on agarose gel by electrophoresis and purified using a Qiagen PCR purification kit. 5 µl NEB CutSmart™ Buffer and 2 µl Dpn I were added in 50 µl and the reactions were incubated at 37 °C for more than 3 h. After Dpn I digestion, the PCR products were purified again or 1 µl were directly transformed into electrocompetent E. coli BL21(DE3) to create the final library for Quick Quality Control (QQC)25 and screening. Screening Procedure Limonene epoxide hydrolase (LEH): Colonies were individually picked and deposited in the wells of 96-deep-well microtiter plates containing 300 µl LB medium with 50 µg/ml carbenicillin and cultured overnight at 37 ℃ with shaking. An aliquot of 120 µl was transferred to glycerol stock plate and stored at -80 °C. Then, 600 µl TB medium with 0.5% (m/v) lactose and 50 µg/ml carbenicillin was added directly to the culture plate for 7~8 h at 28 °C with shaking for protein expression. The cell pellets were harvested and washed with 400 µl 50 mM pH 7.4 potassium phosphate buffer and centrifuged for 10 min 4000 rpm at 4 °C. Then, the pellets were resuspended in 400 µl of the same buffer with 6 U DNase I and 1 mg/ml lysozyme for breaking the cell at 30 °C for 1 h with shaking. The crude lysate was centrifuged for 30 min at 4000 rpm and 4 °C. 40 µl of the supernatant was used for an adrenaline assay according to our previous report .16 300 µl rest supernatant of the active transformants were transferred into new deep-well plates for reaction with 5 mM substrate 1 and 5% acetonitrile as co-solvent for 14~16 h at 30 °C, the final volume was 400 µl. The product and remaining substrate were extracted using equal volumes of ethyl acetate (EtOAc) for GC analysis by chiral column.

ACS Paragon Plus Environment

Page 9 of 11 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Catalysis

P450-BM3 monooxygenase: Colonies were picked and transferred into deep-well plates containing 400 μl LB medium with 50 μg/ml kanamycin and cultured overnight at 37°C with shaking. An aliquot of 120 μl was transferred to glycerol stock plate and stored at -80 °C. The expression culture was inoculated by transfer of 100 μl overnight culture into 900 μl TB medium containing 0.2 mM IPTG and 50 μg/ml kanamycin as a final concentration. After 20 h expression at 25°C, 220 rpm, the cell pellets were harvested and washed with 400 μl 100 mM pH 8.0 potassium phosphate buffer, followed by centrifugation at 4°C and 4000 rpm for 10 min. The supernatant was discarded and the cell pellets were resuspended by vortexing in 400 μl of the same buffer containing 100 mM glucose. The reaction was started by addition of 10 μL cyclohexanone stock solution (200 mM in DMSO, final concentration was 5 mM) and plates were shaken for 18 h at 800 rpm and 30°C. The reaction was stopped by adding 3 x 160 μl ethyl acetate (EtOAc). The phase separation was achieved by centrifugation for 30 min at 4°C, 4000 rpm. The organic phase was transferred into a multi-titer plate (MTP, ABgene, AB-0796) with a Tecan robot system, and subjected to GC analysis using a chiral column.20

AUTHOR INFORMATION Corresponding Author

* Correspondence should be addressed to [email protected]; [email protected] # Authors contributed equally to this work. ASSOCIATED CONTENT Supporting Information. Supporting materials and methods, tables and figures are provided. Primer design and library creation of LEH and P450-BM3, respectively, results of library screening, data collection and list of primers. This information is available free of charge on the ACS Publications website.

ACKNOWLEDGMENT MTR thanks the Max-Planck-Society for generous support. ZS is thankful for being financially supported by CAS Pioneer Hundred Talents Program (No. 2016-053), Biological Resources Programme, Chinese Academy of Sciences (No. KFJ-BRP-009) as well as the Key Research Program of the Chinese Academy of Sciences (KFZD-SW-215). AL thanks the financial support from the National Natural Science Foundation of China (Grant No. 21702052) and State Key Laboratory of Biocatalysis and Enzyme Engineering. REFERENCES 1. Reviews of directed evolution with emphasis on stereo- and regioselectivity: (a) Reetz, M. T. Directed Evolution of Selective Enzymes: Catalysts for Organic Chemistry and Biotechnology; Wiley-VCH: Weinheim, 2016; (b) Qu, G.; Li, A.; Acevedo-Rocha, C. G.; Sun, Z.; Reetz, M. T. The Crucial Role of Methodology Development in Directed Evolution of Selective Enzymes. Angew. Chem. Int. Ed. 2019, 58, DOI: 10.1002/anie.201901491. 2. General reviews of directed evolution: (a) Turner, N. J. Directed Evolution Drives the Next Generation of Biocatalysts. Nat. Chem.

Biol. 2009, 5, 567−573. (b) Currin, A.; Swainston, N.; Day, P. J.; Kell, D. B. Synthetic Biology for the Directed Evolution of Protein Biocatalysts: Navigating Sequence Space Intelligently. Chem. Soc. Rev. 2015, 44, 1172−1239. (c) Denard, C. A.; Ren, H.; Zhao, H. Improving and Repurposing Biocatalysts via Directed Evolution. Curr. Opin. Chem. Biol. 2015, 25, 55–64. (d) Hammer, S. C.; Knight, A. M.; Arnold, F. H. Design and Evolution of Enzymes for Non-natural Chemistry. Curr. Opin. Green Sustain. Chem. 2017, 7, 23−30. (e) Zeymer, C.; Hilvert, D. Directed Evolution of Protein Catalysts. Annu. Rev. Biochem. 2018, 87, 131−157. 3. Reviews of artificial metalloenzymes: (a) Reetz, M. T. Directed Evolution of Artificial Metalloenzymes: A Universal Means to Tune the Selectivity of Transition Metal Catalysts? Acc. Chem. Res. 2019, 52, 336–344.; (b) Renata, H.; Wang, Z. J.; Arnold, F. H. Expanding the Enzyme Universe: Accessing Non-Natural Reactions by Mechanism-Guided Directed Evolution. Angew. Chem. Int. Ed. 2015, 54, 3351−3367. 4. (a) Reetz, M. T. Select Protocols of High-Throughput ee-Screening Systems for Assaying Enantioselective Enzymes. Methods Molec. Biol. 2003, 230, 283−290; (b) Reymond, J.-L. Enzyme Assays: High-Throughput Screening, Genetic Selection and Fingerprinting, Wiley-VCH: Weinheim, 2006; (c) Wójcik, M.; Telzerow, A.; Quax, W. J.; Boersma, Y. L. High-Throughput Screening in Protein Engineering: Recent Advances and Future Perspectives. Int. J. Mol. Sci. 2015, 16, 24918–24945; (d) Xiao, H.; Bao, Z.; Zhao, H. High Throughput Screening and Selection Methods for Directed Enzyme Evolution. Ind. Eng. Chem. Res. 2014, 54, 4011–4020; e) Acevedo-Rocha, C. G.; Agudo, R.; Reetz, M. T. Directed Evolution of Stereoselective Enzymes Based on Genetic Selection as Opposed to Screening Systems. J. Biotechnol. 2014, 191, 3−10. 5. Review of CAST/ISM: Reetz, M. T. Laboratory Evolution of Stereoselective Enzymes: A Prolific Source of Catalysts for Asymmetric Reactions. Angew. Chem. Int. Ed. 2011, 50, 138−174. 6. (a) Firth, A. E.; Patrick, W. M. GLUE-IT and PEDEL-AA: New Programmes for Analyzing Protein Diversity in Randomized Libraries. Nucleic Acids Res. 2008, 36, W281−W285. (b) Patrick, W. M.; Firth, A. E. Strategies and Computational Tools for Improving Randomized Protein Libraries. Biomol. Eng. 2005, 22, 105−112; (c) Firth, A. E., Patrick, W. M. Statistics of protein library construction. Bioinformatics 2005, 21, 3314–3315; (d) Denault, M.; Pelletier, J. N. In Protein Engineering Protocols; Arndt, K. M., Müller, K. M., Eds.; Humana Press: Totowa, NJ, 2007; pp 127−154. 7. (a) Nov, Y. When Second Best is Good Enough: Another Probabilistic Look at Saturation Mutagenesis. Appl. Environ. Microbiol. 2012, 78, 258−262; (b) Nov, Y. Fitness Loss and Library Size Determination in Saturation Mutagenesis. PLOS ONE 2013, 8: e68069; (c) Nov, Y. Probabilistic Methods in Directed Evolution: Library Size, Mutation Rate, and Diversity. Methods Molec. Biol. 2014, 1179, 261−278. 8. Hoebenreich, S.; Zilly, F. E.; Acevedo Rocha, C. G.; Zilly, M.; Reetz, M. T. Speeding up Directed Evolution: Combining the Advantages of Solid-Phase Combinatorial Gene Synthesis with Statistically Guided Reduction of Screening Effort. ACS Synth. Biol. 2015, 4, 317−331. 9. (a) Acevedo-Rocha, C. G.; Reetz, M. T. In Understanding Enzymes: Function, Design, Engineering and Analysis, Svendsen, A., Ed., Pan Stanford Publishing Pte. Ltd.: Singapore, 2016; pp 613−642; (b) Acevedo-Rocha, C. G.; Kille, S.; Reetz, M. T. Iterative Saturation Mutagenesis: A Powerful Approach to Engineer Proteins by Simulating Darwinian Evolution. Methods Molec. Biol. 2014, 1179, 103−128. 10. Reetz, M. T.; Wilensek, S.; Zha, D.; Jaeger, K.-E. Directed Evolution of an Enantioselective Enzyme through Combinatorial Multiple-Cassette Mutagenesis. Angew. Chem. Int. Ed. 2001, 40, 3589−3591. 11. Reetz, M. T.; Carballeira, J. D. Iterative Saturation Mutagenesis (ISM) for Rapid Directed Evolution of Functional Enzymes. Nat. Protoc. 2007, 2, 891−903. 12. Reetz, M. T.; Kahakeaw, D.; Lohmer, R. Addressing the Numbers Problem in Directed Evolution. ChemBioChem 2008, 9, 1797−1804.

ACS Paragon Plus Environment

ACS Catalysis 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

13. Selected examples of using reduced amino acid alphabets in SM-based directed evolution: (a) Zhang, W.; Modén, O.; Tars, K.; Mannervik, B. Structure-Based Redesign of GST A2-2 for Enhanced Catalytic Efficiency with Azathioprine. Chem. Biol. 2012, 19, 414−421; (b) van Leeuwen, J. G.; Wijma, H. J.; Floor, R. J.; van der Laan, J. M.; Janssen, D. B. Directed Evolution Strategies for Enantiocomplementary Haloalkane Dehalogenases: from Chemical Waste to Enantiopure Building Blocks. ChemBioChem 2012, 13, 137−148; (c) Sandström, A. G.; Wikmark, Y.; Engström, K.; Nyhlén, J.; Bäckvall, J.-E. Combinatorial Reshaping of the Candida Antarctica lipase A Substrate Pocket for Enantioselectivity Using an Extremely Condensed Library. Proc. Natl. Acad. Sci. U. S. A. 2012, 109, 78−83; (d) Nobili, A.; Gall, M. G.; Pavlidis, I. V.; Thompson, M. L.; Schmidt, M.; Bornscheuer, U. T. Use of 'Small but Smart' Libraries to Enhance the Enantioselectivity of an Esterase from Bacillus stearothermophilus Towards Tetrahydrofuran-3-yl Acetate. FEBS J. 2013, 280, 3084−3093; (e) Agudo, R.; Roiban, G.-D.; Reetz, M. T. Induced Axial Chirality in Biocatalytic Asymmetric Ketone Reduction. J. Am. Chem. Soc. 2013, 135, 1665−1668; (f) Zhang, Z.-G.; Lonsdale, R.; Sanchis, J.; Reetz, M. T. Extreme Synergistic Mutational Effects in the Directed Evolution of a Baeyer-Villiger Monooxygenase as Catalyst for Asymmetric Sulfoxidation. J. Am. Chem. Soc. 2014, 136, 17262−17272; (g) Sun, Z.; Wikmark, Y.; Bäckvall, J.-E.; Reetz, M. T. New Concepts for Increasing the Efficiency in Directed Evolution of Stereoselective Enzymes. Chem. Eur. J. 2016, 22, 5046−5054; (h) Li, G.; Yao, P.; Gong, R.; Li, J.; Liu, P.; Lonsdale, R.; Wu, Q.; Lin, J.; Zhu, D.; Reetz, M. T. Simultaneous Engineering of an Enzyme's Entrance Tunnel and Active Site: the Case of Monoamine Oxidase MAO-N. Chem. Sci. 2017, 8, 4093−4099; (h) Kan, S. B. J.; Huang, X.; Gumulya, Y.; Chen, K.; Arnold, F. H. Genetically Programmed Chiral Organoborane Synthesis. Nature 2017, 552, 132−136; (i) Acevedo-Rocha, C. G.; Gamble, C.; Lonsdale, R.; Li, A.; Nett, N.; Hoebenreich, S.; Lingnau, J. B.; Wirtz, C.; Fares, C.; Hinrichs, H.; Deege, A.; Mulholland, A. J.; Nov, Y.; Leys, D.; McLean, K. J.; Munro, A. W.; Reetz, M. T. P450-Catalyzed Regio- and Diastereoselective Steroid Hydroxylation: Efficient Directed Evolution Enabled by Mutability Landscaping. ACS Catal. 2018, 8, 3395−3410; (j) Chen, K.; Huang, X.; Kan, S. B. J.; Zhang, R. K.; Arnold, F. H. Enzymatic Construction of Highly Strained Carbocycles. Science 2018, 360, 71−75; (k) Zhang, J.; Huang, X.; Zhang, R. K.; Arnold, F. H. Enantiodivergent α-Amino CH-Fluoroalkylation Catalyzed by Engineered Cytochrome P450s. J. Am. Chem. Soc. 2019, 141, 9798−9802; (l) Currin, A.; Kwok, J.; Sadler, J. C.; Bell, E. L.; Swainston, N.; Ababi, M.; Day, P.; Turner, N. J.; Kell, D. B. GeneORator: An Effective Strategy for Navigating Protein Sequence Space More Efficiently through Boolean OR-Type DNA Libraries. ACS Synth. Biol. 2019, 8, 1371−1378. 14. (a) Li, A.; Acevedo-Rocha, C. G.; Sun, Z.; Cox, T.; Xu, J. L.; Reetz, M. T. Beating Bias in the Directed Evolution of Proteins: Combining High-Fidelity on-Chip Solid-Phase Gene Synthesis with Efficient Gene Assembly for Combinatorial Library Construction. ChemBioChem 2018, 19, 221−228; (b) Li, A.; Sun, Z.; Reetz, M. T. Solid-Phase Gene Synthesis for Mutant Library Construction: The Future of Directed Evolution? ChemBioChem 2018, 19, 2023−2032. 15. Arand, M.; Hallberg, B. M.; Zou, J.; Bergfors, T.; Oesch, F.; van der Werf, M. J.; de Bont, J. A. M.; Jones, T. A.; Mowbray, S. L. Structure of Rhodococcus erythropolis Limonene-1,2-Epoxide Hydrolase Reveals a Novel Active Site. EMBO J. 2003, 22, 2583−2592. 16. Zheng, H.; Reetz, M. T. Manipulating the Stereoselectivity of Limonene Epoxide Hydrolase by Directed Evolution Based on Iterative Saturation Mutagenesis. J. Am. Chem. Soc. 2010, 132, 15744−15751.

Page 10 of 11

17. Wahler, D.; Reymond, J.-L. The Adrenaline Test for Enzymes. Angew. Chem. Int. Ed. 2002, 41, 1229−1232. 18. (a) Sun, Z.; Lonsdale, R.; Kong, X.-D.; Xu, J.-H.; Zhou, J.; Reetz, M. T. Reshaping an Enzyme Binding Pocket for Enhanced and Inverted Stereoselectivity: Use of Smallest Amino Acid Alphabets in Directed Evolution. Angew. Chem. Int. Ed. 2015, 54, 12410−12415; (b) Sun, Z.; Lonsdale, R.; Li, G.; Reetz, M. T. Comparing Different Strategies in Directed Evolution of Enzyme Stereoselectivity: Singleversus Double- Code Saturation Mutagenesis. ChemBioChem 2016, 17, 1865−1872. 19. Reviews of P450 monooxygenases: (a) Urlacher, V. B.; Girhard, M. Cytochrome P450 Monooxygenases in Biotechnology and Synthetic Biology. Trends Biotechnol. 2019, DOI: 10.1016/j.tibtech.2019.01.001; (b) Ortiz de Montellano, P. R. Hydrocarbon Hydroxylation by Cytochrome P450 Enzymes. Chem. Rev. 2010, 110, 932−948; (c) King-Smith, E.; Zwick, R.; Renata, H. Applications of Oxygenases in the Chemoenzymatic Total Synthesis of Complex Natural Products. Biochemistry 2018, 57, 403–412; (d) O’Reilly, E.; Kohler, V.; Flitsch, S. L.; Turner, N. J. Cytochrome P450 as Useful Biocatalysts: Addressing the Limitations. Chem. Commun. 2011, 47, 2490−2501; (e) Dong, I. I.; Fernandez-Fueyo, E.; Hollmann, F.; Paul, C.; Pesic, M.; Schmidt, S.; Wang, Y.; Younes, S.; Zhang, W. Biocatalytic Oxidation Reactions: A Chemist’s Perspective. Angew. Chem. Int. Ed. 2018, 57, 9238−9261; (f) Whitehouse, C. J. C.; Bell, S. G.; Wong, L.-L. P450BM3 (CYP102A1): Connecting the Dots. Chem. Soc. Rev. 2012, 41, 1218−1260. 20. Sun, Z.; Lonsdale, R.; Wu, L.; Li, G.; Li, A.; Wang, J.; Zhou, J.; Reetz, M. T. Structure-Guided Triple-Code Saturation Mutagenesis: Efficient Tuning of the Stereoselectivity of an Epoxide Hydrolase. ACS Catal. 2016, 6, 1590−1597. 21. Li, A.; Ilie, A.; Sun, Z.; Lonsdale, R.; Xu, J.-H.; Reetz, M. T. Whole-Cell-Catalyzed Multiple Regio- and Stereoselective Functionalizations in Cascade Reactions Enabled by Directed Evolution. Angew. Chem. Int. Ed. 2016, 55, 12026–12029. 22. (a) Narhi, L. O.; Fulco, A. J. Characterization of a Catalytically Self-Sufficient 119,000-Dalton Cytochrome P-450 Monooxygenase Induced by Barbiturates in Bacillus megaterium. J. Biol. Chem. 1986, 261, 7160−7169; (b) Munro, A. W.; Leys, D. J.; McLean, K. J.; Marshall, K. R.; Ost, T. W. B.; Daff, S.; Miles, C. S.; Chapman, S. K.; Lysek, D. A.; Moser, C. C.; Page, C. C.; Dutton, P. L. P450 BM3: The very Model of a Modern Flavocytochrome. Trends Biochem. Sci. 2002, 27, 250−257; (c) Jovanovic, T.; Farid, R.; Friesner, R. A.; McDermott, A. E. Thermal Equilibrium of High- and Low-Spin Forms of Cytochrome P450 BM3:  Repositioning of the Substrate? J. Am. Chem. Soc. 2005, 127, 13548−13552; (d) Haines, D. C.; Tomchick, D. R.; Machius, M.; Peterson, J. A. Pivotal role of water in the mechanism of P450BM-3. Biochemistry 2001, 40, 13456−13465. 23. Moore, J. C.; Rodriguez-Granillo, A.; Crespo, A.; Govindarajan, S.; Welch, M.; Hiraga, K.; Lexa, K.; Marshall, N.; Truppo, M. D. “Site and Mutation”-Specific Predictions Enable Minimal Directed Evolution Libraries. ACS Synth. Biol. 2018, 7, 1730-1741. 24. Xu, J.; Cen, Y.; Singh, W.; Fan, J.; Wu, L.; Lin, X.; Zhou, J.; Huang, M.; Reetz, M. T.; Qu, Q. Stereodivergent Protein Engineering of a Lipase to Access All Possible Stereoisomers Bearing Multiple Stereocenters. J. Am. Chem. Soc. 2019, 141, 7934-7945. 25. Bougioukou, D. J.; Kille, S.; Taglieber, A.; Reetz, M. T. Directed Evolution of an Enantioselective Enoate-Reductase: Testing the Utility of Iterative Saturation Mutagenesis. Adv. Synth. Catal. 2009, 351, 3287−3305.

ACS Paragon Plus Environment

Page 11 of 11 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Catalysis

TOC:

ACS Paragon Plus Environment

11