P450-Catalyzed Regio- and Diastereoselective Steroid Hydroxylation

Mar 8, 2018 - Figure S18 shows a stereoview of the binding modes of 1A and 1B in the substrate binding cavity of the WIFI-WC heme domain. .... (32a) M...
0 downloads 5 Views 1MB Size
Subscriber access provided by - Access paid by the | UCSB Libraries

P450-Catalyzed Regio- and Diastereoselective Steroid Hydroxylation: Efficient Directed Evolution Enabled by Mutability Landscaping Carlos G. Acevedo-Rocha, Charles Gamble, Richard Lonsdale, Aitao Li, Nathalie Nett, Sabrina Hoebenreich, Julia B. Lingnau, Cornelia Wirtz, Christophe Farès, Heike Hinrichs, Alfred Deege, Adrian J. Mulholland, Yuval Nov, David Leys, Kirsty J. McLean, Andrew W. Munro, and Manfred T. Reetz ACS Catal., Just Accepted Manuscript • DOI: 10.1021/acscatal.8b00389 • Publication Date (Web): 08 Mar 2018 Downloaded from http://pubs.acs.org on March 8, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 18 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Catalysis

ACS Paragon Plus Environment

ACS Catalysis 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 18

P450-Catalyzed Regio- and Diastereoselective Steroid Hydroxylation: Efficient Directed Evolution Enabled by Mutability Landscaping Carlos G. Acevedo-Rochaa,b†, Charles G. Gamblec, Richard Lonsdalea,b,d, Aitao Lia,b,e, Nathalie Nettb, Sabrina Hoebenreichb, Julia B. Lingnaua, Cornelia Wirtza, Christophe Faresa, Heike Hinrichsa, Alfred Deegea, Adrian J. Mulhollandd, Yuval Novf, David Leysc, Kirsty J. McLeanc, Andrew W. Munroc and Manfred T. Reetza,b* a

Max-Planck-Institut für Kohlenforschung, Kaiser-Wilhelm-Platz 1, 45470 Muelheim, Germany Department of Chemistry, Hans-Meerwein-Strasse 4, Philipps-University, 35032 Marburg, Germany c Manchester Institute of Biotechnology, School of Chemistry, University of Manchester, Manchester, M1 7DN, UK d Centre for Computational Chemistry, School of Chemistry, University of Bristol, Cantock’s Close, Bristol, BS8 1TS, UK e Hubei Collaborative Innovation Center for Green Transformation of Bio-resources, Hubei Key Laboratory of Industrial Biotechnology, College of Life Sciences, Hubei University 368 Youyi Road, Wuchang Wuhan, 430062, China f Department of Statistics, University of Haifa, Haifa, 31905, Israel b

ABSTRACT: Cytochrome P450 monooxygenases play a crucial role in the biosynthesis of many natural products and in the human metabolism of numerous pharmaceuticals. This has inspired synthetic organic and medicinal chemists to exploit them as catalysts in regio- and stereoselective CH-activating oxidation of structurally simple and complex organic compounds such as steroids. However, levels of regio- and stereoselectivity as well as activity are not routinely high enough for real applications. Protein engineering using rational design or directed evolution has helped in many respects, but simultaneous engineering of multiple catalytic traits such as activity, regioselectivity and stereoselectivity, while overcoming tradeoffs and diminishing returns, remains a challenge. Here we show that the exploitation of information derived from mutability landscapes and molecular dynamics simulations for rationally designing iterative saturation mutagenesis constitutes a viable directed evolution strategy. This combined approach is illustrated by the evolution of P450BM3 mutants which enable nearly perfect regio- and diastereoselective hydroxylation of five different steroids specifically at the C16-position with unusually high activity, while avoiding activity-selectivity tradeoffs as well as keeping the screening effort relatively low. The C16 alcohols are of practical interest as components of biologically active glucocorticoids.

KEYWORDS. Directed evolution; cytochrome P450 monooxygenase; regioselectivity; stereoselectivity; mutability landscapes; iterative saturation mutagenesis; steroids; C-H activation. INTRODUCTION Cytochrome P450 (CYP) monooxygenases belong to a large protein superfamily (>20,000 members) that is involved in the biosynthesis of complex natural products including steroids, terpenes, alkaloids, flavonoids and vitamins,1 in the metabolism of therapeutic drugs,1,2 and in the breakdown of pollutants.2 Since the early 1950s, many different fungal and bacterial strains containing CYPs have also been used as catalysts in organic and medicinal chemistry, as first demonstrated in the landmark industrial semi-synthetic preparation of cortisol.3 Protein engineering of many different enzyme types based on rational design4 or directed evolution5 allow the

enhancement and reversal of enantioselectivity, although activity of the best mutants may not be optimal. The situation in the case of CYP-catalyzed hydroxylation of steroids or other natural products such as terpenes or alkaloids is even more challenging, because the desired catalytic profile is more complex encompassing regioselectivity, diastereoselectivity and activity.6 Parallel to these efforts, organic chemists have responded to the need for late-stage CH-activating hydroxylation of distinct types of compounds by developing novel reagents and catalysts, but limitations in terms of regioselectivity persist.7 1

ACS Paragon Plus Environment

Page 3 of 18 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Catalysis

When faced with the problem of CYP-based selective hydroxylation of substrates not studied previously, screening wildtype (WT) or mutants evolved earlier for other purposes may lead fortuitously to a certain regioand steresoelectivity at a desired (or undesired) position in the substrate, which, if necessary, can then be improved by further mutagenesis.6 An impressive recent example concerns the late stage transformation of an allylic CH2-entity into the respective α,ß-unsaturated ketone function in the natural product synthesis of nigelladine A by the groups of Stoltz and Arnold.8 Upon testing CYP mutant libraries previously generated for other sterically demanding substrates, a mutant was identified which ensures a regioselectivity of 3.8 : 1 in favor of the desired hydroxylation at position C7, followed by chemical oxidation to the ketone in the final step.8 In other work, P450BM3, a widely used self-sufficient fatty acid CYP from Bacillus megaterium,9 was elegantly employed by Fasan and co-workers in the late-stage hydroxylation of the anti-malaria drug artemisinin.10a Transformations at the chemically easily accessible positions such as C9 and C10 had been performed earlier using synthetic reagents and catalysts.11 In contrast, it was known that P450-mediated metabolism occurs, inter alia, at the chemically difficult to access C6 and C7 positions.11 In order to reach this difficult goal by P450BM3 catalysis, a mutant FL#62 with 16 point mutations was first tested which had been identified earlier in a fingerprint process based on saturation mutagenesis (SM) and color-based screening of 12,500 mutants using bulky surrogate substrates.12 Gratifyingly, this variant showed in the reaction of artemisinin a selectivity of C7(S) : C7(R) : C6a = 83 : 10 : 7, calling for improvement by directed evolution.10a Using the technique of iterative saturation mutagenesis (ISM),13 final selectivities of 100% C7(S), 100% C7(R) and 94% C6a were achieved, although in the latter case with a 3-fold decrease in total turnover number (TTN) relative to the parent enzyme.10a Later the selective hydroxylation of the anti-leukemic agent parthenolide was also accomplished, although one mutant showed a strong activity tradeoff (17-fold loss).10b Tradeoff effects have been observed in other directed evolution studies of CYPs,6 as in protein engineering of other enzyme types.5,14 On the other hand, Wong and coworkers reported the highly regioselective (96%) hydroxylation of testosterone at C15 with excellent substrate conversion (83%) and little screening effort using a panel of only ~100 P450BM3 variants.15 Although C15selectivity was not planned in terms of targeting this particular position, this kind of result is of significant synthetic value. Some time ago, we reported the directed evolution of P450BM3 in the selective hydroxylation of testosterone.16 Since WT does not accept this substrate nor steroids in general, the known variant F87A6b was tested, which led to a ~1 : 1 mixture of the respective 2β- and 15βproducts. For further improvement, 20 positions surrounding the extensive binding pocket were identified, grouped into small multi-residue randomization sites, and subjected to ISM using relatively large amino acid alphabets (AAAs).16a Mutants with good conversion (ca. 80%) and 97% 2β- or 96% 15β-selectivity were found, although activity was sub-optimal. It was also necessary to assay a total of ~9,000 transformants per substrate by automated HPLC analysis, which constitutes a consider-

able screening effort. In contrast to C2- and C15selectivity, steroid functionalization at position C16 is of particular synthetic value because the respective alcohols are components of biologically active glucocorticoids.17 Using rational design based on site-directed mutagenesis, Commandeur and co-workers engineered P450BM3 mutants in the hydroxylation of testosterone at positions 16α/β with selectivities of 60-85%, but with sub-optimal substrate conversion and 10-fold activity loss.18 Other CYPs have also been engineered or used to control regio- and stereoselective steroid hydroxylation at other positions.6h,19a-d However, they generally suffer from the same issue: High selectivity is observed only at low substrate conversion, while high substrate conversion compromises selectivity.12 A different approach without the use of mutagenesis is to use protected or functionalized substrates according to the Gringlconcept or to employ additives, but this technique has not been applied to steroids and cannot be expected to be general for this class of substrates.19d-f Thus, evolving both regio- and stereoselectivity as well as high activity for CYP-catalyzed steroid hydroxylation at position C16 (or at other targeted positions) remains a daunting task. In some of our previous “steroid mutant libraries” some weakly C16-selective P450BM3 mutants of very low activity were identified.16a Rather than using these mutants as starting templates for SM and applying the same ISMbased procedure for evolving possible improvements, we decided to explore an alternative strategy. We hoped that it would be more efficient while minimizing the screening effort. The basic idea is the utilization of mutability landscapes (MLs)20 as a rational guide for choosing optimal reduced AAAs in ISM experiments, in some cases flanked by molecular dynamics (MD) computations for additional assistance. This approach is different from previous strategies in which (R)- and/or (S)enantioselectivities of other enzyme types were evolved by applying SM at individual residues using, inter alia, NNK codon degeneracy encoding all 20 canonical amino acids.5,13 This enables positive mutations to be identified. The conventional procedure of combining positive mutations can then be applied, although this does not always work. Recently, we have used the information from such NNK-scans for further genetic optimization by employing the respective newly introduced amino acids as building blocks in reduced AAAs for SM at multi-residue sites.21 This technique, in contrast to simply combining positive mutations, focuses on more extensive yet rationally chosen protein sequence space and reduces the screening effort drastically. While in several cases successful, it should be remembered that only positive mutants are sequenced, which are then involved in subsequent decision-making. When focusing on enantioselectivity as a typical parameter, for example, no information is provided regarding possible mutations that influence other parameters such as activity, either in a deleterious or positive manner. In contrast to the traditional procedure based on NNKencoded SM at individual residues lining the binding pocket,13 MLs generate extensive additional information of significant value, if exploited properly. Massive sequencing data of MLs provide a complete fingerprint of the influence on selectivity as well as activity of all theoretically possible amino acid mutations at each position, thereby revealing both positive and negative effects. As

ACS Paragon Plus Environment 2

ACS Catalysis 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

already delineated, mutations that increase activity may exert an opposing effect on regio- or stereoselectivity, and vice versa, and this fundamental problem also pertains when considering regio- versus stereoselectivity. MLs have been used previously in an increasing number of directed evolution studies of other proteins.20 In some cases, positive mutations were combined with generation of multi-mutational variants. Our combined approach in the present study is fundamentally different because we utilized the ML-derived information in designing SM-based evolutionary pathways for efficient directed evolution. We chose testosterone (1) as the model substrate in a multi-parameter

Page 4 of 18

genetic optimization procedure encompassing activity, regioselectivity and diastereoselectivity at position C16 (Scheme 1). Minimization of tradeoffs was a central theme. We also aimed for some degree of generality regarding C16-selectivity by investigating four additional steroids without further mutagenesis experiments. Finally, by obtaining crystal structures of mutants, we hoped to gain insight into the origin of altered catalytic profiles. We note that simultaneous multi-parameter optimization not only constitutes a challenge in CYP-catalyzed reactions, but in directed evolution of selective enzymes in general.5,6

Scheme 1. Model reaction for regio- and diastereoselective control of testosterone (1) hydroxylation to generate the respective 16α (2) or 16β (3) alcohols. libraries5,13,21: 1) Split large randomization sites for SM into smaller ones and/or use reduced AAAs. 2) Identify beneficial mutations by acquiring experimentally mutational data. 3) Employ computational approaches to minimize library size in silico. A more extensive analysis with additional references is provided in the Supporting Information.

Brief Analysis of Current Directed Evolution Strategies. Before turning to the results, we briefly review the present status of directed evolution techniques, because this puts our study into the proper framework. Enzymes have been used for a long time for enantio- and regioselective transformations in organic chemistry and biotechnology, but this approach has suffered from the traditional limitations regarding the often observed poor or incorrect stereoselectivity, limited substrate scope and insufficient stability.22 Directed evolution has progressed during the last two decades to a point where these problems can be addressed and generally solved.5,6,13 The most common gene mutagenesis methods in directed evolution are error-prone PCR (epPCR, a shotgun technique), DNA shuffling (a recombination-based method) and SM with generation of focused mutant libraries.5 Despite impressive advances, reliable prediction of the influence of even a single mutation on protein function remains difficult23, not to speak of several in combination. Random methods like epPCR are often employed, especially when structural or functional knowledge is not available. However, such approaches usually require screening of large protein combinatorial libraries.24 In contrast, SM at sites lining the enzyme binding pocket (CAST) allows the creation of smaller libraries that are more likely to contain improved variants.13 If necessary, ISM can be employed, which means that the best mutant identified in a library at a given site is used as the template for SM-based randomization at another site.13 Indeed, it has been shown that SM is more efficient than epPCR and DNA shuffling for improving selectivity and/or activity.25 In view of Emil Fischer’s lock-and-key hypothesis and Daniel Koshland’s induced-fit model, this is not surprising. However, when targeting stereo- and regioselectivity, the labor-intensive screening step is the bottleneck,5,13 because selection systems such as phage display, and related techniques do not work in this area of directed evolution.26 Efficient methods for creating small and smart libraries are therefore needed. Different strategies have been developed to create such focused

RESULTS AND ANALYSES Learning from Sequence-Function Relationships. As reported earlier,16a we started with mutant F87A, which unselectively leads to equimolar amounts of 2βand 15β-hydroxytestosterone. Previously, we used this template to build three combinatorial libraries for possible ISM experiments: Libraries A (R47/T49/Y51) and C (M185/L188) were randomized using NDT codon degeneracy (12 amino acids), whereas NNK was used for constructing library B (V78/A82). Residues R47 and Y51 interact with the carboxylate group of the natural fatty acid substrates near the mouth of the substrate entry cavity, whereas residues V78/A82 are located at the active site of the heme domain (see below). Since mutations to small amino acids (Gly/Cys/Ser) in residues M185/L188 (library C) correlated mostly with high conversion of 1 (50-80%) but not with selectivity,16 we have now investigated in more detail the role of each amino acid substitution at the five residues of previous libraries A and B (R47, T49, Y51, V78, A82) by building a ML with mutant F87A as the template. In total, 95 mutants were constructed and screened using whole cells as described in the SI (Figure 1). Interestingly, mutations R47W, T49E, and A82L/F/E/Q enhance substrate conversion by ~2-3-fold, whereas polar (T/Y/H/ K/R/D/E/Q/N) substitutions at position 78 diminish significantly or completely abolish activity regardless of the side-chain length. Moreover, regioselectivity for position 2β or 15β is controlled by many diverse substitutions at positions Y51 or V78/A82, respectively. In contrast, A82W is the only mutation that shifts regi-

ACS Paragon Plus Environment 3

Page 5 of 18 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Catalysis

oselectivity significantly (3%→41%) towards position 16β without compromising activity. These results indicate that only a minor fraction of the mutations are beneficial, while the majority of mutations are neutral and many are entirely deleterious in terms of activity and regioselectivity. Moreover, the emergence of mutation A82W for shifting regioselectivity confirms its importance regardless of genetic background, as substantiated by subsequent data (see below). These lessons are important for designing reduced AAAs.

bridges and hydrophobic contacts and are thus not ideal biocatalysts for industrial applications.27 For these reasons, we hypothesized that ISM-based CASTing would engineer highly active, selective and stable biocatalysts by starting with the WT enzyme. Library Design and Statistical Analysis. Having analyzed sequence-function relationships from 11 residues systematically (R47, T49, Y51, V78, A82) and partially (S72, F87, M185, L188, A330 and L437), we considered more extensive SM. However, it may not be necessary to mutagenize all residues. Since R47W and T49E improve activity significantly, but the former mutation exhibits higher selectivity for product 3 (dubbed 3-selectivity) (Figure 1), position T49 was not considered. Given that L181, L188 and M185 are close to each other, we decided to eliminate the latter because L188 is present in mutants M01/M11/F87A, while keeping L181 due to its proximity to V78 and A82 (see below), which may enable potential synergistic interactions.28 Finally, 10 residues were chosen. Before considering residue grouping, AAA size and identity, we applied the Patrick and Firth29 as well as Nov30 metrics in two library designs that were successfully used in various enzymes5,13,21 to estimate screening effort at different degrees of library coverage assuming absence of amino acid bias (Table S2). Table S2 shows the effort required for screening combinatorial libraries of up to 10 residues randomized to 2 or 4 amino acids. In the case of the binary AAA (2-AAA), screening can be quite low (767 samples) when coverage is 53% but increases by about 4-fold (3,076 samples) with 95% library coverage. Such a library maximizes the chances of reshaping a binding pocket while decreasing the structural diversity at each site. On the other hand, an AAA of 4 members (4-AAA) maximizes a more localized structural diversity, but it increases the screening effort dramatically, irrespective of library coverage. To resolve this issue, ISM could be used by screening two or more multi-sites composed of 4 residues randomized with 4-AAAs, as only 766 samples are necessary for 95% coverage. Although this is a practical number, the effort rises proportionally to the number of small libraries. In view of previous observations that extensive library coverage is not necessary for obtaining acceptable mutants16b, we divided the 10 target residues in two multisites (A and B), each composed of 5 residues. Their randomization with a reduced 4-AAA requires 767 samples for 53% library coverage, which is a practical number given our low-throughput HPLC screening method of 8-min per sample (see SI). This decision led to the question as to which residues should be included in groups A and B. This grouping was chosen so that the respective residues are spatially close to one another, allowing maximal probability of cooperative effects. However, primer costs should also be considered if non-degenerate codons are used (especially if two or more contiguous codons are simultaneously targeted with a large AAA). Since mutagenesis of F87 toward a small residue is crucial for accepting substrate 1, this residue was assigned to group A. This means that we planned to start with WT P450BM3 as the parent gene. Residues S72 and A82 were also assigned to this first group because mutations S72I and A82W are required for the formation of products 2 and 3, respectively. We also included R47 and L437 in this group because the former could act as

Figure 1. Mutability landscape of P450BM3 mutant F87A toward testosterone (1) hydroxylation. The five target residues are indicated on the left side, while the traits under investigation are shown in percentage on the right: Substrate conversion (black) as well as selectivity towards 2β- (red), 15β- (blue) or 16β-hydroxytestosterone (green) are shown. The values of the parent enzyme are shown in squares. Other oxidation sites occur minimally at positions 19, 1β and at other unknown positions ( 2β- > unknown products; 6α-, 15α-, 19, 6β, 7α-, 2α-, 11α-, 1β, 11β, 2β or 4-hydroxytestosterone were not detected. The values are the mean of three independent experiments displaying the standard error mean (n=3). Reaction conditions: 1 mM 1 in 600 µL KPi buffer pH 8.0 for 24 h/220 rpm/37 °C. Average OD values: ~3.0.

We next chose the 2-selective LIFI (R47L/S72I/A82F/ F87I) and 3-selective WWV (R47W/ A82W/F87V) mu-

tants as templates for ISM. We generated two more B libraries, this time using a two-step QuikChangeTM reac-

ACS Paragon Plus Environment 6

ACS Catalysis 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

tion for SM, but the initial quality control revealed suboptimal results, even after various optimization attempts (Figure S3). We nevertheless screened 950 colonies per case (1,900 transformants in total). Interestingly, the addition of mutations Y51W/L181C to LIFI (LIFI-WC) enhanced both 2-selectivity (83→93%) and activity (83→90%), while the single L181Q mutation in WWV (WWV-Q) enhanced 3-selectivity (71→85%), but decreased conversion somewhat (91→84%) (Table 2). Although the catalytic profiles are already quite good, further improvements seemed possible. At this stage, instead of more extensive screening, we turned to MDs with the aim of understanding the role of the mutations and with the hope of gaining computational guides for additional protein engineering. It should be noted that in simultaneous genetic multi-parameter optimization, it is not optimal to choose a mutant for further mutagenesis that has the very best score in one property (e.g., activity). Rather, as practiced here and noted earlier,33 rational compromises in terms of balanced (minimal) tradeoffs need to be made. For instance, mutants LIFI-WQM and WWW-M show the highest respective conversion values of 1 towards 2 (98%) and 3 (92%), but the selectivity is ca. 4% lower than in mutants LIFI-WC and WWV-Q. Molecular Dynamics Simulations. Traditionally, MD simulations are performed to investigate the origin of stereo- and regioselectivity of biocatalysts,5,6 but seldom for guiding the actual engineering process, with a few recent examples.34 MDs were performed on the LIFI-WC (R47L/S72I/A82F/S72I-Y51W/L181C) and WWV-Q (R47W/ A82W/F87V-L181Q) mutants. Preliminary experiments on the apo enzyme models were carried out to allow relaxation of the protein in the presence of the introduced mutations. Three 50 ns unrestrained simulations were performed, followed by clustering of these geometries. The root mean square deviation (RMSD) of the backbone alpha carbon atoms from the initial structure was measured for the MD simulations of both mutants (LIFI-WC and WWV-Q) in the reaction with 1 (Figures S4-S5). The values are small and oscillate around a fixed value for most of the simulations, suggesting stable model systems. The root mean square fluctuations (RMSF) of the backbone atoms of the individual amino acids were averaged over the simulation. The regions of high RMSF correspond to the loops between the alpha helices (Figures S6-S7). Docking of 1 to the representative structures from the clustering calculation was performed before further MDs were carried out on the enzyme/substrate complexes. As a control, docking and simulations were also performed with the WT enzyme; no productive 1 binding modes were observed (data not shown), in agreement with the lack of activity for WT P450BM3 with testosterone. This is due to steric hindrance around the heme caused by the F87 side chain. In contrast, productive binding orientations were observed for 1 in the LIFI-WC and WWV-Q mutants that are consistent with the observed selectivity (Figure 3), i.e., poses were observed in which hydrogen abstraction leading to formation of 2 and 3 would be favored (Tables S3-S4 and Movies S1-S2). Moreover, a hydrogen bonding interaction is observed between the carbonyl oxygen of 1 and the Q73 side chain in the LIFI-WC variant (Figure S8), whereas in WWV-Q a hydrogen bonding interaction is observed between the carbonyl oxygen atom of 1 and the S72 side chain (Figure S9)

Page 8 of 18

Figure 3. Substrate docking in models of the engineered P450BM3 mutants. Active sites of the a) LIFI-WC and b) WWV-Q mutants are shown with testosterone (green) docking in the poses showing the highest scores. The mutated residues are shown in space-fill format (magenta). Mutant models were created based on the 1BU7 WT crystal structure.35 The two models shown display 1 in binding poses that prove to be consistent with the experimentally observed selectivity, with the hydrogen atom to undergo abstraction closest to the catalytically active oxidant Compound I.36,37 Starting from the above docking poses, the hydrogen bonding interactions observed between the carbonyl group of 1 and the side chains of the mutant enzymes are not present during the entire MD trajectories (Figures S10-S11). This suggests that the active site is not entirely optimized for binding of 1 in positions that would result in exclusive hydroxylation at their respective position. This is further supported by the data in Table S5, which show the average distance between the O-atom of the catalytically active species heme-FeIV=O porphyrin radical cation (Cpd I)1 and the hydrogen atoms at the 16α and 16β positions. Apart from one of the three LIFIWC simulations, all simulations have a smaller value for the average O-H16β distance, compared to O-H16α. Remarkably, the docking and MD calculations can explain the roles of the majority (e.g., R47, S72, A82, F87, L181Q) but not of all the mutated residues (Y51, L181C). For instance, residues Q73 (caused by mutation S72I) and S72 form a H-bond to the carbonyl group of 1 to position the substrate in the right orientation in mutants LIFI-WC and WWV-Q, respectively. Likewise, mutation L181Q enables formation of a H-bond to residue T436 in

ACS Paragon Plus Environment 7

Page 9 of 18 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Catalysis

mutant WWV-Q (see below), but the role of mutation L181C in LIFI-WC is not clear. Finally, introducing L/W in position 47 breaks a salt bridge to Q352, whereas substitutions I/V at position F87 remove steric congestion around the heme and F/W at position A82 reshape the active site, as reported elsewhere.9f However, in the latter cases it is unclear why one specific mutation is required for each mutant. To better understand these “side-chain differences”, we swapped one amino acid to the corresponding one in each mutant. In LIFI-WC, L47 was mutated to W and W47 to L in WWV-Q, and so on for residues 51, 82, 87, and 181. The data are listed at the bottom of Table 2. Importantly, mutation L47W increases 2-selectivity (93→96%) and conversion (90→95%), but the inverse mutation W47L decreases both 3-selectivity (85→82%) and conversion (84→66%), indicating that Trp at position 47 is crucial to oxidize 1 with great efficiency regardless of selectivity. The remaining four mutations in LIFI-WC compromise severely activity without affecting 2-selectivity, except when I87 was mutated to Val (LIFV-WC), where selectivity changed to a mix (~2:1) between 16β and 15β hydroxy1. In the WWV-Q-derived mutants, both enzymatic traits are compromised except in mutant WVF-Q (W82F) that shows an increase in 1 conversion by 8%, but a severe decrease of 3-selectivity (85→19%), indicating the presence of strong tradeoffs, and that other mutations elsewhere are needed to overcome these effects. At this stage, one goal out of two has been reached, with mutant WIFI-WC (R47W/S72I/A82F/S72I-Y51W/L181C) being 96% selective for 2 while catalyzing almost complete 1 conversion (95%). Mutant WWV-Q, however, still required improvement. During the MD simulations of variant WWV-Q, the formation of a hydrogen bond between the amide oxygen group of the newly introduced glutamine at position 181 and the hydroxyl group of residue T436 was observed (Figure 4), as supported by their average distance (Figure S12), compared to that between the side chain nitrogen atom of Q181 and the hydroxyl oxygen atom of T436 (Figure S13). In fact, hydrogen bonding is also observed between the side-chain amine hydrogen atom of Q181 and the sulfur atom of M177. Since these interactions influence substrate orientation above the hemeFeIV=O, we performed mutagenesis experiments with the goal of improving 3-selectivity and activity.

Figure 5. Mutability landscapes (MLs) of 3-selective mutants. a) Mutant WWV-Q was used as a template to construct all possible single mutants at position T436. Several mutations cause reduced activity (S/I/V/P/G), selectivity (E/Y/F/W) or both (K/L/M), while others increase activity slightly (R), selectivity (S/I/V/P/G) or both (C/D/Q/A/H). b) Mutant WWV-QR was used as template for creating all variants at position M177. Most mutations to small non-polar (G) and polar amino acids (S/D/N/E/K/T/Q) cause enhanced activity and selectivity regardless of the side-chain size. Nonpolar substitutions of medium (I/L) and large (W/F) side-chains retain activity but impair selectivity. The bars show the standard error mean (n=3). Reaction conditions: 1 mM 1 in 600 µL KPi buffer pH 8.0 for 24 h/220 rpm/37 °C. Average OD values: ~3.0.

Figure 4. MD simulations guide engineering efforts. Hydrogen bonding is observed between the side chain

NNK-randomization is typically performed in such endeavors, but we constructed MLs20 using the parent enzyme WWV-Q to gain a deeper insight of the reactivity of the substrate in the active site. Residue T436 was addressed first (Figure 5a). Highly selective mutants were found (89-93%), but the activity was decreased significantly (72-45%), indicating the presence of strong tradeoffs. On the other hand, mutation T436R enhances conversion (84→92%) without compromising 3selectivity (85%). For this reason, we chose the most active variant displaying the weakest tradeoff effects, i.e., WWV-QR (WWV-Q plus T436R). Next, we constructed another ML, but this time at position M177 (Figure 5b). Remarkably, amino acids with small side-chains (S/G) improved 3-selectivity (85→92%) and conversion (92→95%), while large ones (F/W/Y), retained conversion (87-91%) at the expense of 3-selectivity (64-71%), illustrating the existence of strong

oxygen atom of Q181 and the hydroxyl hydrogen atom of T436 as well as between the side-chain amine hydrogen atom of Q181 and the sulfur atom of M177, suggesting that these residues are important for substrate orientation in the active site, and are therefore potential mutagenesis targets.

ACS Paragon Plus Environment 8

ACS Catalysis 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

diminishing returns. That is, the closer to the optimum, the smaller the benefit per mutation. At this stage, we had achieved our second goal, namely engineering a highly active (92% conversion of 1) and 3-selective (92%) enzyme WWV-QRS (R47W/A82W/F87VL181Q/T436R/M177S).

Page 10 of 18

tants, catalytic systems for hydroxylated steroid production are unlikely to function efficiently using simple NADPH addition to drive the reaction. In the standard whole-cell reactions, a system based on NADP+, glucose and glucose dehydrogenase to regenerate NADPH was used. To determine the catalytic profile with this regeneration system, the mutants were challenged in vitro in the same conditions as for the whole-cell reactions (1 mM substrate), resulting in TTN of 8,660 and 9,044 with conversion values of 92 and 97% as well as selectivity of 94 and 93% for 2 and 3 by mutants WIFI-WC and WWVQRS, respectively. These data show that the NADPH regeneration system is effective in improving the performance of the engineered enzymes. Analysis by differential scanning calorimetry (DSC) indicated that the steroid-free WWV-QRS mutant had a Tm of 52.4 ± 0.1 °C, rising to 55.4 ± 0.1 °C on binding 1. The steroid-free WIFI-WC mutant had a Tm of 55.1 ± 0.1 °C, but a decreased Tm (52.3 ± 0.1 °C) on 1 binding, a phenomenon possibly related to the presence of two 1 binding sites in this mutant. These values are ~10°C lower than WT heme in presence of its native substrate,40 but still acceptable for practical applications in whole cell catalysis. The binding of 1 to both enzymes was also investigated by electron paramagnetic resonance (EPR) spectroscopy (Figure S17). The X-band EPR analyses reveal rhombic EPR spectra for both WIFI-WC and WWW-QRS mutant in their native forms, and following the addition of ethanol (2%, as solvent for 1) and then of 1 itself. In all cases, the spectra are characteristic of correctly folded, low-spin ferric P450 enzymes that have cysteine thiolate as the proximal ligand to the heme iron. The low-spin spectra of both native mutants are perturbed most significantly by ethanol addition, but there is no significant high-spin, ferric heme iron signal observed for either mutant; a phenomenon likely due to the low temperature required for EPR spectral analysis of P450s and other heme proteins. To obtain further information on the binding modes of 1, attempts were made to co-crystallize the 2-selective WIFI-WC and 3-selective WWV-QRS mutant heme domains in complex with 1. Crystals of the WWV-QRS variant heme domain bound to 1 could not be obtained, but successful crystallization of the WIFI-WC heme domain complex with 1 was achieved (Figure 6). The WIFIWC mutant heme domain crystals diffracted to a resolution of 2.1 Å (see Table S7 in SI for details). These data reveal additional electron density in the 2-selective mutant active site for both monomers present in the asymmetric unit that clearly corresponds to the presence of two 1 ligands. One of these is located directly above the heme (1A) with the C16 near the heme iron (Figure 6a). The second one (1B) is located adjacent to 1A and occupies the more distant region of the active site near to the R47-Y51 fatty acid carboxylate binding motif. The 2selective mutant-1 complex is most like the structure of the WT P450BM3 heme domain in complex with a “decoy” molecule,41 N-perfluorododecanoyl-L-tryptophan, and an overlay reveals that 1A occupies the perfluorododecanoyl binding pocket while 1B is near the L-tryptophan binding region for this decoy molecule (Figure 6b). The overlay illustrates how individual mutations each contribute to 1 binding: S72I, A82F and F87I form the 1A binding pocket, while R47W, Y51W and S72I contribute to the hydrophobic 1B binding site. There is no direct con-

Biochemical, X-ray and EPR Structural Characterization of Best Mutants. Using pure enzymes (Figure S14), we first determined the dissociation constant (Kd) for 1 binding by the enzymes in both the full length and heme domain versions using UV-Vis spectroscopic titrations (Figure S15): WIFI-WC has Kd values of 0.4 ± 0.1 and 0.49 ± 0.06 µM, respectively; whereas WWV-QRS has Kd values of 0.99 ± 0.11 and 1.12 ± 0.03 µM, respectively, indicating that the CPR domain does not affect substrate binding in the heme domain. Regardless of the construct, the 2-selective mutant has ca. 2-fold stronger binding affinity for 1 compared to the 3-selective one, suggesting that the former enzyme may have a better catalytic profile. Steady-state kinetics were also determined for the full-length proteins using a NADPH regeneration system (Figure S16). Interestingly, WIFIWC gives a typical Michaelis–Menten saturation curve for formation of 2 with Km, kcat and kcat/Km values of 0.28 ± 0.02 µM, 11.56 ± 0.19 s-1 and 4.13 x 107 M-1 s-1, respectively. However, WWV-QRS shows a different behavior for generation of 3 with a relatively high turnover frequency at low (ca. 2 µM) substrate concentrations (ca. 13 s-1), but decreased rates at higher 1 concentrations with respective Ki, Km, kcat and kcat/Km values of 0.73 ± 0.19 µM, 2.53 ± 0.69 µM, 62.12 ± 14 s-1 and 2.45 x 107 M-1 s-1, suggesting substrate inhibition, i.e., tight binding of 1 and/or the hydroxylated product(s),38 a phenomenon that has been observed by the Sligar group in steroid oxidation by other CYPs.39 To establish whether NADPH oxidation is well coupled to steroid substrate oxidation in these mutants, further studies were done to quantify hydroxylated product formation in relation to NADPH oxidation. Hence, NADPH oxidation in absence of steroid substrate was measured to determine the “leak rate”; that is, how fast oxygen reduction occurs due to electron transfer through P450BM3 flavin/heme cofactors. Rate constants were then compared for 1-bound enzymes, and particularly for the characterization of the 3-selective mutant (Table S16). In the absence of substrate, mutant WWV-QRS shows a ca 6-fold enhancement in NADPH leak rate (NLR) compared to WIFI-WC (38 vs. 6 nmol s-1). However, in the presence of 1, the values are quite similar (24 vs. 22 nmol s-1). The product formation rate of 3, however, is ca. 4.5 times faster than that of 2 (9 vs. 2 nmol s-1), resulting in higher 1 final conversion values of 33 vs. 8%, respectively, with 1 equivalent of NADPH. Within two minutes, ~154 nmol NADPH are depleted, but only 16 and 65 nmol of 1 are consumed, indicating a coupling efficiency of 10% and 42% for WIFI-WC and WWV-QRS, respectively. WWV-QRS shows a higher NLR than WIFI-WC, but in the presence of 1 the coupling efficiency of WWV-QRS is ~4-fold higher, indicating not only that non-specific NADPH oxidation is minimized in the 1-bound form, but also that this enzyme is more efficient than the WIFI-WC variant when NADPH is limiting. In view of uncoupling of NADPH oxidation from substrate oxidation in the WIFI-WC and WWV-QRS mu-

ACS Paragon Plus Environment 9

Page 11 of 18 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Catalysis

tact between the testosterones and L181C. Compared with the WT N-perfluorododecanoyl-L-tryptophan complex, the N-terminal region of the heme domain B’-helix has adopted an extended loop structure in the 2selective mutant-1 complex. This allows for expansion of the active site to enable the binding of 1A, and mutation S72I is in this region. Figure S18 shows a stereo-view of the binding modes of 1A and 1B in the substrate binding cavity of the WIFI-WC heme domain. Comparison of ISM and Rational Design. To compare our engineered enzymes with the M01 and M11derived mutants reported by Commandeur et al.,18 we created via site-directed mutagenesis not only the highly active 2- or 3-selective variants, but also intermediate mutants (Table 3). All variants were screened using whole cells as described in the experimental section. The most selective and active mutants reported are M01/S72I/A82W (M01-IW) and M01/A82W (M01-W), which in pure form are ca. 85% 2- and 3-selective, respectively, but 2.6- and 8.3-fold less active than M11, the most active variant reported.18e In whole cells, mutants M01-IW and M01-W show 49% or 67% 2- or 3-selectivity and 37% or 79% 1 conversion, respectively. Clearly, these enzymes show poor selectivity and conversion in whole cells, which is often the preferred industrial method for expressing biocatalysts, because additional and costly steps like cell disruption, centrifugation and separation are avoided. In fact, the product formation rates reported for M01-IW and M01-W towards 2 and 3 are 0.048 and 0.11 nmol s-1, respectively.18e Our WIFI-WC and WWV-QRS mutants show respective values of 2 and 9 nmol s-1 for the formation rates of the 2 and 3 alcohols, i.e., 42- and 80-fold enhancements compared to M01-IW and M01-W.

Conclusively, our results indicate that whole cell systems can significantly improve the performance of suboptimal enzymes. Although ISM as originally developed appears to be more successful than epPCR and rational design for engineering highly active and selective biocatalysts, in the present study the efficiency had to be improved further for significant results: Our present strategy required the screening of about 3,000 samples (Figure S19), which is 3-fold lower compared to previous ISM approaches.16 Conversely, using the semi-rationally designed triple mutant R47L/F87V/L188Q, 1,400 colonies were screened after 3 epPCR rounds to obtain the M01 and M11 parent mutants.18 However, these enzymes are somewhat unstable27 and might require extensive screening effort to improve not only stability, but also activity and selectivity. Finally, we assessed the potential utility of our enzymes evolved via ISM, MLs and MDs vs. epPCR and rational design by performing upscaling reactions with 30 mg of 1 in 100 mL (ca. 1 mM). Mutants WIFI-WC or WWV-QRS convert almost all substrate (84-99% conversion) into products 2 or 3 (91-95% selectivity) within 6 h, whereas mutants M01-IW and M01-W 18e show low conversion (40-50%) as well as 72 and 82 % selectivity, respectively, and these values were not improved even after 30 h reaction (Figures S20S24). Using our best mutants, we isolated 23.3 and 7.7 mg (yields of 77% and 27%) of 2 and 3, respectively. We faced technical issues during the product isolation procedure, but these were not optimized because the main goal of this experiment was to compare in a large scale setup the catalytic profile of our evolved enzymes with the reported ones.18

Figure 6. X-ray structure of the WIFI-WC mutant and 1 binding mode comparison with that of N-perfluorododecanoyl-Ltryptophan. a) Cartoon representation of the 2-selective mutant-testosterone complex. Electron density for both testosterone ligands is shown as a blue mesh contoured at 1 sigma. The C-alpha atoms of the 6 mutations are shown as magenta spheres. b) Overlay of the active site regions of the 2-selective mutant-testosterone complex with the WT P450BM3 N-perfluorododecanoyl-L-tryptophan complex (Watanabe, et al., PDB 5B2W, DOI: 10.2210/pdb5b2w/pdb). The individual mutations (in blue) and corresponding WT residues (in green) are shown in sticks with the corresponding ligands respectively in yellow (testosterone) and magenta (N-perfluorododecanoyl-L-tryptophan). Table 3. Subset of mutants reported to be 2- and 3-selective in a cell-free system.18e

ACS Paragon Plus Environment 10

ACS Catalysis Regio- and stereoselectivity (%)

G G G

I I

I I I

G G G

E267

L188

E143

F87

A82

F81

W W W W

I V

Q Q Q Q Q Q Q Q Q Q

V V V V V V V V

1 Conv. (%)

L L L L L L L L L L L

I V V V V V V V V I V V I

G415

WT I V LV LVQ LVQS M01 M01-W M01-IW M01-VW M01-WI M11 M11-I M11-II

S72

Code

E64

Target residues and mutations R47

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 18



S S S S S S S S S

Inactive 1±1 9±3 14±2 39±5 45±7 52±5 79±8 37±7 2±2 26±5 77±2 28±6 13±3

0 14±4 16±2 15±1 17±2 23±2 1±1 14±4 0 2±1 14±1 38±8 12±3

15β

16α

16β

Others*

17±9 49±8 58±8 66±9 63±10 35±4 15±7 14±9 0 2±1 30±15 43±9 3±1

0 0 0 0 0 1±1 1±1 49±8 81±9 3±1 2±1 5±1 44±10

83±9 36±6 25±3 17±2 18±3 31±4 67±6 6±1 11±7 82±14 18±1 4±1 9±2

0 1±1 1±1 2±1 3±1 10±2 16±4 16±6 8±8 11±2 38±18 10±2 33±6

*The major species are unknown followed by 11α-, 2α- and 19-hydroxytestosterone. The values are the mean of three independent experiments displaying the standard error (n=3). Reaction conditions: 1 mM 1 in 600 µL KPi buffer pH 8.0 for 24 h / 220 rpm / 37 °C. Average OD values: ~3.0.

ACS Paragon Plus Environment 11

Page 13 of 18 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Catalysis

Scheme 2. Efficient mutants found for the C16-selective hydroxylation of other steroids without performing additional mutagenesis experiments: androstenedione (4), nandrolone (7), boldenone (10) and norethindrone (13) and their respective alcohols at position C16. Mutant identities are listed in Table 2. The conversion and selectivity values (% conversion as determined by HPLC analysis) arise from the original data-sets located in Tables S8-S11 (n=3). Reaction conditions: 1 mM 4/7/10/13 in 600 µL KPi buffer pH 8.0 for 24 h/220 rpm/37 °C. Average OD values: ~3.0. Exploring Substrate Acceptance by Using Four Other Steroids. Although universal catalysts for selective transformations cannot exist due to fundamental reasons, organic chemists seek catalysts that are efficient not just for a single substrate. Therefore, we tested four other steroids in the quest to achieve regio- and diastereoselectivity at position C16, specifically androstenedione (4), nandrolone (7), boldenone (10) and norethindrone (13). Instead of screening all 3,000 mutants, we focused on the 24 improved mutants that are listed in Table 2. In general, many different mutants at each ISM round exhibited excellent conversion and selectivity values (Tables S8-S11). The most efficient mutants are summarized in Scheme 2. In the case of 4, variants WIFI-WC and WWV-HQM display excellent selectivity (95 and 100%) as well as conversion values (85 and 93%) towards the 16α (5) and 16β (6) positions, respectively. Substrate 7 was converted to the 16α (8) alcohol with high diastereoselectivity (98%) and acceptable conversion (71%) using LIFI-CW, whereas WWV-Q produced 16β-hydroxy-nandrolone (9) with good selectivity (90%) and conversion (91%). Substrate 10 was also hydroxylated at positions 16α (11) or 16β (12) with 97% selectivity/70% conversion or 72% selectivity/71% conversion by mutants LIWI-CW and WWV-Q, respectively. Although no mutant was found to yield the 16α alcohol of substrate 13, mutant WMI yielded the 16β (14) alcohol with 91% selectivity and 59% conversion. In total, 96 samples were screened, providing a formidable outcome because no additional mutagenesis experiments were required. Notably, of the 7 products obtained, to our knowledge 4 have never been reported: 9, 11, 12 and 14. These compounds were characterized by NMR and LC-MS to determine the stereo-configuration and molecular mass (see methods). Two ways to induce further fine-tuning would be to either screen all 3000 mutants, or to employ the best mutants as templates for further ISM combined with MLs and/or MDs

decision making. As a result, unusually active variants were evolved that catalyze the hydroxylation of steroids regioselectively at the desired C16 position with pronounced α- and β-diastereoselectivity on an optional basis. Thus, the problem of multi-parameter optimization was solved, certainly in the present study. Along the various evolutionary trajectories, we observed pronounced tradeoffs and diminishing returns on both enzymatic traits, activity and selectivity. These phenomena, which are common in protein engineering and directed evolution efforts, and still constitute widespread bottlenecks,14 illustrate the complexity of simultaneously evolving proficient biocatalysts for more than one catalytic trait. In other ISM studies, this issue has been addressed by exploring many possible evolutionary pathways,13 but the typical approach requires extensive screening of combinatorial libraries. Such trading-off screening does not contribute to “greener” directed evolution. With reduced screening effort, MLs also enable a deeper understanding of landscape regions that are more prone to larger tradeoffs than others. This is the reason why we chose mutant WWV-QR (showing diminishing returns on activity without selectivity tradeoffs) as template for the next ISM cycle. Although MLs are more expensive to construct than NNK-based SM or even epPCR libraries, they provide extensive information which enables an excellent understanding of sequencefunction relationships (as exemplified by mutation R47W that is present in two different mutants). A universal strategy to minimize total costs cannot be proposed, because each project is likely to be different, depending upon such factors as the screening method, expenditure for primers and specific labor costs in academia versus industry. In fact, if the screening step is too expensive, MLs can be cost-effective.32a Moreover, since the cost of DNA synthesis and sequencing is decreasing, we expect that the use of MLs as described herein will make this approach cheaper in the future, so that the user can benefit from the information regarding sequence-function relationships as a basis for rationally designing directed evolution experiments.20 Because each unique mutant is sequenced and located in a defined well of a micro-titer plate, MLs enable fast screening with new substrates and/or to look at other traits such as stability with the lowest screening effort. In contrast to our choices in defining hits to be used in further SM experiments, the most selective mutants exhibited strong tradeoffs on conversion, suggesting that many more engineering steps would be required for increasing activity. Again, focusing solely on selectivity does not constitute the optimal strategy.16,33 Overall, our combined directed evolution concept proved to be superior to previous protein engineering approaches. Most importantly, the invariably occurring tradeoffs between selectivity and activity were avoided. Mutants WIFI-WC and WWV-QRS display kcat/Km values of 4.13 x 107 and 2.45 x 107 M-1 s-1, respectively, which are close to those of WT P450BM3 towards long-chain fatty acids as natural

DISCUSSION AND OUTLOOK The goal of this study was to demonstrate how optimally chosen directed evolution techniques can be combined to develop an efficient strategy that enables the generation of small, high quality libraries of active P450BM3 mutants for efficient steroid hydroxylation at the specifically targeted C16 position. The strategy makes use of known mechanistic information such as X-ray structural data, exploratory mutational experiments based on mutability landscapes (MLs)20 for choosing optimal reduced amino acid alphabets (AAAs) for saturation mutagenesis (SM) at residues lining the binding pocket (CAST)13 and iterative saturation mutagenesis (ISM)13,16,21 in which the choice of the evolutionary pathways was made semi-rationally. At appropriate points, MD simulations served as additional guides. A few CAST residues had been used earlier in other SM studies, but the specific mutations did not serve as a guide in our

ACS Paragon Plus Environment 12

ACS Catalysis 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

substrates (5-6 x 107 M-1 s-1).6b,42 Both mutants exhibit TTNs close to 9,000, suggesting their potential use in industrial applications. The structure of the 1-bound WIFI-WC P450BM3 mutant heme domain shows that 1 occupies two different positions in the P450, one of which is clearly catalytically relevant and lies relatively co-planar with the P450 heme (1A), positioning its C16 close to the heme iron. The second molecule (1B) lies in the active site entry channel and is orientated orthogonally to 1A. The 1B molecule may stabilize the binding of 1A but is also likely to impede product dissociation and so lead to slow catalysis. In CYP21A2, two progesterone molecules were likewise located in the access channel and active site,43 suggesting that future MDs may require docking of more than one substrate. Although steroid-specific hydroxylating CYPs of eukaryotic and bacterial origin exist, the former frequently show recombinant expression difficulties and low catalytic efficiency, while the latter have a preferred selectivity for attacking on the β “face”.44 Upon screening 213 bacterial CYPs, only 24 (~11%) showed high stereo and regioselectivity at various positions, of which only three were 2-selective.44 Inspired by this study, two new 2selective CYP154s were recently characterized45: CYP154C5 from the bacterium Nocardia farcinica enables the formation of 2, yet its TTN of