The extreme structural plasticity in the CYP153 subfamily of P450s

overnight at 37 °C. The following day, the culture was inoculated into LB broth (1:50) ... To prepare cell-free extracts for activity screening (eith...
0 downloads 0 Views 4MB Size
Subscriber access provided by University of Sunderland

Article

The extreme structural plasticity in the CYP153 subfamily of P450s directs development of designer hydroxylases Filippo Fiorentini, Anna-Maria Hatzl, Sandy Schmidt, Simone Savino, Anton Glieder, and Andrea Mattevi Biochemistry, Just Accepted Manuscript • DOI: 10.1021/acs.biochem.8b01052 • Publication Date (Web): 06 Nov 2018 Downloaded from http://pubs.acs.org on November 7, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

The extreme structural plasticity in the CYP153 subfamily of P450s directs development of designer hydroxylases Filippo Fiorentinia, Anna-Maria Hatzlb, Sandy Schmidtb, Simone Savinoa, Anton Gliederb, Andrea Mattevia,*

a

Department of Biology and Biotechnology, University of Pavia, via Ferrata 9, 27100 Pavia, Italy

b

Institute of Molecular Biotechnology, Graz University of Technology, Petersgasse 14, 8010 Graz,

Austria

*

Correspondence to: Andrea Mattevi, Department of Biology and Biotechnology, University of

Pavia, Via Ferrata 1, 27100 Pavia, Italy; [email protected] and Institute of Molecular Biotechnology, NAWI Graz, Graz University of Technology, Petersgasse 14, 8010, Graz, Austria; [email protected] Heading/Running Title: A toolbox of CYP153 enzymes Abbreviations: CYP, heme-dependent monooxygenase domain of cytochrome P450 enzymes; Fdx, ferredoxin; FdR, ferredoxin reductase; cand_1, CYP153 from Pseudomonas sp. 19-rlim; cand_10, CYP153 from Phenylobacterium zucineum; cand_15, CYP153 from Novosphingobium aromaticivorans; SRS, substrate recognition site.

Keywords: substrate recognition, drug metabolism, conformational change, heme, monooxygenase

1 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 30

ABSTRACT CYP153s are bacterial class I P450 enzymes traditionally described as alkane hydroxylases with high terminal regioselectivity. They have been more recently shown to also catalyze hydroxylations at nonactivated carbon atoms of small heterocycles. The aim of our work was to perform an extensive characterization of this subfamily in order to deliver a toolbox of CYP153 enzymes for further development as biocatalysts. Through the screening of recently sequenced bacterial genomes, twenty CYP153s were selected, comprising 17 single monooxygenase domains and three multi-domain variants, where the monooxygenase domain is naturally fused to its redox partners in a single polypeptide chain. The 20 novel variants were heterologously expressed and their activity screened towards octane and small heterocycles. More extended substrate characterization was then performed on three representative candidates and their crystal structures were unveiled and compared with those of the known CYP153A7 and CYP153A33. The tested enzymes displayed a wide range of activities, ranging from Ω and Ω-1 hydroxylations of lauric acid to indigo-generating indole modification. The comparative analysis highlighted a conserved architecture and amino acid composition of the catalytic core close to the heme while showing a huge degree of structural plasticity and flexibility in those regions hosting the substrate recognition sites. Although dealing with this type of conformational variability adds a layer of complexity and difficulty to structure-based protein engineering, such diversity in substrate acceptance and recognition promotes the investigated CYP153s as a prime choice for tailoring designer hydroxylases.

INTRODUCTION Hydroxyl-functionalized hydrocarbons are widely employed in chemical industry due to their use as versatile solvents, plasticizers, surfactants, and diol precursors for polymer intermediates and the synthesis of fine chemicals and pharmaceuticals.1 The production and application of these modified hydrocarbons involves the hydroxylation of non-activated carbon atoms, a reaction which is often hard or impossible to attain with the desired regio- and stereo-selectivity.2 In this context, the cytochrome P450 enzymes of the CYP153 subfamily are particularly attractive because they are endowed with the natural ability to selectively hydroxylate medium-sized alkanes using dioxygen as a green oxidant3-5 CYP153s are bacterial class I P450 enzymes that operate as three-component systems, comprising the heme-dependent monooxygenase core (CYP) and two additional redox proteins and/or domains, namely an iron-sulfur electron carrier (ferredoxin, Fdx) and a FAD-containing reductase (ferredoxin reductase, FdR), which transfer electrons from NAD(P)H to the monooxygenase active site.4 CYP153s are regarded as promising biocatalysts thanks to their superior (>95%) regioselectivity for the Ω-position compared to other CYP families, including engineered CYP102A1 (P450-BM3) and CYP102A3.3,6-13 If the hydroxylation of alkanes has been viewed as the prototypical reaction of CYP153s, various cyclic and aromatic compounds (e.g. limonene, cyclohexene, styrene) as well as medium/long-chain length fatty acids (e.g. lauric acid and palmitic acid) were also shown 2 ACS Paragon Plus Environment

Page 3 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

to be substrates for hydroxylation and/or epoxidation reactions by these enzymes.4,14-20 Recently, the crystal structures of CYP153A7 from Sphingomonas sp. HXN200 and CYP153A33 from Marinobacter aquaeolei guided the engineering of these two biocatalysts towards, respectively, the highly stereoselective 3’hydroxylation of N-benzyl pyrrolidine and the conversion of lauric acid to the Ω-hydroxylated product.9,19 In addition to these catalytic properties, a further valuable feature is that CYP153s are water-soluble proteins in contrast to membrane-bound P450s and AlkB-related enzymes that catalyze hydrocarbon hydroxylations. With the aim of expanding our knowledge on the CYP153 subfamily and generate a toolbox of biocatalysts acting on a pool of diverse substrates, ranging from aliphatic molecules to heterocyclic compounds, we have characterized a numerous set of new CYP153s. This work led us to discover that this P450 subfamily comprises a highly conserved catalytic core which, by contrast, is decorated by a set of very flexible protein regions that determine versatile substrate preferences associated to distinct regio- and stereo-selectivities. These features highlight a unique combination of structural plasticity and diverse substrate acceptance that make CYP153s very attractive for oxygen functionalization reactions of diverse molecules.

MATERIALS AND METHODS Gene cloning. Accession IDs from NCBI are listed in Table 1. The twenty genes coding for the CYP153s listed in Table 1 and for CYP153A6, CYP153A7, CYP153A33, and Fdx and FdR of CYP153A6 were purchased as gBlocks gene fragments from IDT (Integrated DNA Technologies). The cDNAs were amplified by PCR and inserted into the desired expression plasmids by Gibson Assembly technique. The gene for each candidate was inserted into modified pBID plasmids designed either for the co-expression (under P_tac promoter) with the redox partners of CYP153A6 (under P_trp promoter) or the expression of the protein fused in a single polypeptide chain with the cytochrome reductase domain of the CYP505 from Aspergillus fumigatus (under P_tac promoter). The genes of cand_1, cand_10, and cand_15, CYP153A33, and CYP153A7 were cloned into pET-28 vectors in order to add an N-terminal purification His6-tag. Final clones were selected through ampicillin resistance and plasmid sequencing. Protein expression. E. coli BL21 cells were transformed by heat shock with the appropriate plasmids. Cells from a single colony were pre-inoculated into LB broth containing ampicillin 100 μg/ml and grown overnight at 37 °C. The following day, the culture was inoculated into LB broth (1:50) supplemented with ampicillin 100 μg/ml. Cultures were grown at 37 °C, 180 rpm until OD ≈ 0.7 and then induced by isopropyl β-D-1-thiogalactopyranoside (0.5 mM final concentration). After induction, the temperature was lowered to 20 °C and growth was allowed to continue for a total time of 24 h. Cells were harvested by centrifugation (5,000 x g, 15 min, 4 °C). Fdx and FdR were expressed in transformed E. coli BL21. Cells from a colony were pre-inoculated into M9 mineral medium containing ampicillin 100 μg/ml at 37 °C, 180 rpm. The following day, the culture was inoculated into M9 mineral medium (1:50) supplemented with ampicillin 100 μg/ml. Cultures were grown at 3 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 30

37 °C, 180 rpm until OD ≈ 0.8 and then induced by addition of 3-indoleacrylic acid to a final concentration of 50 µg/ml. After induction, the temperature was lowered to 20 °C and growth was allowed to continue for a total time of 20 h. Cells were harvested by centrifugation (5,000 x g, 15 min, 4 °C). Cell-free extract preparations. To prepare cell-free extracts for activity screening (either from cultures coexpressing the CYP153 candidate with CYP513A6 Fdx and FdR or fused with CYP505 from Aspergillus fumigatus), cells were resuspended (1:2) in lysis buffer (50 mM potassium phosphate buffer pH 7.5, 1 mM phenylmethylsulfonyl fluoride, 10% (v/v) glycerol) and disrupted by sonication (3 runs, 1.5 min each, pulse mode, 60% amplitude). Lysed cells were then centrifuged for 1 h at 50,000 x g to remove cell debris. The resulting supernatant (cell-free lysate) was passed through 0.2 μm filters, flash-frozen in liquid nitrogen, and stored at -20 °C before use. Protein purification. Cells expressing His6-tagged CYP153s were resuspended (1:5) in lysis buffer (25 mM potassium phosphate buffer pH 7.5, 250 mM NaCl, 1 mM phenylmethylsulfonyl fluoride, 10 µM leupeptin, 10 µM pepstatin, DNAse I 5 μg/g of cell paste) and disrupted through a high-pressure homogenizer. After removing the cell debris (45 min, 75,000 x g, 4 °C), the cell-free extract was loaded by an Äkta system (GE Healthcare) equipped with a multi-wavelength detector (set at 280/350/420 nm) onto a nickel-affinity Histrap column (GE Healthcare) pre-equilibrated with lysis buffer. After washing (5 column volumes) with lysis buffer and two further wash steps with 45 mM and 90 mM imidazole in lysis buffer, His6-tagged CYP153s were eluted with 180 mM imidazole. The buffer was exchanged to 20 mM Tris-HCl buffer pH 9 (at 4 °C) by a HiTrap Desalting Column (GE Healthcare). The sample was then loaded onto a Source 15Q 4.6/100 PE and then eluted through a linear ascending gradient to 1 M NaCl in 20 mM Tris-HCl buffer pH 9 at 4 °C. Fractions containing the purified protein were pooled and loaded onto a gel filtration column (Superdex 200 10/300, GE Healthcare) pre-equilibrated with storage buffer (50 mM HEPES pH 7.5, 50 mM NaCl, 5% (v/v) glycerol) to obtain a higher degree of purity and evaluate the oligomerization state of the protein. Protein peak fractions were pooled and concentrated by a centrifugal filter unit (retaining proteins ≥ 30 kDa) up to 30 mg/ml. The protein sample was monitored from initial extraction to final purification step by SDS-PAGE and UV/VIS

spectrophotometry.

Purified CYP153 concentrations were calculated by UV/VIS

spectrophotometry on the basis of the following extinction coefficient: ε418 = 90.000 mM-1 cm-1. Among the twenty candidates, His6-tagged cand_1, cand_6, cand_10, cand_13, and cand_15 were selected for purification and crystallization experiments. Protein crystallization, data collection, structure determination, and refinement. Extensive sparse matrix screening was performed with several commercial kits (Jena Bioscience and Quiagen) using an Oryx 8 robot (Douglas Instruments) in sitting drop plates (Swissci, Molecular Dimensions). Crystallization droplets were prepared with a 1:1 volume ratio by mixing 30 mg/ml protein sample in 50 mM HEPES pH 7.5, 50 mM NaCl, 5% (v/v) glycerol with reservoir solution. Promising conditions were optimized by manually-prepared sitting drop plates (Cryschem, Hampton). The optimized conditions were as follows: cand_1, 0.2 M di-ammonium tartrate 20% (w/v) PEG 3350; cand_10, 0.1 M tri-sodium citrate pH 5.6 1 M ammonium phosphate; cand_15, 0.1 M HEPES pH 7.5 10% (w/v) PEG 8000. Crystals were harvested and 4 ACS Paragon Plus Environment

Page 5 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

cryoprotected with 20% v/v PEG400 and/or 20% v/v glycerol before cryo-cooling in liquid nitrogen. Samples were shipped to the ESRF, Grenoble, France for data collection.21 Results of data collection were transferred to our laboratory and the automatically generated .HKL files were used to run XDS.22 After a run with aimless in the CCP4 package23, structure solution was performed with Phaser using PDB 3RWL to run molecular replacement.24 Atomic models were refined with refmac5 and Coot25,26 and their quality was checked using Molprobity.27 The crystals of cand_15 exhibited an unusual degree of disorder affecting the quality of the electron density of subunit B. This resulted in refinement and crystallographic statistics of lower quality (Table 2). Subunit A was used for model analysis and generation of all pictures. Table 2 lists the PDB codes. Conversion assays by cell-free extracts. The activity screening using octane and N-BOC-pyrrolidine as substrates was performed in a final volume of 1 ml containing cell-free extract (total protein concentration ≈ 100 mg/ml in 50 mM MOPS pH 7.4, 10% glycerol (w/v), 1 mM DTT, 0.1 mM EDTA), 5 mM NADH, and 10 mM substrate. Substrates were purchased from Sigma in analytical grade or higher and used without further purification. Reactions mixtures were transferred into a 50 mL conical bottom tube and incubated in an orbital shaker (110 rpm, 28°C, 24 h). For compound extraction, 10 µL of 5 M HCl and 1 ml of ethylacetate with 0.01% tetradecane as an internal standard were added and vigorously mixed. The aqueous and organic phases were separated by centrifugation (4000 × g, 10 min) and the ethyl acetate layer was withdrawn, dried over anhydrous Na2SO4, and stored in a capped GC vial before analysis. Conversion assays by purified CYP153s. All reactions and controls were performed in duplicate with substrates dissolved in ethanol. Substrates were purchased from Sigma in analytical grade or higher and used without further purification. Reactions were performed in potassium phosphate buffer at pH 7.5, final volume 0.5 ml, comprising 5 µM purified enzyme, 10 mM NADH, 5 mM substrate (from 0.5 M stock in methanol), and 120 µL of 100 mg/ml cell-free extract from CYP153A6 FdR/Fdx overexpression. Blanks were run in the absence of CYP153 enzyme. Reaction mixtures were extracted by slightly different procedures according to the type of substrate tested. For octane and N-BOC pyrrolidine, reactions were stopped by addition of 5 µL 5 M HCl and then extracted by ethyl acetate (2 x 250 µL) in the presence of 0.01% tetradecane. For tetrahydrofuran, reactions were stopped by addition of 5 µL 5 M HCl and extracted by CH2Cl2 (2 x 250 µL) in the presence of 0.01% tetradecane. For pyrrolidine, derivatization was achieved by addition of 30 µL triethylamine and 15 µL acetic anhydride; reactions were then extracted by ethyl acetate (2 x 250 µL) in the presence of 0.01% tetradecane. For lauric acid, reactions were stopped by addition of 5 µL 5 M HCl and extracted by ethyl acetate (2 x 250 µL) in the presence of 5 mM octanoic acid; upon solvent evaporation, reactions were derivatized by addition of 60 µL of N,O-bistrifluoroacetamide (30 min incubation at 80 °C) and then diluted by addition of 200 µL of ethyl acetate. For all substrates, the organic phases were dried over anhydrous Na2SO4 before injection into GC. For indole, reactions were diluted 1:12 in DMSO in order to completely solubilize the indigo formed. Conversions were then analyzed by UV spectrophotometry.

5 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 30

Analytical methods for conversion determination and quantification. All measurements were performed at least as duplicates and analyzed via GC-MS or GC-FID for identification purposes. All compounds and enantiomers were identified by comparison with commercially available authentic reference materials and analyzed via GC-MS and GC for identification purpose. Numbers shown are always based on calibration curves with authentic standards of the reagents (using internal standards). 1 µl samples were injected into the GC instrument. The GC column temperature was used as the following: 70°C hold for 5 min, 4 °C/min to 140 °C, 40 °C/min to 300 °C hold for 5 min. The injection temperature was 250°C and the column flow 1.16 mL/min with helium as carrier gas. GC-FID analysis was performed with a Shimadzu GC-2010 or GC-2030 (Shimadzu, Kyoto, Japan) equipped with the columns specified in Table S2 and flame-ionization detection (FID). Concentrations have been determined based on calibration curves using authentic standards. GC-MS analysis was performed with a Shimadzu GCMS-QP2010 SE (Shimadzu, Kyoto, Japan) equipped with a ZB5MSi column (30 m x 0.25 mm x 0.25 µM, Phenomenex). The MS settings were used as the following: 250 °C ion source temperature, 320 °C interface temperature and a scan speed of 5000.

RESULTS Sequence analysis and candidate selection Twenty candidates were selected among the sequences identified in a previous bioinformatics study in which 3,979 microbial genomes and 137 metagenomes from terrestrial, freshwater, and marine environments were screened for the presence of new CYP153 enzymes.28 Based on sequence alignment, our selection strategy aimed at covering a wide set of source organisms while providing variability among those residues which were shown to be part of the catalytic site and/or involved in substrate acceptance based on the known crystal structures of CYP153A7 from Sphingomonas sp. HXN200 and CYP153A33 from Marinobacter aquaeole.i9,19 The list of the selected candidates is reported in Table 1 and the alignment of their amino acid sequences is shown in Figure S1. Among the 20 candidates, 17 represented single monooxygenase domains (hereafter named cand_1, cand_2, etc), whereas three candidates were predicted to be multi-domain proteins comprising an N-terminal CYP domain, an FdR domain, and a C-terminal Fdx domain (named fusion_1, fusion_2, and fusion_3, respectively; Table 1). The relative sequence identity among the selected variants varies between 27% and 65% (Table S1). We did not select variants with pairwise identities above ~60% in order to ensure sufficient diversity in the activity preferences and structural features (Table 2).

Design of expression strategies for redox partners The requirement for auxiliary redox proteins that deliver reducing equivalents from NAD(P)H to the heme, often hinders the biochemical characterization and biocatalytic exploitation of cytochromes P450. Bearing this issue in mind and aiming at finding a versatile and generally applicable redox system for enzyme characterization, we devised two expression strategies for the 17 single-domain candidates (Figure 1). The first strategy was based on the co-expression of each candidate with Fdx and FdR from Mycobacterium sp. 6 ACS Paragon Plus Environment

Page 7 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

HXN-1500 (the redox partners of CYP153A6).6 The second strategy relied on the chimeric fusion with the cytochrome reductase domain of CYP505 from Aspergillus fumigatus.29 As gathered from the SDS-PAGE analysis of the cell-free extracts, most enzymes were very well expressed in E. coli BL21 using both expression strategies (Figure 2A-B). Furthermore, also the three “natural-fusion” enzymes proved to be generally well-produced as single proteins by E. coli. Nevertheless, in the course of many purification attempts that included the use of various sets of protease inhibitors, these naturally-fused CYP153s were invariably found to undergo spontaneous cleavage generating two main species, identified according to their molecular weight as the monooxygenase CYP domain alone (~50 kDa) and the linked FdR and Fdx domains (~38 kDa; Figure 2C). Because such a spontaneous proteolysis was observed for all three “natural-fusion” enzymes, it might be speculated that protein cleavage into redox and monooxygenase subunits might represent a physiological maturation process.

Substrate conversion screening The cell-free extracts for the 17 CYP153 candidates obtained by the two expression protocols plus the three natural-fusion variants were initially tested for the conversion of two representative substrates, namely octane and pyrrolidine. The terminal hydroxylation of aliphatic alkanes, such as octane, is indeed considered to be the preferred reaction for the 153 subfamily of CYPs and can be of high value for bioremediation and synthetic purposes.3-5 Likewise, the hydroxylation of N-benzyl pyrrolidine at the non-activated 3’ position was recently demonstrated for CYP153A7 and represents a valuable reaction for the production of versatile functionalized products to be used for the synthesis of pharmaceuticals and fine chemicals.9 For comparison, our studies comprised also CYP153A6, a well characterized member of the CYP153 subfamily, which we used as a sort of benchmark enzyme.3,6-7 The conversion of octane to octanol was detected with seven out of the 20 candidates and only from the coexpression strategy (Figure 3A). Octanol is promptly over-oxidized to octanoic acid by endogenous E. coli oxidases. It is important to note that the activity of the cand_13 (from Mycobacterium intracellulare MOTT02) proved to be higher than the one of CYP153A6, the most active enzyme for this substrate. No conversion could be registered from any of the CYP153 variants with unprotected pyrrolidine as substrate, both from the co-expression and the chimeric-fusion strategies. The same was observed with other non-protected fivemembered heterocyclic compounds, such as tetrahydrofuran and thiophane. In light of these results, we therefore tested the conversion of N-BOC-protected pyrrolidine with the idea that the presence of a hydrophobic group protecting the heteroatom favors the acceptance and the correct posing of pyrrolidine in the CYP153 active sites. We found that – although with generally low enantioselectivity – four out of the 20 variants and the reference CYP153A6 could catalyze the regioselective hydroxylation of protected pyrrolidine in the electronically unfavored 3’ position (Figure 3B). As it was for the experiments with octane, only the co-expression strategy proved to sustain the N-BOC-pyrrolidine conversion. We also observed the presence of indigo in the cells over-expressing two CYP153 variants (cand_1 and cand_3). As described for other CYPs, the non-heme iron Rieske dioxygenases, and some flavin-dependent 7 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 30

monooxygenases, the production of this blue dye derives from a specific oxygenation of indole, a compound generated from the tryptophan metabolism in E. coli cells.30-32 Thus, also members of the CYP153 family are now shown to catalyze indole oxygenation yielding indigo dye. Finally, it is of notice that seven single domain candidates (cand_5, 7, 8, 9, 12, 16, 17) and all three naturalfusion enzymes were inactive on all tested substrates (Figure 3). These data may hint to very different substrate preferences. However, the lack of activity can also arise from poor protein expression and/or insufficient compatibility with the CYP153A6 redox partners, which, nevertheless, proved to be a successfully versatile system for several of the many different enzymes that were tested.

Crystallographic studies on cand_1, cand_10, and cand_15 In an effort to delve deeper into the sequence/structural features at the base of the variability in substrate acceptance among CY153s, we decided to attempt the structural determination for some selected candidates (cand_1, cand_6, cand_10, cand_13, and cand_15) chosen on the basis of their: expression levels in E. coli, purity yields, and activity in the oxygenation of octane and/or N-BOC-pyrrolidine and/or indole. Rewardingly, three proteins cand_1 (from Pseudomonas sp. 19-rlim, active on N-BOC-pyrrolidine and indigo), cand_10 (from Phenylobacterium zucineum active on octane), and cand_15 (from Novosphingobium aromaticivorans, active on both N-BOC-pyrrolidine and octane) produced well-diffracting crystals that enabled structure determination at 1.8, 2.9, and 2.9 Å resolution, respectively (Table 3 and Figure 4A). The structures were initially solved by molecular replacement using CYP153A7 as a search model (PDB code 3RWL, 61%, 61%, and 48% sequence identity to cand_1, cand_10, and cand_15, respectively). In all the three cases, the asymmetric unit contained multiple copies of non-crystallographic symmetry-related enzyme subunits (for each structure, we have referred to subunit A for model description and analysis). The array of new CYP153 structures (cand_1, PDB code 6HQD; cand_10, PDB code 6HQG; cand_15, PDB code 6HQW) and their resemblance with the known CYP153A7 (PDB code 3RWL) and CYP153A33 (PDB code 5FYG, complex with Ω-hydroxy lauric acid) are shown in Figure 4B-C. As expected, the newly solved CYP153 structures exhibit the overall fold typical of P450 hydroxylases, with the heme located at the bottom of the active-site tunnel. Pairwise C atom superpositions of the core regions return root-mean-square deviations below 1.2 Å (Figure 5A). Differently from cand_1, where only residues 1-6 were not visible in the electron density, the structures of cand_10 and cand_15 lack rather large portions for which no electron density was visible. Specifically, residues 1-6, 174-204, and 225-249 could not be modelled in cand_10, whereas the structure of cand_15 lacks residues 1-38, 201-203, 215-217, 249-252, and 444-445 (Table 3). As it will be discussed below in more detail, these amino acids are located in highly flexible regions taking part in substrate recognition and binding (Figure 5B). Although we repeatedly performed crystal soaking experiments, we could not obtain any structure with a bound substrate. However, the presence of distinct electron density in front of the heme of cand_1 was interpreted as a molecule of bound PEG. This ligand turned out to represent an insightful substrate analogue bound to the enzyme active site (Figure 4A). In summary, the crystallographic analyses of three CYP153 provided comprehensive and 8 ACS Paragon Plus Environment

Page 9 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

complementary information about the structural diversity and the mode of substrate recognition among this enzyme subfamily.

Structural comparison reveals a high degree of plasticity As depicted in Figure 6A, the first element of structural divergence is given by the B/C-loop (or Ω-loop) which, due to high flexibility, could not be fully modeled also in CYP153A7 (from residue 88 to 96).9 The length of this loop varies from 16 residues in CYP153A33 and cand_10 up to 27 amino acids in cand_15. Of interest, cand_1 shows a double turn at the top of the loop, whereas the B/C-loop belonging to cand_15 – the longest recorded among the variants analyzed here – stretches out without showing any secondary structure or structural motif apart from a -helical turn. An even higher degree of flexibility applies to the B/C-loop of CYP153A7, where nine residues could not be modeled in the electron density and are missing from the refined structure. An influence on the arrangement of the B/C-loop in cand_10 may also come from the β11/1-2 loop formed by residues 52-61, which, unlike all the other models, protrudes with two extra amino acids (H57 and E5; Figure 6A). This small loop, through interactions with the B/C-loop, is suggested here to take part in forming the gate leading to the active site. The most astonishing structural variability certainly concerns the conformation of the F and G helices together with the F/G-loop (Figure 6B). Known to interact with the substrate and provide substrate recognition,9,19 this region displays highly diverging arrangements among the different CYP153 variants. If the F and G helices and the F/G loop could be fully traced for cand_1 (as it was for CYP153A7 and CYP153A33) and in large part for cand_15, the corresponding amino acids could hardly be modelled in cand_10 as their electron density was mostly missing in all three crystallographically independent subunits (Table 3). Specifically, only one of the two helices – from N205 to E224 – could be traced in the electron density. Surprisingly, although the sequence alignment assigns such region to the G helix (Figure 4C), it arranges diagonally between the F and G helices with respect to all the other CYP153 structures and follows a direction which is the opposite to the one of all the other G helices (schematically depicted in Figure 6B). Such a distinctive trait observed in the three crystallographically independent subunits together with the lack of clearly interpretable electron densities for the remaining portions hint to a completely different arrangement of this region in cand_10, most likely engaging two long flexible loops (from residue 174 to 204 and from residue 225 to 249) to connect the G helix to the rest of the structure. If the case of cand_10 stands out due to its peculiarity, the orientation of the F and G helices among the other CYP153 variants diverge significantly too, clustering cand_1, cand_15, and CYP153A33 together and leaving CYP153A7 apart due to its shifted arrangement involving both helices. The third element of distinctiveness among the different CYP153 models consists of the I helix, especially in the orientation of its initial portion which is part of the active site (Figure 6C). The shift of the I helix together with the orientation of the short H helix contributes to enlarge or restrict the active site, from a wider pocket in CYP153A7 and cand_10 to a narrower one in cand_1 and cand_15 (Figure 7A).

9 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 30

All the differences described so far are summarized in Figure 5B. From this collective view, it becomes clear how the variability displayed by the CYP153 structures concentrates on those regions which are shaping the front door leading the substrates to the active site. The concerted arrangement of each of these flexible regions may therefore have a strong influence in substrate selection and accessibility to the active site. The lack of some segments in the models of cand_10, cand_15, and in CYP153A7 might relate to a higher flexibility associated to the lack of bound ligands. The Ω-C12OH bound to CYP153A33 and the PEG ligand molecule to cand_1 might reflect the more rigid and close conformation of these enzymes (Figure 4A).

Comparative analysis of substrate preferences of cand_1, cand_10, and cand_15 Aiming at expanding the correlation between substrate preferences and the structure/sequence variations that were identified, the enzymatic activities of purified cand_1, cand_10, cand_15 were evaluated. For comparison, CYP153A7 and CYP153A33 were included in this work because their crystal structures are known (Figure 4B). All the enzymes were expressed as His6-tagged proteins and purified to homogeneity in order to compare their activities using the same protein concentration (5 µM) in the reaction mixtures. FdR and Fdx from CYP153A6 were co-expressed and added to the reaction mixture as cell-free lysate. In this form, the E. coli endogenous NADH-recycling enzymes were included in the reaction mixture. A spectrum of prototypical substrates – namely octane, dodecanoic acid, N-BOC pyrrolidine, and indole – were thoroughly evaluated for each of the five enzymes and conversion details are reported in Table 4 and Figure 7B. In general, the screening revealed how linear substrates (octane and lauric acid) are generally well accepted by all tested CYP153s, whereas cand_1 and CYP153A7 are almost exclusively competent for cyclic and more bulky compounds (N-BOC-pyrrolidine and indole). In more detail, the overnight conversion of 5 mM octane was close or equal to 100% for all enzymes except for CYP153A33. Interestingly, the use of the biocatalyst in a highly purified form led to a gain of activity towards octane by cand_1, which could not be detected from the preliminary screening based on cell-free extracts (Figure 3). An almost complete conversion was also found for lauric acid. Most interestingly, regio-divergent oxygen insertions yielded either a selective Ω-hydroxylation by CYP153A33 and cand_10 or a mixture of Ω-OH and (Ω-1)-OH fatty acid products by CYP153A7, cand_1, and cand_15. It is noticeable that cand_1 featured a net preference for the (Ω-1)-OH product, identified by GC-MS (Figure S2). Therefore, cand_1 proved to be an effective Ω-1 fatty-acid hydroxylase. When it comes to the conversion of N-BOC pyrrolidine, cand_1 and CYP153A7 stand out for the excellent conversion yields, in both cases leading to complete substrate consumption (Table 4 and Figure 7B). Although regioselective for the non-activated carbon atom in position three, all five tested CYP153s showed poor enantioselectivity probably due to a loose fitting into the catalytic site. The highest enantiomeric excess recorded was 57% (S) from cand_1, whereas the ee of CYP153A7 did not exceed 19% (R). Interestingly, the presence of a N-BOC protection group in place of a N-benzyl substituent utilized in previous studies results in inversion of the stereoselectivity by CYP153A7, which catalyzes the 3’-hydroxylation of N-benzyl10 ACS Paragon Plus Environment

Page 11 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

pyrrolidine with a net preference for the S enantiomer.8-11 The hydroxylation of indole to form indoxyl is exclusively catalyzed by CYP153A7 and cand_1. Though the latter shows a much higher yield, the conversion does not go beyond 50%, probably because the rapid accumulation of indigo, being highly hydrophobic, hampers the stability of the enzyme in solution to the point of stopping its activity after a short time.

Insights into structure/activity-relationship among the CYP153 variants Since the substrate characterization revealed a common high activity towards linear substrates – octane and lauric acid – and a more restricted one towards N-BOC-pyrrolidine and indole – almost exclusively limited to CYP153A7 and cand_1 – we sought to investigate the catalytic residues and the structural elements lying behind this variability. The combined analysis of the active site structures and sequence alignment revealed how the amino-acid composition of its inner segment is very well conserved among all 20 variants that we screened (Table 2 and Figure 7C). In this framework of high conservation, cand_1 stands out because of a few characteristic amino acid replacements: L106, I306, and L407 in place of conserved Ile, Leu, and Phe residues, respectively. These changes can slightly but significantly affect the positioning of the substrate and thereby explain the distinct regio- and enantio-selectivity featured by this variant. Indeed, cand_1 shows a largely prevalent conversion of lauric acid to the (Ω-1)-OH fatty acid product (Table 4). The PEG molecule modeled in cand_1 electron density shows a bent conformation of the terminal atom in the direction of the I306 residue (Figure 4A). This conformation likely mimics the orientation of a fatty acid substrate, exposing the Ω-1 carbon to the heme’s iron to afford Ω-1 hydroxylation. Mutation of L354 to Ile in CYP153A33 led to a drop in its selectivity for the Ω position and formation of (Ω-1)-OH fatty acid15,19. I306 of cand_1 matches the position of L354 of CYP153A33 and thereby recapitulates the L354I mutation studied in CYP153A33 (Figure 8A). In addition, the presence of L407 in cand_1 instead of a bulkier Phe residue may provide higher flexibility to the substrate posing within the active site to allow inverted orientations of fatty acid substrates. This can explain the α-, β-, and γ-OH fatty acid products detected by GC-MS in trace amounts from the conversions of lauric acid by cand_1 (data not shown). Together with F86, this same characteristic pattern of active-site residues (L106, I306, and L407) may explain the inverted enantioselectivity in the 3’-hydroxylation of N-BOC-pyrrolidine displayed by cand_1 compared to CYP153A7 (Table 3). The two enzymes feature an opposite and reciprocal arrangement of this cluster of residues – a phenylalanine (F403 in CYP153A7 and F86 in cand_1), a leucine (L302 in CYP153A7 and L106 in cand_1), and an isoleucine (I102 in CYP153A7 and I306 in cand_1) – which, though weakly, might respectively promote the exposure of the pro-R hydrogen (in CYP153A7) or the pro-S one (in cand_1) (Figure 8B). Although able to explain specific differences in regio- and enantio-selectivity, these minimal alterations in the inner segment of the active site can hardly explain the conspicuous variations in N-BOC-pyrrolidine and indole acceptance by the different CYP153s (Figure 7B-C). These variations likely reflect the concerted effects of many residues, scattered on the more external and flexible regions located at or in proximity of the 11 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 30

entrance of the active-site tunnel. Moving along the tunnel from the heme to the protein exterior, the variability in the amino acid composition increases progressively and is reflected by the sweeping change in conformation of those portions of the B/C-loop, the F/G-loop and the F-G helices themselves facing the active site (Figure 5B). Not surprisingly, most of the CYP substrate recognition sites (SRSs) are known to localize in these areas. SRS1 generally lies in the highly variable loop region between the B and C helices (B/C-loop), SRS2 is usually located in the C-terminal end of F helix, SRS3 and SRS4 are spanned by the Nterminal regions of G and I helices, whereas β1-4 and β4-1 house SRS5 and SRS6, respectively.33 To understand how such conformational changes could affect the accessibility and positioning of substrates into the different CYP153 variants, we dissected the shape of the substrate-binding site for each of the five CYP153 enzymes by cutting off the protein surface (Figure 7A). The comparison highlights how all binding pockets share a common core represented by the conserved inner tier of the catalytic site. From this shared core outwards, the architecture of the cavity changes extensively, ranging from an open funnel displayed by CYP153A7 and cand_10 to a narrower and tapering shape featured by CYP153A33, cand_1, and cand_15. Even if a bigger and more open conformation of the cavity could be predicted to facilitate the entrance of bulkier substrates (such as N-BOC-pyrrolidine and indole), similar substrate preferences could not be matched to similar geometries of the binding pockets (e.g. cand_10 vs cand_15 or cand_1 vs CYP153A7: see Figure 7A-B). This must therefore be regarded as a very dynamic and adaptable region which cannot be depicted by a single conformation and whose flexibility is instrumental to substrate recognition and selection events. This is clearly a case of extreme structural plasticity. How such dynamic substrate-recognition elements control substrate selection and binding remains an open and truly challenging question, with farreaching implications for enzyme-based technologies.

DISCUSSION With the aim of expanding the characterization of the CYP153 subfamily of P450s, through the screening of 20 novel variants and the development of a versatile co-expressed redox system, we were able to identify new biocatalysts that are differently active in the hydroxylation of non-activated carbon atoms of alkanes, fatty acids, and heterocyclic compounds. Unveiling the crystal structure of three new CYP153 variants (cand_1 from Pseudomonas sp. 19-rlim, cand_10 from Phenylobacterium zucineum, and cand_15 from Novosphingobium aromaticivorans) and comparing them with previously available ones from the same subfamily (CYP153A7 and CYP153A33) allowed us to perform a comprehensive structural analysis which confronted us with a strong dichotomy: a highly conserved and rigid scaffold comprising both the regions behind the heme and the core of the enzyme (i.e. the inner tier of the catalytic site) contrasts sharply with the huge alterations and the plasticity – best highlighted by cand_10 – of the structural elements shaping the outer segment of the active site. If the differences in the enantio- and regio-selectivity displayed by cand_1 can be explained by few subtle changes in the catalytic residues, the different acceptance of heterocyclic compounds among the CYP153s analyzed here cannot be reduced to the active site itself but must rather arise from more distant elements involved in substrate recognition and selection events. Our structural 12 ACS Paragon Plus Environment

Page 13 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

analysis allows identifying these elements in those regions – the B/C-loop, the F, G, and H helices and the F/G-loop – which exhibit the most pronounced conformational changes and sequence variabilities. However, when it comes to correlating the different architectures shaped by these flexible regions with the different substrate preferences recorded, no clear relationships can be inferred. Therefore, in the effort to tailor the activity and selectivity of CYP153s for their exploitation in industrial processes, this apparent lack of structure/function correlation discourages purely rational approaches based on site-directed mutagenesis, whose success has been proved with other monooxygenases or oxidases (e.g. the flavin-dependent ones) where the substrate selection is accomplished exclusively through fitting/coordination within the catalytic site.34 The elusive and more complex traits at the base of CYP153 activity foster approaches based on directed evolution, which can rely on focused libraries inferred from our structural analysis as well as on a toolbox of different enzyme scaffolds with distinct flexibility and plasticity. Though challenging, the structural complexity and plasticity coupled to the substrate diversity and catalytic versatility renders the CYP153 subfamily especially attractive for the development of designer hydroxylase enzymes.

ACKNOWLEDGMENTS The financial support of the Fondazione Cariplo (grant 2015‐0406) is kindly acknowledged. SUPPORTING INFORMATION Details about the GC and MS experiments together with more extensive sequence alignments are described in the Supporting Information. REFERENCES 1. Baumann, M., Baxendale, I.R., Ley, S.V., Nikbin, N. (2011) An overview of the key routes to the bestselling 5-membered ring heterocyclic pharmaceuticals. Beilstein J. Org. Chem. 7, 442-495. 2. Guengerich, F.P., Yoshimoto, F.K. (2018) Formation and cleavage of C-C bonds by enzymatic oxidationreduction reactions. Chem. Rev. 118, 6573-6655. 3. Funhoff, E.G., Bauer, U., García-Rubio, I., Witholt, B., van Beilen, J.B. (2006) CYP153A6, a soluble P450 oxygenase catalyzing terminal-alkane hydroxylation. J. Bacteriol. 188, 5220-5227. 4. Funhoff, E.G., Salzmann, J., Bauer, U., Witholt, B., van Beilen, J.B. (2007) Hydroxylation and epoxidation reactions catalyzed by CYP153 enzymes. Enzyme and Microbial. Technology. 4, 806-812. 5. Scheps, D., Malca, S.H., Hoffmann, H., Nestl, B.M., Hauer, B. (2011) Regioselective ω-hydroxylation of medium-chain n-alkanes and primary alcohols by CYP153 enzymes from Mycobacterium marinum and Polaromonas sp. strain JS666. 2011. Org. Biomol. Chem. 9, 6727-6733. 13 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 30

6. Gudiminchi, R.K., Randall, C., Opperman, D.J., Olaofe, O.A., Harrison, S.T., Albertyn, J., Smit, M.S. (2012) Whole-cell hydroxylation of n-octane by Escherichia coli strains expressing the CYP153A6 operon. Appl. Microbiol. Biotechnol. 96, 1507-1516. 7. Olaofe, O.A., Fenner, C.J., Gudiminchi, R.K., Smit, M.S., Harrison, S.T. (2013) The influence of microbial physiology on biocatalyst activity and efficiency in the terminal hydroxylation of n-octane using Escherichia coli expressing the alkane hydroxylase, CYP153A6. Microb. Cell Fact. 12, 8. 8. Tang, W.L., Li, Z., Zhao, H. (2010) Inverting the enantioselectivity of P450pyr monooxygenase by directed evolution. Chem. Comm. 46, 5461-5463. 9. Pham, S.Q., Pompidor, G., Liu, J., Li, X.D., Li, Z. (2012) Evolving P450pyr hydroxylase for highly enantioselective hydroxylation at non-activated carbon atom. Chem. Comm. 48, 4618-4620. 10. Yang, Y., Chi, Y.T., Toh, H.H., Li, Z. (2015) Evolving P450pyr monooxygenase for highly regioselective terminal hydroxylation of n-butanol to 1,4-butanediol. Chem. Comm. 51,514-517. 11. Yang, Y., Liu, J., Li, Z. (2014) Engineering of p450pyr hydroxylase for the highly regio- and enantioselective subterminal hydroxylation of alkanes. Angew. Chem. Int. Ed. Engl. 53, 3120-3124. 12. Meinhold, P., Peters, M.W., Hartwick, A., Hernandez, A.R., Arnold, F.H. (2006) Engineering cytochrome P450 BM3 for terminal alkane hydroxylation. Adv. Synth. Catal. 348,763-772. 13. Lentz, O., Urlacher, V., Schmid, R.D. (2004) Substrate specificity of native and mutated cytochrome P450 (CYP102A3) from Bacillus subtilis. J. Biotechnol. 108, 41-9. 14. Li, Z., Feiten, H.J., Chang, D., Duetz, W.A., van Beilen, J.B., Witholt, B. (2001) Preparation of (R)- and (S)-N-protected 3-hydroxypyrrolidines by hydroxylation with Sphingomonas sp. HXN-200, a highly active, regio- and stereoselective, and easy to handle biocatalyst. J. Org. Chem. 6, 8424-8430. 15. Scheps, D., Honda Malca, S., Richter, S.M., Marisch, K., Nestl, B.M., Hauer, B. (2013) Synthesis of ωhydroxy dodecanoic acid based on an engineered CYP153A fusion construct. Microb. Biotechnol. 6, 694707 16. Kirtz, M., Klebensberger, J., Otte, K.B., Richter, S.M., Hauer B. (2016) Production of ω-hydroxy octanoic acid with Escherichia coli. J. Biotechnol. 230, 30-33. 17. Jung, E., Park, B.G., Ahsan, M.M., Kim, J., Yun, H., Choi, K.Y., Kim, B.G. (2016) Production of ωhydroxy palmitic acid using CYP153A35 and comparison of cytochrome P450 electron transfer system in vivo. 2016. Appl .Microbiol. Biotechnol. 100, 10375-10384.

14 ACS Paragon Plus Environment

Page 15 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

18. Jung, E., Park, B.G., Yoo, H.W., Kim, J., Choi, K.Y., Kim. B.G. (2018) Semi-rational engineering of CYP153A35 to enhance ω-hydroxylation activity toward palmitic acid. Appl. Microbiol. Biotechnol. 102, 269-277. 19. Hoffmann, S.M., Danesh‐Azari, H.R., Spandolf, C., Weissenborn, M.J., Grogan, G., Hauer, B. (2016) Structure‐guided redesign of CYP153AM.aq for the improved terminal hydroxylation of fatty acids. Chemcatchem 20, 3234–3239. 20. Notonier, S., Gricman, Ł., Pleiss, J., Hauer, B. (2016) Semirational Protein Engineering of CYP153AM.aq. -CPRBM3 for Efficient Terminal Hydroxylation of Short- to Long-Chain Fatty Acids. Chembiochem 17, 1550-1557. 21. Svensson, O., Malbet-Monaco, S., Popov, A., Nurizzo, D., Bowler, M.W. (2015) Fully automatic characterization and data collection from crystals of biological macromolecules. Acta Crystallogr. D Biol. Crystallogr. 71, 1757-1767. 22. Kabsch, W. (2010) Integration, scaling, space-group assignment and post-refinement. Acta Crystallogr. D Biol .Crystallogr. 66, 125-132. 23. Winn, M.D., Ballard, C.C., Cowtan, K.D., Dodson, E.J., Emsley, P., Evans, P.R., Keegan, R.M., Krissinel, E.B., Leslie, A.G., McCoy, A., McNicholas, S.J., Murshudov, G.N., Pannu, N.S., Potterton, E.A., Powell, H.R., Read, R.J., Vagin, A., Wilson, K.S. (201) Acta Crystallogr. D Biol Crystallogr. 67, 235-242. 24. McCoy, A.J., Grosse-Kunstleve, R.W., Adams, P.D., Winn, M.D., Storoni, L.C., Read, R.J. (2007) Phaser crystallographic software. J. Appl. Cryst. 40, 658-674. 25. Emsley, P., Cowtan K. (2004) Coot: model-building tools for molecular graphics. Acta Crystallogr. D Biol. Crystallogr. 60, 2126-2132. 26. Murshudov, G., Vagin, A., Dodson, E. (1996) Macromolecular Refinement: Proceedings of the CCP4 Study Weekend. 75–84. 27. Chen, V.B., Arendall, W.B., Headd, J.J., Keedy, D.A., Immormino, R.M., Kapral, G.J., Murray, L.W., Richardson, J.S., Richardson, D.C. (2010) MolProbity: all-atom structure validation for macromolecular crystallography. Acta Crystallogr. D Biol. Crystallogr. 66, 12-21. 28. Nie, Y., Chi, C.Q., Fang, H., Liang, J.L., Lu, S.L., Lai, G.L., Tang, Y.Q., Wu, X.L. (2014) Diverse alkane hydroxylase genes in microorganisms and environments. Sci. Rep. 4, 4968.

15 ACS Paragon Plus Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 30

29. Chen, W., Lee, M.K., Jefcoate, C., Kim, S.C., Chen, F., Yu, J.H. (2014) Fungal cytochrome p450 monooxygenases: their distribution, structure, functions, family expansion, and evolutionary origin. Genome Biol. Evol. 6, 1620-16234. 30. Gillam, E.M., Notley, L.M., Cai, H., De Voss, J.J., Guengerich, F.P. (2000) Oxidation of indole by cytochrome P450 enzymes. Biochemistry 39, 13817-13824. 31. America, S.P., Jung, H.S., Kim, H.S., Han, S.S., Kim, H.S., Lee, J.H. (2015) Characterization of a flavincontaining monooxygenase from Corynebacterium glutamicum and its application to production of indigo and indirubin. Biotechnol. Lett. 37, 1637-1644. 32. Parales, R.E., Lee, K., Resnick, S.M., Jiang, H.Y., Lessner, D.J., Gibson, D.T. (2000) Structural investigations of the ferredoxin and terminal oxygenase components of the biphenyl 2,3-dioxygenase from Sphingobium yanoikuyae B1. J Bacteriol. 182, 1641-1649. 33. Sirim, D., Widmann, M., Wagner, F., Pleiss, J. (2010) Prediction and analysis of the modular structure of cytochrome P450 monooxygenases. BMC Struct. Biol. 10, 34. 34. Dijkman, W.P., Binda, C., Fraaije, M.W., Mattevi A. (2015) Structure-Based enzyme tailoring of 5Hydroxymethylfurfural Oxidase. ACS Catal. 5,1833–1839.

16 ACS Paragon Plus Environment

Page 17 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

FIGURES AND TABLES Table 1. The twenty selected CYP153 variants. The candidate number and the corresponding organism of origin and NCBI accession numbers are reported. CANDIDATE NUMBER

cand_1 cand_2 cand_3 cand_4 cand_5 cand_6 cand_7 cand_8 cand_9 cand_10 cand_11 cand_12 cand_13 cand_14 cand_15 cand_16 cand_17 fusion_1 fusion_2 fusion_3

DETAILS >gi|346421746|gb|AEO27390.1| cytochrome P450 [Pseudomonas sp. 19-rlim] >gi|499806432|ref|WP_011487166.1| cytochrome P450 [Paraburkholderia xenovorans] >gi|696581171|ref|WP_033112350.1| cytochrome P450 [Dickeya dadantii] >gi|499220412|ref|WP_010917952.1| cytochrome P450 [Caulobacter vibrioides] >gi|698183153|gb|AIT82675.1| cytochrome P450 (plasmid) [Novosphingobium pentaromativorans US6-1] >gi|499470532|ref|WP_011157172.1| cytochrome P450 [Rhodopseudomonas palustris] >gi|499984974|ref|WP_011665692.1| cytochrome P450 [Rhodopseudomonas palustris] >gi|737675938|ref|WP_035645129.1| cytochrome P450 [Bradyrhizobium sp. ORS 285] >gi|504305193|ref|WP_014492295.1| cytochrome P450 [Bradyrhizobium japonicum] >gi|501513438|ref|WP_012521375.1| cytochrome P450 [Phenylobacterium zucineum] >gi|504392332|ref|WP_014579434.1| cytochrome P450 [Marinobacter adhaerens] >gi|497491214|ref|WP_009805412.1| cytochrome P450 [Pseudooceanicola batsensis] >gi|378804531|gb|AFC48666.1| putative Linalool 8-monooxygenase [Mycobacterium intracellulare MOTT-02] >gi|495522599|ref|WP_008247244.1| cytochrome P450 [Limnobacter sp. MED105] >gi|499763148|ref|WP_011443882.1| cytochrome P450 [Novosphingobium aromaticivorans] >gi|488692970|ref|WP_002617102.1| cytochrome P450 [Stigmatella aurantiaca] >gi|229318679|gb|EEN84537.1| unspecific monooxygenase [Rhodococcus erythropolis SK121] >gi|504129982|ref|WP_014361693.1| cytochrome P450 [Gordonia polyisoprenivorans VH2] >gi|358242504|dbj|GAB11608.1| cytochrome P450 [Gordonia araii NBRC 100433] >gi|359317904|dbj|GAB21416.1| putative cytochrome P450 [Gordonia polyisoprenivorans NBRC 16320 = JCM 10675]

17 ACS Paragon Plus Environment

Biochemistry

CYP153A6

CYP153A33

Cand_1

Cand_2

Cand_3

Cand_4

Cand_5

Cand_6

Cand_7

Cand_8

Cand_9

Cand_10

Cand_11

Cand_12

Cand_13

Cand_14

Cand_15

Cand_16

Cand_17

Fusion_1

Fusion_2

Fusion_3

Table 2. Structure/sequence-based alignment comparing the residues located in the vicinity of the heme or in the gorge leading to the active site of our twenty CYP153 candidates, CYP153A6, CYP153A7, and CYP153A33. The protein region to which the residues belong is reported in the additional column on the right.

CYP153A7

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46

Page 18 of 30

A77(S)

A

P128

H81

V

Q

A

A

S

Q

V

S

P86

P

A

P

P

W102

R

-

P

P

P

I82

I

I130

F86

M

I

I

A

I

I

I

I

I88

-

L

I

I

I107

H

I

I

I

-

I83(H)

T

I131

V87

L

K

T

I

S

S

M

G

V89

I

R

I

I

T108

R

R

I

V

-

L98

L

-

-

R

N

L

L

W

W

W

W

L99

I

G

I

V

-

E

L

I

I

I

P99

P

E142

P103

T

V

P

E

P

P

P

P

P100

E

V

A

E

P127

A

P

E

E

E

N100

M

M143

N104

S

G

M

M

S

S

S

S

M101

T

G

M

M

M128

S

Q

M

M

M

I102

I

I145

L106

I

I

I

I

I

I

I

I

I103

I

I

I

I

I130

L

I

I

I

I

A103

A

A146

G107

N

T

A

A

A

A

A

A

A104

A

T

A

A

A131

S

D

A

A

A

S182

S

V222

T183

S

S

S

S

S

S

S

S

-

S

S

S

S

T207

S

S

T

T

T

D183

D

W224

W185

D

D

D

D

D

D

D

D

-

D

D

D

D

W209

D

D

E

D

D

T185

T

-

-

V

L

A

T

A

A

S

A

W222

A

I

A

L

-

L

L

T

V

V

T186

T

M228

I189

N

M

T

T

T

T

T

T

-

A

T

T

A

A213

I

V

A

A

A

L251

L

L303

L255

L

L

L

L

L

L

L

L

L258

L

L

L

L

L279

L

L

L

L

L

V254

V

V306

V258

V

I

V

V

V

V

V

V

V261

V

V

V

V

V282

I

I

V

V

V

G255

G

G307

G259

G

G

G

G

G

G

G

G

G262

G

G

G

G

G283

A

G

G

G

G

D258

D

D310

D262

D

D

D

D

D

D

D

D

D265

D

D

D

D

D286

E

E

D

D

D

T259

T

T311

T263

T

T

T

T

T

T

T

T

T266

T

T

T

T

T287

S

T

T

T

T

L302

L

L354

I306

V

V

L

L

L

L

L

L

L309

L

L

L

L

L330

F

I

L

L

L

M305(Q)

M

M357

M309

M

M

M

M

M

M

M

M

M312

M

M

M

M

M333

M

M

M

M

M

F403

F

F455

L407

N

F

F

L

F

F

F

F

F410

F

F

F

F

F431

F

F

F

F

F

V404

V

V456

V408

I

L

V

V

V

V

V

I

V411

V

V

V

V

V432

V

V

V

V

V

18 ACS Paragon Plus Environment

B/C-loop

F-helix

I helix

inner catalytic segment inner/mid catalytic segment

Page 19 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

Table 3. Crystallographic statistics for structures of CYP153 cand_1, cand_10, and cand_15a PDB code Resolution range

Cand_1 (6HQD)

c

Cand_10 (6HQG)

d

Cand_15 (6HQW)

e

44 - 1.8

49.31 - 2.9

50 - 2.9

P21

C2

P32

Unit cell (Å), (°)

87.3 97.6 87.8 90 119.2 90

105.5 89.2 201.8 90 96.6 90

113.1 113.1 73.0 90 90 120

Total reflections

781031

108438

72726

116816 (5829)

40389 (4594)

22850 (3723)

6.7 (6.9)

2.7 (2.6)

3.2 (3.3)

Completeness (%)

99.2 (99.9)

97.6 (98.3)

98.7 (99.6)

Mean I/sigma (I)

13.6 (1.5)

5.4 (1.2)

12.6 (2.4)

R-merge (%)

0.077 (1.15)

0.162 (0.857)

0.61 (0.6)

b

0.99 (0.54)

0.96 (0.511)

0.95 (0.745)

R-work (%)

17.0

21.5

24.8

R-free (%)

20.0

27.2

29.9

RMS (bonds) (Å)

0.013

0.014

0.009

RMS (angles) (°)

1.6

1.7

1.4

Ramachandran favoured (%)

97.3

97.0

92.1

Ramachandran allowed (%)

2.4

2.5

7.1

Ramachandran outliers (%)

0.2

0.5

0.8

Average B-factor

24.4

41.6

69.5

Space group

Unique reflections Multiplicity

CC1/2

a

Statistics for the highest-resolution shell are shown in parentheses

b

A cut-off criterion for resolution limits was applied on the basis of the mean intensity correlation

coefficient of half-subsets of each dataset c

The model contains three subunits in the asymmetric unit. Chain A contains residues 7-420; chain

B contains residues 9-420; chain C contains residues 6-420. d

The model contains four subunits in the asymmetric unit. Chain A contains residues 7-173, 205-

224, 250-425; chain B contains residues 7-173, 204-224, 250-425; chain C contains residues 7-95, 98-173, 204-224, 250-425; chain D contains residues 7-96, 99-173, 205-224, 249-425. e

The model comprises two subunits in the asymmetric unit. Chain A comprises residues 39-200,

204-214, 218-248, 253-443; chain B comprises residues 39-102, 105-110, 128-195, 275-443. The electron density of subunit B is much weaker and of lower quality than the one of subunit A, which was therefore used for structural analysis.

ACS Paragon Plus 19 Environment

Biochemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 30

Table 4. Conversion of octane, lauric acid, N-BOC pyrrolidine, and indole by cand_1, cand_10, cand_15, CYP153A7, and CYP153A33. octanea,b

lauric acida,b

N-BOC-pyrrolidineb

conv. (%)

(Ω-1)-OH (%)

(Ω)-OHb (%)

overall conv. (%)

CYP153A7

96 ± 1.7

7

93

100

100

CYP153A33

38 ± 11.8

n.d.

29

29 ± 13.8