Characterization of a UGT84 Family ... - ACS Publications

Nov 15, 2017 - (Figure 2).7,8 The N-terminal domain of plant UGTs is mostly involved in binding the sugar .... 1.1.2) was used for the ligand docking ...
2 downloads 0 Views 6MB Size
Subscriber access provided by READING UNIV

Article

Characterization of a UGT84 family glycosyltransferase provides new insights into substrate binding and reactivity of galloylglucose ester-forming UGTs Alexander E Wilson, Xiaoxue Feng, Nadia N Ono, Doron Holland, Rachel Amir, and Li Tian Biochemistry, Just Accepted Manuscript • DOI: 10.1021/acs.biochem.7b00946 • Publication Date (Web): 15 Nov 2017 Downloaded from http://pubs.acs.org on November 16, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Biochemistry is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 28

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

Characterization of a UGT84 family glycosyltransferase provides new insights into substrate binding and reactivity of galloylglucose ester-forming UGTs Alexander E. Wilson, † Xiaoxue Feng, † Nadia N. Ono, † Doron Holland, ‡ Rachel Amir, # and Li ⊥ Tian *,†,§, †

Department of Plant Sciences, University of California, Davis, CA 95616 Institute of Plant Sciences, Agricultural Research Organization, Newe Ya’ar Research Center, Ramat Yishay 30095, Israel # Migal Galilee Technology Center, P.O. Box 831, Kiryat Shmona 11016, Israel § Shanghai Key Laboratory of Plant Functional Genomics and Resources, Shanghai Chenshan Botanical Garden, Shanghai, 201602, China ⊥ Shanghai Chenshan Plant Science Research Center, Chinese Academy of Sciences, Shanghai, 201602, China ‡

*To whom correspondence should be addressed: Dr. Li Tian. Department of Plant Sciences, Mail Stop 3, University of California, Davis, CA 95616. Telephone: (530) 752-0940; Fax: (530) 7529659; E-mail: [email protected]. ORCID iD: 0000-0001-6461-6072. Keywords: enzyme mutation, gallic acid, glucose ester, glycosyltransferase, group L, homology modeling, site-directed mutagenesis, UGT, UGT84 Abstract Galloylated plant specialized metabolites play important roles in plant-environment interactions as well as the promotion of human and animal health. The galloylation reactions are mediated by the formation of galloylglucose esters from gallic acid and UDP-glucose, catalyzed by the plant UGT84 family glycosyltransferases. To explore and exploit the structural determinants of UGT84 activities, we performed homology modeling and substrate docking of PgUGT84A23, a galloylglucose ester-forming family 84 UGT, as well as sequence comparisons of PgUGT84A23 with other functionally characterized plant UGTs. By employing site-directed mutagenesis of candidate amino acids, enzyme assays with analogous substrates, and kinetic analysis, key amino acid sites for PgUGT84A23 substrate binding and reactivity were elucidated. The galloylglucose ester-forming UGT84s have not been shown to glycosylate genistein (an isoflavonoid) in vivo. Unexpectedly, amino acids highly conserved among UGT84s were identified that affect specifically the binding of genistein, but not gallic acid and other tested sugar acceptors. This result suggests that genistein may resemble the substrate profile for the enzyme ancestral of the galloylglucose ester-forming UGTs and recruited during transition from a general to a more specialized defense function. Overall, a better understanding of the structurefunction relationship of UGT84s will facilitate enzyme engineering to produce pharmaceutically and industrially valuable glycosylated compounds.

1 ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Introduction UDP-sugar dependent glycosyltransferases (UGTs) catalyze the glycosylation of specialized plant metabolites, which greatly expands the chemical diversity, functionality, and bioactivity of the core aglycone molecules. 1 Although glycosidic linkages (e.g. C-O-C) can be forged between the sugar donor and acceptor molecules to form glycosides, aromatic and aliphatic acids can also conjugate with the sugar donor to generate ester linkages (e.g. C=O-O-C) and form glucose esters. Generation of glucose esters has been proposed to neutralize the acid functional group of the aglycone and produce a high-energy compound (i.e. a glucose ester) that serves as an intermediate for the subsequent acyl transfer reactions. 2, 3 In addition, galloylglucose esters made from gallic acid (GA) and UDP-glucose are used for the production of galloylated compounds with applications in the food industry as well as manufacturing of bio-based surfactants, cosmetics, and pharmaceutical drugs. 4 Of the 15 phylogenetic groups of plant UGTs (A-M, O and P, grouped based on sequence homology), only group L UGTs have been shown to produce glucose esters of phenolic acids (Figure 1). There are three UGT families within group L, including UGT74s, UGT75s, and UGT84s. Only UGT84s exhibit galloylglucose ester-forming activities. Although none of the UGT84 proteins have been crystallized, the structures of two UGT74s from Oryza sativa and Arabidopsis thaliana, Os79 and AtUGT74F2, respectively, have recently been solved using Xray crystallography. 5, 6 Therefore, the spatial arrangement of the critical residues in UGT84s could be investigated through homology modeling using the UGT74 structures as templates. In general, the C-terminus of plant UGTs contains the conserved, 44 amino-acid Plant Secondary Product Glycosyltransferase (PSPG) motif that interacts with the sugar donor (Figure 2). 7, 8 The N-terminal domain of plant UGTs is mostly involved in binding the sugar acceptor. Determinants of sugar acceptor specificity are much more complex and varied than those for the sugar donor, but usually include the structure of the binding pocket’s peptide backbone and the environment set by the size, shape, and hydrophobicity of amino acids present therein. 7 To gain a better understanding of the structure-function relationship of UGT84s and facilitate the exploitation of these enzymes, we conducted homology modeling and substrate docking for PgUGT84A23, a recently isolated and functionally characterized galloylglucose ester-forming UGT84. 9 In addition, sequence comparisons of UGTs that produce glucosides and/or glucose esters of phenolic acids (including gallic acid) were performed. Such analysis identified conserved amino acids in the galloylglucose ester-forming UGTs that are located outside of the substrate binding pockets in the PgUGT84A23 homology model. Site-directed mutagenesis and kinetic analysis of the candidate amino acids were carried out to assess their roles in catalysis and interactions with groups of substrates possessing similar structures. These studies collectively provided new insights into the catalytic mechanism of galloylglucose ester-forming UGTs and the structural profile of the ancestral substrates for these UGTs. Materials and Methods Homology-based modeling of PgUGT84A23 SWISS-MODEL, 10 PHYRE2, 11 I-TASSER, 12 and MODELLER 13 (with the UCSF Chimera interface) were each used for homology modeling of PgUGT84A23 with the crystal structures of plant UGTs available in the Protein Data Bank (PDB) as templates, including MtUGT71G1 (2ACV), AtUGT72B1 (2VG8), MtUGT78G1 (3HBF), CtUGT78K6 (4WHM), MtUGT85H2 (2PQ6), Os79 (5TMD), and VvGT1 (2C1X). 6, 14-20 The homology models were 2 ACS Paragon Plus Environment

Page 2 of 28

Page 3 of 28

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

Figure 1. Phylogenetic analysis of plant UGTs with characterized activities toward phenolic acids. Bootstrap values (10,000 replicates) greater than 60% are shown next to the branches. The group L and galloylglucose ester-forming UGTs are indicated. PgUGT84A23 is highlighted in bold. The GenBank accession numbers for the UGTs are: PgUGT84A23 (ANN02875.1), PgUGT84A24 (ANN02877.1), FaGT2 (Q66PF4.1), VlRSgt (ABH03018.1), VvgGT1 (AEW31187.1), VvgGT2 (AEW31188.1), VvgGT3 (NP_001267849.1), QrUGT84A13 (AHA54051.1), CsUGT84A22 (ALO19890.1), FaGT5 (Q2V6K1.1), DgUGT1 (BAO66179.1), AtUGT84A1 (NP_193283.2), AtUGT84A2 (NP_188793.1), AtUGT84A3 (OAP00592.1), AtUGT84A4 (OAO98847.1), AsUGT84C2 (ACD03236.1), AtUGT84B1 (OAP11221.1), 3 ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

AtUGT75C1 (AAL69494.1), AtUGT75D1 (NP_567471.1), AtUGT75B1 (NP_001320882.1), AtUGT75B2 (NP_172044.1), Os79 (5TMD_A), OsSGT1 (AAF18438.1), AtUGT74E2 (OAP17332.1), AtUGT74D1 (AAM61249.1), AtUGT74C1 (NP_180738.1), AtUGT74B1 (NP_173820.1), AtUGT74F1 (NP_181912.1), AtUGT74F2 (OAP07463.1), LuUGT74S1 (AGD95005.1), NtSGT (NP_001312643.1), VvUFGT (2C1X_A), AtUGT78D1 (NP_564357.1), AtUGT78D2 (NP_197207.1), AtUGT78D3 (NP_197205.1), CsUGT78A14 (ALO19888.1), MtUGT78G1 (XP_003610163.1), CtUGT78K6 (3WC4_A), MtUGT85H2 (2PQ6_A), AtUGT73B1 (NP_567955.1), AtUGT73B2 (NP_567954.1), AtUGT73B3 (NP_567953.1), AtUGT73B4 (NP_001189529.1), AtUGT73B5 (NP_179150.3), AtUGT73C5 (NP_181218.1), AtUGT73C6 (OAP07438.1), GmUGT73F2 (NP_001237242.2), AtUGT89A2 (NP_195969.1), AtUGT89B1 (NP_177529.2), AtUGT72E2 (OAO95244.1), AtUGT72E3 (NP_198003.1), AtUGT72B1 (OAP00532.1), AtUGT72B3 (NP_001322773.1), AtUGT71C4 (NP_563784.2), AtUGT71C3 (NP_172206.1), AtUGT71C2 (NP_180535.1), AtUGT71C1 (NP_180536.1), MtUGT71G1 (2ACV_A), AtUGT71B1 (NP_188812.1), AtUGT71B6 (OAP05506.1), AtUGT71B7 (BAB02841.1), AtUGT71B8 (OAP07002.1), AtUGT71D1 (NP_180534.1), AtUGT71C5 (NP_172204.1), AtUGT71B5 (NP_001328018.1), AtUGT76D1 (NP_180216.1), AtUGT76E12 (NP_566885.1), AtUGT76E2 (NP_200767.1), AtUGT76E11 (NP_190251.1), AtUGT76E1 (NP_200766.2), and AtUGT88A1 (NP_566550.1). FvGT2 and RiGT2 sequences were obtained from the original publication 21 as they had not been deposited in the GenBank database. As, Avena strigosa; At, Arabidopsis thaliana; Cs, Camellia sinensis; Ct, Clitoria ternatea; Dg, Delphinium grandiflorum; Fa, Fragaria x ananassa; Fv, F. vesca; Gm, Glycine max; Lu, Linum usitatissimum; Mt, Medicago truncatula; Nt, Nicotiana tobacum; Os, Oryza sativa; Pg, Punica granatum; Qr, Quercus robur; Ri, Rubus idaeus; Vl, Vitis lubrusca; Vv, V. vinifera. evaluated for quality by examining parameters of the Ramachandran plot, as well as the publicly available software programs Verify3D (three-dimensional/3D profile analysis), 22 Errat (statistical analysis of non-bonded interactions between different atom types), 23 and QMEAN (composite scoring of the geometrical aspects of protein structures) 24. After the initial modeling using different programs and template structures, the homology model of PgUGT84A23 based on Os79, generated by SWISS-MODEL, was selected for downstream analysis and ligand docking. Of the crystallized plant UGTs, Os79 is the most similar to PgUGT84A23 based on secondary structure prediction, phylogenetic analysis (both Os79 and PgUGT84A23 belong to group L), and protein sequence identity (33.71% identical). The measures of quality for the selected PgUGT84A23 model are 92.2% in the favored region, 4.4% in the allowed region, and 3.3% in the outlier region of the Ramachandran plot, a quality score of 88.5% for Verify3D, a quality score of 93.1% for Errat, and a QMEAN Z-score of -3.27. Modeling of PgUGT84A23 was also conducted using AtUGT74F2 (PDB entry 5U6M; 38.94% identical to PgUGT84A23) as a template. The quality measures of the AtUGT74F2-based model were similar to those of the Os79-based model, including 93.9% in the favored region, 4.4% in the allowed region, and 1.8% in the outlier region of the Ramachandran plot, a quality score of 87.3% for Verify3D, a quality score of 84.5% for Errat, and a QMEAN Z-score of -3.27. Molecular docking analysis

4 ACS Paragon Plus Environment

Page 4 of 28

Page 5 of 28

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

AutoDock Vina 25 (version 1.1.2) was used for the ligand docking analysis of the PgUGT84A23 homology model with sugar donors and acceptors. The protein models were prepared using AutoDockTools to add polar hydrogens and obtain the grid box coordinates for docking. The location for docking analysis of the PgUGT84A23 model was selected based on alignments with the crystallized plant UGTs in complex with their respective sugar acceptors and/or UDP or UDP-glucose. For the first round of docking, an additional two angstroms in each direction was added to the area occupied by the sugar acceptor to allow for differences between the PgUGT84A23 and Os79 sugar acceptor binding pockets.

Figure 2. Protein sequence alignment of galloylglucose ester-forming UGT84s and the crystallized group L UGTs. Black and gray shadings indicate conserved identical and similar amino acid residues, respectively. Gaps in sequences are shown with dashes. *, amino acids 5 ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

mutated in this work. The Plant Secondary Product Glycosyltransferase (PSPG) motif is underlined. At, Arabidopsis thaliana; Cs, Camellia sinensis; Fa, Fragaria x ananassa; Os, Oryza sativa; Pg, Punica granatum; Qr, Quercus robur; Ri, Rubus idaeus; Vv, Vitis vinifera. Since the spacious UDP-glucose binding pocket is adjacent to the active site and has numerous hydrogen bonding partners, a portion of the sugar acceptor was found in the location of UDP-glucose binding in several favorable dockings. Therefore, the grid box was repositioned in these docking models to ensure that the docked acceptor molecule was not in the predicted UDP-glucose binding pocket or did not overlap with the docked UDP-glucose in the model. Various ligand conformations (including position, orientation and torsions of the ligand) were compared based on the calculated affinity (kcal/mol) as well as manual inspection in PyMOL (version 1.8.6.0; The PyMOL Molecular Graphics System). For the sugar acceptors, the ligand conformation with the strongest calculated affinity as well as the appropriate carboxyl or hydroxyl group positioned toward H20 (the catalytic residue) was selected for presentation in the figures. Sequence alignment and phylogenetic analysis Plant UGT sequences with demonstrated glucose ester- and/or glucoside-forming activities toward phenolic acids were retrieved from the National Center for Biotechnology Information (NCBI). The protein sequences were aligned using Multiple Sequence Comparison by LogExpectation (MUSCLE) (Figure S1). 26 The sequence alignment was used for constructing a Neighbor Joining (NJ) tree in MEGA7, 27 which was tested with 10,000 bootstrap replicates. Cloning, expression and purification of wild type and mutant PgUGT84A23 proteins Site-directed mutagenesis of PgUGT84A23 was conducted following the Single-Primer Reactions IN Parallel (SPRINP) method 28 using the pHIS8-PgUGT84A23 plasmid 9 as a template. The PCR primers used for site-directed mutagenesis are listed in Table S1. Mutations in the recombinant pHIS8-PgUGT84A23 plasmids were verified by DNA sequencing. Plasmids containing the appropriate mutations were transformed into Rosetta 2 (DE3) pLysS cells for protein expression. The bacterial cells were grown at 37ºC with shaking at 225 rpm to an OD600 of 0.6 - 0.8. To induce protein expression, isopropyl β-D-1-thiogalactopyranoside (IPTG) was added to the cell culture to a final concentration of 0.5 mM. The cells were grown overnight at 16ºC with shaking and harvested by centrifugation at 4ºC. The His-tagged recombinant proteins were purified using Ni-NTA agarose beads (Qiagen, Valencia, CA) and quantified using the Bradford assay. 29 The wild type and mutant PgUGT84A23 proteins were expressed and purified at least three times for use as replicates in the enzyme assays. Yields of the purified wild type and mutant PgUGT84A23 proteins were comparable, suggesting that the amino acid mutations did not affect the stability and solubility of the proteins. Lysates of uninduced and induced cell as well as purified proteins were separated on SDS-PAGE gels to verify protein induction (Figure S2). Based on analysis of the SDS-PAGE gel images, the target protein (i.e. wild type or mutant PgUGT84A23) was between 60% - 90% of total purified proteins, with an average of 72%. Similar levels of target proteins were used in the enzyme assays after taking into consideration the quantity of total purified proteins and the percentage of co-purified proteins in each sample. Enzyme assays and kinetic analysis of wild type and mutant PgUGT84A23 proteins 6 ACS Paragon Plus Environment

Page 6 of 28

Page 7 of 28

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

The UGT enzyme assay mixture included 50 mM MES (pH 5.0), UDP-glucose and the sugar acceptor, each at 1 mM, 14 mM 2-mercaptoethanol, and 5 µg purified target protein. The sugar acceptors tested include GA, benzoic acid, 4-hydroxybenzoic acid, 3,4-dihydroxybenzoic acid, cinnamic acid, coumaric acid, caffeic acid, apigenin, and genistein. The reaction mixture was incubated at 30ºC for 30 min and terminated by adding 10 µL of 100% (w/v) trichloroacetic acid (TCA) and 100 µL of 50% methanol. High Performance Liquid Chromatography (HPLC) analysis of the reaction mixtures was carried out as previously described. 9 Glucosylated products of gallic acid, apigenin, and genistein were identified based on comparison of UV absorption spectra and retention time to the following authentic standards. 1-O-galloyl-β-Dglucose was purchased from Carbosynth Limited (Berkshire, UK); apigenin-7-O-glucoside and genistein-7-O-glucoside were obtained from Chromadex (Irvine, CA). Glucosylated products of benzoic acid, 4-hydroxybenzoic acid, 3,4-dihydroxybenzoic acid, cinnamic acid, coumaric acid, and caffeic acid were compared to the aglycone substrates, where the conjugates showed similar UV spectra but altered retention time. Glucosylated products of all the substrates tested were also analyzed by mass spectrometry analysis as described. 9 No enzyme, boiled protein, and no UDPglucose were used as controls for the enzyme assays. Products were not observed in these control reactions. The wild type and mutant PgUGT84A23 enzyme activities were linear at 30ºC for at least 60 min. Their enzyme activities were also linear with protein concentrations when assayed at multiple time points between 15 min and 60 min. For measuring steady-state kinetics of the wild type and mutant PgUGT84A23 proteins, the enzyme assay mixture (100 µL) contained 50 mM MES, pH 5.0, 1 mM UDP-glucose, 0.2 - 1 mM GA, 14 mM 2-mercaptoethanol, and 5 µg of purified recombinant protein. Triplicate reactions were carried out at 30ºC for 30 min, terminated, and analyzed on HPLC as described above. The specific activity (nkat/mg) of PgUGT84A23 proteins was calculated as nmol substrate converted/s (nkat) by 1 mg of purified protein. The kinetic parameters were determined using hyperbolic regression fitting to the Michaelis-Menten equation with a custom script in R. Statistical analysis Statistical analysis of the enzyme specific activities and kinetic parameters was conducted in R. Multiple comparisons (Analysis of variance/ANOVA) were performed for the wild type or mutant PgUGT84A23 enzymes with each sugar acceptor substrate. The residuals were normally distributed (based on the normal quantile-quantile plot) and the variances homogeneous allowing for ANOVA analysis. Tukey Honest Significant Differences (HSDs) were calculated as a post hoc test and significance categories were determined using the multcompView package in R. Results Structural modeling and docking analysis of PgUGT84A23 identified candidate amino acid residues for enzyme activity and substrate binding

7 ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

To identify the amino acid sites responsible for substrate binding and activity of the galloylglucose ester-forming UGT84s, a 3D homology model of PgUGT84A23 was constructed using Os79 as a template (Figure 3A). The UDP-glucose (sugar donor) binding pocket of the PgUGT84A23 model is highly consistent with the crystallized plant UGTs (Figure 3B). 6, 14-20 In addition, the docked conformation of UDP-glucose with the PgUGT84A23 model was almost completely aligned with the bound UDP-2-fluoro-2-deoxy-D-glucose (U2F; an analog of UDPglucose) co-crystallized with Os79 (Figure 3B).

8 ACS Paragon Plus Environment

Page 8 of 28

Page 9 of 28

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

Figure 3. A homology model of PgUGT84A23 based on the crystal structure of Os79. (A) Overlay of the secondary structure of Os79 (in pink) and the homology model of PgUGT84A23 (in blue). (B) The PgUGT84A23 homology model docked with UDP-glucose. In blue, the PgUGT84A23 homology model showing side chains of residues forming the sugar donor binding pocket. In teal, predicted amino acids in the sugar donor binding pocket that were mutated in this work. In red, oxygen atom of water adjacent to the UDP-glucose binding pocket of the Os79 crystal structure. In grey, a representative docking of UDP-glucose. Based on the docking analysis, several UDP-glucose-coordinating residues in the PgUGT84A23 model are within the PSPG motif (Figures 2 and 3B). These include W342 that stabilizes the uracil ring, H360, W363, N364, and S365 that interact with the phosphate oxygens, Q345 and E368 that make contact with the C2’ and C3’ hydroxyls of ribose, D384 that forms polar contacts with the C3’ and C4’ hydroxyls of glucose, and Q385 that forms polar contacts with the C2’ and C3’ hydroxyls of glucose. There are additional UDP-glucose-coordinating residues in the PgUGT84A23 homology model located outside of the PSPG motif (Figures 2 and 3B). These include N22 that stabilizes the region from the uracil ring through the phosphates, V144 that forms polar contacts between the oxygen of the backbone and the C6’ hydroxyl of glucose, and S284 that interacts with phosphate oxygen(s). D384 located in the PSPG motif was selected as a representative for the mutagenesis analysis to test the effectiveness of modeling and docking for UDP-glucose binding. Unlike the conserved UDP-glucose binding pocket, the sugar acceptor binding pocket varies among different plant UGT crystal structures. 6, 14-20 In the docking of GA to the PgUGT84A23 homology model, the side chains of H20, Q145, Y153, F189, W382, and D384 directly coordinate GA (Figure 4A). H20 is hypothesized to be the catalytic residue and forms a hydrogen bond with the carboxyl group of GA. Although not in direct contact with GA, the side chain of N122 hydrogen bonds with the side chain of H20. Q145 and Y153 are well positioned to hydrogen bond with a water that would be in the correct position to hydrogen bond with the 3hydroxyl of GA (Figure 4A). The backbone of W382 and the sidechain of D384 are in the correct position to hydrogen bond with the 4-hydroxyl of GA. Additionally, the side chains of F15, F124, I203, L199, F189, W382, and V285 surround the acceptor binding pocket with the bulky side chains of F189 and W382 forming one side of the binding pocket (Figure 4A). H20, N122, Q145, Y153, F189, and W382 were selected for mutagenesis to assess their functions in GA binding and catalysis. UGT sequence alignment identified additional candidate amino acids To identify the conserved amino acids within the galloylglucose ester-forming UGTs, their protein sequences were aligned and compared with those plant UGTs that had been tested with phenolic acids as substrates (forming glucosides and/or glucose esters) (Figure S1). These UGTs also showed activities towards substrates other than phenolic acids in enzyme assays. Multiple sites are shared amongst UGTs that produce galloylglucose esters or more broadly glucose esters, but are not found in any of the UGTs that produce glycosides (Figure S1). Seven amino acids were selected for site-directed mutagenesis based on the sequence alignment (Figures 2 and S1). D219 is found exclusively in the galloylglucose ester-forming UGTs and not outside the UGT84s. Q18 and D388 are conserved in most UGT84s, but are also present in this position outside the UGT84s. G98 and C356 are only present in the group L UGTs, but are absent from the other UGTs at the same position. Y154 and G159 are mostly conserved 9 ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 28

among the group L UGTs, and located near Y153, an amino acid predicted to interact with GA by the modeling and docking analysis (Figure 4A).

10 ACS Paragon Plus Environment

Page 11 of 28

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

Figure 4. The PgUGT84A23 homology model docked with gallic acid. (A) In blue, the PgUGT84A23 homology model showing side chains of residues forming the sugar acceptor binding pocket. In teal, predicted amino acids in the sugar acceptor binding pocket that were mutated in this work. In red, oxygen atoms of water in the sugar acceptor binding pocket of the Os79 crystal structure. In grey, a representative docking conformation of gallic acid with the PgUGT84A23 homology model using AutoDock Vina. (B) In blue, the PgUGT84A23 homology model. In teal, amino acids selected from UGT sequence alignment that were mutated in this work. In grey, a representative docking conformation of gallic acid with the PgUGT84A23 homology model using AutoDock Vina. Mutagenesis of the candidate amino acids revealed sites critical for PgUGT84A23 substrate binding and enzyme activity To assess the role of the candidate amino acids on PgUGT84A23 activity and substrate binding, 20 mutations at 14 amino acid sites were generated and their catalytic activities toward 9 sugar acceptors determined (Table 1). The three groups of structurally similar compounds, including 4 benzoic acid derivatives, 3 cinnamic acid derivatives, and 2 structural isomers of (iso)flavonoids, allow comparative analysis of PgUGT84A23 activities toward different sizes, functional groups, as well as relative ring positions and orientations of the sugar acceptors (Figure 5). Of the 20 PgUGT84A23 mutants, D384A completely abolished enzyme activity with the 9 sugar acceptors (Table 1), confirming the critical role of D384 in binding UDP-glucose. In addition, 6 mutants at 4 sites, including H20A, N122A, Q145A, Q145S, Y153A, and Y153F, exhibited reduced activities (to 2-58% of the wild type protein activity) with all 9 substrates. These results indicate the importance of H20, N122, Q145, Y153, and D384 for PgUGT84A23 activity through either substrate binding or enzyme catalysis. Furthermore, 10 mutants at 9 sites, including Q18A, H20Q, G98A, N122D, Y154A, F189A, D219A, W382A, W382Y, and D388N, had reduced activities with 1 to 8 sugar acceptors (Table 1). Notably, Q18A, G98A, and D388N mutations lowered the activities of PgUGT84A23 toward genistein, but not the other sugar acceptors. On the other hand, H20Q and W382A significantly decreased the activities of all substrates, except for apigenin (Table 1). Although not statistically significant, H20Q showed a trend of reduced activities toward apigenin compared to the wild type enzyme. Two mutants/sites, including G159A and C356A, did not affect enzyme activities for all the substrates tested (Table 1). D219N led to a slightly increased activity for 4-hydroxybenzoic acid (4-HBA), but not other substrates. It is worth noting that, as with wild type PgUGT84A23, 9 the mutant proteins formed a single glucosylated product. In addition, none of the mutations resulted in formation of GA glucosides instead of galloylglucose esters. Furthermore, the mutant proteins were not active for the substrates that previously did not show product formation with the wild type PgUGT84A23, 9 including catechol, resveratrol, chlorogenic acid, catechin, epicatechin, cyanidin, delphinidin, and pelargonidin (data not shown). To investigate how the mutations may affect the intrinsic activities of PgUGT84A23, kinetic measurements toward GA were carried out for the wild type enzyme as well as the H20A, N122A, Q145A, and Y153A mutants (Table 2). These 4 mutants showed largely reduced, but not abolished, activities with all 9 substrates (Table 2). H20A, N122A, Q145A, and Y153A displayed affinity for GA similar to the wild type PgUGT84A23 protein (Table 2). However, the much lower turnover of the mutant proteins led to overall compromised enzyme efficiencies (kcat/Km) that are 3-15% of the wild type enzyme (Table 2). 11 ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 28

Figure 5. Chemical structures of the sugar acceptors used in the wild type and mutant PgUGT84A23 enzyme assays (A) and three-dimensional (3D) structures of apigenin and genistein (B). The 3D conformer structures were downloaded from PubChem (https://pubchem.ncbi.nlm.nih.gov). The A, B, and C rings of apigenin and genistein are indicated. BA, benzoic acid; 4-HBA, 4-hydroxybenzoic acid; 3,4-DHBA, 3,4-dihydroxybenzoic acid. Table 1. Specific activities of the wild type and mutant PgUGT84A23 proteins toward phenolic acid and (iso)flavonoid substrates. PgUGT84A23

Gallic Acid

WT

5.1 ± 0.2 (100 ± 5)

Q18A

5.3 ± 0.1 (103 ± 3)

H20A

0.2 ± 0.1 (4 ± 1)

H20Q

0.4 ± 0.1 (8 ± 3)

G98A

5.3 ± 0.1 (104 ± 2)

N122A

1.7 ± 0.3

a

a

f

f

a

def

BA

4-HBA ab

0.6 ± 0.1 (100 ± 14) abcde

0.4 ± 0.1 (74 ± 9)

3,4-DHBA b

6.1 ± 0.3 (100 ± 5) 5.6 ± 0.2 (92 ± 3)

-

0.2 ± 0.0 (3 ± 1)

-

0.2 ± 0.1 (3 ± 2) abcd

0.5 ± 0.1 (83 ± 9)

0.1 ± 0.1

f

5.6 ± 0.1 (92 ± 2) 1.1 ± 0.3

b

g

g

b

fg

Cinnamic Acid Coumaric Acid

a

a

5.2 ± 0.2 (100 ± 3)

2.8 ± 0.4 (100 ± 14)

ab

abcd

4.9 ± 0.0 (94 ± 1)

0.3 ± 0.0 (6 ± 1) 0.6 ± 0.2 (12 ± 4)

h

0.2 ± 0.1 (6 ± 2)

h

5.1 ± 0.0 (98 ± 1) 2.6 ± 0.3

2.2 ± 0.3 (79 ± 10)

0.2 ± 0.1 (6 ± 2)

a

fg

fg

abcd

2.4 ± 0.2 (85 ± 7)

def

0.3 ± 0.1

fg

ab

3.8 ± 0.3 (100 ± 9)

ab

3.7 ± 0.3 (97 ± 8) 0.3 ± 0.1 (7 ± 2)

ghi

0.3 ± 0.1 (7 ± 3)

hi

ab

3.7 ± 0.3 (99 ± 8) 1.0 ± 0.3

12 ACS Paragon Plus Environment

efghi

Caffeic Acid abc

5.3 ± 0.3 (100 ± 7)

abcde

4.4 ± 0.2 (84 ± 3)

0.5 ± 0.3 (9 ± 4) 0.4 ± 0.2 (8 ± 4)

h

h

abcde

4.6 ± 0.2 (87 ± 3) 1.5 ± 0.2

gh

Apigenin abcd

2.7 ± 0.3 (100 ± 11)

bcde

1.9 ± 0.3 (70 ± 11)

e

0.6 ± 0.3 (22 ± 13)

de

1.3 ± 0.2 (48 ± 6)

cde

1.8 ± 0.3 (65 ± 11) 1.0 ± 0.2

e

Genistein ab

6.9 ± 0.3 (100 ± 4)

def

3.4 ± 0.4 (49 ± 6)

i

0.2 ± 0.0 (2 ± 1)

i

0.1 ± 0.0 (1 ± 1)

cde

4.4 ± 0.5 (64 ± 6) 0.9 ± 0.3

ghi

Page 13 of 28

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

(33 ± 5)

(9 ± 9) a

N122D

4.9 ± 0.2 (96 ± 4)

Q145A

1.4 ± 0.3 (29 ± 6)

Q145S

2.0 ± 0.2 (39 ± 4)

Y153A

1.1 ± 0.3 (22 ± 5)

Y153F

2.8 ± 0.3 (55 ± 6)

Y154A

3.9 ± 0.3 (76 ± 6)

G159A

5.4 ± 0.1 (106 ± 3)

F189A

2.5 ± 0.3 (49 ± 6)

D219A

3.1 ± 0.3 (61 ± 6)

D219N

5.2 ± 0.2 (102 ± 4)

C356A

5.2 ± 0.3 (102 ± 5)

W382A

1.3 ± 0.3 (25 ± 5)

W382Y

4.5 ± 0.3 (88 ± 6) -

ef

def

ef

cde

abc

a

cde

bcd

a

a

ef

0.1±0.1 (8 ± 9)

D384A

a

D388N

5.2 ± 0.2 (102 ± 3)

g

gh

def

0.2 ± 0.1 (38 ± 9)

bcdef

0.3 ± 0.1 (52 ± 16)

ab

efg

ef

cdef

abcde

0.4 ± 0.1 (74 ± 9)

a

0.7 ± 0.2 (117 ± 25) abc

0.6 ± 0.0 (95 ± 3) -

cde

cde

ef

1.1 ± 0.3 (39 ± 9)

a

a

5.1 ± 0.0 (100 ± 1)

de

3.0 ± 0.2 (108 ± 7)

defg

3.1 ± 0.4 (51 ± 6)

defg

2.5 ± 0.3 (48 ± 5)

bc

1.0 ± 0.2 (35 ± 5)

bcde

5.3 ± 0.2 (87 ± 3)

bcdef

3.5 ± 0.5 (67 ± 9)

a

1.5 ± 0.3 (53 ± 10)

a

7.6 ± 0.5 (125 ± 7)

ab

5.2 ± 0.2 (100 ± 4)

ab

6.7 ± 0.2 (110 ± 3)

5.1 ± 0.4 (98 ± 7)

g

2.8 ± 0.1 (100 ± 3)

a

abc

2.5 ± 0.2 (89 ± 5)

fgh

1.2 ± 0.4 (23 ± 7)

0.1 ± 0.1 (4 ± 2)

abcd

2.3 ± 0.4 (38 ± 6) -

abcdef

cdefg

5.0 ± 0.3 (96 ± 6)

ef

0.1 ± 0.0 (12 ± 3) -

efg

0.7 ± 0.3 (24 ± 9)

ab

3.7 ± 0.5 (61 ± 9)

0.3 ± 0.1 (5 ± 2)

fg

0.4 ± 0.2 (13 ± 7)

3.0 ± 0.4 (58 ± 7)

6.3 ± 0.0 (103 ± 1)

0.2 ± 0.1 (39 ± 9)

fg

fgh

2.4 ± 0.3 (39 ± 5)

bc

5.1 ± 0.1 (98 ± 2)

g

efg

3.9 ± 0.2 (75 ± 4) -

5.3 ± 0.4 (87 ± 6)

fg

0.5 ± 0.1 (18 ± 2)

1.1 ± 0.3 (21 ± 5)

ab

0.6 ± 0.0 (107 ± 3)

0.2 ± 0.0 (7 ± 1)

2.3 ± 0.2 (44 ± 3)

g

(25 ± 6)

cdefg

1.4 ± 0.3 (50 ± 9)

0.9 ± 0.2 (17 ± 4)

fg

0.8 ± 0.3 (13 ± 5)

(10 ± 3)

abc

4.2 ± 0.3 (81 ± 6)

1.4 ± 0.2 (23 ± 3)

f

0.4 ± 0.1 (72 ± 9)

(50 ± 6)

cd

3.9 ± 0.3 (64 ± 4)

0.5 ± 0.0 (8 ± 1)

0.1±0.0 (20 ± 3)

ef

ab

(18 ± 4) def

0.2 ± 0.1 (28 ± 9)

0.8 ± 0.1 (28 ± 3) -

a

abcde

1.9 ± 0.3 (111 ± 9)

(29 ± 4)

cde

2.3 ± 0.3 (59 ± 6)

0.3 ± 0.1 (8 ± 2)

h

gh

0.9 ± 0.2 (23 ± 5)

0.8 ± 0.3 (22 ± 8)

h

defgh

1.4 ± 0.3 (36 ± 8)

defghi

1.6 ± 0.4 (42 ± 10)

a

4.2 ± 0.1 (111 ± 3)

bcd

2.5 ± 0.2 (65 ± 5)

def

1.8 ± 0.2 (49 ± 4)

a

4.3 ± 0.1 (113 ± 3)

ab

3.9 ± 0.2 (103 ± 4)

i

0.2 ± 0.1 (6 ± 3)

defg

1.5 ± 0.2 (39 ± 5) -

abc

3.5 ± 0.2 (93 ± 5)

(37 ± 8)

bcde

4.3 ± 0.3 (76 ± 6)

0.6 ± 0.0 (12 ± 1)

h

gh

1.7 ± 0.3 (32 ± 6)

0.9 ± 0.3 (17 ± 5)

h

fg

2.5 ± 0.3 (49 ± 5)

cdef

3.6 ± 0.5 (69 ± 9)

abcd

5.3 ± 0.1 (101 ± 2)

cdef

3.8 ± 0.3 (72 ± 5)

def

3.7 ± 0.4 (69 ± 8)

a

6.0 ± 0.3 (113 ± 4)

ab

5.9 ± 0.3 (111 ± 5) 0.7 ± 0.3 (14 ± 4)

h

ef

3.3 ± 0.3 (63 ± 6) -

abcde

4.7 ± 0.2 (89 ± 3)

(14 ± 4)

de

1.5 ± 0.2 (53 ± 8)

e

0.3 ± 0.3 (11 ± 13)

e

1.1 ± 0.5 (41 ± 17)

e

0.7 ± 0.4 (26 ± 15)

e

0.8 ± 0.3 (30 ± 11)

de

1.4 ± 0.3 (52 ± 13)

ab

3.3 ± 0.5 (122 ± 17) bcde

1.9 ± 0.3 (70 ± 11)

e

0.7 ± 0.3 (26 ± 11)

abcd

2.8 ± 0.3 (104 ± 11)

abcd

3.0 ± 0.3 (111 ± 13)

abcd

2.6 ± 0.5 (96 ± 17)

a

3.9 ± 0.2 (144 ± 6) -

abc

3.3 ± 0.3 (122 ± 11)

cd

4.7 ± 0.6 (68 ± 8) 0.4 ± 0.1 (6 ± 2)

hi

de

3.7 ± 0.6 (54 ± 8) 0.6 ± 0.4 (9 ± 6)

hi

efgh

2.4 ± 0.6 (42 ± 9)

ghi

1.0 ± 0.4 (14 ± 6)

a

7.7 ± 0.3 (112 ± 5)

ghi

0.7 ± 0.2 (10 ± 3)

fghi

1.4 ± 0.6 (20 ± 9)

bcd

5.1 ± 0.2 (74 ± 3)

abc

6.4 ± 0.2 (93 ± 3)

ghi

0.9 ± 0.2 (13 ± 3)

abc

6.0 ± 0.4 (87 ± 6) -

defg

3.0 ± 0.2 (43 ± 3)

a

The specific activity of the wild type and mutant PgUGT84A23 is expressed as nmol substrate converted per sec by 1 mg of purified recombinant proteins (nmol mg-1 s-1). bEach value is the mean result of at least three enzyme assays using independently expressed and purified proteins ± standard error. cPercentage of wild type enzyme activity is shown in parenthesis. d-, reaction products not detectable. eValues marked with the same letters are not significantly different from each other within a column (P > 0.01). fWT, wild type; BA, benzoic acid; 4-HBA, 4hydroxybenzoic acid; 3,4-DHBA, 3,4-dihydroxybenzoic acid.

Table 2. Kinetic measurement of wild type and mutant PgUGT84A23 proteins using gallic acid as substrate. PgUGT84A23

Vmax (µM s-1)

Km (mM)

WT

1.10 ± 0.06

H20A

1.09 ± 0.07

N122A

1.05 ± 0.13

Q145A

0.93 ± 0.10

Y153A

0.90 ± 0.18

a a a a a

0.23 ± 0.03

a

< 0.01 ± 0.01 0.03 ± 0.01 0.03 ± 0.01 0.01 ± 0.01

b b b

kcat (s-1)

kcat/Km (mM-1 s-1)

0.65 ± 0.05 b

0.02 ± 0.01 0.09 ± 0.01 0.08 ± 0.01 0.02 ± 0.01

a

b b b b

0.59 ± 0.07 0.02 ± 0.01 0.09 ± 0.02 0.08 ± 0.02 0.03 ± 0.02

a

a

b b b b

Values marked with the same letters are not significantly different from each other within a column (P > 0.01). bWT, wild type. 13 ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 28

Discussion Site-directed mutagenesis and kinetic analysis provided new insights into the catalytic mechanism of PgUGT84A23, a galloylglucose ester-forming UGT The catalytic efficiency of PgUGT84A23 for all tested substrates was dramatically reduced by mutations to alanines at H20, N122, Q145, and Y153 located in the sugar acceptor binding pocket (Table 1). Unexpectedly, these mutations only resulted in slower substrate turnovers, but not decreased binding affinity of GA, by the mutated proteins (Table 2). The optimal pH for the glucose ester-forming reaction is around 5 9, 30-32. At this acidic pH, the active site H20 is mostly protonated. Based on the homology model of PgUGT84A23 docked with GA, the protonated H20 forms a hydrogen bonding network with N122 and an oxygen of the GA carboxyl group, which is necessary to destabilize the resonance stabilization of the GA carboxyl group, prior to formation of the galloylglucose ester. As such, the H20A and N122A mutations would disrupt the active site and compromise the reactivity of PgUGT84A23, without having a negative effect on GA binding. However, the H20A and N122A mutations did not completely abolish the PgUGT84A23 activity, suggesting that other catalytic residues/mechanisms are also be involved in the production of galloylglucose esters. Interestingly, a recent study on AtUGT74F2 (a group L UGT that forms an ester between salicylic acid and glucose; optimal reaction pH at 5) suggested a reaction mechanism similar to the galloylglucose ester-forming UGT84s. 5 It also involves hydrogen bonding of the protonated active site histidine with an oxygen of the carboxyl group of salicylic acid. The carboxyl group in salicylic acid then destabilizes and liberates an oxyanion which subsequently attacks the anomeric carbon of UDP-glucose 5. Located at the interface of the sugar acceptor and donor binding pockets, Q145 may play a role in ensuring the correct angles and distances between the carboxyl of GA, the H20 side chain, and the C1’ hydroxyl of the glucose moiety of UDP-glucose, through direct hydrogen bonding with GA or interaction with a bound water (Figure 4A). Similarly, the aromatic side chain and the hydroxyl of Y153 may position GA in the productive orientation with the carboxyl group of GA facing H20, through direct hydrogen bonding with GA or a bound water. Taken together, the Q145A and Y153A mutations likely reduce the catalytic efficiency of PgUGT84A23 via controlling the positioning/orientation, but not binding, of GA in the active site. In the crystal structures of the glucoside-forming UGTs, D117 (AtUGT72B1), 15 D125 (MtUGT85H2), 16 and D120 (Os79) 6 occupy the homologous position to N122 (PgUGT84A23 homology model). These aspartic acids play a supporting role to a histidine residue (homologous to H20 in PgUGT84A23), which deprotonates the hydroxyl groups of the sugar acceptors to generate glucosides. 6, 15, 16 However, the N122D mutation did not shift the activity of PgUGT84A23 from galloylglucose ester to gallic acid glucoside formation (Table 1), suggesting that additional amino acids are necessary to position a hydroxyl group of GA toward H20 for deprotonation and production of gallic acid glucoside(s). This hypothesis is supported by recent observations on the formation of salicylic acid (2-hydroxybenzoic acid) glucoside by AtUGT74F1 and AtUGT74F2. 5 Besides forming a glucose ester at the carboxyl position of phenolic acids (e.g. GA), PgUGT84A23 also glucosylated (iso)flavonoid substrates (e.g. apigenin and genistein) at the hydroxyl position (Table 1). 9 Production of glucose esters and glucosides by PgUGT84A23 prefers acidic and basic pH, respectively. 9 The pKa of 7-OH on apigenin (~7.4) and genistein 14 ACS Paragon Plus Environment

Page 15 of 28

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

(~7.2)33-35 matches closely the pKb of the side chain on the catalytic histidine H20 (pKb ~8) of PgUGT84A23. Therefore, the 7-OH groups of apigenin and genistein are likely to be deprotonated by H20. On the other hand, the pKa of 4’-OH on apigenin (~9) and genistein (~9.5) as well as the pKa of 5-OH on apigenin (~11.6) and genistein (~13)33-35 are too high for the acidbase reaction (i.e. deprotonation of the hydroxyl group by H20) to occur. Consistent with this analysis, only 7-O-glucosylated products were produced when the wild type and mutant

PgUGT84A23 enzymes reacted with apigenin and genistein (Figure 6). Figure 6. PgUGT84A23 glucosylates apigenin (A) and genistein (B) in the 7-O position. The authentic glucoside standards as well as HPLC elution profiles of reaction products with or without PgUGT84A23 are shown. Key amino acid residues were identified for substrate binding by PgUGT84A23 Since F189A, W382A, Y154A, and D219A only modified the specific activities of PgUGT84A23 for some sugar acceptors tested, these amino acids may function in binding, but not turnover, of the substrates (Table 1). When the size of the amino acid side chain is reduced in F189A, it led to a substantial decrease in PgUGT84A23 enzyme activity with the benzoic acid derivatives and cinnamic acid, but not the relatively large molecules caffeic acid, coumaric acid, and apigenin (Table 1; Figure 5). Therefore, F189 may instigate a steric hindrance effect that traps the small substrates in the sugar acceptor binding pocket. However, F189A lost most of its activity with genistein (5,7,4’-trihydroxy-isoflavone), a structural isomer of apigenin (5,7,4’trihydroxy-flavone). Genistein and apigenin only differ for the position and orientation of the B 15 ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 28

ring with respect to the A and C rings (Figure 5B). The distinct impact of F189A on genistein and apigenin suggests differential binding and/or reaction mechanisms of the two compounds by PgUGT84A23. It also indicates the importance of the relative ring position and orientation in the (iso)flavonoid substrates for PgUGT84A23 catalysis. W382 is located at the entrance to the acceptor binding pocket (Figure 4A). W382A widens the opening to the acceptor binding pocket and dramatically decreases its activity with all the substrates except for apigenin (Table 1). This suggests that the bulky side chain of tryptophan may help retain the sugar acceptor molecules in the active site through steric hindrance. The little impact of W382A on apigenin may be due to stabilization of apigenin contributed by its interactions with additional amino acids. The W382Y and Y154A mutations did not display a clear trend of differential impact on substrate binding with regard to substrate size or the number of hydroxyl groups, suggesting that other mechanisms must be involved in binding and/or orienting of the sugar acceptors. According to the PgUGT84A23 homology model, D219 forms hydrogen bonds with W363 and a water bound by N364; both W363 and N364 directly interact with UDP-glucose (Figures 3B and 4B). D219A, but not D219N, disrupts the hydrogen bonding network of D219, W363, water, and N364, which subsequently destabilizes the binding of UDP-glucose and causes a substantial reduction in activity for apigenin and genistein (Table 1). The moderate or no impact of D219A on the benzoic acid and cinnamic acid derivatives suggests that these molecules are less sensitive to changes in the stability of the sugar donor binding pocket than the (iso)flavonoid substrates. Besides PgUGT84A23 characterized in this work, only one other UGT84 family glycosyltransferase, FaGT2* from the cultivated strawberry (F. ananassa), has been subjected to site-directed mutagenesis. 21 FaGT2* and FaGT2 are natural variants of the same enzyme and differ in 8 amino acids (all outside of the PSPG motif). Three amino acids in FaGT2* were mutated to the corresponding amino acids in FaGT2; the resulting mutants FaGT2*_R230S and FaGT2*_E420D_I422V (a double mutant) had increased specific activities toward GA, but were less active for other aliphatic and aromatic acids tested. 21 R230, E420, and I422 in FaGT2* correspond to H226, D421, and V423 in the PgUGT84A23 homology model, respectively. These three amino acids are located on helices distant from the sugar acceptor binding pocket in the PgUGT84A23 model and are not highly conserved in the galloylglucose ester-forming UGTs (Figure S1). It is possible that that these amino acid sites affect substrate binding by altering the overall folding of the FaGT2*/FaGT2 and, possibly PgUGT84A23, proteins. Taken together, the comparative study of FaGT2*/FaGT2 activities indicates that, in addition to protein structural modeling and substrate docking, natural variations could also be useful in revealing the amino acid sites involved in substrate binding. Are amino acid sites retained for substrate binding by the ancestral UGT enzyme? PgUGT84A23 produced galloylglucose esters in vitro and in plant tissues. 9 However, mutations of three highly conserved amino acid sites in UGT84s, Q18A, G98A, and D388N, attenuated the activities of PgUGT84A23 toward genistein, but not GA or other sugar acceptors (Table 1). Interestingly, of the various phenolic acids, (iso)flavonoids, and other phenylpropanoid compounds tested, the wild type PgUGT84A23 protein has the highest in vitro activity with genistein, though the genistein aglycone and glycosides are not detectable in the pomegranate (P. granatum) tissues where PgUGT84A23 is expressed. 9, 36 It is worth mentioning that Q18, G98, and D388 are not located in the sugar acceptor binding pocket based on the 16 ACS Paragon Plus Environment

Page 17 of 28

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

homology model of PgUGT84A23 docked with GA. They were identified through sequence conservation within the galloylglucose ester-forming UGTs. Although enzymes involved in plant specialized metabolism (e.g. UGTs) are specific with their native substrates when examined in plants, they often exhibit multispecificity/promiscuity towards other substrates in in vitro enzyme assays. 37 This multispecificity of substrates, which is mediated by conformational diversity (i.e. different active site configurations) in the enzymes, has also been seen for enzymes in humans and microbes. 38, 39 Similarly, Q18, G98, and D388 identified in this work could play a role in maintaining the conformational diversity of the galloylglucose ester-forming UGTs. As such, UGT84s can also utilize genistein as a substrate in vitro, even though they form a galloylglucose ester in planta. 9 The surprising discovery of amino acid sites that are specific for binding genistein in a galloylglucose ester-forming UGT also prompts the question: what are the substrates for the enzyme ancestral of these UGTs? Recruitment of enzymes from the core metabolic pathways (ancestral enzymes) to more specialized defense functions has occurred many times during plant evolution. 40, 41 Such genome and metabolic plasticity is important for individual plants and populations to adapt to rapid changes in the environment (many specialized metabolites play defense functions in plants). For instance, the convergent evolution of the methylation reactions of caffeine biosynthesis indicates how this recruitment can occur repeatedly, and by different paths, even within the same metabolic network. 42 It was suggested that GA and other highly oxidized (e.g. containing multiple hydroxyl groups) metabolites had evolved during the transition from a general to a more specialized plant defense. 43, 44 Consequently, the UGTs that modify GA may have evolved from an ancestral UGT that work(ed) with other metabolites. The structural profile of genistein may resemble that of the sugar acceptor for this ancient UGT enzyme. It is also possible that the galloylglucose esterforming UGTs were recruited from UGTs that act(ed) on isoflavonoids, which are found throughout the land plants (though significant production of isoflavonoids is mostly restricted to the Fabaceae 45). Conclusion Using an integrated structural modeling, sequence comparison, and analogous substrate approach, the catalytic mechanism of a galloylglucose ester-forming UGT was investigated and substrate selectivity and reactivity revealed. An unexpected insight is that the ancestral function of the galloylglucose ester-forming UGTs could be for glucoside formation with a substrate similar to genistein (genistein does not contain a carboxyl group). The information obtained from this work can be directly applied to engineering UGT84 enzymes with enhanced/altered substrate specificity and reactivity for producing valuable bioproducts through synthetic biology. Acknowledgements: We thank Kyle Pelot and Katherine Murphy for their assistance in cloning some of the mutated PgUGT84A23 genes. This work was supported by the National Science Foundation (MCB1120323 to LT), BARD (IS-4822-15R to RA, LT, and DH), the National Science Foundation Graduate Research Fellowship to NNO, the UC Davis, Department of Plant Sciences Graduate Research Fellowship and the Henry A. Jastro Research Fellowship to AEW and NNO. This work is dedicated to the memory of Professor Eric E. Conn (1923-2017). Conflict of interest: The authors declare they have no conflict of interest.

17 ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 28

Author contributions: AEW and LT designed the work; AEW, XF, and NNO performed the research; AEW and LT analyzed the data; AEW and LT wrote the draft; NNO, DH, and RA edited the manuscript. Supporting Information Alignment of plant UGTs with characterized activities toward phenolic acids (Figure S1). Induction and purification of wild type and mutant PgUGT84A23 proteins (Figure S2). The primers used for site-directed mutagenesis of PgUGT84A23 (Table S1).

References [1] Bowles, D., Isayenkova, J., Lim, E., and Poppenberger, B. (2005) Glycosyltransferases: managers of small molecules, Curr Opin Plant Biol 8, 254-263. [2] Wilson, A. E., Matel, H. D., and Tian, L. (2016) Glucose ester enabled acylation in plant specialized metabolism, Phytochem Rev 15, 1057-1074. [3] Harborne, J., and Corner, J. (1961) The cinnamic esters of Antirrhinum majus flowers, Arch Biochem Biophys 92, 192-193. [4] Dembitsky, V. M. (2004) Chemistry and biodiversity of the biologically active natural glycosides, Chem Biodivers 1, 673-781. [5] George Thompson, A. M., Iancu, C. V., Neet, K. E., Dean, J. V., and Choe, J.-y. (2017) Differences in salicylic acid glucose conjugations by UGT74F1 and UGT74F2 from Arabidopsis thaliana, Sci Rep 7, 46629. [6] Wetterhorn, K. M., Newmister, S. A., Caniza, R. K., Busman, M., McCormick, S. P., Berthiller, F., Adam, G., and Rayment, I. (2016) Crystal structure of Os79 (Os04g0206600) from Oryza sativa: A UDP-glucosyltransferase involved in the detoxification of deoxynivalenol, Biochemistry 55, 6175-6186. [7] Osmani, S. A., Bak, S., and Møller, B. L. (2009) Substrate specificity of plant UDPdependent glycosyltransferases predicted from crystal structures and homology modeling, Phytochemistry 70, 325-347. [8] Hughes, J., and Hughes, M. (1994) Multiple secondary plant product UDP-glucose glucosyltransferase genes expressed in cassava (Manihot esculenta Crantz) cotyledons, DNA Seq 5, 41-49. [9] Ono, N., Qin, X., Wilson, A., Li, G., and Tian, L. (2016) Two UGT84 family glycosyltransferases catalyze a critical reaction of hydrolyzable tannin biosynthesis in pomegranate (Punica granatum), PLoS One 11, e0156319. [10] Biasini, M., Bienert, S., Waterhouse, A., Arnold, K., Studer, G., Schmidt, T., Kiefer, F., Cassarino, T. G., Bertoni, M., Bordoli, L., and Schwede, T. (2014) SWISS-MODEL: modelling protein tertiary and quaternary structure using evolutionary information, Nucl Acids Res 42, W252-W258. [11] Kelley, L. A., Mezulis, S., Yates, C. M., Wass, M. N., and Sternberg, M. J. E. (2015) The Phyre2 web portal for protein modeling, prediction and analysis, Nat Protoc 10, 845-858. [12] Roy, A., Kucukural, A., and Zhang, Y. (2010) I-TASSER: a unified platform for automated protein structure and function prediction, Nat Protoc 5, 725-738. [13] Webb, B., and Sali, A. (2016) Comparative protein structure modeling using MODELLER, In Curr Protoc Protein Sci, pp 2.9.1-2.9.37, John Wiley & Sons, Inc. 18 ACS Paragon Plus Environment

Page 19 of 28

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

[14] Offen, W., Martinez-Fleites, C., Yang, M., Kiat-Lim, E., Davis, B. G., Tarling, C. A., Ford, C. M., Bowles, D. J., and Davies, G. J. (2006) Structure of a flavonoid glucosyltransferase reveals the basis for plant natural product modification, EMBO J 25, 1396-1405. [15] Brazier-Hicks, M., Offen, W. A., Gershater, M. C., Revett, T. J., Lim, E.-K., Bowles, D. J., Davies, G. J., and Edwards, R. (2007) Characterization and engineering of the bifunctional N- and O-glucosyltransferase involved in xenobiotic metabolism in plants, Proc Natl Acad Sci 104, 20238-20243. [16] Li, L., Modolo, L. V., Escamilla-Trevino, L. L., Achnine, L., Dixon, R. A., and Wang, X. (2007) Crystal structure of Medicago truncatula UGT85H2 - Insights into the structural basis of a multifunctional (iso)flavonoid glycosyltransferase, J Mol Biol 370, 951-963. [17] Shao, H., He, X., Achnine, L., Blount, J. W., Dixon, R. A., and Wang, X. (2005) Crystal structures of a multifunctional triterpene/flavonoid glycosyltransferase from Medicago truncatula, Plant Cell 17, 3141-3154. [18] Modolo, L. V., Li, L., Pan, H., Blount, J. W., Dixon, R. A., and Wang, X. (2009) Crystal structures of glycosyltransferase UGT78G1 reveal the molecular basis for glycosylation and deglycosylation of (iso)flavonoids, J Mol Biol 392, 1292-1302. [19] Hiromoto, T., Honjo, E., Tamada, T., Noda, N., Kazuma, K., Suzuki, M., and Kuroki, R. (2013) Crystal structure of UDP-glucose:anthocyanidin 3-O-glucosyltransferase from Clitoria ternatea, J Synchrotron Radiat 20, 894-898. [20] Hiromoto, T., Honjo, E., Noda, N., Tamada, T., Kazuma, K., Suzuki, M., Blaber, M., and Kuroki, R. (2015) Structural basis for acceptor-substrate recognition of UDP-glucose: anthocyanidin 3-O-glucosyltransferase from Clitoria ternatea, Protein Sci 24, 395-407. [21] Schulenburg, K., Feller, A., Hoffmann, T., Schecker, J. H., Martens, S., and Schwab, W. (2016) Formation of β-glucogallin, the precursor of ellagic acid in strawberry and raspberry, J Exp Bot 67, 2299-2308. [22] Eisenberg, D., Lüthy, R., and Bowie, J. U. (1997) VERIFY3D: Assessment of protein models with three-dimensional profiles, Methods Enzymol 277, 396-404. [23] Colovos, C., and Yeates, T. O. (1993) Verification of protein structures: patterns of nonbonded atomic interactions, Protein Sci 2, 1511-1519. [24] Benkert, P., Tosatto, S. C. E., and Schomburg, D. (2008) QMEAN: A comprehensive scoring function for model quality assessment, Proteins 71, 261-277. [25] Trott, O., and Olson, A. J. (2010) AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading, J Comput Chem 31, 455-461. [26] Edgar, R. (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput, Nucl Acids Res 32, 1792-1797. [27] Kumar, S., Stecher, G., and Tamura, K. (2016) MEGA7: molecular evolutionary genetics analysis version 7.0 for bigger datasets, Mol Biol Evol 33, 1870-1874. [28] Edelheit, O., Hanukoglu, A., and Hanukoglu, I. (2009) Simple and efficient site-directed mutagenesis using two single-primer reactions in parallel to generate mutants for protein structure-function studies, BMC Biotechnol 9, 61-61. [29] Bradford, M. (1976) A rapid and sensitive for the quantitation of microgram quantities of protein utilizing the principle of protein-dye binding, Anal Biochem 72, 248-254. [30] Cui, L., Yao, S., Dai, X., Yin, Q., Liu, Y., Jiang, X., Wu, Y., Qian, Y., Pang, Y., Gao, L., and Xia, T. (2016) Identification of UDP-glycosyltransferases involved in the 19 ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 28

biosynthesis of astringent taste compounds in tea (Camellia sinensis), J Exp Bot 67, 2285-2297. [31] Mittasch, J., Böttcher, C., Frolova, N., Bönn, M., and Milkowski, C. (2014) Identification of UGT84A13 as a candidate enzyme for the first committed step of gallotannin biosynthesis in pedunculate oak (Quercus robur), Phytochemistry 99, 44-51. [32] Khater, F., Fournand, D., Vialet, S., Meudec, E., Cheynier, V., and Terrier, N. (2012) Identification and functional characterization of cDNAs coding for hydroxybenzoate/hydroxycinnamate glucosyltransferases co-expressed with genes related to proanthocyanidin biosynthesis, J Exp Bot 63, 1201-1214. [33] Favaro, G., Clementi, C., Romani, A., and Vickackaite, V. (2007) Acidichromism and ionochromism of luteolin and apigenin, the main components of the naturally occurring yellow weld: a spectrophotometric and fluorimetric study, J Fluoresc 17, 707-714. [34] Zielonka, J., Gbicki, J., and Grynkiewicz, G. (2003) Radical scavenging properties of genistein, Free Radic Biol Med 35, 958-965. [35] Amat, A., Clementi, C., De Angelis, F., Sgamellotti, A., and Fantacci, S. (2009) Absorption and emission of the apigenin and luteolin flavonoids: a TDDFT investigation, J Phys Chem A 113, 15118-15126. [36] Seeram, N., Zhang, Y., Reed, J., Krueger, C., and Vaya, J. (2006) Pomegranate phytochemicals, In Pomegranates ancient roots to modern medicine (Seeram, N., Schulman, R., and Heber, D., Eds.), pp 3- 29, Taylor and Francis, Boca Raton, FL. [37] Weng, J.-K., Philippe, R. N., and Noel, J. P. (2012) The rise of chemodiversity in plants, Science 336, 1667-1670. [38] Khersonsky, O., and Tawfik, D. S. (2010) Enzyme promiscuity: A mechanistic and evolutionary perspective, Annu Rev Biochem 79, 471-505. [39] Erijman, A., Aizner, Y., and Shifman, J. M. (2011) Multispecific recognition: Mechanism, evolution, and design, Biochemistry 50, 602-611. [40] Moghe, G. D., and Last, R. L. (2015) Something old, something new: Conserved enzymes and the evolution of novelty in plant specialized metabolism, Plant Physiol 169, 15121523. [41] Gachon, C., Langlois-Meurinne, M., and Saindrenan, P. (2005 ) Plant secondary metabolism glycosyltransferases: the emerging functional analysis, Trends Plant Sci 10, 1360-1385. [42] Huang, R., O’Donnell, A. J., Barboline, J. J., and Barkman, T. J. (2016) Convergent evolution of caffeine in plants by co-option of exapted ancestral enzymes, Proc Natl Acad Sci 113, 10613-10618. [43] Gottlieb, O., Kaplan, M., and Kubitzki, K. (1993) A suggested role of galloyl esters in the evolution of dicotyledons, Taxon 42, 539-552. [44] Kubitzki, K., and Gottlieb, O. R. (1984) Phytochemical aspects of angiosperm origin and evolution, Acta Bot Neerl 33, 457-468. [45] Dewick, P. (1994) Isoflavonoids, In The flavonoids: Advances in research since 1986 (Harborne, J., Ed.), pp 117-238, Chapman and Hall, London.

20 ACS Paragon Plus Environment

Page 21 of 28

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

For Table of Contents Use Only Characterization of a UGT84 family glycosyltransferase provides new insights into substrate binding and reactivity of galloylglucose ester-forming UGTs Alexander E. Wilson, Xiaoxue Feng, Nadia N. Ono, Doron Holland, Rachel Amir, and Li Tian

21 ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 1 172x177mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 22 of 28

Page 23 of 28

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

Figure 2 150x166mm (300 x 300 DPI)

ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 3 130x180mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 24 of 28

Page 25 of 28

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

Figure 4 126x201mm (300 x 300 DPI)

ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 5 85x129mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 26 of 28

Page 27 of 28

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

Figure 6 97x119mm (300 x 300 DPI)

ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Graphic Abstract 92x83mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 28 of 28