Article pubs.acs.org/biochemistry
Improving Functional Annotation in the DRE-TIM Metallolyase Superfamily through Identification of Active Site Fingerprints Garima Kumar, Jordyn L. Johnson, and Patrick A. Frantom* Department of Chemistry, The University of Alabama, 250 Hackberry Lane, Tuscaloosa, Alabama 35487, United States S Supporting Information *
ABSTRACT: Within the DRE-TIM metallolyase superfamily, members of the Claisen-like condensation (CC-like) subgroup catalyze C−C bond-forming reactions between various αketoacids and acetyl-coenzyme A. These reactions are important in the metabolic pathways of many bacterial pathogens and serve as engineering scaffolds for the production of long-chain alcohol biofuels. To improve functional annotation and identify sequences that might use novel substrates in the CC-like subgroup, a combination of structural modeling and multiple-sequence alignments identified active site residues on the third, fourth, and fifth β-strands of the TIM-barrel catalytic domain that are differentially conserved within the substrate-diverse enzyme families. Using α-isopropylmalate synthase and citramalate synthase from Methanococcus jannaschii (MjIPMS and MjCMS), site-directed mutagenesis was used to test the role of each identified position in substrate selectivity. Kinetic data suggest that residues at the β3−5 and β4−7 positions play a significant role in the selection of αketoisovalerate over pyruvate in MjIPMS. However, complementary substitutions in MjCMS fail to alter substrate specificity, suggesting residues in these positions do not contribute to substrate selectivity in this enzyme. Analysis of the kinetic data with respect to a protein similarity network for the CC-like subgroup suggests that evolutionarily distinct forms of IPMS utilize residues at the β3−5 and β4−7 positions to affect substrate selectivity while the different versions of CMS use unique architectures. Importantly, mapping the identities of residues at the β3−5 and β4−7 positions onto the protein similarity network allows for rapid annotation of probable IPMS enzymes as well as several outlier sequences that may represent novel functions in the subgroup.
T
facilitated by conserved active site motifs.6 Hence, sequence homology-based functional annotation is not completely reliable for such multifunctional superfamilies. To attenuate this problem and improve the quality and utility of annotation, more robust evidence in support of functional assignment is required. “Genomic enzymology” is one such approach that assimilates information from bioinformatics, structural, and kinetic studies to establish enzyme function in diverse enzyme superfamilies.7−13 This approach has been successfully applied to the enolase superfamily, one of the most extensively characterized mechanistically diverse superfamilies. Genomic enzymology could explain the presence of similar functionalities and structures arising from differentially conserved sequences occurring via divergent evolution in o-succinylbenzoate synthases and “pseudoconvergent evolution” in N-succinyl amino acid racemase and muconate lactonizing enzyme.8,10,11 In a recent publication, the DRE-TIM metallolyase superfamily was similarly investigated to understand the role of active site residues in functional diversity.12
here has been a rapid increase in the deposition of protein sequence data into public databases due in part to the large number of genome sequencing projects.1 Experimentbased functional annotations are unable to match the pace of new protein sequences being submitted, creating an obstacle to identifying novel enzyme activities in nature. Since the late 1990s, computational predictions have been the sole basis for annotations of a majority of protein sequences.2 Sequence similarity searching algorithms such as FASTA and BLAST are employed to find the homologues of a novel sequence. Examining the functions of the homologues and sequence regions common to the homologues and the novel protein helps in assignment of a function to the novel protein.3 However, using these automated tools to annotate new proteins has its shortcomings. Functional assignment on the basis of simple pairwise comparisons with a homologue can be incorrect for a number of reasons, including (i) divergence of functions of homologous proteins, (ii) omission of functions in multifunctional proteins, (iii) error in annotation of the chosen homologue, and (iv) choosing the wrong homologue for functional assignment.4 These issues are highlighted in difficulties in the assignment of function in enzyme superfamilies.5 Enzyme superfamilies are described as a set of homologous enzymes that catalyze different reactions but share a common mechanistic step © XXXX American Chemical Society
Received: November 3, 2015 Revised: March 2, 2016
A
DOI: 10.1021/acs.biochem.5b01193 Biochemistry XXXX, XXX, XXX−XXX
Article
Biochemistry
Figure 1. Functional diversity and annotation in the CC-like subgroup of the DRE-TIM metallolyase superfamily. (A) Reactions catalyzed by each of the characterized members of the CC-like subgroup. (B) Representative protein similarity network for the CC-like subgroup. Each node represents a group of protein sequences (denoted by node size) sharing >60% identity. Edges are drawn if the similarity between a pair of nodes is better than an E value threshold cutoff of 1 × 10−80 (median percent pairwise identity of 50% and median alignment length of 381 residues). Node color is by confirmed in vitro activity and corresponds to the reactions shown in panel A. (C) Identical PSN as shown in panel B, but with representative nodes colored green if they contain a sequence that has been reviewed by Swiss-Prot. Swiss-Prot-reviewed entries have been manually annotated, resulting in a level of reliability higher than that of automated annotation. Gray nodes do not contain a reviewed sequence.
(Figure 1A). On the basis of the substrate specificity, the member enzymes in this subgroup are annotated as αisopropylmalate synthase (IPMS), citramalate synthase (CMS), homocitrate synthase (HCS), methylthioalkylmalate synthase (MAM), R-citrate synthase (R-CS), and 2-phosphinomethylmalic synthase (PMMS). A PSN of sequences in the CC-like subgroup generated at a stringent E value threshold cutoff of 1 × 10−80 organizes the proteins into six different clusters: IPMS1/CMS1/MAM, IPMS2, CMS2, CMS3, HCS (Lys), and R-CS (Figure 1B).12 Sequences shown in the PSN share an average sequence identity of ∼50% within a cluster and 60% sequence identity. The new data set with 1855 representative nodes and 272156 edges was downloaded as a Cytoscape23 readable xgmml file. Mapping the network nodes with additional data was conducted on the Cytoscape program. Initial edges were drawn between nodes that had an E value of 3 × 104-fold.34 However, HCS enzymes with the Lrp/ AsnC domain contain a substitution of an alanine for the active site aspartic acid. This predicts the Crenarchaeote enzymes would be insensitive to competitive inhibition by L-lysine. Additionally, it has been shown that an Lrp/AsnC domain in another protein in the Crenarchaeote Sulfolobus solfataricus functions by binding L-lysine.35 Initial attempts to overexpress, purify, and characterize the putative HCS from S. solfataricus
MjCMS and MjIPMS (Table 1 and Figure S1). One possible mechanism involves contributions from the regulatory domain in substrate selectivity. In IPMS from Arabidopsis thaliana, removal of the regulatory domain resulted in increased activity with the alternate substrate 4-methylthio-2-oxobutyrate, a substrate for the CC-like subgroup member MAM synthase.31 Directed evolution experiments are currently underway as an alternate approach to identifying second-shell and/or epistatic residues that contribute to substrate selectivity in MjCMS. Comparison of the role of the identified residues in substrate selectivity in other CMS enzymes is complicated by the fact that there are three different clusters, each with unique active site fingerprints. The fingerprint for LiCMS, a member of the CMS3 cluster, is shown in Table 1. LiCMS differs from MjCMS in every position except for the strictly conserved glutamate at position β4−7. Like MjCMS, wild-type LiCMS will not use KIV as a substrate. However, substitution at positions β3−5 and β4−7 results in enzyme variants with significant KIV-dependent activity (kcat/Km values of ∼102 M−1 s−1).18 This result suggests that CMS1 and CMS3 sequences have evolved unique active site contributions to substrate selectivity. Only one member of the CMS2 cluster has been characterized (CMS from Geobacter sulf urreducens), and no structural information is available.32 Because of sequence divergence, multiple-sequence alignments with other CC-like subgroup members fail to produce reliable identification of the fingerprint positions in this cluster. Improvement in Functional Annotation from Experimental Results. With experimental confirmation that the β3−5 and β4−7 positions are important in MjIPMS substrate selectivity, the identities of residues at these locations can be mapped onto the PSN to assist in improving functional annotation. The two extremes of the state of functional annotation in the IPMS1/CMS1 cluster can be seen in Figure 1. In Figure 1B, representative nodes are colored by reported in vitro activity representing the highest level of accuracy. In Figure 1C, representative nodes are colored green if they contain at least one curated, Swiss-Prot-reviewed annotation. While the Swiss-Prot-reviewed annotations are more likely to H
DOI: 10.1021/acs.biochem.5b01193 Biochemistry XXXX, XXX, XXX−XXX
Biochemistry
■
(Uniprot entry Q97ZE0) by this lab have resulted in insoluble inclusion bodies thus far. The 15 sequences containing the FE motif (gold nodes) are from bacterial organisms in the Clostridiales order and the Rhodobacteraceae and Synergistaceae families. In the organism Thermovirga lienii, a moderate thermophile in the Synergistaceae family, the KEGG database indicates three paralogs to the FE motif-containing sequence (Uniprot entry G7V6A1). Of these three, one (G7V5Y2) is located in the IPMS1 cluster and contains an LF motif, suggesting it is a true IPMS enzyme. The other two paralogs (G7V9F6 and G7V911) are found in the HMG-CoA lyase subgroup. Overall, these results suggest that the sequences containing the FE motif may catalyze a novel reaction or have unique properties relative to those of the currently characterized IPMS enzymes. Increasing the stringency on the remaining IPMS1/CMS1 cluster to an E value of 10−150 results in the formation of a new cluster that contains both the MjIPMS and MjCMS sequences (Figure 4C). The sequences in this new cluster are all from Archaea, and sequences in this newly formed cluster have been designated as either IPMS-archaea (red nodes) or CMS-archaea (magenta nodes) on the basis of their active site fingerprints. The sequences remaining in the parent IPMS cluster contain two main motifs, the LF motif that provides selectivity for KIV over pyruvate and sequences containing an uncharacterized LG motif. These nodes, colored red in Figure 4, are predicted to have IPMS activity based on experimental results with MjIPMS. However, even at this stringent cutoff, there is still diversity in sequence motifs at the β3−5 and β4−7 positions. There are a surprising number of LG-containing nodes (green) in the larger cluster of Figure 4C (seven representative nodes, >350 unique sequences). Sequences in these nodes are bacterial in origin. The largest representative node for this motif contains two entries (A6LDN1 and E1QWZ1) with Swiss-Protreviewed annotations for IPMS activity. Parabacteroides distasnois, a common bacterium in gut fauna, contains the LG motif-containing A6LDN1 and the paralog A6LDM8, found in the CMS2 cluster. Olsenella uli, a bacteria associated with periodontitis, does not contain any paralogs to E1QWZ1. On the basis of this abbreviated investigation, it is likely that sequences with the LG motif do have IPMS activity; however, it is unclear if substitution of the much smaller glycine residue for the phenylalanine affects overall substrate selectivity.
■
Article
ASSOCIATED CONTENT
S Supporting Information *
The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.biochem.5b01193. A table of the oligonucleotide primers used in this study and figures showing the superimposed structures of CClike subgroup members with α-ketoacid ligands bound, model structures of MjIPMS and MjCMS superimposed with NmIPMS, and the inhibition plot for KIV versus pyruvate with MjCMS (PDF)
■
AUTHOR INFORMATION
Corresponding Author
*Department of Chemistry, The University of Alabama, Box 870336, Tuscaloosa, AL 35487. Telephone: 205-348-8349. Fax: 205-348-9104. E-mail:
[email protected]. Funding
This work was supported by funding from the National Science Foundation (NSF) through a CAREER Award to P.A.F. (MCB-1254077) and an NSF Graduate Research Fellowship (J.L.J.). This work was also supported by a Graduate Research Fellowship from The University of Alabama (G.K.). Notes
The authors declare no competing financial interest.
■
ABBREVIATIONS AcCoA, acetyl-coenzyme A; BLAST, basic local alignment search tool; CC-like, Claisen-like condensation; CMS, citramalate synthase; DTP, 4,4′-dithiodipyridine; HCS, homocitrate synthase; IPMS, α-isopropylmalate synthase; IPTG, isopropyl β-D-1-thiogalactopyranoside; KIV, α-ketoisovalerate; LiCMS, citramalate synthase from L. interrogans; MAM, methylthioalkylmalate synthase; MjCMS, citramalate synthase from Methanococcus jannaschii; MjIPMS, α-isopropylmalate synthase from Me. jannaschii; MtIPMS, α-isopropylmalate synthase from M. tuberculosis; NmIPMS, α-isopropylmalate synthase from N. meningitidis; PMMS, 2-phosphinomethylmalic synthase; PSN, protein similarity network; TEA, triethanolamine.
■
REFERENCES
(1) Benson, D. A., Clark, K., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J., and Sayers, E. W. (2015) GenBank. Nucleic Acids Res. 43, D30−35. (2) Bork, P., and Bairoch, A. (1996) Go hunting in sequence databases but watch out for the traps. Trends Genet. 12, 425−427. (3) Karp, P. D. (1998) What we do not know about sequence analysis and sequence databases. Bioinformatics 14, 753−754. (4) Frishman, D. (2007) Protein annotation at genomic scale: the current status. Chem. Rev. 107, 3448−3466. (5) Schnoes, A. M., Brown, S. D., Dodevski, I., and Babbitt, P. C. (2009) Annotation error in public databases: misannotation of molecular function in enzyme superfamilies. PLoS Comput. Biol. 5, e1000605. (6) Gerlt, J. A., and Babbitt, P. C. (2001) Divergent evolution of enzymatic function: mechanistically diverse superfamilies and functionally distinct suprafamilies. Annu. Rev. Biochem. 70, 209−246. (7) Allen, K. N., and Dunaway-Mariano, D. (2004) Phosphoryl group transfer: evolution of a catalytic scaffold. Trends Biochem. Sci. 29, 495− 503. (8) Glasner, M. E., Fayazmanesh, N., Chiang, R. A., Sakai, A., Jacobson, M. P., Gerlt, J. A., and Babbitt, P. C. (2006) Evolution of
CONCLUSIONS
The problem of accurate functional annotation for genes of unknown function continues to grow as the number of genomic sequences increases. Here, we have applied the principles of genomic enzymology to begin to address this issue in the DRETIM metallolyase superfamily. Differentially conserved residues that contribute to substrate selection in MjIPMS were identified and experimentally verified. However, experimental results did not support a role in substrate selectivity for analogous sites in the related enzyme MjCMS, suggesting substrate selection may arise from a more complicated mechanism. Importantly, the experimental results are leveraged through use of a PSN to propose functional assignment for a large swath of sequences in this subgroup. Outlier sequences were also identified as candidate enzymes for uncharacterized activities that could support novel metabolic pathways. I
DOI: 10.1021/acs.biochem.5b01193 Biochemistry XXXX, XXX, XXX−XXX
Article
Biochemistry structure and function in the o-succinylbenzoate synthase/Nacylamino acid racemase family of the enolase superfamily. J. Mol. Biol. 360, 228−250. (9) Seibert, C. M., and Raushel, F. M. (2005) Structural and catalytic diversity within the amidohydrolase superfamily. Biochemistry 44, 6383−6391. (10) Gerlt, J. A., Babbitt, P. C., Jacobson, M. P., and Almo, S. C. (2012) Divergent evolution in enolase superfamily: strategies for assigning functions. J. Biol. Chem. 287, 29−34. (11) Odokonyero, D., Ragumani, S., Lopez, M. S., Bonanno, J. B., Ozerova, N. D., Woodard, D. R., Machala, B. W., Swaminathan, S., Burley, S. K., Almo, S. C., and Glasner, M. E. (2013) Divergent evolution of ligand binding in the o-succinylbenzoate synthase family. Biochemistry 52, 7512−7521. (12) Casey, A. K., Hicks, M. A., Johnson, J. L., Babbitt, P. C., and Frantom, P. A. (2014) Mechanistic and bioinformatic investigation of a conserved active site helix in alpha-isopropylmalate synthase from Mycobacterium tuberculosis, a member of the DRE-TIM metallolyase superfamily. Biochemistry 53, 2915−2925. (13) Hicks, M. A., Barber, A. E., 2nd, Giddings, L. A., Caldwell, J., O’Connor, S. E., and Babbitt, P. C. (2011) The evolution of function in strictosidine synthase-like proteins. Proteins: Struct., Funct., Genet. 79, 3082−3098. (14) Forouhar, F., Hussain, M., Farid, R., Benach, J., Abashidze, M., Edstrom, W. C., Vorobiev, S. M., Xiao, R., Acton, T. B., Fu, Z., Kim, J. J., Miziorko, H. M., Montelione, G. T., and Hunt, J. F. (2006) Crystal structures of two bacterial 3-hydroxy-3-methylglutaryl-CoA lyases suggest a common catalytic mechanism among a family of TIM barrel metalloenzymes cleaving carbon-carbon bonds. J. Biol. Chem. 281, 7533−7545. (15) Zhang, K., Sawaya, M. R., Eisenberg, D. S., and Liao, J. C. (2008) Expanding metabolism for biosynthesis of nonnatural alcohols. Proc. Natl. Acad. Sci. U. S. A. 105, 20653−20658. (16) Marcheschi, R. J., Li, H., Zhang, K., Noey, E. L., Kim, S., Chaubey, A., Houk, K. N., and Liao, J. C. (2012) A synthetic recursive ″+1″ pathway for carbon chain elongation. ACS Chem. Biol. 7, 689− 697. (17) Hunter, M. F., and Parker, E. J. (2014) Modifying the determinants of alpha-ketoacid substrate selectivity in mycobacterium tuberculosis alpha-isopropylmalate synthase. FEBS Lett. 588, 1603− 1607. (18) Ma, J., Zhang, P., Zhang, Z., Zha, M., Xu, H., Zhao, G., and Ding, J. (2008) Molecular basis of the substrate specificity and the catalytic mechanism of citramalate synthase from Leptospira interrogans. Biochem. J. 415, 45−56. (19) Kumar, G., and Frantom, P. A. (2014) Evolutionarily distinct versions of the multidomain enzyme alpha-isopropylmalate synthase share discrete mechanisms of V-type allosteric regulation. Biochemistry 53, 4847−4856. (20) Howell, D. M., Xu, H., and White, R. H. (1999) (R)-citramalate synthase in methanogenic archaea. J. Bacteriol. 181, 331−333. (21) Atkinson, H. J., Morris, J. H., Ferrin, T. E., and Babbitt, P. C. (2009) Using sequence similarity networks for visualization of relationships across diverse protein superfamilies. PLoS One 4, e4345. (22) Gerlt, J. A., Bouvier, J. T., Davidson, D. B., Imker, H. J., Sadkhin, B., Slater, D. R., and Whalen, K. L. (2015) Enzyme Function InitiativeEnzyme Similarity Tool (EFI-EST): A web tool for generating protein sequence similarity networks. Biochim. Biophys. Acta, Proteins Proteomics 1854, 1019−1037. (23) Shannon, P., Markiel, A., Ozier, O., Baliga, N. S., Wang, J. T., Ramage, D., Amin, N., Schwikowski, B., and Ideker, T. (2003) Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498−2504. (24) Huisman, F. H., Koon, N., Bulloch, E. M., Baker, H. M., Baker, E. N., Squire, C. J., and Parker, E. J. (2012) Removal of the C-terminal regulatory domain of alpha-isopropylmalate synthase disrupts functional substrate binding. Biochemistry 51, 2289−2297.
(25) Needleman, S. B., and Wunsch, C. D. (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443−453. (26) Pettersen, E. F., Goddard, T. D., Huang, C. C., Couch, G. S., Greenblatt, D. M., Meng, E. C., and Ferrin, T. E. (2004) UCSF Chimera–a visualization system for exploratory research and analysis. J. Comput. Chem. 25, 1605−1612. (27) Yang, Z., Lasker, K., Schneidman-Duhovny, D., Webb, B., Huang, C. C., Pettersen, E. F., Goddard, T. D., Meng, E. C., Sali, A., and Ferrin, T. E. (2012) UCSF Chimera, MODELLER, and IMP: An integrated modeling system. J. Struct. Biol. 179, 269. (28) Koon, N., Squire, C. J., and Baker, E. N. (2004) Crystal structure of LeuA from Mycobacterium tuberculosis, a key enzyme in leucine biosynthesis. Proc. Natl. Acad. Sci. U. S. A. 101, 8295−8300. (29) Okada, T., Tomita, T., Wulandari, A. P., Kuzuyama, T., and Nishiyama, M. (2010) Mechanism of substrate recognition and insight into feedback inhibition of homocitrate synthase from Thermus thermophilus. J. Biol. Chem. 285, 4195−4205. (30) Katoh, K., and Standley, D. M. (2013) MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772−780. (31) de Kraker, J. W., and Gershenzon, J. (2011) From amino acid to glucosinolate biosynthesis: protein sequence changes in the evolution of methylthioalkylmalate synthase in Arabidopsis. Plant Cell 23, 38− 53. (32) Risso, C., Van Dien, S. J., Orloff, A., Lovley, D. R., and Coppi, M. V. (2008) Elucidation of an alternate isoleucine biosynthesis pathway in Geobacter sulfurreducens. J. Bacteriol. 190, 2266−2274. (33) Deng, W., Wang, H., and Xie, J. (2011) Regulatory and pathogenesis roles of Mycobacterium Lrp/AsnC family transcriptional factors. J. Cell. Biochem. 112, 2655−2662. (34) Bulfer, S. L., Scott, E. M., Pillus, L., and Trievel, R. C. (2010) Structural basis for L-lysine feedback inhibition of homocitrate synthase. J. Biol. Chem. 285, 10446−10453. (35) Brinkman, A. B., Bell, S. D., Lebbink, R. J., de Vos, W. M., and van der Oost, J. (2002) The Sulfolobus solfataricus Lrp-like protein LysM regulates lysine biosynthesis in response to lysine availability. J. Biol. Chem. 277, 29537−29549.
J
DOI: 10.1021/acs.biochem.5b01193 Biochemistry XXXX, XXX, XXX−XXX