Evolution of Enzyme Superfamilies: Comprehensive Exploration of

Nov 1, 2016 - The sequence and functional diversity of enzyme superfamilies have expanded through billions of years of evolution from a common ancesto...
0 downloads 7 Views 1MB Size
Subscriber access provided by UB + Fachbibliothek Chemie | (FU-Bibliothekssystem)

Current Topic/Perspective

Evolution of enzyme superfamilies: comprehensive exploration of sequence-function relationships Florian Baier, Janine N Copp, and Nobuhiko Tokuriki Biochemistry, Just Accepted Manuscript • DOI: 10.1021/acs.biochem.6b00723 • Publication Date (Web): 01 Nov 2016 Downloaded from http://pubs.acs.org on November 4, 2016

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Biochemistry is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

Evolution of enzyme superfamilies: comprehensive exploration of sequence-function relationships

Baier, F.1, Copp, J.N.1, Tokuriki N.1* 1

Michael Smith Laboratories, University of British Columbia, Vancouver, BC V6T 1Z4,

Canada

*Corresponding author: [email protected]

Abbreviations SSN: sequence similarity network FCN: function connectivity network MBL: metallo-β-lactamase HAD: haloacid dehalogenase cytGST: cytosolic glutathione transferase BKACE: β-keto acid cleavage enzyme AMPS: Alpha-, Mu-, Pi-, and Sigma-like

1

ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Abstract The sequence and functional diversity of enzyme superfamilies has expanded through billions of years of evolution from a common ancestor. Understanding how protein sequence and functional “space” has expanded, at both the evolutionary and molecular level, is central to biochemistry, molecular and evolutionary biology. Integrative approaches that examine protein sequence, structure, and function have begun to provide comprehensive views of the functional diversity and evolutionary relationships within enzyme superfamilies. In this review, we outline the recent advances in our understanding of enzyme evolution and superfamily functional diversity. We describe the tools that have been used to comprehensively analyze sequence relationships, and to characterize sequence and function relationships. We also highlight recent large-scale experimental approaches that systematically determine the activity-profiles across enzyme superfamilies. We identify several intriguing insights from this recent body of work: first, promiscuous activities are prevalent among extant enzymes; second, many divergent proteins retain “function connectivity” via enzyme promiscuity, which can be used to probe the evolutionary potential and history of enzyme superfamilies; finally, we discuss open questions regarding the intricacies of enzyme divergence, as well as potential research directions that will deepen our understanding of enzyme superfamily evolution.

2

ACS Paragon Plus Environment

Page 2 of 41

Page 3 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

Introduction Enzymes have evolved remarkable functional diversity, and understanding how new enzyme functions evolve is a central goal in biological science. The initial steps toward understanding enzyme evolution were taken over 80 years ago when enzymes were first structurally and biochemically characterized.1 An enormous body of work since that time has profoundly deepened our knowledge of enzyme functional diversity and their sequence-structure-function relationships.2-7 It has become clear that the diversity of structural folds and mechanistic features in the active site (catalytic residues and cofactors) are critical determinants of the extraordinarily diverse functional repertoire observed in modern enzymes.8-12 At present, the CATH (Class, Architecture, Topology and Homologous superfamilies) database classifies 235,858 structural domains that fall within 2,738 superfamilies, in which proteins share a common structural fold, sequence motif, and set of catalytic features.13 Even within a single superfamily, there exists enzymes that are capable of catalyzing a diverse range of chemical reactions, to an astonishing degree.2,14 Enzymes in the same superfamily often act on different substrates (i.e., altered size, shape and electrostatics) and/or catalyze different chemical reactions (i.e., altering the chemical bond that is broken or formed, or even utilizing completely different chemistry, such as hydrolase, isomerase and oxidase reactions).2,14 Examples of functionally diverse enzyme superfamilies include the haloacid dehalogenase (HAD), enolase, cytosolic glutathione transferase (cytGST), metallo-β-lactamase (MBL) and amidohydrolase superfamilies, each of which contain members that catalyze a wide variety of distinct chemical reactions, spanning all six E.C. classes (Figure 1).5,7,11 They provide dramatic, but not unusual, examples of evolutionarily related functional diversity.2 Recent large-scale function-profiling, combined with extensive bioinformatic studies, have revealed important details about the functional diversity and evolution of enzyme superfamilies.15-18

In this review, we summarize recent advances that have expanded our understanding of sequence and function diversity, as well as the evolutionary relationships within enzyme superfamilies. We highlight new observations of several enzyme superfamilies that come from exhaustive bioinformatics and large-scale function-profiling efforts. We analyze

3

ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

previously published large-scale experimental data from superfamily-wide enzyme function-profiling studies and discuss the implications in the context of evolutionary divergence of enzyme superfamilies. In particular, we note that seemingly distant enzyme functions are commonly connected through enzyme promiscuity across a superfamily. Furthermore, we suggest that these efforts should be integrated with new approaches in order to gain further insights into how enzyme superfamilies diverge, and to reveal the molecular details underlying enzyme evolution.

Promiscuity as a route to functional divergence within protein superfamilies The modern theory of how enzymes evolve was established in 1970’s by Jacob, who postulated that: “Evolution does not produce innovations from scratch. It works on what already exists, either transforming a system to give it a new function or combining several systems to produce a more complex one”.19 In this view, a new enzyme function evolves from a pre-existing enzyme that exhibits promiscuous functions, which we define here as any latent and secondary functions additional to the enzyme’s native and physiological functions. Concurrently, Jensen conceptualized enzyme evolution as a process whereby substrate promiscuity (or ambiguity; ability to turn-over non-native substrates that represent the same enzymatic reaction to the native substrate) and catalytic promiscuity (ability to catalyze non-native reactions) is the foundation for the evolution of new functions.20 Thus, ancestral enzymes may have been multifunctional (nonspecialized) or promiscuous, and upon a change in the environmental selection pressure, a promiscuous activity could become the target of selection and be further improved through the accumulation of adaptive mutations that yield higher catalytic efficiencies for that function (Figure 2). Gene duplication (before or after adaptive mutations) eventually leads to the emergence of a new enzyme function.21 Indeed, this classical view is supported by several recent enzyme evolution studies.22-27 For example, several xenobiotic degrading enzymes, such as organophosphate hydrolase and atrazine chlorohydrolase, evolved from precursor enzymes that possessed those functions as latent promiscuous activities.28,29 Additionally, in the course of sequence and functional expansions, new catalytic activities may emerge, which subsequently enable further functional divergence, ultimately leading to the broad range of enzymatic functions

4

ACS Paragon Plus Environment

Page 4 of 41

Page 5 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

observed within contemporary enzyme superfamilies (Figure 1).

Exhaustive sequence characterization facilitates comprehensive classification of functional families The first step toward understanding the sequence and functional diversity within a superfamily, as well as the evolutionary relationships between its extant members, is to establish rigorous classification systems that assign sequences into iso-functional subgroups.30-32 The rapid increase in sequencing data over the last two decades enabled us to capture global sequence relationships within protein superfamilies.33 Traditional phylogenetic analyses can be used to identify sequence-based clusters that can be classified based on the early branches in the superfamily’s phylogeny.32 While justified, this approach can be problematic because a single superfamily can easily exceed 10,000 sequences, and amino acid sequence identities between enzymes belonging to different iso-functional subgroups can be extremely low, sometimes with only a few catalytically important residues being conserved.33 Thus, performing phylogenetic analysis with all available sequences within a superfamily is often a laborious and computationally expensive proposition that may not be feasible, particularly for researchers that are not experts in computational phylogenetics. Recently, an alternative sequence relationship characterization tool has been developed: sequence similarity networks (SSNs).34,35 Generating SSNs is computationally less demanding, and as a result considerably larger sequence sets can be successfully analyzed. SSNs visualize sequence associations as networks, where nodes represent a sequence and edges denote pair-wise sequence comparisons, e.g. using the BLAST E-value.36 Sequence clusters are separated by increasing or reducing the threshold at which the pairwise sequence comparison score is visualized (see an example of SSNs for the MBL superfamily in Figure 3).16,34 Note that sequence-based classifications (phylogenetic analysis or SSNs) are typically used to classify subgroups separated at an early stage of evolutionary history, and so are less likely to capture recent or on-going functional divergence, e.g. the evolution of xenobiotic degradation enzymes.28 Nonetheless, they provide a strong method for exhaustive sequence characterizations, i.e., grouping superfamily members according to their sequence relationships and, by extension, their evolutionary history of functional

5

ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

divergence.37

Exploring the functional diversity within enzyme superfamilies Our knowledge of the range of enzyme functions is the result of a vast number of experimental efforts. Historically, the functions of individual enzymes were determined via classical genetic and biochemical approaches, e.g. genomic context analysis, phenotypic assays that investigate gene knockouts and/or overexpression, function complementation using model organisms, enzymology and structure analysis.1 Bolstering these efforts are new experimental platforms, such as microfluidics, metabolomics, activity-based proteomics and high-throughput phenotype screening, which allow for the large-scale functional characterization of enzymes.38-41 Mapping this wealth of experimental data onto phylogenies or SSNs facilitates a new global view of enzyme divergence and the relationships between sequence, structure and function (Figure 3).37 The Structure Function Linkage Database (SFLD) and Enzyme Function Initiative (EFI) are publicly accessible databases that integrate sequence, structure and function information for a number of protein superfamilies.31,42 These initiatives have also established improved classification algorithms to provide more attentive annotation of new sequences, and in doing so have revealed extraordinary functional variety within enzyme superfamilies. For example, over 100 discrete functions were identified within each of the 200 most diverse enzyme superfamilies.8 Figure 1 displays the sequence, structure and functional diversity of five representative enzyme superfamilies, highlighting in particular their functional diversity. This type of exhaustive sequence characterization and functional data mapping also enables the identification and subsequent exploration of uncharacterized enzymes, families and subgroups within superfamilies.43 Uncharacterized sequence space remains very large, even within wellstudied enzyme superfamilies like the cytGST, amidohydrolase, MBL (see also Figure 3) and enolase superfamilies.16,44 It is therefore critical that we use the existing data to prioritize our continued efforts to characterize new and unknown enzyme functions.

In addition to experimental approaches, computational analyses may allow researchers to predict the physiological functions of uncharacterized sequences. The function of closely

6

ACS Paragon Plus Environment

Page 6 of 41

Page 7 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

related sequence clusters may be a good indicator for a general type of chemical reaction, and the genomic context of the gene could indicate functions and/or pathways to which enzymes belong, i.e., if the gene is localized to an operon or surrounded by enzymes that belong to a particular metabolic pathway.45,46 For example, systematic genomic context analysis recently unveiled a new functional subgroup within the HAD superfamily that is responsible for the final step of thiamin diphosphate synthesis.47 Computational methods utilizing structural data, e.g. ligand docking and structural modeling, can also predict physiological substrates for new enzymes.2,46,48-51

Enzyme promiscuity is a biochemical marker for evolutionary and function connectivity Sequence classification, together with the mapping of biochemical and structural properties, can reveal the evolutionary relationships between functional families, and help to reveal the history of evolutionary and functional divergence. In contrast, “promiscuous functions” of enzymes can reveal “function connectivity”, and provide additional information on evolutionary relationships.15,24,29,52 As the classical Jensen model postulates, the evolution of a new function generally occurs by exploiting and optimizing promiscuous activities of existing enzymes. This suggests that enzyme promiscuity can act as a potentiating factor, revealing potential evolutionary trajectories toward novel specialized functions. Indeed, it has been shown that activities toward ancestral substrates are maintained in extant proteins as promiscuous functions; at the same time, members of progenitor enzyme families often exhibit derived activities as promiscuous functions.22 Thus, both the ancestral and potential future evolution of an enzyme superfamily are, in a sense, revealed in the promiscuous functions of its existing members.15,22-27 For example, enzymes from four different iso-functional subgroups within the alkaline phosphate superfamily (phosphatase, phosphodiesterase, phosphonate hydrolase and sulfatase) exhibit cross-wise shared promiscuous activities.24 Also, some promiscuous activities may be associated with physiologically relevant functions; In that case, these enzymes may also be called “multi-function” enzymes.53,54 In many instances, however, promiscuous activities are likely to be an intrinsic enzyme property that stems from their inherent reactivity and lack of absolute specificity for one particular substrate or chemical

7

ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

reaction and so may not be relevant for cellular activities.55-57 This suggests that some promiscuous activities are serendipitous and are not directly related to evolutionary divergence, while nonetheless implying potential evolutionary and function connectivity, as demonstrated by directed evolution and enzyme engineering experiments, where weak promiscuous functions can be readily improved through adaptive mutations.58-60 Thus, systematically characterizing the promiscuity of extant enzymes may provide insights into the function connectivity and evolutionary potential between distinct subgroups within larger enzyme superfamilies. Superfamily-wide function-profiling can reveal the evolutionary pathways that led to a superfamily’s diversity Recently, several groups have conducted enzyme characterizations that went beyond the conventional one-enzyme-at-a-time level, and instead performed large-scale functionprofiling, in which a diverse set of enzymes belonging to the same enzyme superfamily is assayed against a set of substrates in an “all versus all” manner.15-18 Their systematic characterizations have provided intriguing insights into the functional repertoire and evolutionary potential contained within each superfamily. First, they identified native functions for many previously uncharacterized enzymes, which allowed a more confident classification of several iso-functional subgroups. Second, the resulting activity profiles revealed the extent to which distinct functions are connected via promiscuous activities.

In the following sections, we discuss four examples of comprehensive enzyme characterizations that focused on the MBL superfamily15, the cytGST superfamily16, the HAD superfamily17

and the β-keto acid cleavage enzyme (BKACE) family.18 The

cytGST and MBL superfamilies consist of functional groups that catalyze diverse chemical reactions, and thus the studies explored “catalytic promiscuity” of selected members of each superfamily. In the case of the BKACE family and HAD superfamily, the work primarily focused on substrate promiscuity. Note that in this review we refer to different “functions” as either different chemical reactions, which can include multiple substrates for one reaction, or activity toward different substrates, as used and classified in the original work. In order to visualize and compare function connections through

8

ACS Paragon Plus Environment

Page 8 of 41

Page 9 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

promiscuous activities, we generated “function connectivity networks” (FCNs) using the function-profiling data presented in the original publications (FCNs are described in Figure 4). FCNs visualize the relationship between enzymes that perform different functions as a result of enzyme promiscuity. For example, if a single enzyme catalyzes two different functions, then those two functions will be connected and cluster in close proximity in an FCN. By contrast, two functions will not be connected and remain separated in the FCN if there is no enzyme that is capable of carrying out both functions (Figure 4).

Metallo-β-lactamase superfamily The metallo-β-lactamase (MBL) superfamily encompasses a large enzyme superfamily, with over 25,000 sequences from all domains of life distributed across at least 24 isofunctional subgroups that are engaged in a diverse set of cellular functions, e.g., DNA, RNA and nucleotide processing, detoxification, antibiotic resistance, and quorumquenching.15,61,62 MBL enzymes primarily catalyze hydrolytic reactions, however some are also involved in non-hydrolytic reactions such as nitric oxide reduction and sulfur dioxygenation, as well as non-enzymatic functions such as DNA binding and transport.61,63-65 Members of the MBL superfamily share a common αββα-fold (MBLfold) and bi- or mono- metal active site architecture.61 Previously, we systematically characterized the functions of 24 enzymes that belong to 15 different iso-functional subgroups for activity in 10 distinct hydrolytic reactions.15 The enzymes were highly specialized toward their native reactions (kcat/KM ≥104 M-1s-1). However, most MBL enzymes exhibited some degree of promiscuous activity, albeit with comparatively low catalytic efficiency (kcat/KM ≤102 M-1s-1). Overall, 18 out of the 24 enzymes catalyzed multiple reactions and, on average, each enzyme catalyzed 2.5 out of the 10 reactions tested. The prevalence of promiscuity in this superfamily means there is high connectivity between functions, with 9 out of 10 functions being connected in the FCN (Figure 5A). Interestingly, we observe a sub-clustering of enzymatic reactions in the FCN; reactions that are related to anionic substrates (phosphodieterase, sulfatase, phosphonate hydrolase) and neutral substrates (β-lactamase, glyoxalase II and lactonase) form isolated clusters (Figure 5A). A similar clustering of sequences (enzymes)

9

ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

associated with reactions using either anionic or neutral substrates was also observed in SSNs and phylogenetic analysis, which suggests that the FCN clustering reflects functional divergence in evolutionary history (Figure 3).15,66

Cytosolic glutathione transferase superfamily The cytosolic glutathione transferase (cytGST) superfamily contains over 10,000 known sequences. Its most common functions involve the utilization of glutathione as a cofactor to metabolize endogenous compounds, detoxify chemicals, and prevent oxidative stress.16,67 CytGSTs structurally consist of two domains: a smaller N-terminal thioredoxin-like fold domain that binds glutathione and a larger C-terminal domain is responsible for substrate recognition and binding.67 The superfamily catalyzes a diverse range of functions (over 140 E.C. terms) that can be categorized mechanistically into three groups based on the usage of glutathione: ‘oxidized’, ‘consumed’ or ‘not consumed’.16 Furthermore, the three groups can be sub-classified into 15 functions according to their reaction chemistry.16 Mashiyama et al. recently performed a large-scale function-profiling analysis and characterized 82 enzymes in this superfamily for 15 different functions, which were assayed using 175 different substrates with several chemically similar substrates representing one function.16 The authors also combined available literature data of an additional 174 enzymes, and thus provide a functionprofiling analysis for 256 enzymes. From this data, we see a variable level of promiscuity amongst the enzymes in this superfamily; 53% of enzymes are specific and catalyze only one reaction (136 out of 256), but ~7% of enzymes (17 out of 256) catalyze more than six reactions.16 The cytGST FCN revealed that almost all functions, 14 out of 15, are connected (Figure 5B). Interestingly, the clustering of individual functions relates to glutathione utilization, i.e., ‘oxidized’, ‘consumed’ and ‘not consumed’. Glutathione oxidized and not consumed are completely separated, and only indirectly connected through consumed (Figure 5B). In contrast, glutathione consumed connects more extensively than others; in particular, two glutathione consumed reactions, ‘conjugate addition’ and ‘nucleophilic aromatic substitution’ are positioned as central hubs that associate with 12 other reactions via promiscuous enzymes (Figure 5B).

10

ACS Paragon Plus Environment

Page 10 of 41

Page 11 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

β-keto acid cleavage enzyme (BKACE) family Until recently, only one enzymatic function had been assigned to what is now known as the BKACE family: the condensation of β-keto-5-amino-hexanoate and acetyl-CoA to produce aminobutyryl-CoA and acetoacetate within the lysine fermentation pathway.68 Formerly the family was referred to as the DUF849 family, which is comprised of 922 sequences and structurally adopts a canonical TIM β/α barrel fold.68 In 2014 Bastard et al. performed a systematic characterization to uncover and assign enzymatic functions to other members of this enzyme family.18 Based on the biochemical and structural analyses of the previously characterized enzyme, the authors selected and characterized activity with 16 other β-keto acid substrates for 124 uncharacterized enzymes from the DUF849 family. Out of the 124 enzymes, 80 were active for at least one substrate, and 15 of 17 substrates were catalyzed by at least one enzyme. These proteins led them to identify the family as one of β-keto acid cleavage enzymes, and thus, the family has now been named the “β-keto acid cleavage enzyme” or BKACE family. This systematic function-profiling revealed that over 60% of the enzymes that were tested (50 out of 80) displayed activity for more than one β-keto acid substrate, whereas 30 out of those 80 enzymes were specific. The FCN of the BKACE family leads to conclusions similar to those of the FCNs of the MBL and cytGST superfamilies (Figure 6A), in which all the functions that were tested, which correspond to different substrates of β-keto acid cleavage reaction, are connected through promiscuous enzymes. Furthermore, substrates with distinct chemical features e.g., anionic, cationic, nonionic polar and apolar, form separate clusters. In general, nonionic polar and apolar substrates are more connected through promiscuous enzymes, as highlighted by the β-ketohexanoate substrate (S11, Figure 6A), which is connected to all other substrates. In contrast, charged substrates (green border, cationic S1 and S2; red border, anionic S3 and S4, Figure 6A) form distinct clusters, which are located at the peripheries of the FCNs, and are over all less widely connected.

Haloacid dehalogenase superfamily The HAD superfamily is comprised of 120,000 sequences, which share a Rossmann fold “core” domain and a fused “cap” domain.17,69 The majority of HAD enzymes require Mg2+ and a conserved active site aspartic acid.69 Although the superfamily is named after

11

ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

the haloacid dehalogenase (C-Cl bond cleavage) enzyme, the majority of enzymes are phosphate hydrolases such as phosphoesterases, ATPases, sugar phosphomutases, and other phosphatases and phosphonatases involved in P-O and P-C bond cleavage.69 The cellular functions of HAD phosphatases include primary metabolism of amino acids and sugars, secondary metabolism, dNTP pool regulation, cellular housekeeping, and nutrient uptake.69 Huang and coworkers measured the activity profile of more than 200 enzymes against a diverse library of 167 phosphatase (98%) and phosphonatase (2%) substrates (grouped into 13 substrate classes).17 The authors considered an enzyme-substrate pair as active if a 30-min incubation of the reaction mixture (5µM enzyme and 1 mM substrate) provides OD650 = >0.2, using a chemical dye to detect the release of inorganic phosphate. Overall, most of the tested enzymes were highly promiscuous; 75% of the enzymes act on more than five substrates, and 23% are active for more than 41 substrates (up to 143 substrates).17 Indeed, the FCN of the HAD superfamily generated from this dataset revealed very dense connectivity between functions (substrate classes) and complete connection between all 13 functions, in which each function is connected to every other functional class (small box in Figure 6 B). We believe that the high connectivity is associated with the extensive number of experimental characterizations (>200 enzymes against 167 substrates). In addition, the set of substrates is less diverse compared to the other studies; they generally involve the same chemical reactions (P-O (98%) and P-C (2%) bond cleavage), and thus almost exclusively differ by their leaving group. To obtain a better separation and clustering of functions (substrate classes), we reanalyzed the data using a more stringent cut-off of OD650 = 0.5 (the highest OD650 observed in the assay was ~1.0), which gave rise to a clustering of the distinct functions (substrate classes). For example, we observe that all five monophosphate sugar substrates (acid, alcohol, aldose, amine and ketose sugars) cluster together, whereas other functions are separated. On the other hand, phosphonate substrates, which involve cleavage of P-C bonds instead of P-O (all other substrates), are the least connected substrate class. Thus, employing an activity cut-off, similar to the BLAST E-value cut-off in SSNs, can be a useful method to observe relationships between functions and enzymes in FCNs.

Evolutionary perspectives on function connectivity through promiscuity

12

ACS Paragon Plus Environment

Page 12 of 41

Page 13 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

There are common features and general trends that can be extracted from these functionprofiling analyses, which we discuss in the following sections. The FCNs provide intriguing insights into the function connectivity within superfamilies through enzymes with promiscuous functions. However, we would like to note some caveats in our analyses: First, each study features a different range of functions and sequences that were tested, i.e., the number and type of functions (chemical reactions, substrate classes, or substrates) and number and sequence divergence of enzymes; Second, the assays for each study were different, and had different levels of sensitivity, which can also affect how many functions per enzyme are revealed – for example, a less sensitive assay would mask activities that are below the detection limit, resulting in fewer promiscuous activities being identified.

Most functions are connected via promiscuous enzymes The FCNs reveal that most functions are connected through enzyme promiscuity (Figure 5 and 6). Although these connections may not directly relate to the historical divergence of functions within the superfamilies, the high connectivity between these functions provides experimental support for the notion that diverse functions could evolve from a common ancestor. Adaptive evolution proceeds through functional intermediates, and new functions are recruited from existing promiscuous enzymes and subsequently optimized through adaptive mutations and gene duplication events (Figure 2).

Biased promiscuity underlies indirect function connectivity Despite the overall function connectivity, many functions are only indirectly connected through ‘intermediate functions’, i.e. two functions are not connected by the promiscuity of a single enzyme, but through at least one other function and two other enzymes (Figure 5 and 6). The most prominent example is seen in the BKACE family, where cationic and anionic substrates have no direct connections, but are only connected through neutral substrates (Figure 6A). It is rational that an enzyme adapted to a cationic substrate would possess a negatively charged active site cavity that would not be appropriate for anionic substrates, and vice versa. Enzymes that have evolved toward neutral substrates, however, may be able to promiscuously act on both anionic and

13

ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

cationic substrates to some extent. Similar trends are observed in the other superfamilies that we examined (Figure 5 and 6). Therefore, the emergence of a new function may only be associated with distinct functions within a superfamily, and its evolution could be restricted to particular progenitor enzymes. In other words, functional expansion from a common ancestor may be constrained to one or a few specific trajectories in order to reach functions that are chemically distant from the ancestral functions (Figure 1). In contrast, however, some functions are frequently observed as being linked by promiscuous enzymes, and thus are more likely to be connected to many other functions (Figure 5 and 6). These functions might easily evolve from various progenitor enzymes within a superfamily. Indeed, there are notable examples of convergent evolution in the literature: β-lactamase activity has arisen at least twice within the MBL superfamily.61,66 In the HAD superfamily phosphomutase activity evolved independently more than once.69 Additional cases of convergent evolution have been observed in many other superfamilies including phosphatidylinositol-phosphodiesterases6, enolases37, and Zndependent peptidases.70

The degree of promiscuity is enzyme dependent Many of the enzymes assayed across these studies were promiscuous, and some exhibited activity toward a remarkable variety of substrates (dark and black enzyme nodes in the FCNs, Figure 5 and 6). In contrast, some enzymes were specific to only one function (white enzyme nodes in the FCNs; Figure 5 and 6). Thus, the level of promiscuity is enzyme dependent and varies significantly. It is possible, however, that expanding the scope of chemical reactions and substrates that are assayed may reveal promiscuous activities in enzymes that currently appear highly specific. The range and type of promiscuity also varies substantially amongst enzymes within the same iso-functional subgroup. Different members of the same iso-functional subgroup are, generally, orthologous enzymes that play the same physiological role in different species, but that possess diverse sequences due to genetic drift.71 These sequence changes are relatively neutral in terms of the native function, but can alter promiscuous functions, and thereby result in “cryptic genetic variation”.72-76 For example, among five B1 β-lactamases from the MBL superfamily, two enzymes catalyze only their native reaction, while each of the

14

ACS Paragon Plus Environment

Page 14 of 41

Page 15 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

remaining three enzymes are capable of up to three additional reactions (Figure 5A). Additionally, members of the AMPS (Alpha-, Mu-, Pi-, and Sigma-like) subgroup of the cytGST superfamily also exhibit different levels of promiscuity; some AMPS enzymes catalyze one reaction, whereas others catalyze up to six reactions (Figure 5B). Interestingly, specificity and promiscuity levels are not correlated with particular sequence clusters within the AMPS subgroup, but are instead scattered throughout (Figure 5C). Such cryptic genetic variation has also been observed in laboratory evolution studies where enzymes were subjected to “neutral drift” (accumulation of mutations under a purifying selection pressure for the native function), which produced notable changes in their activity promiscuous functional profiles.75,76 Thus, cryptic genetic variation can result in sequences that already have higher activities towards a new substrate, and thus support a role for neutral genetic diversity in driving the innovation and evolution of new enzyme functions.72-74 This observation also indicates, however, that some sequences are less “evolvable” compared to others (in some cases, appearing entirely unevolvable) because they encode enzymes that do not exhibit enough promiscuous activity to provide a selective advantage when the environment changes to favor a new function.

Function connectivity depends on the environment and cofactor availability Function profiles and enzyme activity levels can vary depending on the environment. A simple example is that chemical engineers used “enzyme condition promiscuity” to achieve catalysis under water-limited environments with hydrolytic enzymes; such environs favour ester synthesis instead of hydrolysis.77 Similarly, other environmental changes can promote the appearance of new promiscuous functions. For example, a temperature shift can induce changes in substrate specificity in a bacterial thymidine kinase for various nucleoside analogues.78 Cofactor exchange, e.g. metal ions, flavins and hemes, can cause condition-specific promiscuity, and so lead to different activity profiles depending on the conditions under which the enzymes were assayed.56,79-83 For example, a recent study by Lapalikar et al. showed that the substitution of F420-dependent reductases with FMN instead of F420 enables them to catalyze both oxidation and reduction of the same substrate.84 Furthermore, a recent study by Reynolds et al. evolved

15

ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

a cytochrome P450 enzyme to incorporate a non-proteinogenic cofactor, iron deuteroporphyrin IX, which enables the enzyme to catalyze the non-natural carbenoidmediated olefin cyclopropanation reaction, which is not observed with the native cofactor heme.85 In the MBL superfamily, most enzymes are assumed to incorporate Zn2+ in the active site, however some have been shown to prefer Fe2+, Ni2+, Mn2+ and Co2+.80 We previously characterized the effect of exchanging the active site metal ions of five enzymes from the MBL superfamily (6 different metals and 8 different reactions). This systematic analysis revealed that the function profile of these enzymes, and thus their function connectivity, varies significantly across different environments (in this case environments with different metal ion availabilities; Figure 7).80 Interestingly, individual metal ions display limited connectivity; however, when the activity profiles of all metals are superimposed, all reactions are connected (Figure 7). Although the concentration of metal ions in the cellular milieu is generally regulated, metalloenzymes do not always achieve perfect metallation with a specific, most active metal ion(s), but rather exist as various metal isoforms in the cell.86-91 Also, environmental variation in metal availability or cellular stress can lead to severe mismetallation of metalloenzymes.86-88,92 Thus, cofactor depend activity profile changes that were observed in these in vitro studies may also reflect functional variation in the natural environment.

To date, there is no direct evidence to support that environmental-dependent functional promiscuity fosters the expansion of enzyme superfamily diversity. The diversity of cofactor utilities observed in enzymes within a single enzyme superfamily, however, may imply that very evolutionary scenario.93 For example, a recent study by Goldman et al. suggests that the functional diversity of TIM barrel enzymes may stem from the incorporation of different cofactors, including FMN, NAD, NADP, and various metal ions.93 Furthermore, Ahmed et al. described new subgroups of split β-barrel fold enzymes, which display surprising cofactor diversity, including the binding and utilization of F420, FMN, FAD and heme.48 Interestingly, some of these enzymes promiscuously bind multiple cofactors with considerably high affinity in vitro.48

Structural features that determine activity profiles

16

ACS Paragon Plus Environment

Page 16 of 41

Page 17 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

In general, the members of a single enzyme superfamily share the same mechanistic features in the active site. Similarly, it has been shown that the native and promiscuous activities within a single enzyme are catalyzed in the same active site.94 Elucidating the molecular and structural basis for enzyme chemical and substrate specificity, however, remains extremely challenging. For example, in our previous study of 24 MBL enzymes, we found that many enzymes, despite large differences in active site shape, volume and hydrophobicity, could catalyze the same reactions, such as the native functions of phosphodiesterases or β-lactamases.15 The only tendency we found was that enzymes that hydrolyze charged substrates as their native function possess more hydrophilic active sites. Bastard et al. also found a structure-function correlation for enzymes of the BKACE superfamily. In particular, they found that enzymes with more hydrophobic and non-charged active sites turnover hydrophobic and non-charged polar substrates. In contrast, enzymes that have negatively or positively charged residues in the active site turnover generally positively and negatively charged substrates, respectively. Members of the HAD superfamily are structurally classified depending on the cap domain above the active site, as being either C0 (minimal cap) or C1 and C2 (large cap). Huang et al. found that C1 and C2 enzymes exhibit generally more promiscuity compared to C0 enzymes. For the cytGST superfamily, Mashiyama et al. could not identify any specific active site property or feature that is associated with the catalysis of particular reactions. Besides these rough tendencies, no study has successfully provided explicit molecular explanations for different types of enzyme functionality across a superfamily, let alone for the existence of promiscuous activities. This difficulty reflects the challenges faced when predicting and annotating the functions for new and/or uncharacterized sequences. Further efforts to characterize structural and functional details, including structural dynamics, are necessary for us to advance our understanding of the structure-function relationships that determine the catalytic function(s) that different enzymes perform.

Future perspectives Over the last decade, the integration of high-throughput sequencing, large-scale sequence and evolutionary analysis, and experimental characterizations have enabled new global views of functional diversity for many enzyme superfamilies. However, our

17

ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

understanding of enzyme functions and the evolutionary processes that led to their expansion is far from complete. Many open questions remain and further efforts are required to develop a better understanding of how this remarkable functional diversity has evolved.

Continuous efforts to establish accurate superfamily classifications that are based on exhaustive sequence characterization in conjunction with experimental data, are extremely important, and the integration of metagenomic (environmental DNA) surveys is imperative, if we are to explore the unrecognized sequence clusters and potentially important groups of these superfamilies.31,42,95 The development of tools that can accurately predict enzyme function(s) is also essential to support the experimental exploration of sequence space; experimental efforts to characterize functional and structural properties are unlikely to ever meet the exponential rate at which sequence information continues to grow.96 Furthermore, there are strong biases of sequence, biochemical and structural information that need to be recognized and accounted for. The focus of many experimental approaches has been generally narrow and traditionally confined to relatively few protein folds and superfamilies, or even isolated to selected functional groups, partly due to biomedical and industrial interests and the experimental assays that are available.97 As a result, the majority of functional families and superfamilies have yet to be fully, or even partially, characterized.37 In addition, the misannotation of gene functions is common in public databases that use automated computational annotation predictions based upon homology with existing entries; a study published by Schnoes et al. observed that one third of the 37 investigated superfamilies contained misannotation levels greater than 80%.98 Sequencing bias may also hinder functional annotation and delineation of sequence clusters, as most sequences deposited to genome databases are from relatively few model organisms and isolated microbial strains.97,99 Thus experimental investigations guided by exhaustive sequence analysis are essential for the effective and comprehensive characterization of enzyme superfamilies. High-throughput experimental platforms capable of systematically characterizing a large number of sequences would also be highly beneficial.39,100,101

18

ACS Paragon Plus Environment

Page 18 of 41

Page 19 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

In the four comprehensive enzyme characterization studies discussed here, the scope of the functions and substrates that were investigated was relatively narrow compared to the large diversity of functions that can exist in a single superfamily (Figure 1). The question remains, therefore, as to whether significantly different functions, e.g. oxidoreductases, lyases and hydrolases, can also diverge from each other in the same manner via promiscuous functions. A recent study showed that, whereas functional expansion typically occurs within each of the six major E.C. classes (i.e., oxidoreductase, transferase, hydrolase, lysase, isomerase and ligase), evolutionary transitions between these classes, while also observed, are far less common.2 Highly distinct chemical reactions can be catalytically exclusive, as the chemical and structural requirements may be incompatible, and thus it might be rare to observe that a single enzyme catalyzes very distinct functions. Yet, there are a few anecdotal cases that have been described: for example, a reductive dehalogenase of the cytGST superfamily that is involved in the biodegradation pathway of xenobiotic polychlorinated compounds in Sphingomonas chlorophenolica, exhibits high levels of isomerase activity, which is suggested to be its ancestral activity.102 Also, decarboxylases from E. coli and Pseudomonas pavonaceae have been shown to promiscuously function as synthase103 or hydratases, respectively.104 Interestingly, Yew et al. also showed that the synthase activity of the E. coli decarboxylase could be improved by 260-fold with only four mutations in the active site. Despite these examples, we suggest that in some instances “hopeful monsters” may be required for the emergence of new catalytic functions, as large structural and mechanistic reorganizations, including the insertion of domains with different catalytic properties, are often observed between functionality distant enzymes.105 Indeed, laboratory evolution studies demonstrated that loop and domain insertions/deletions can cause the emergence of new catalytic functions, which can be subsequently optimized by further adaptive mutations.29,106 At the same time, a recent work by Aravind and co-workers highlights examples of the drastic structural divergence of proteins, via domain incorporation, topological rewiring, loop modifications, and mutations of catalytic residues, that can occur even when their molecular functions remain conserved.107 Such cases of dramatic structural rearrangements that are neutral for the native function lead us to speculate that proteins may, in some cases, be able to explore new regions of sequence and structural

19

ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

space in which they can catalyze new promiscuous activities without necessarily changing the ancestral function or impairing organismal fitness.107

Another challenge for the functional exploration of enzyme superfamilies is to establish a link between an enzyme’s in vitro activity profile and its in vivo functions.108 For example, to what extent are promiscuous activities associated with cellular functions?109113

The existence of both native and promiscuous substrates in the same cellular

compartment may cause cross-inhibition. Thus, the kinetic balance between the two activities would affect the organism’s fitness, and may be particularly relevant during the evolution toward a new function.114,115 In addition, environmental changes could significantly affect the observed in vivo levels of native, as well as promiscuous, activities. For example, organisms consistently face environmental fluctuations such as variation of nutrition and cofactor availability89,116,117, pH (in particular periplasm and extracellular matrix)118, and temperature78, and thus, potentially exposing enzymes to many types of environmental variation. Moreover, most enzymes are tightly regulated in terms of expression (transcription and translation) and cellular localization (e.g. nucleus, intracellular, and extracellular).119 Several studies have recently shown that expression and localization were co-optimized with function during the course of evolution in order to ensure the each enzyme appears at the right time and space within the cell.26,120,121 Therefore, molecular evolutionary studies that integrate in vivo parameters, such as spatial and temporal expression or metabolite flux, are of particular interest, and will likely provide new insights into the evolutionary dynamics enabling functional diversification beyond enzyme promiscuity.115,122

Many questions regarding the molecular details of enzyme specificity and promiscuity remain unanswered. What are the molecular determinants that differentiate distinct chemical and substrate specificities? What are the molecular distinctions between specialized enzymes and promiscuous enzymes? How many mutations are generally necessary to switch one function to another? What are the molecular mechanisms that underlie functional transitions? Can all promiscuous functions act as a sufficient foundation for the evolution of efficient new enzymes? Some promiscuous activities may

20

ACS Paragon Plus Environment

Page 20 of 41

Page 21 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

not provide a selective advantage simply because their activity level is too low.123 In addition, even if the initial activity is sufficient to provide a selective advantage, all sequences may not be optimal evolutionary starting points, due to genetic and structural constrains.114 Indeed, growing evidence indicates that epistasis (non-additive interactions between mutations) is prevalent, plays a significant role in shaping fitness landscapes, and constrains evolutionary trajectories.124-126 Seemingly neutral mutations can positively and negatively affect the evolution of a new function by acting as an “epistatic ratchet”, which alter the effect of adaptive mutations and therefore the evolvability of enzymes.114,127,128 For instance, a few recent studies have shown that the same mutations have very different effects on the activity levels of orthologous enzymes, being beneficial for some and detrimental for others.127,129 Deciphering the molecular causes of “evolvability” of enzymes toward new functions is one of the most intriguing aspects of this exploratory process. By characterizing resurrected ancestral sequences and using comparative laboratory evolution experiments, we may yield further insights into the molecular details of epistasis and how it constrains the evolution of enzymes.58,130-133

In summary, we are beginning to understand the evolution of functional diversity in enzyme superfamilies. Utilizing the diverse and powerful tools now available, we can tackle long-standing questions in biology and achieve new insights into the evolutionary process that has created such vast functional diversity. Continuous efforts to explore and integrate high-resolution studies (e.g. characterizing the molecular details of evolutionary trajectories) and low-resolution analyses (e.g. superfamily-wide function-profiling) will be essential to promote our understanding of enzyme evolution and enhance our ability to design and engineer novel enzymes.

21

ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Acknowledgements We thank members of the Tokuriki lab for comments on the manuscript, and in particular Dave W Anderson for editing the manuscript.

Funding Sources This work was supported by Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery Grant (RGPIN 418262-12). N.T. is a CIHR new investigator and a Michael Smith Foundation of Health Research (MSFHR) career investigator.

22

ACS Paragon Plus Environment

Page 22 of 41

Page 23 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

Table of Contents graphic (For Table of Contents use only) Specific enzyme

Promiscuous enzyme

Functional transition

Ancestral Function

23

ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 41

Figure 1 Superfamily

# of sequences

Structural fold

HAD

91,000

(SFLD)

Rossmann

cytGST

13,000

(SFLD)

Thioredoxin

56,000

(PFAM)

Functional diversity

Oxidoreductases

Amidohydrolase

TIM β/α barrel

Transferases Hydrolases

MBL

38,000

(PFAM)

49,000

(SFLD)

Lyases

αββα (MBL)

Isomerases

Enolase

Ligases

TIM β/α barrel

Unique E.C. terms

Figure 1. Sequence and function diversity within selected enzyme superfamilies: HAD (haloacid dehalogenase), cytGST (cytosolic glutathione transferase), amidohydrolase, MBL (metallo-β-lactamase), and enolase. Sequence diversity and structural fold information was retrieved from the SFLD31 and PFAM33 databases. Function diversity was retrieved from the CATH database.13

24

ACS Paragon Plus Environment

Page 25 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

Figure 2 Specific enzyme

Promiscuous enzyme

Functional transition

Ancestral Function

Figure 2. A schematic representation of the evolutionary process by which functional divergence occurs within a theoretical enzyme superfamily. Circles represent a single sequence (enzyme) and colors represent the native physiological function. The inner circle represents promiscuous activities. The functional divergence from a common ancestor (light blue) occurs via the recruitment of promiscuous activities and evolutionary optimization of these functions to generate new specialized enzymes (grey, deep blue and olive). During the adaptive process or genetic drift, a new promiscuous function may subsequently arise in a derived family and lead to further expansion of the functional repertoire in the superfamily (green, pale blue, purple and magenta).

25

ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 3

Figure 3. A representative sequence similarity network (SSN) of the MBL superfamily members reproduced from Baier et al.15 33,843 protein sequences are represented by 6,233 nodes, which share 0.2 for their analysis. Note that the FCN with the OD650 = 0.5 cut-off contains only 145 enzymes, because activities of 59 enzymes were below the more stringent 0.5 cut-off.

31

ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 32 of 41

Figure 7

A

B

all metals

4 PCE 4 ARS

7 BLA 7 PDE

4 ARS

5 LAC

5 PTE

5 EST

4 LAC 4 SLG

4 PCE

3 PTE 3 EST

3 BLA

4 PDE 2 ARS

Zn2+ only

2 PTE

2 LAC

7 PDE

D

Mn2+ only

7 BLA 5 EST

5 PTE

4 PCE

4 SLG

6 EST

C

Ni2+ only

1 ARS

3 BLA 3 PDE

Figure 7. FCN representation of the metal-dependent activity profiles for five MBL superfamily enzymes. (A, B, C and D) FCNs represent the function-profiling analysis of 5 enzymes against 8 enzymatic functions with 6 different metal ions (Fe2+, Zn2+, Co2+, Cd2+, Ni2+, Mn2+). Large square nodes represent functions, small nodes represent enzymes, and edges indicate the existence of enzymatic activity. Enzyme nodes are quantitatively shaded depending on the number of activities they catalyze, from white being specific for one function (connected to one function) to black being highly promiscuous (connected to ≥ 7 functions). (A) FCN of the activity profiles of all six reconstituted metal isoforms. (B) FCN of activity profiles of the Ni2+ reconstituted metal isoforms. (C) FCN of activity profiles of the Mn2+ reconstituted metal isoforms. (D) FCN of activity profiles of the Zn2+ reconstituted metal isoforms; the corresponding FCNs of Co2+, Cd2+ and Fe2+ are not shown due to space limitations. Enzymatic functions: BLA, β-lactamase; SLG, glyoxalase II; ARS, arylsulfatase; PDE, phosphodiesterase; PCE, phosphorylcholine-esterase; PTE, phosphotriesterase; LAC, lactonase; EST, esterase/lipase. The numbers within each function node indicates how many other functions are directly connected via a promiscuous enzyme.

32

ACS Paragon Plus Environment

Page 33 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

References

(1) Blow, D. (2000) So do we understand how enzymes work? Structure 8, R77–81. (2) Furnham, N., Dawson, N. L., Rahman, S. A., Thornton, J. M., and Orengo, C. A. (2016) Large-Scale Analysis Exploring Evolution of Catalytic Machineries and Mechanisms in Enzyme Superfamilies. J. Mol. Biol. 428, 253–267. (3) Gerlt, J. A., Bouvier, J. T., Davidson, D. B., Imker, H. J., Sadkhin, B., Slater, D. R., and Whalen, K. L. (2015) Enzyme Function Initiative-Enzyme Similarity Tool (EFIEST): A web tool for generating protein sequence similarity networks. Biochim. Biophys. Acta 1854, 1019–1037. (4) Glasner, M. E., Gerlt, J. A., and Babbitt, P. C. (2006) Evolution of enzyme superfamilies. Curr. Opin. Chem. Biol. 10, 492–497. (5) Gerlt, J. A., Babbitt, P. C., Jacobson, M. P., and Almo, S. C. (2011) Divergent Evolution in Enolase Superfamily: Strategies for Assigning Functions. J. Biol. Chem. 287, 29–34. (6) Furnham, N., Sillitoe, I., Holliday, G. L., Cuff, A. L., Laskowski, R. A., Orengo, C. A., and Thornton, J. M. (2012) Exploring the evolution of novel enzyme functions within structurally defined protein superfamilies. PLoS Comput Biol 8, e1002403. (7) Seibert, C. M., and Raushel, F. M. (2005) Structural and Catalytic Diversity within the Amidohydrolase Superfamily. Biochemistry 44, 6383–6391. (8) Das, S., Dawson, N. L., and Orengo, C. A. (2015) Diversity in protein domain superfamilies. Curr. Opin. Genet. Dev. 35, 40–49. (9) Todd, A. E., Orengo, C. A., and Thornton, J. M. (2001) Evolution of function in protein superfamilies, from a structural perspective. J. Mol. Biol. 307, 1113–1143. (10) Aloy, P., Oliva, B., Querol, E., Aviles, F. X., and Russell, R. B. (2002) Structural similarity to link sequence space: new potential superfamilies and implications for structural genomics. Protein Sci. 11, 1101–1116. (11) Meng, E. C., and Babbitt, P. C. (2011) Topological variation in the evolution of new reactions in functionally diverse enzyme superfamilies. Curr. Opin. Struct. Biol. 21, 391– 397. (12) Farías-Rico, J. A., Schmidt, S., and Höcker, B. (2014) Evolutionary relationship of two ancient protein superfolds. Nat Chem Biol 10, 710–715. (13) Sillitoe, I., Lewis, T. E., Cuff, A., Das, S., Ashford, P., Dawson, N. L., Furnham, N., Laskowski, R. A., Lee, D., Lees, J. G., Lehtinen, S., Studer, R. A., Thornton, J., and Orengo, C. A. (2015) CATH: comprehensive structural and functional annotations for genome sequences. Nuc. Acids Res. 43, D376–81. (14) Brown, S. D., Gerlt, J. A., Seffernick, J. L., and Babbitt, P. C. (2006) A gold standard set of mechanistically diverse enzyme superfamilies. Genome Biol 7, R8. (15) Baier, F., and Tokuriki, N. (2014) Connectivity between catalytic landscapes of the metallo-β-lactamase superfamily. J. Mol. Biol. 426, 2442–2456. (16) Mashiyama, S. T., Malabanan, M. M., Akiva, E., Bhosle, R., Branch, M. C., Hillerich, B., Jagessar, K., Kim, J., Patskovsky, Y., Seidel, R. D., Stead, M., Toro, R., Vetting, M. W., Almo, S. C., Armstrong, R. N., and Babbitt, P. C. (2014) Large-Scale Determination of Sequence, Structure, and Function Relationships in Cytosolic Glutathione Transferases across the Biosphere. PLoS Biol 12, e1001843.

33

ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(17) Huang, H., Pandya, C., Liu, C., Al-Obaidi, N. F., Wang, M., Zheng, L., Toews Keating, S., Aono, M., Love, J. D., Evans, B., Seidel, R. D., Hillerich, B. S., Garforth, S. J., Almo, S. C., Mariano, P. S., Dunaway-Mariano, D., Allen, K. N., and Farelli, J. D. (2015) Panoramic view of a superfamily of phosphatases through substrate profiling. Proc. Natl. Acad. Sci. U.S.A. 112, E1974–83. (18) Bastard, K., Smith, A. A. T., Vergne-Vaxelaire, C., Perret, A., Zaparucha, A., De Melo-Minardi, R., Mariage, A., Boutard, M., Debard, A., Lechaplais, C., Pelle, C., Pellouin, V., Perchat, N., Petit, J.-L., Kreimeyer, A., Medigue, C., Weissenbach, J., Artiguenave, F., De Berardinis, V., Vallenet, D., and Salanoubat, M. (2014) Revealing the hidden functional diversity of an enzyme family. Nat Chem Biol 10, 42–49. (19) Jacob, F. (1977) Evolution and tinkering. Science 196, 1161–1166. (20) Jensen, R. A. (1976) Enzyme recruitment in evolution of new function. Annu. Rev. Microbiol. 30, 409–425. (21) Innan, H., and Kondrashov, F. (2010) The evolution of gene duplications: classifying and distinguishing between models. Nat. Rev. Genet. 11, 97–108. (22) Voordeckers, K., Brown, C. A., Vanneste, K., van der Zande, E., Voet, A., Maere, S., and Verstrepen, K. J. (2012) Reconstruction of Ancestral Metabolic Enzymes Reveals Molecular Mechanisms Underlying Evolutionary Innovation through Gene Duplication. PLoS Biol (Thornton, J. W., Ed.) 10, e1001446. (23) Noor, S., Taylor, M. C., Russell, R. J., Jermiin, L. S., Jackson, C. J., Oakeshott, J. G., and Scott, C. (2012) Intramolecular Epistasis and the Evolution of a New Enzymatic Function. PLoS ONE (Dobson, R., Ed.) 7, e39822. (24) Mohamed, M. F., and Hollfelder, F. (2013) Efficient, crosswise catalytic promiscuity among enzymes that catalyze phosphoryl transfer. Biochim. Biophys. Acta 1834, 417– 424. (25) Copley, S. D. (2009) Evolution of efficient pathways for degradation of anthropogenic chemicals. Nat Chem Biol 5, 559–566. (26) Ngaki, M. N., Louie, G. V., Philippe, R. N., Manning, G., Pojer, F., Bowman, M. E., Li, L., Larsen, E., Wurtele, E. S., and Noel, J. P. (2012) Evolution of the chalconeisomerase fold from fatty-acid binding to stereospecific catalysis. Nature 485, 530–533. (27) Huang, R., Hippauf, F., Rohrbeck, D., Haustein, M., Wenke, K., Feike, J., Sorrelle, N., Piechulla, B., and Barkman, T. J. (2012) Enzyme functional evolution through improved catalysis of ancestrally nonpreferred substrates. Proc. Natl. Acad. Sci. U.S.A. 109, 2966–2971. (28) Seffernick, J. L., de Souza, M. L., Sadowsky, M. J., and Wackett, L. P. (2001) Melamine deaminase and atrazine chlorohydrolase: 98 percent identical but functionally different. J. Bacteriol. 183, 2405–2410. (29) Afriat-Jurnou, L., Jackson, C. J., and Tawfik, D. S. (2012) Reconstructing a Missing Link in the Evolution of a Recently Diverged Phosphotriesterase by Active-Site Loop Remodeling. Biochemistry 51, 6047–6055. (30) Brown, S. D., and Babbitt, P. C. (2012) Inference of functional properties from large-scale analysis of enzyme superfamilies. J. Biol. Chem. 287, 35–42. (31) Akiva, E., Brown, S., Almonacid, D. E., Barber, A. E., Custer, A. F., Hicks, M. A., Huang, C. C., Lauck, F., Mashiyama, S. T., Meng, E. C., Mischel, D., Morris, J. H., Ojha, S., Schnoes, A. M., Stryke, D., Yunes, J. M., Ferrin, T. E., Holliday, G. L., and Babbitt, P. C. (2014) The Structure-Function Linkage Database. Nuc. Acids Res. 42, D521–30.

34

ACS Paragon Plus Environment

Page 34 of 41

Page 35 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

(32) Dunwell, J. M., Culham, A., Carter, C. E., Sosa-Aguirre, C. R., and Goodenough, P. W. (2001) Evolution of functional diversity in the cupin superfamily. Trends Biochem. Sci 26, 740–746. (33) Punta, M., Coggill, P. C., Eberhardt, R. Y., Mistry, J., Tate, J., Boursnell, C., Pang, N., Forslund, K., Ceric, G., Clements, J., Heger, A., Holm, L., Sonnhammer, E. L. L., Eddy, S. R., Bateman, A., and Finn, R. D. (2011) The Pfam protein families database. Nuc. Acids Res. 40, D290–D301. (34) Atkinson, H. J., Morris, J. H., Ferrin, T. E., and Babbitt, P. C. (2009) Using Sequence Similarity Networks for Visualization of Relationships Across Diverse Protein Superfamilies. PLoS ONE (Jordan, I. K., Ed.) 4, e4345. (35) Barber, A. E., and Babbitt, P. C. (2012) Pythoscape: a framework for generation of large protein similarity networks. Bioinformatics 28, 2845–2846. (36) Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. (1990) Basic local alignment search tool. J. Mol. Biol. 215, 403–410. (37) Brown, S. D., and Babbitt, P. C. (2014) New Insights about Enzyme Evolution from Large Scale Studies of Sequence and Structure Relationships. J. Biol. Chem. 289, 30221– 30228. (38) Cravatt, B. F., Wright, A. T., and Kozarich, J. W. (2008) Activity-based protein profiling: from enzyme chemistry to proteomic chemistry. Annu. Rev. Biochem. 77, 383– 414. (39) Davids, T., Schmidt, M., Böttcher, D., and Bornscheuer, U. T. (2013) Strategies for the discovery and engineering of enzymes for biocatalysis. Curr. Opin. Chem. Biol. 17, 215–220. (40) Prosser, G. A., Larrouy-Maumus, G., and de Carvalho, L. P. S. (2014) Metabolomic strategies for the identification of new enzyme functions and metabolic pathways. EMBO Rep. 15, 657–669. (41) Carpenter, A. E., and Sabatini, D. M. (2004) Systematic genome-wide screens of gene function. Nat. Rev. Genet. 5, 11–22. (42) Gerlt, J. A., Allen, K. N., Almo, S. C., Armstrong, R. N., Babbitt, P. C., Cronan, J. E., Dunaway-Mariano, D., Imker, H. J., Jacobson, M. P., Minor, W., Poulter, C. D., Raushel, F. M., Sali, A., Shoichet, B. K., and Sweedler, J. V. (2011) The Enzyme Function Initiative. Biochemistry 50, 9950–9962. (43) Pieper, U., Chiang, R., Seffernick, J. J., Brown, S. D., Glasner, M. E., Kelly, L., Eswar, N., Sauder, J. M., Bonanno, J. B., Swaminathan, S., Burley, S. K., Zheng, X., Chance, M. R., Almo, S. C., Gerlt, J. A., Raushel, F. M., Jacobson, M. P., Babbitt, P. C., and Sali, A. (2009) Target selection and annotation for the structural genomics of the amidohydrolase and enolase superfamilies. J. Struct. Funct. Genomics 10, 107–125. (44) Lukk, T., Sakai, A., Kalyanaraman, C., Brown, S. D., Imker, H. J., Song, L., Fedorov, A. A., Fedorov, E. V., Toro, R., and Hillerich, B. (2012) Homology models guide discovery of diverse enzyme specificities among dipeptide epimerases in the enolase superfamily. Proc. Natl. Acad. Sci. USA 109, 4122–4127. (45) Korbel, J. O., Jensen, L. J., Mering, von, C., and Bork, P. (2004) Analysis of genomic context: prediction of functional associations from conserved bidirectionally transcribed gene pairs. Nat Biotechnol 22, 911–917. (46) Zhao, S., Kumar, R., Sakai, A., Vetting, M. W., Wood, B. M., Brown, S., Bonanno, J. B., Hillerich, B. S., Seidel, R. D., Babbitt, P. C., Almo, S. C., Sweedler, J. V., Gerlt, J.

35

ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

A., Cronan, J. E., and Jacobson, M. P. (2013) Discovery of new enzymes and metabolic pathways by using structure and genome context. Nature 502, 698–702. (47) Hasnain, G., Roje, S., Sa, N., Zallot, R., Ziemak, M. J., de Crécy-Lagard, V., Gregory, J. F., and Hanson, A. D. (2016) Bacterial and plant HAD enzymes catalyse a missing phosphatase step in thiamin diphosphate biosynthesis. Biochem. J. 473, 157–166. (48) Ahmed, F. H., Carr, P. D., Lee, B. M., Afriat-Jurnou, L., Mohamed, A. E., Hong, N.S., Flanagan, J., Taylor, M. C., Greening, C., and Jackson, C. J. (2015) SequenceStructure-Function Classification of a Catalytically Diverse Oxidoreductase Superfamily in Mycobacteria. J. Mol. Biol. 427, 3554–3571. (49) Xiang, D. F., Kolb, P., Fedorov, A. A., Xu, C., Fedorov, E. V., Narindoshivili, T., Williams, H. J., Shoichet, B. K., Almo, S. C., and Raushel, F. M. (2012) Structure-based function discovery of an enzyme for the hydrolysis of phosphorylated sugar lactones. Biochemistry 51, 1762–1773. (50) Hitchcock, D. S., Fan, H., Kim, J., Vetting, M., Hillerich, B., Seidel, R. D., Almo, S. C., Shoichet, B. K., Sali, A., and Raushel, F. M. (2013) Structure-guided discovery of new deaminase enzymes. J. Am. Chem. Soc. 135, 13927–13933. (51) Kolb, P., Ferreira, R. S., Irwin, J. J., and Shoichet, B. K. (2009) Docking and chemoinformatic screens for new ligands and targets. Curr. Opin. Biotechnol. 20, 429– 436. (52) Roodveldt, C., and Tawfik, D. S. (2005) Shared Promiscuous Activities and Evolutionary Features in Various Members of the Amidohydrolase Superfamily. Biochemistry 44, 12728–12736. (53) (null), Philippe, R. N., and Noel, J. P. (2012) The rise of chemodiversity in plants. Science 336, 1667–1670. (54) Copley, S. D. (2012) Moonlighting is mainstream: Paradigm adjustment required. Bioessays 34, 578–588. (55) Tokuriki, N., and Tawfik, D. S. (2009) Protein dynamism and evolvability. Science 324, 203–207. (56) Nobeli, I., Favia, A. D., and Thornton, J. M. (2009) Protein promiscuity and its implications for biotechnology. Nat Biotechnol 27, 157–167. (57) Babtie, A., Tokuriki, N., and Hollfelder, F. (2010) What makes an enzyme promiscuous? Curr. Opin. Chem. Biol. 14, 200–207. (58) Renata, H., Wang, Z. J., and Arnold, F. H. (2015) Expanding the enzyme universe: accessing non-natural reactions by mechanism-guided directed evolution. Angew. Chem. Int. Ed. Engl. 54, 3351–3367. (59) Romero, P. A., and Arnold, F. H. (2009) Exploring protein fitness landscapes by directed evolution. Nat Rev Mol Cell Biol 10, 866–876. (60) Weng, J.-K. (2014) The evolutionary paths towards complexity: a metabolic perspective. New Phytol. 201, 1141–1149. (61) Bebrone, C. (2007) Metallo-β-lactamases (classification, activity, genetic organization, structure, zinc coordination) and their superfamily. Biochem. Pharmacol. 74, 1686–1701. (62) Daiyasu, H., Osaka, K., Ishino, Y., and Toh, H. (2001) Expansion of the zinc metallo-hydrolase family of the beta-lactamase fold. FEBS Letters 503, 1–6. (63) Puehringer, S., Metlitzky, M., and Schwarzenbacher, R. (2008) The pyrroloquinoline quinone biosynthesis pathway revisited: a structural approach. BMC Biochem 9, 8.

36

ACS Paragon Plus Environment

Page 36 of 41

Page 37 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

(64) Holdorf, M. M., Owen, H. A., Lieber, S. R., Yuan, L., Adams, N., Dabney-Smith, C., and Makaroff, C. A. (2012) Arabidopsis ETHE1 encodes a sulfur dioxygenase that is essential for embryo and endosperm development. Plant Physiol. 160, 226–236. (65) Silaghi-Dumitrescu, R., Kurtz, D. M., Ljungdahl, L. G., and Lanzilotta, W. N. (2005) X-ray crystal structures of Moorella thermoacetica FprA. Novel diiron site structure and mechanistic insights into a scavenging nitric oxide reductase. Biochemistry 44, 6492–6501. (66) Aravind, L. (1999) An evolutionary classification of the metallo-beta-lactamase fold proteins. In Silico Biol. (Gedrukt) 1, 69–91. (67) Armstrong, R. N. (1997) Structure, catalytic mechanism, and evolution of the glutathione transferases. Chem. Res. Toxicol. 10, 2–18. (68) Bellinzoni, M., Bastard, K., Perret, A., Zaparucha, A., Perchat, N., Vergne, C., Wagner, T., de Melo-Minardi, R. C., Artiguenave, F., Cohen, G. N., Weissenbach, J., Salanoubat, M., and Alzari, P. M. (2011) 3-Keto-5-aminohexanoate cleavage enzyme: a common fold for an uncommon Claisen-type condensation. J. Biol. Chem. 286, 27399– 27405. (69) Burroughs, A. M., Allen, K. N., Dunaway-Mariano, D., and Aravind, L. (2006) Evolutionary genomics of the HAD superfamily: understanding the structural adaptations and catalytic diversity in a superfamily of phosphoesterases and allied enzymes. J. Mol. Biol. 361, 1003–1034. (70) Makarova, K. S., and Grishin, N. V. (1999) The Zn-peptidase superfamily: functional convergence after evolutionary divergence. J. Mol. Biol. 292, 11–17. (71) Gabaldón, T., and Koonin, E. V. (2013) Functional and evolutionary implications of gene orthology. Nat. Rev. Genet. 14, 360–366. (72) Wagner, A. (2008) Neutralism and selectionism: a network-based reconciliation. Nat. Rev. Genet. 9, 965–974. (73) Paaby, A. B., and Rockman, M. V. (2014) Cryptic genetic variation: evolution's hidden substrate. Nat. Rev. Genet. 15, 247–258. (74) Masel, J., and Trotter, M. V. (2010) Robustness and evolvability. Trends Genet. 26, 406–414. (75) Amitai, G., Gupta, R. D., and Tawfik, D. S. (2007) Latent evolutionary potentials under the neutral mutational drift of an enzyme. HFSP J. 1, 67–78. (76) Bloom, J. D., Romero, P. A., Lu, Z., and Arnold, F. H. (2007) Neutral genetic drift can alter promiscuous protein functions, potentially aiding functional evolution. Biol Direct 2, 17. (77) Hult, K., and Berglund, P. (2007) Enzyme promiscuity: mechanism and applications. Trends Biotechnol. 25, 231–238. (78) Lutz, S., Lichter, J., and Liu, L. (2007) Exploiting temperature-dependent substrate promiscuity for nucleoside analogue activation by thymidine kinase from Thermotoga maritima. J. Am. Chem. Soc. 129, 8714–8715. (79) Sánchez-Moreno, I., Iturrate, L., Martín-Hoyos, R., Jimeno, M. L., Mena, M., Bastida, A., and García-Junceda, E. (2009) From kinase to cyclase: an unusual example of catalytic promiscuity modulated by metal switching. Chembiochem 10, 225–229. (80) Baier, F., Chen, J., Solomonson, M., Strynadka, N. C. J., and Tokuriki, N. (2015) Distinct Metal Isoforms Underlie Promiscuous Activity Profiles of Metalloenzymes. ACS Chem. Biol. 10, 1684–1693.

37

ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(81) Nielsen, M. M., Suits, M. D. L., Yang, M., Barry, C. S., Martinez-Fleites, C., Tailford, L. E., Flint, J. E., Dumon, C., Davis, B. G., Gilbert, H. J., and Davies, G. J. (2011) Substrate and metal ion promiscuity in mannosylglycerate synthase. J. Biol. Chem. 286, 15155–15164. (82) Badarau, A., and Page, M. I. (2006) The variation of catalytic efficiency of Bacillus cereus metallo-beta-lactamase with different active site metal ions. Biochemistry 45, 10654–10666. (83) Dimitrov, J. D., and Vassilev, T. L. (2009) Cofactor-mediated protein promiscuity. Nat Biotechnol 27, 892. (84) Lapalikar, G. V., Taylor, M. C., Warden, A. C., Onagi, H., Hennessy, J. E., Mulder, R. J., Scott, C., Brown, S. E., Russell, R. J., Easton, C. J., and Oakeshott, J. G. (2012) Cofactor promiscuity among F420-dependent reductases enables them to catalyse both oxidation and reduction of the same substrate. Catal. Sci. Technol. 2, 1560. (85) Reynolds, E. W., McHenry, M. W., Cannac, F., Gober, J. G., Snow, C. D., and Brustad, E. M. (2016) An evolved orthogonal enzyme/cofactor pair. J. Am. Chem. Soc. (86) Carter, E. L., Tronrud, D. E., Taber, S. R., Karplus, P. A., and Hausinger, R. P. (2011) Iron-containing urease in a pathogenic bacterium. Proc. Natl. Acad. Sci. U.S.A. 108, 13095–13099. (87) Xu, Y., Feng, L., Jeffrey, P. D., Shi, Y., and Morel, F. M. M. (2008) Structure and metal exchange in the cadmium carbonic anhydrase of marine diatoms. Nature 452, 56– 61. (88) Culotta, V. C., Yang, M., and O'Halloran, T. V. (2006) Activation of superoxide dismutases: putting the metal to the pedal. Biochim. Biophys. Acta 1763, 747–758. (89) Foster, A. W., Osman, D., and Robinson, N. J. (2014) Metal Preferences and Metallation. J. Biol. Chem. 289, 28095–28103. (90) Clugston, S., Yajima, R., and Honek, J. (2004) Investigation of metal binding and activation of Escherichia coli glyoxalase I: kinetic, thermodynamic and mutagenesis studies. Biochem. J. 377, 309–316. (91) Waldron, K. J., and Robinson, N. J. (2009) How do bacterial cells ensure that metalloproteins get the correct metal? Nat Rev Micro 7, 25–35. (92) Imlay, J. A. (2014) The mismetallation of enzymes during oxidative stress. J. Biol. Chem. 289, 28121–28128. (93) Goldman, A. D., Beatty, J. T., and Landweber, L. F. (2016) The TIM Barrel Architecture Facilitated the Early Evolution of Protein-Mediated Metabolism. J Mol Evol 82, 17–26. (94) Tokuriki, N., Jackson, C. J., Afriat-Jurnou, L., Wyganowski, K. T., Tang, R., and Tawfik, D. S. (2012) Diminishing returns and tradeoffs constrain the laboratory optimization of an enzyme. Nat Comms 3, 1257. (95) Cantarel, B. L., Coutinho, P. M., Rancurel, C., Bernard, T., Lombard, V., and Henrissat, B. (2009) The Carbohydrate-Active EnZymes database (CAZy): an expert resource for Glycogenomics. Nuc. Acids Res. 37, D233–8. (96) Radivojac, P., Clark, W. T., Oron, T. R., Schnoes, A. M., Wittkop, T., Sokolov, A., Graim, K., Funk, C., Verspoor, K., Ben-Hur, A., Pandey, G., Yunes, J. M., Talwalkar, A. S., Repo, S., Souza, M. L., Piovesan, D., Casadio, R., Wang, Z., Cheng, J., Fang, H., Gough, J., Koskinen, P., Törönen, P., Nokso-Koivisto, J., Holm, L., Cozzetto, D., Buchan, D. W. A., Bryson, K., Jones, D. T., Limaye, B., Inamdar, H., Datta, A., Manjari,

38

ACS Paragon Plus Environment

Page 38 of 41

Page 39 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

S. K., Joshi, R., Chitale, M., Kihara, D., Lisewski, A. M., Erdin, S., Venner, E., Lichtarge, O., Rentzsch, R., Yang, H., Romero, A. E., Bhat, P., Paccanaro, A., Hamp, T., Kaßner, R., Seemayer, S., Vicedo, E., Schaefer, C., Achten, D., Auer, F., Boehm, A., Braun, T., Hecht, M., Heron, M., Hönigschmid, P., Hopf, T. A., Kaufmann, S., Kiening, M., Krompass, D., Landerer, C., Mahlich, Y., Roos, M., Björne, J., Salakoski, T., Wong, A., Shatkay, H., Gatzmann, F., Sommer, I., Wass, M. N., Sternberg, M. J. E., Škunca, N., Supek, F., Bošnjak, M., Panov, P., Džeroski, S., Šmuc, T., Kourmpetis, Y. A. I., van Dijk, A. D. J., Braak, ter, C. J. F., Zhou, Y., Gong, Q., Dong, X., Tian, W., Falda, M., Fontana, P., Lavezzo, E., Di Camillo, B., Toppo, S., Lan, L., Djuric, N., Guo, Y., Vucetic, S., Bairoch, A., Linial, M., Babbitt, P. C., Brenner, S. E., Orengo, C., Rost, B., Mooney, S. D., and Friedberg, I. (2013) A large-scale evaluation of computational protein function prediction. Nat. Methods 10, 221–227. (97) Schnoes, A. M., Ream, D. C., Thorman, A. W., Babbitt, P. C., and Friedberg, I. (2013) Biases in the Experimental Annotations of Protein Function and Their Effect on Our Understanding of Protein Function Space. PLoS Comput Biol (Orengo, C. A., Ed.) 9, e1003063. (98) Schnoes, A. M., Brown, S. D., Dodevski, I., and Babbitt, P. C. (2009) Annotation error in public databases: misannotation of molecular function in enzyme superfamilies. PLoS Comput Biol 5, e1000605. (99) Hug, L. A., Baker, B. J., Anantharaman, K., and Brown, C. T. (2016) A new view of the tree of life. Nature. (100) Colin, P.-Y., Kintses, B., Gielen, F., Miton, C. M., Fischer, G., Mohamed, M. F., Hyvönen, M., Morgavi, D. P., Janssen, D. B., and Hollfelder, F. (2015) Ultrahighthroughput discovery of promiscuous enzymes by picodroplet functional metagenomics. Nat Comms 6, 10008. (101) Bachovchin, D. A., Koblan, L. W., Wu, W., Liu, Y., Li, Y., Zhao, P., Woznica, I., Shu, Y., Lai, J. H., Poplawski, S. E., Kiritsy, C. P., Healey, S. E., DiMare, M., Sanford, D. G., Munford, R. S., Bachovchin, W. W., and Golub, T. R. (2014) A high-throughput, multiplexed assay for superfamily-wide profiling of enzyme activity. Nat Chem Biol 10, 656–663. (102) Anandarajah, K., Kiefer, P. M., Donohoe, B. S., and Copley, S. D. (2000) Recruitment of a double bond isomerase to serve as a reductive dehalogenase during biodegradation of pentachlorophenol. Biochemistry 39, 5303–5311. (103) Yew, W. S., Akana, J., Wise, E. L., Rayment, I., and Gerlt, J. A. (2005) Evolution of enzymatic activities in the orotidine 5'-monophosphate decarboxylase suprafamily: enhancing the promiscuous D-arabino-hex-3-ulose 6-phosphate synthase reaction catalyzed by 3-keto-L-gulonate 6-phosphate decarboxylase. Biochemistry 44, 1807–1815. (104) Poelarends, G. J., Serrano, H., Johnson, W. H., Hoffman, D. W., and Whitman, C. P. (2004) The hydratase activity of malonate semialdehyde decarboxylase: mechanistic and evolutionary implications. J. Am. Chem. Soc. 126, 15658–15659. (105) Toth-Petroczy, A., and Tawfik, D. S. (2014) Hopeful (protein InDel) monsters? Structure 22, 803–804. (106) Park, H. S. (2006) Design and Evolution of New Catalytic Activity with an Existing Protein Scaffold. Science 311, 535–538. (107) Zhang, D., Iyer, L. M., Burroughs, A. M., and Aravind, L. (2014) Resilience of biochemical activity in protein domains in the face of structural divergence. Curr. Opin.

39

ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Struct. Biol. 26, 92–103. (108) Davidi, D., Noor, E., Liebermeister, W., Bar-Even, A., Flamholz, A., Tummler, K., Barenholz, U., Goldenfeld, M., Shlomi, T., and Milo, R. (2016) Global characterization of in vivo enzyme catalytic rates and their correspondence to in vitro kcat measurements. Proc. Natl. Acad. Sci. U.S.A. 113, 3401–3406. (109) Oberhardt, M. A., Zarecki, R., Reshef, L., Xia, F., Duran-Frigola, M., Schreiber, R., Henry, C. S., Ben-Tal, N., Dwyer, D. J., Gophna, U., and Ruppin, E. (2016) SystemsWide Prediction of Enzyme Promiscuity Reveals a New Underground Alternative Route for Pyridoxal 5'-Phosphate Production in E. coli. PLoS Comput Biol 12, e1004705. (110) Soo, V. W. C., Hanson-Manful, P., and Patrick, W. M. (2011) Artificial gene amplification reveals an abundance of promiscuous resistance determinants in Escherichia coli. Proc. Natl. Acad. Sci. U.S.A. 108, 1484–1489. (111) Piedrafita, G., Keller, M. A., and Ralser, M. (2015) The Impact of Non-Enzymatic Reactions and Enzyme Promiscuity on Cellular Metabolism during (Oxidative) Stress Conditions. Biomolecules 5, 2101–2122. (112) Guzmán, G. I., Utrilla, J., Nurk, S., Brunk, E., Monk, J. M., Ebrahim, A., Palsson, B. O., and Feist, A. M. (2015) Model-driven discovery of underground metabolic functions in Escherichia coli. Proc. Natl. Acad. Sci. U.S.A. 112, 929–934. (113) Notebaart, R. A., Szappanos, B., Kintses, B., Pál, F., Györkei, Á., Bogos, B., Lázár, V., Spohn, R., Csörgő, B., Wagner, A., Ruppin, E., Pál, C., and Papp, B. (2014) Networklevel architecture and the evolutionary potential of underground metabolism. Proc. Natl. Acad. Sci. U.S.A. 111, 11762–11767. (114) Khanal, A., Yu McLoughlin, S., Kershner, J. P., and Copley, S. D. (2015) Differential effects of a mutation on the normal and promiscuous activities of orthologs: implications for natural and directed evolution. Mol. Biol. Evol. 32, 100–108. (115) Copley, S. D. (2011) Toward a Systems Biology Perspective on Enzyme Evolution. J. Biol. Chem. 287, 3–10. (116) Cotruvo, J. A., and Stubbe, J. (2012) Metallation and mismetallation of iron and manganese proteins in vitro and in vivo: the class I ribonucleotide reductases as a case study. Metallomics 4, 1020–1036. (117) Ellington, A. D., and Bull, J. J. (2005) Evolution. Changing the cofactor diet of an enzyme. Science 310, 454–455. (118) Wilks, J. C., and Slonczewski, J. L. (2007) pH of the cytoplasm and periplasm of Escherichia coli: rapid measurement by green fluorescent protein fluorimetry. J. Bacteriol. 189, 5601–5607. (119) Dixon, D. P., Hawkins, T., Hussey, P. J., and Edwards, R. (2009) Enzyme activities and subcellular localization of members of the Arabidopsis glutathione transferase superfamily. J. Exp. Bot. 60, 1207–1218. (120) Pougach, K., Voet, A., Kondrashov, F. A., Voordeckers, K., Christiaens, J. F., Baying, B., Benes, V., Sakai, R., Aerts, J., Zhu, B., Van Dijck, P., and Verstrepen, K. J. (2014) Duplication of a promiscuous transcription factor drives the emergence of a new regulatory network. Nat Comms 5, 4868. (121) Voordeckers, K., Pougach, K., and Verstrepen, K. J. (2015) How do regulatory networks evolve and expand throughout evolution? Curr. Opin. Biotechnol. 34, 180–188. (122) Nam, H., Lewis, N. E., Lerman, J. A., Lee, D. H., Chang, R. L., Kim, D., and Palsson, B. O. (2012) Network Context and Selection in the Evolution to Enzyme

40

ACS Paragon Plus Environment

Page 40 of 41

Page 41 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

Specificity. Science 337, 1101–1104. (123) McLoughlin, S. Y., and Copley, S. D. (2008) A compromise required by gene sharing enables survival: Implications for evolution of new enzyme activities. Proc. Natl. Acad. Sci. U.S.A. 105, 13497–13502. (124) DePristo, M. A., Weinreich, D. M., and Hartl, D. L. (2005) Missense meanderings in sequence space: a biophysical view of protein evolution. Nat. Rev. Genet. 6, 678–687. (125) Kaltenbach, M., and Tokuriki, N. (2014) Dynamics and constraints of enzyme evolution. J. Exp. Zool. (Mol. Dev. Evol.) 322, 468–487. (126) Miton, C. M., and Tokuriki, N. (2016) How mutational epistasis impairs predictability in protein evolution and design. Protein Sci. 25, 1260–1272. (127) Lunzer, M., Golding, G. B., and Dean, A. M. (2010) Pervasive cryptic epistasis in molecular evolution. PLoS Genet 6, e1001162. (128) Bridgham, J. T., Ortlund, E. A., and Thornton, J. W. (2009) An epistatic ratchet constrains the direction of glucocorticoid receptor evolution. Nature 461, 515–519. (129) Parera, M., and Martinez, M. A. (2014) Strong Epistatic Interactions within a Single Protein. Mol. Biol. Evol. 31, 1546–1553. (130) Harms, M. J., and Thornton, J. W. (2010) Analyzing protein structure and function using ancestral gene reconstruction. Curr. Opin. Struct. Biol. 20, 360–366. (131) Harms, M. J., and Thornton, J. W. (2013) Evolutionary biochemistry: revealing the historical and physical causes of protein properties. Nat. Rev. Genet. 14, 559–571. (132) Khersonsky, O., and Tawfik, D. S. (2010) Enzyme Promiscuity: A Mechanistic and Evolutionary Perspective. Annu. Rev. Biochem. 79, 471–505. (133) Hartl, D. L. (2014) What can we learn from fitness landscapes? Curr Microbiol 21, 51–57. (134) Shannon, P., Markiel, A., Ozier, O., Baliga, N. S., Wang, J. T., Ramage, D., Amin, N., Schwikowski, B., and Ideker, T. (2003) Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504.

41

ACS Paragon Plus Environment