Subscriber access provided by UNIV AUTONOMA DE COAHUILA UADEC
Letter
Computational Design of Multi-Substrate Enzyme Specificity Antony D. St-Jacques, Marie-Ève Catherine Eyahpaise, and Roberto A. Chica ACS Catal., Just Accepted Manuscript • DOI: 10.1021/acscatal.9b01464 • Publication Date (Web): 09 May 2019 Downloaded from http://pubs.acs.org on May 10, 2019
Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.
is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.
Page 1 of 7 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
ACS Catalysis
Computational Design of Multi-Substrate Enzyme Specificity Antony D. St-Jacques, Marie-Ève C. Eyahpaise & Roberto A. Chica* Department of Chemistry and Biomolecular Sciences, University of Ottawa, Ottawa, Ontario, Canada, K1N 6N5 ABSTRACT: The engineering of multi-substrate enzyme specificity is highly desirable to foster the application of biocatalysis in industry. Here, we develop a multistate computational protein design methodology called multichemical state analysis (MCSA) that can optimize enzyme sequences on large structural ensembles for productive binding of multiple target substrates. Using MCSA, we redesigned E. coli branched-chain amino acid aminotransferase to accept both α-ketoglutarate and the non-native substrate L-histidine. Screening of a designed combinatorial library comprising 32 mutants for enhanced L-histidine transamination activity yielded four variants displaying up to ≈200fold improvements to kcat/KM. MCSA opens the door to the design of broad-specificity biocatalysts and multi-substrate enzymes displaying tailored specificity.
KEYWORDS Computational protein design; multistate design; multistate analysis; biocatalysis; enzyme engineering; aminotransferase; transaminase; amino acids state, so as to predict binding affinity,12-14 or an inhibitor-bound state, so as to identify mutations that confer inhibitor resistance while maintaining catalytic activity.15 Here, we develop and experimentally validate a multistate CPD methodology that can optimize enzyme sequences on large structural ensembles for productive binding of multiple target substrates, which we use to redesign specificity in an industrially-relevant multi-substrate enzyme.
The creation of enzymes displaying desired substrate specificity is an important objective of enzyme engineering. To achieve this goal, computational protein design (CPD) algorithms can be used to identify sequences that can fulfill interactions required to productively bind a target substrate.1-3 Standard CPD protocols find optimal sequences in the context of a single chemical state, for example a unique enzyme structure with a single substrate bound at its active site.3-4 However, many industrially important enzymes5-9 undergo one-site bi bi ping-pong mechanisms, which necessitate that two substrates bind to the same active site at different points in the reaction for complete catalytic turnover.10 Thus, computational design of multisubstrate enzyme specificity requires the ability to optimize sequences in the context of multiple chemical states (i.e., multiple substrates) because mutations designed to enable the productive binding of one substrate may be detrimental to the binding of a second substrate.
As a test case for the development of this methodology, we aimed to redesign the active site of Escherichia coli branched-chain amino acid aminotransferase (BCAT), a pyridoxal 5’-phosphate (PLP) dependent enzyme that catalyzes the reversible transamination of α-ketoglutarate with branchedchain aliphatic L-amino acids, to enable transamination with the non-native substrate Lhistidine. We selected this enzyme because it follows a one-site bi bi ping-pong mechanism (Figure S1),16-17 and because crystal structures with various substrates or analogues bound at the active site are available.18-19 L-Histidine was selected as the target donor substrate because it is a poor substrate of BCAT (< 1% activity relative to L-leucine),20 and is chemically distinct from the enzyme’s native substrates. We selected α-ketoglutarate as the target
Previously, multistate approaches to CPD11 have been used to redesign transcriptional regulators12 and enzymes13-15 for binding of a new target molecule. However, in these examples sequences were optimized to bind a single target small molecule, with the alternate chemical state being either the unbound
1
ACS Paragon Plus Environment
ACS Catalysis 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
acceptor substrate because its product, L-glutamate, can be detected using a microplate-based assay that we previously developed to screen aminotransferase mutant libraries.21
undesired structural changes to the backbone or Schiff base adducts with PLP, respectively. This resulted in a total search space of 3.4 × 107 possible sequences (186). Thus, a total of 1010 individual rotamer-optimization calculations (3 chemical states × 100 templates × 3.4 × 107 sequences) would be required to comprehensively evaluate the ability of every sequence to bind both target substrates while not destabilizing the enzyme fold.
The transamination reaction catalyzed by BCAT (Figure S1) follows a series of steps in which the PLP cofactor is covalently attached via a Schiff base to either the catalytic lysine residue (internal aldimine) or to one of the substrates (external aldimine). To ensure that designed sequences are active, our computational procedure must be able to identify active-site mutations that allow productive binding of both target substrates while not significantly destabilizing the enzyme fold. Therefore, sequences were optimized on the BCAT structure in the Lhistidine (HIS) and L-glutamate (GLU) external aldimine forms, as well as in the substrate-free internal aldimine (PLP) state. Inclusion of the PLP state enabled us to assess the effect of mutations on enzyme fold stability by predicting whether they caused favorable or unfavorable interactions in the active site in the absence of substrate. Each of these three chemical states was modeled as a 100-member ensemble of backbone templates to approximate conformational flexibility in BCAT (Figure S2). The use of such ensembles as inputs to multistate CPD has been shown to result in improved prediction accuracy.22-24 Consequently, a total of 300 unique protein models were generated in silico (Methods) and used as input templates in our design calculations. Y129
PLP
W126
To simplify the calculation and ensure computational tractability (SI Text, Methods), we developed a design procedure for the engineering of multi-substrate enzyme specificity named multichemical state analysis (MCSA). In MCSA, the design calculation on each chemical state (represented by a 100-member backbone ensemble) is divided into subdesigns (Figure 2a) in which rotamers for all possible combinatorial sequences composed from overlapping subsets of the total designed positions (Table S1) are optimized on every backbone included in the ensemble. This is done so that the total number of sequences that are rotamer-optimized by each subdesign is reduced to a value that is manageable with limited computational resources. Following rotamer optimization, ranked lists of scored sequences are obtained for each sub-design on every chemical state (Figure 2b). Energies of identical sequences from corresponding sub-designs on individual chemical states are combined (Figure 2c) by applying a fitness function (e.g., arithmetic mean), and then converted to position energy matrices (PEMs) for every subdesign (Figure 2d). A PEM is a table of amino acid energies versus position, where each energy reflects the stabilizing or destabilizing effect that an amino acid at that position is predicted to have on the binding of a specific substrate and/or on the structure of the protein template. At this stage, sub-design PEMs for individual chemical states can also be generated by converting the energies of scored sequences into PEMs without application of the fitness function. Next, the elementwise mean (i.e. average of individual elements in the matrix with corresponding ones from other matrices, one at a time) of all sub-design PEMs is computed to construct a single global PEM (Figure 2e) for the multi-chemical state, or for each individual chemical state. Once obtained, the global PEM is used to generate a table of amino-acid probabilities at each designed position based on the Boltzmann distribution (Figure 2f). These probabilities reflect the predicted ability of each amino-acid type at every design position to productively bind the target substrates (HIS and/or GLU) and/or stabilize the enzyme fold (PLP). In this
G196 K159
A258 G196
V109 Y31
V109 A258 Y129
PLP
90° Y31 K159
W126
Figure 1. Active site of E. coli BCAT. The PLP cofactor and the catalytic lysine residue are colored white. The 2methyl-L-leucine substrate analogue is colored orange. Designed residues are colored green, and their Cα atom is shown as a sphere. PDB ID: 1I1L.18
We analyzed the BCAT crystal structure bound to the 2-methyl-L-leucine substrate analogue (PDB ID: 1I1L)18 to identify active-site residues involved in substrate binding. These six residues (Figure 1) were allowed to sample side-chain rotamers of all proteinogenic amino acids with the exception of proline and lysine, which were omitted to avoid
2
ACS Paragon Plus Environment
Page 2 of 7
Page 3 of 7
way, MCSA can be used to design enzyme sequences that can stabilize multiple chemical states from large sequence and structural search spaces with fewer computational resources than those required to tackle an equivalent number of states using standard multistate design algorithms.25-26 2
design is divided into two sub-designs comprising overlapping subsets of the total designed positions to allow comprehensive search of the sequence space. (b) Following rotamer optimization of every sequence on each backbone ensemble, ranked lists of scored sequences (shown as colored lines) are obtained for each sub-design of every chemical state. Here, the Boltzmannweighted average energy (E) across all ensemble members is used to score sequences. (c) Energies of identical sequences from corresponding sub-designs on individual chemical states are combined by applying a fitness function (e.g., arithmetic average). This generates a multi-chemical fitness score for each sequence from every sub-design. (d) The energies of scored sequences from every multi-chemical sub-design are converted to position energy matrices, which are tables of amino acid energies (Eaa) versus designed position. (e) Position energy matrices from both sub-designs are combined into a global position energy matrix by computing the elementwise mean. (f) The global position energy matrix is converted into the position probability (p) matrix based on the Boltzmann distribution.
2 3
3
1
1 HIS
GLU
chemical state 1
chemical state 2
a 2
2
2
2
3
3
1
1
sub-design 1
sub-design 1
sub-design 2
The tables of probabilities generated for the HIS, GLU, and PLP chemical states (Tables S2–S4) are displayed as sequence logos in Figure 3. As expected, each sequence logo is unique, demonstrating that mutations predicted to stabilize one chemical state may have a detrimental or neutral effect in another chemical state. For example, in the GLU state, the N mutation is favored at position 109 (95.1%), but this mutation has a probability of 11.3% in the HIS state and < 0.1% in the PLP state. This result is unsurprising given that N109 is predicted to form stronger favorable interactions with L-glutamate than with L-histidine or the internal aldimine (Figure S3a). Similarly, H196 has a high probability in the PLP state (> 99.9%) because it fills the active-site pocket in the absence of substrate (Figure S3b), resulting in stabilizing interactions. However, H196 has low probability in the GLU and HIS states (< 0.1%) because its imidazole side chain is predicted to form weakly stabilizing interactions with the L-glutamate or L-histidine substrates. On the other hand, highprobability amino acids at some positions are similar across all states (e.g., F/Y and W positions 31 and 126, respectively), suggesting that those residues are important for stability of the BCAT fold because they have a stabilizing effect independent of the presence, absence, or identity of the substrate.
sub-design 2
b E
E sub-design 1
E sub-design 2
E sub-design 1
sub-design 2
c E
E sub-design 1
sub-design 2
d Position
Position
EA,1 EA,2
EA,2 EA,3
Amino acid
Amino acid
EC,1 EC,2
⋮
⋮
EW,1 EW,2 EY,1 EY,2
EC,2 EC,3
⋮
⋮
EW,2 EW,3 EY,2 EY,3
e Position
Amino acid
EA,1 EA,2 EA,3 EC,1 EC,2 EC,3
⋮
⋮
⋮
EW,1 EW,2 EW,3 EY,1 EY,2 EY,3
f Position pA,1 pA,2 pA,3
Amino acid
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
ACS Catalysis
pC,1 pC,2 pC,3
⋮
⋮
⋮
pW,1 pW,2 pW,3 pY,1 pY,2 pY,3
Figure 2. Multi-chemical state analysis (MCSA). Toy example involving the optimization of amino acids at three designed positions (1–3) on two chemical states (HIS and GLU), each one represented by a 3-member backbone ensemble. (a) For each chemical state, the
To design BCAT variants that can catalyze the transamination of L-histidine with α-ketoglutarate, we evaluated mutations in the context of all three chemical states. This was done by first averaging the
3
ACS Paragon Plus Environment
ACS Catalysis
energies obtained for each BCAT sub-design sequence on all chemical states, and then using these average energies to generate a table of probabilities for the HisGluPLP state (Table S5). As expected, the multichemical state sequence logo (Figure 3) was unique but contained many of the high-probability amino acids from each individual chemical state (Tables S2– S4). Not all of the highest probability amino-acids found at each position on every individual chemical state had probabilities > 1% in the HisGluPLP state because these may be favorable in the context of one chemical state but detrimental in the context of the others. Conversely, some mutations that have a low probability (0.1–1%) in all but one chemical state (e.g., Q109, S258) are favored in the HisGluPLP multichemical state. These results demonstrate that MCSA achieved a compromise in trying to stabilize all chemical states by favoring mutations that are not significantly destabilizing in any state. V109
We also ran control calculations on the L-leucine external aldimine state (LEU) and used the resulting energies to generate a table of probabilities for the LeuGluPLP multi-chemical state (Tables S6–S7). We hypothesized that replacement of the non-native HIS state by the native LEU state would result in enhanced probabilities for wild-type residues at every position. This is indeed what we observed, with probabilities for all wild-type residues at every position in the LeuGluPLP multi-chemical state being greater than or equal to those in HisGluPLP. Notably, the wild-type G196 residue has a much higher probability when the LEU state is included (Figure 3, Tables S6–S7). This analysis suggests that MCSA is able to identify sequences that can productively bind two target substrates without destabilizing the enzyme fold.
W126 Y129 G196 A258
Next, we used the probability table of the HisGluPLP state (Table S5) to generate a combinatorial library that includes the highest-probability amino acids obtained by MCSA as well as the wild-type residue at each position (Methods). This yielded a 32-member combinatorial library containing I/M/Q/V, F/Y, C/G, and A/S at positions 109, 129, 196, and 258, respectively. Screening of this library for enhanced transamination activity with L-histidine (>3-fold wildtype activity) yielded four active mutants, representing a hit rate of 12.5%. Although all variants displayed substrate inhibition by the α-ketoglutarate acceptor (Figure S4), their catalytic efficiency towards L-histidine was enhanced by at least 24-fold, with that of the most active variant (V109I/G196C) being approximately 200-fold higher than the corresponding value for the wild type (Table 1). Kinetic analyses with the L-leucine donor showed that all four mutants display equivalent or enhanced catalytic efficiency with this native substrate, demonstrating that the increase in L-histidine transamination activity is not accompanied by loss of the enzyme’s native activity. This result is in agreement with our MCSA predictions, which ranked all mutations found in these mutants (V109I, V109M, Y129F, and G196C) in the top 3 most favorable amino acids at their corresponding positions on both the HisGluPLP and LeuGluPLP multi-chemical states (Figure 3). While catalytic efficiencies of the four mutants with L-histidine remain well below that with
LeuGluPLP
LEU
HisGluPLP
PLP
GLU
HIS
Y31
Figure 3. Position probability matrices generated from different chemical states. Matrices are shown as sequence logos, where the letter height corresponds to the amino acid probability at that position (see Tables S2–S8 for probability values). Amino acids with a probability < 1% are not included in the sequence logos.
THR
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
4
ACS Paragon Plus Environment
Page 4 of 7
Page 5 of 7 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
ACS Catalysis
the native substrate L-leucine, they are comparable to those of other aminotransferases that have been used
to synthesize valuable chiral amines from non-native substrates.27-29
Table 1. Apparent kinetic parameters of E. coli BCAT and its mutants for transamination of various aminoacid donors with α-ketoglutarate as acceptor α-ketoglutaratea
L-histidineb
L-leucineb
L-threonineb
Enzyme
KM (mM)
Ki (mM)
kcat (s–1)
kcat / KM (M–1 s–1)
KM (mM)
kcat (s–1)
kcat / KM (M–1 s–1)
KM (mM)
kcat (s–1)
kcat / KM (M–1 s–1)
KM (mM)
kcat (s–1)
kcat / KM (M–1 s–1)
Wild type
0.29
60
0.164
560
NDc
ND
0.05
0.40
0.173
430
ND
ND
0.09
V109I/G196C
0.060
11
0.52
8700
37
0.36
9.7
0.151
0.699
4600
ND
ND
0.8
V109I
0.114
13
0.73
6400
39
0.266
6.9
0.129
0.945
7300
ND
ND
1.6
V109M/G196C
0.124
22
0.356
2900
ND
ND
1.8
0.130
0.383
3000
ND
ND
0.38
V109I/Y129F
1.13
30
0.160
140
54
0.064
1.2
0.29
0.138
470
52
0.0137
0.26
Donor substrate is 2 mM L-leucine substrate is 1.25 mM (V109I/G196C, V109I, V109M/G196C) or 10 mM (wild type, V109I/Y129F) αketoglutarate c ND indicates that a value could not be determined because saturation with the donor substrate was not possible at the maximum concentration tested (86 mM) a
b Acceptor
200-fold while requiring the experimental screening of only 32 mutants, a library size several orders of magnitude smaller than was required to obtain similar activity enhancements in other aminotransferases with a single round of directed evolution.30-31 Compared with alternate computational methods that were developed to redesign enzyme substrate specificity, MCSA has the benefit of not requiring sequence alignments or costly molecular dynamics simulations, and of allowing sequences to be optimized for productive binding of multiple substrates simultaneously.32-34 Redesign of enzyme specificity by MCSA also compares favorably to iterative saturation mutagenesis,35 a rational design approach consisting of iterative rounds of saturation mutagenesis at user-defined sites in an enzyme structure. For example, we performed saturation mutagenesis of BCAT at positions 109, 129, and 196, and obtained only one mutant (V109I) displaying enhanced transamination activity with Lhistidine (> 3-fold wild-type activity). Thus, testing of > 100 variants produced over two successive rounds of iterative saturation mutagenesis would have been required to identify all four BCAT variants displaying enhanced L-histidine transamination activity described here.
Having successfully designed four BCAT variants displaying enhanced L-histidine transamination activity, we next investigated whether the beneficial mutations predicted by MCSA were specific to the target non-native substrate. To this end, we tested the BCAT mutants with L-threonine, which is an equally poor donor substrate of BCAT (≈1% activity relative to L-leucine)20 that is chemically distinct from Lhistidine, L-leucine, and α-ketoglutarate. Catalytic efficiencies obtained with L-threonine (Table 1) show that the wild-type enzyme reacts approximately 2fold more efficiently with this substrate than with Lhistidine. However, all four BCAT variants display higher catalytic efficiency with the L-histidine target substrate than with L-threonine (4–12-fold). Furthermore, we ran MCSA on the L-threonine external aldimine state (THR) to verify whether mutations that enhance activity towards L-histidine were also predicted to enhance productive binding of L-threonine. All four mutations have lower probability on the THR than on the HIS state (Figure 3, Table S8). Notably, mutations found in the most active variant (V109I/G196C) have a probability of 2.8% and