Article pubs.acs.org/accounts
Molecular Dynamics Explorations of Active Site Structure in Designed and Evolved Enzymes Published as part of the Accounts of Chemical Research special issue “Protein Motion in Catalysis”. Sílvia Osuna,†,‡ Gonzalo Jiménez-Osés,‡ Elizabeth L. Noey,‡ and K. N. Houk*,‡ †
Institut de Química Computacional i Catàlisi and Departament de Química, Universitat de Girona, Campus Montilivi s/n, 17071 Girona, Spain ‡ Department of Chemistry and Biochemistry, University of California, Los Angeles, California 90095, United States CONSPECTUS: This Account describes the use of molecular dynamics (MD) simulations to reveal how mutations alter the structure and organization of enzyme active sites. As proposed by Pauling about 70 years ago and elaborated by many others since then, biocatalysis is efficient when functional groups in the active site of an enzyme are in optimal positions for transition state stabilization. Changes in mechanism and covalent interactions are often critical parts of enzyme catalysis. We describe our explorations of the dynamical preorganization of active sites using MD, studying the fluctuations between active and inactive conformations normally concealed to static crystallography. MD shows how the various arrangements of active site residues influence the free energy of the transition state and relates the populations of the catalytic conformational ensemble to the enzyme activity. This Account is organized around three case studies from our laboratory. We first describe the importance of dynamics in evaluating a series of computationally designed and experimentally evolved enzymes for the Kemp elimination, a popular subject in the enzyme design field. We find that the dynamics of the active site is influenced not only by the original sequence design and subsequent mutations but also by the nature of the ligand present in the active site. In the second example, we show how microsecond MD has been used to uncover the role of remote mutations in the active site dynamics and catalysis of a transesterase, LovD. This enzyme was evolved by Tang at UCLA and Codexis, Inc., and is a useful commercial catalyst for the production of the drug simvastatin. X-ray analysis of inactive and active mutants did not reveal differences in the active sites, but relatively long time scale MD in solution showed that the active site of the wild-type enzyme preorganizes only upon binding of the acyl carrier protein (ACP) that delivers the natural acyl group to the active site. In the absence of bound ACP, a noncatalytic arrangement of the catalytic triad is dominant. Unnatural truncated substrates are inactive because of the lack of protein−protein interactions provided by the ACP. Directed evolution is able to gradually restore the catalytic organization of the active site by motion of the protein backbone that alters the active site geometry. In the third case, we demonstrate the key role of MD in combination with crystallography to identify the origins of substrate-dependent stereoselectivities in a number of Codexis-engineered ketoreductases, one of which is used commercially for the production of the antibiotic sulopenem. Here, mutations alter the shape of the active site as well as the accessibility of water to different regions of it. Each of these examples reveals something different about how mutations can influence enzyme activity and shows that directed evolution, like natural evolution, can increase catalytic activity in a variety of remarkable and often subtle ways.
1. INTRODUCTION
In spite of these successes, our understanding is not sufficient to rationally tune the global architecture of the protein that ultimately determines catalysis in order to achieve efficiencies rivaling those of naturally evolved proteins. Instead, experimental directed evolution (DE) is often used to achieve several orders of magnitude of acceleration beyond that achieved by the designed protein.4 Furthermore, many designs fail altogether. Even in successful cases, most of the designs are not active. We need to
Understanding the enormous catalytic power of enzymes is a grand challenge for chemical biology. In the past decade, our group has collaborated with biologists and computational groups to design new enzymes with functions different from those evolved by natural enzymes.1 Our successes in enzyme design showed that we understand which catalytic groups will accelerate the rates of reactions and that quantum-mechanical (QM) calculations predict the correct positioning of these groups.2 We also have the computational tools, such as Baker’s Rosetta or Mayo’s Phoenix, to incorporate these groups into wellcharacterized stable protein scaffolds.3 © 2015 American Chemical Society
Received: December 19, 2014 Published: March 4, 2015 1080
DOI: 10.1021/ar500452q Acc. Chem. Res. 2015, 48, 1080−1089
Article
Accounts of Chemical Research
equation17 and represents the highest possible rate at which a reaction with no free energy barrier can occur. Other quantities of interest for catalytic reactions are the time for association of substrates to catalysts and the dissociation of catalyst−product complexes. Diffusion limits how fast the association of substrate and enzyme can occur in a condensed phase. Diffusion in water occurs with a bimolecular rate constant of ∼109 M−1 s−1. This translates to association periods of 1 s to 10−3 s when the enzyme concentration is micromolar and the substrate concentration is millimolar to 1 M.18 Other time scales emphasizing uncatalyzed reactions have been published.19 Dissociation usually occurs in 10−5 to 10−2 s. The overall time scale of enzyme-catalyzed reactions is quite broad (1 to 10−8 s on the basis of measured kcat values). As discussed in many Accounts in this Special Issue, the time scale for bond breaking and formation (tens of fs) is much shorter than the time scales of collisions with the protein or solvent or of intramolecular vibrational relaxation. Vibrations of the substrate and the active site are indeed necessary to acquire the energy necessary to surmount a potential energy barrier,12 but this vibrational energy transfer occurs much more slowly than the bond changing events.14,19−21 Nevertheless, preorganization of active sites is crucial for catalysis, and dynamic motions determine the probability that the active site will be appropriately positioned to stabilize the bond-changing events in reactions.22−25 These aspects of catalysis and dynamics are the focus of this Account. In addition, protein conformational changes26 and motions involving mobile loops can act as gates for substrate access or product release27 from the active site, but we have not studied these. Much of this Account involves MD simulations aimed at understanding how remote (or at least noncatalytic) mutations influence catalysis. DE involves multiple rounds of evolution followed by screening to accumulate beneficial mutations, which are often scattered around the whole enzyme.4 Important contributions to this field have been reported recently by Codexis, Inc.28,29 and the Hilvert,30 Mayo,31 Tawfik,32,33 and Arnold34−36 laboratories. One of the goals of our research is to understand how mutations of amino acids not directly involved in catalysis nevertheless influence the activity and to build this knowledge into the enzyme design procedure.
develop additional computational tools that provide significant acceleration and selectivity. The goal of our recent research is to learn how to substitute, at least in part, computational design for these costly DE experiments. To do this, we first need to know much more about how mutations that do not modify the catalytic groups nevertheless influence the catalytic activity to a large extent.5−8 Mutagenesis can alter not only the shape but also the flexibility of the active site. These dynamic consequences have important effects on catalysis that are often concealed to structural techniques like X-ray crystallography. All molecules possess potential and kinetic energy and thus are in constant motion, each atom moving as much as 1 Å in 10−14 s. MD uses classical Newtonian mechanics to determine how all of the atoms in a system move as a result of forces acting on the atoms. Since there are thousands of atoms in a protein and the associated solvent molecules and ions, parametrized force fields such as AMBER9 are used to efficiently calculate the forces on each atom that determine their movement. The forces change every time the atoms move, and they must be recalculated after very brief time steps (less than 5 fs). Millions or even billions of calculations are necessary to describe motions happening at short times such as nanoseconds or microseconds. These studies have been made more efficient by technological breakthroughs such us graphics processing unit (GPU)-based computing10 and specialpurpose machines such as the Anton Supercomputer,11 both of which were used in the work described herein. In chemistry, dynamics describes the motions of atoms, molecules, and even condensed media.12 These dynamical fluctuations take place on a wide range of time scales (Figure 1).13,14 Many time scales like this have been published,13,14 and
2. MD EVALUATION OF DESIGNED ENZYMES FOR THE KEMP ELIMINATION The “inside-out” enzyme design protocol1 that our group developed with the Baker lab involves testing the stability of the protein active sites designed through our quantum-mechanical theozyme, match, and Rosetta design strategy and their ability to bind the substrate. These tests involve short MD simulations to screen out poor designs. This was used extensively to evaluate Kemp eliminases7 that were designed by Rosetta.7,31,33 The Kemp elimination is shown in Figure 3a. MD simulations revealed that unsuccessful computational designs failed to maintain key catalytic hydrogen bonds (Figure 3b). Besides filtering out inactive designs, the time-dependent behavior of the catalytic active site residues was used to guide directed evolution.31,32 In a subsequent study,6 we performed longer MD simulations (100 ns) using ligands resembling the transition state (TS) to different extents (Figure 4a,b), and a broader range of protein folds (Figure 4c−e). The modestly active enzymes KE70 and KE59 were designed through QM calculations in our group and Rosetta in Baker’s lab37 and then evolved experimentally by
Figure 1. Time scales of different types of protein motions.
we show our own version here. These motions include bond vibrations (10−100 fs), side-chain rotations (ps to μs), local domain fluctuations (ns to ms), allosteric transitions (μs to s), and the overall folding of the structural motifs of proteins (μs to s). Each of these motions either precedes or accompanies chemical reactions. Chemical dynamics involves studies of rates and mechanisms of reactions at the molecular level, generally in a time-resolved fashion.12 Molecular dynamicists, including our group, have studied the dynamics of simple reactions in the gas phase and in solution.12,15,16 The time scales of processes involved in chemical reactions are represented in Figure 2. The time it takes to make or break bonds in a reaction (50−100 fs) is the same as that of most molecular vibrations. The quantity kBT/h (∼50 fs) is the pre-exponential factor in Eyring’s transition state theory 1081
DOI: 10.1021/ar500452q Acc. Chem. Res. 2015, 48, 1080−1089
Article
Accounts of Chemical Research
Figure 2. Time scales of events related to reactions.
Figure 5. (a) DFT theozyme optimized for KE70 enzymes.33 (b) Overlay of the active site structures of the crystallized KE70.5 (fifth round of DE) and KE70.6 (sixth round) mutants (in sticks) and the theozyme structure (in balls and sticks). (c, d) MD traces of the distances between the reactive N atom of His17 and (c) the C atom of benzisoxazole 1 and (d) the N atom of benzotriazole 2 for selected KE70 mutants.
Figure 3. (a) The Kemp elimination. (b) Active site preorganization analyzed for all active and inactive variants by monitoring the H-bond distance (d) and angle (θ) of the catalytic His-Glu dyad along 20 ns MD trajectories. Green circles denote active designs and red circles inactive ones. The star indicates the theozyme values of d and θ.
disposition of the substrate with respect to the catalytic residues (Figure 5b). After DE, both enzymes experienced similar improvements of around 400-fold in catalytic efficiency (kcat/ KM). The more proficient enzyme HG3 was first designed by our group and Mayo31 and was subsequently evolved experimentally by Hilvert, who achieved a 181-fold increase in catalytic activity.38 It features an Asp and originally a Lys (afterward mutated to His and finally to Gln) as the general base and the hydrogen-bond donor, respectively. A very preorganized active site close to the original computational design was characterized crystallographically in the final mutant. The less active, unevolved enzymes KE70 and KE59 showed adequate active site−ligand contacts with the actual substrate, benzisoxazole 1, in the MD simulations (Figures 5c and 6c); surprisingly, the evolved, most active mutants exhibited worse substrate binding due to migration to either the water solution or secondary binding sites not observed in the crystallographic structures. However, with the more polarized, TS-like benzotriazole 2 as a substrate, the trend was reversed (Figures 5d and 6d). Here, after the second round of DE, the ligand maintained a proper orientation in the active site of KE70, and the catalytic
Figure 4. Structures of (a) 5-nitrobenzisoxazole (1), the experimental substrate; (b) 6-nitrobenzotriazole (2), a frequently cocrystallized inhibitor; and (c−e) the three different Kemp eliminases studied through MD simulations: (c) KE70, (d) KE59, and (e) HG3.
Tawfik.32,33 These designs feature a His-Asp dyad (KE59) or Glu (KE70) as the general base (“B” in Figure 3a) and Ser as a hydrogen-bond donor (“HB” in Figure 3a; also see Figures 5a and 6a). The available X-ray structures of KE70 variants bound to different substrates and TS analogues indicated a broad distribution of catalytic contacts and often an unproductive 1082
DOI: 10.1021/ar500452q Acc. Chem. Res. 2015, 48, 1080−1089
Article
Accounts of Chemical Research
precision38 and Warshel’s preorganization,22,23 which is recognized as a requirement for enhanced catalysis.
Figure 6. (a) DFT theozyme optimized for KE59 enzymes.32 (b) Overlay of the active site structures of the crystallized KE59.1 (first round of DE) and KE59.13 (13th round) mutants (in sticks) and the theozyme structure (in balls and sticks). (c, d) MD traces of the distances between the reactive O atom of Glu230 and (c) the C atom of benzoisoxazole 1 and (d) the N atom of benzotriazole 2 for selected KE59 mutants.
Figure 8. (a, b) DFT theozymes with Asp as the base and (a) Lys or (b) Gln as hydrogen-bond donors, optimized for HG3 enzymes. (c, d) Overlays of the theozyme structure (in balls and sticks) with the X-ray structures (in sticks) of the (c) computationally designed HG1 and HG2 enzymes31 and (d) the final evolved HG3.17 variant.38
contacts overall got closer to the optimal calculated values (Figure 5d). With HG3 enzymes, simulations with the reacting substrate 1 were again not consistent with the theozyme-fidelity paradigm (Figure 7a): the two most active mutants, HG3.14 and HG3.17,
Figure 7. Evolution of the distances between the reactive O atom of Asp127 and (a) the C atom of benzoisoxazole 1 and (b) the N atom of benzotriazole 2 along the MD simulations for all of the HG3 mutants: the original design (HG3, in blue), two intermediate DE mutants (the third HG3.3b and seventh HG3.7, in teal and violet, respectively), the 14th-DE-round mutant (HG3.14, in orange), and finally the most evolved variant (HG3.17, in magenta).
The MD trajectories performed for multiple designs and evolved variants of enzymes for the Kemp elimination indicated that the geometric similarity to the computed theozymes could only be achieved in the ground state using polarized substrates that mimic the charge distribution in the TS. These observations are fully consistent with the fact that the designs are made to provide maximum stabilization of the polar transition state and not the nonpolar reactant. MD shows very clearly that the active site is dynamic and assembles very tightly around the polar transition state but not around the reactant. The systematic study of different protein scaffolds and substrates by MD simulations has determined how the average structure of active sites can change significantly upon binding of the substrate or TS with respect to the apo state; these conformational changes usually take tens or hundreds of nanoseconds and are very often undetectable by crystallography. Evolution gradually populates the catalytically competent arrangements of inherently flexible active sites. In view of these results, we propose that the systematic evaluation of the conformational flexibility/rigidity of designed active sites through MD simulations, using polarized TS models, will aid in the design of more preorganized and active enzymes.
showed worse substrate binding than the less evolved variants, which displayed a more ordered arrangement of the substrate and the catalytic residues. Again, when the more polarized, cocrystallized benzotriazole 2 was used, the most active mutants maintained the best catalytic contacts (Figure 7b): the catalytic distance between the substrate N−H bond (a polarized mimic of the reactive C−H bond) and Asp127 was very distorted (∼10 Å) in the first variant as a result of substrate translocation but was restored along the DE pathway from HG3 to HG3.17 by decreasing the population of the unproductive binding modes. In the seventh round of DE, the catalytic distance stayed constant at the optimal value observed in the X-ray structure of the final mutant and in the computed theozyme (2.5 Å; see Figure 8a,b). These results agree with Hilvert and Mayo’s definition of
3. MD ANALYSIS OF THE ROLE OF REMOTE MUTATIONS IN LOVD ACTIVE SITE DYNAMICS LovD from Aspergillus terreus converts monacolin J acid (MJA) into the cholesterol-lowering drug lovastatin (LVA, acid form) via acylation of the α-S-methylbutyrate side chain (Figure 9). In the natural pathway, LovD interacts with the acyl carrier protein (ACP) domain of its binding partner protein LovF. In this process, LovF acylates Ser76 of LovD following a ping-pong mechanism39 to deliver the α-S-methylbutyrate side chain. The latter is then transferred to MJA (Figure 9). The acylation− deacylation reactions are assisted by the catalytic residues Tyr188 and Lys79 (Figures 10 and 11a). Simvastatin (SVA, acid form), the active pharmaceutical ingredient in Zocor, differs from LVA by one methyl group.40 However, the wild-type LovD enzyme 1083
DOI: 10.1021/ar500452q Acc. Chem. Res. 2015, 48, 1080−1089
Article
Accounts of Chemical Research
Figure 9. (A) Natural and (B) engineered biosynthetic pathways of LovD.
Figure 11. (a) DFT-optimized catalytic Ser76-Lys79-Tyr188 triad. (b) Overlay of the X-ray structures of wild-type LovD (yellow), LovD6 (cyan), and LovD9 (green). (c, d) DFT structures of (c) all of the stationary points along the transacylation reaction pathway and (d) the rate-determining TS for Ser76 acylation.
most of them located outside the active site.5 The X-ray structures of several mutants with different activities toward SVA showed an almost identical catalytically competent arrangement of the active site, especially for the catalytic Ser76-Tyr188-Lys79 residues (Figures 10 and 11b). Similarly,