The Case of Kemp Eliminase Catalysis - ACS Publications - American

Mar 28, 2011 - One of the fundamental challenges in biotechnology and biochemistry is the ability to design effective enzymes. Despite recent progress...
0 downloads 0 Views 3MB Size
ARTICLE pubs.acs.org/biochemistry

Challenges and Advances in Validating Enzyme Design Proposals: The Case of Kemp Eliminase Catalysis Maria P. Frushicheva, Jie Cao, and Arieh Warshel* Department of Chemistry, 418 SGM Building, University of Southern California, 3620 McClintock Avenue, Los Angeles, California 90089-1062, United States

bS Supporting Information ABSTRACT: One of the fundamental challenges in biotechnology and biochemistry is the ability to design effective enzymes. Despite recent progress, most of the advances on this front have been made by placing the reacting fragments in the proper places, rather than by optimizing the preorganization of the environment, which is the key factor in enzyme catalysis. Thus, rational improvement of the preorganization would require approaches capable of evaluating reliably the actual catalytic effect. This work considers the catalytic effects in different Kemp eliminases as a benchmark for a computer-aided enzyme design. It is shown that the empirical valence bond provides a powerful screening tool, with significant advantages over current alternative strategies. The insights provided by the empirical valence bond calculations are discussed with an emphasis on the ability to analyze the difference between the linear free energy relationships obtained in solution and those found in the enzymes. We also point out the trade-off between the reliability and speed of the calculations and try to determine what it takes to realize reliable computer-aided screening.

A

dvances in rational enzyme design are expected to have great potential in industrial applications and eventually in medicine.1 In fact, the ability to design efficient enzymes can be considered as the best manifestation of a true understanding of enzyme catalysis. However, most attempts to rationally design enzymes and the resulting constructs have been much less effective than the corresponding natural enzymes.1 Moreover, despite the progress in directed evolution (e.g., ref 2), we do not have unique rationales for the resulting rate enhancements. In our view, the best way to identify the problems with the current rational design approaches (for a review, see ref 1) is to examine the actual performance of simulations of catalytic effects. Unfortunately, most current studies have not been based on simulations of the relevant activation barriers. In this respect, it has been argued3,4 that the problems in generating highly effective designer enzymes are due (at least in part) to the incomplete modeling of the transition state (TS) and to the limited awareness to the key role of the reorganization energy. Capturing the reorganization energy by simulation techniques is, in fact, routinely done by the empirical valence bond (EVB) approach, which is a semiempirical valence bond-type, quantum mechanics/molecular mechanics (QM/MM) approach (e.g., refs 4 and 5). More specifically, semiquantitative computational studies of the effect of mutations on enzyme catalysis date back to the EVB simulations of the anticatalytic effect of mutations of trypsin.6 Subsequent calculations of known and predicted mutational effects include EVB studies (e.g., refs 4, 5, 7, and 8) and more recent r 2011 American Chemical Society

molecular orbital-combined quantum mechanics/molecular mechanics (MO-QM/MM) studies (e.g., refs 912). The studies described above, and in particular the quantitative one, have established the importance of the changes in reorganization energy upon mutations.4,8 A recent study3 has explored the ability of the EVB approach to be used in quantitative screening of design proposals. More recently, we also demonstrated the ability of the EVB to reproduce the observed effect of directed evolution in refining Kemp eliminases.13 Here we will focus again on the Kemp elimination reaction,14 because the design of enzymes that catalyze this reaction has been the center of recent excitement.15,16 At this point, it may be useful to clarify that the ability to screen computationally different design proposals is not a trivial feature that is shared by all current approaches. That is, while the accomplishments of constructing artificial enzymes from scratch (e.g., ref 2) are truly impressive, it is not clear if the final screening is based on a reliable “scoring function”. Here the best way to make a clear judgment is to take cases in which we know the observed catalytic effects and see which approach can reproduce them reliably. More specifically, while mutation predictions would be very convincing, having a clear benchmark (with known mutational results) and reproducing it by a given computer program should be a clear condition for Received: January 14, 2011 Revised: March 21, 2011 Published: March 28, 2011 3849

dx.doi.org/10.1021/bi200063a | Biochemistry 2011, 50, 3849–3858

Biochemistry a proper validation. With this perspective in mind, we can start by noting that gas phase calculations (or calculations with a small cluster of very few protein residues) are very unlikely to reproduce the corresponding observed catalytic effects. Furthermore, even recent attempts to use a MO-QM/MM approach (e.g., ref 17) have not provided a reasonable estimate of the observed catalytic effect or the trend of the mutational effects in artificially designed enzymes. The same is true even for recent attempts to improve the MO-QM/MM calculations18 and for ONIOM calculations19 (see Discussion). Some of the problems with current perspectives in the field are illustrated by a recent Current Opinion,20 in which the focus has been placed on catalytic proposals that are known to be ineffective (e.g., dynamics and ground state destabilization), while overlooking computational studies that actually reproduced the observed catalytic effect. Here the issue is not the different opinions about what might be important in catalysis but which effect has been found by proper computer modeling to contribute even mildly to catalysis where the results are known (see refs 4 and 21). In other words, if we are inserted in computer-aided enzyme design (CAED), we should ask what was found in computations that actually examined all possible catalytic factors, and reproduce the observed catalysis, before we venture to consider options that were found to give a minor (or no) effect. The issue here is not what might contribute to catalysis but what does contribute, and the judgment (in particular when it comes to CAED) should be based, at least at some level, on the ability to reproduce the observed effect of the active site. This will help to eliminate effects that are not capable of leading to catalysis. With this perspective in mind, we followed the preliminary study of ref 13 but focused here on a more extensive validation of the power of the EVB in calculating the catalytic effect in Kemp eliminases, on exploring the effect of directed evolution, and on the requirement for improving the catalysis of the previously designed enzymes. We also focus here on the origin of the difference between the linear free energy relationships (LFERs) for the Kemp elimination reaction in solution and in the designed enzymes as a way to improve our understanding of the problems in obtaining effective catalysis. Furthermore, in view of the uncertainty about the binding mode of the substrate in the system used in ref 13, we put significant effort into studies of catalytic antibodies for which we have direct information about the binding mode. Overall, we demonstrate that our approach allows one to explore the effect of different mutations and in principle to predict which mutations can lead to enhanced catalysis. We also clarify the difference between our strategies and other less quantitative approaches.

’ SYSTEMS AND METHODS Systems. The enzymes chosen as a benchmark for this study catalyze the Kemp elimination reactions14 with different proton donors and proton acceptors (the reactions studied are described schematically in Figure 1). To demonstrate the considerations in determining the rate constants for the reference reactions in these systems, we quantified first the energetics of the reaction in water, with 5-nitrobenzisoxazole as a donor and a carboxylic acid as a base.4 Here one may try to use the rate constant of the uncatalyzed reaction in water (1.2  106 M1 s1 at pH 7.2516), but this reaction involves a large contribution from the hydroxide

ARTICLE

ion (OH) available at this pH.14 In this respect, we would like to clarify that we have no problem modeling the pH effect in a solution reaction (see ref 22), but the well-understood effect of OH will be different at different pHs and thus is not so useful in establishing a reasonable reference state (because the enzyme rate constant does not depend on the OH concentration). In fact, we are interested in a reaction with a well-defined base [and the effect of the base concentration has been well understood quite early by the biochemical community (see the discussion in ref 4)]. Thus, as discussed repeatedly (e.g., ref 4), it is much more useful and relevant to consider the “chemically filtered” reference reaction,4 which is defined as the solution reaction that follows exactly the same mechanism as the one in the enzyme. With this in mind, we determined the corresponding rate constant by taking the estimate23 of the rate of the reaction of 5-nitrobenzisoxazole in acetate buffer (5  105 M1 s1) and extrapolating it to 55 M. The resulting rate constant (kcage) was around 0.0025 s1, for the case when the donor and acceptor (the substrate and a carboxylic acid, respectively) are placed in the same solvent cage. Similar conclusions were made via other estimates, including ab initio calculations (see Figure S1 in the Supporting Information) where we obtained a barrier of approximately 1718 kcal/mol, which is in the same range as the observed barrier (considering the fact that we did not include entropic corrections). At any rate, the relevant activation barrier for the reference reaction is given in Table 1. Similar analysis (see also Table S3 in the Supporting Information) can also be performed with other bases or with other leaving groups, and in fact, the relevant results can be taken from ref 24 but with the 55 M cage correction. It must be emphasized that we are not talking about kcat/KM relative to the uncatalyzed reaction, where the effect of the enzyme is larger, because this rate enhancement includes the well-understood effect of binding to the active site4 and the real challenge is to optimize kcat. Interestingly, and significantly, the reference reaction in a solvent cage is not much slower than the reaction catalyzed by the originally designed enzyme [kcat = 0.020.3 s1 (see ref 16)]. This indicates that a major part of the catalytic effect is due to just placing the donor and acceptor within contact distance (see also ref 15). The enzymes studied here are catalytic antibodies 34E415 and 13G5,25 designed template structures KE70 and KE59,16 designed proteins HG2 and 1A53-2,26 and human serum albumin (HSA)27 and bovine serum albumin (BSA).28 The X-ray coordinates of the studied systems were obtained from the Protein Data Bank and are listed in Table 1. A typical system (namely the 34E4 antibody) is depicted in Figure 2. The studied enzyme systems include 5-nitrobenzisoxazole, 5,7-dinitrobenzisoxazole, unsubstituted benzisoxazole, and 6-glutaramidebenzisoxazole substrates (Figure 1b) with different catalytic bases (Glu/Asp, Lys, and His). Methods. The free energy surface of the reference solution reaction of 5-nitrobenzisoxazole with a carboxylic acid as a base was estimated by ab initio calculations, in the same way described in our previous works (e.g., ref 29), and the resulting surface is shown in Figure S1 (see Supporting Information). The ab initio effective charges of the RS, TS, and PS are given in Figure S2 (see Supporting Information). The same approach was used for the other reference reactions. The reaction in the protein was modeled by the EVB method whose features have been extensively described in many previous works (see refs 30 and 31 for recent reviews) and is not described here. We clarify, however, that the reliability of the EVB has been 3850

dx.doi.org/10.1021/bi200063a |Biochemistry 2011, 50, 3849–3858

Biochemistry

ARTICLE

established in many cases, including the reproduction of ab initio gas phase results (see refs 3032). We also note that the EVB surfaces are calibrated by forcing them to reproduce the ab initio results for the solution reactions mentioned above. In doing so, we have to take into account the complex nature of

the reaction with a concerted proton transfer and CN bond breaking. Usually, this requires three diabatic states, but for our purposes, we decided to use two states with modified charges that reproduce the ab initio TS charges (see the Supporting

Figure 1. (a) Schematic description of the Kemp elimination reaction. (b) Substrate structures used in the study: 5-nitrobenzisoxazole (1), 5,7dinitrobenzisoxazole (2), unsubstituted benzisoxazole (3), and 6-glutaramidebenzisoxazole (4).

Figure 2. The structure of the active site of the 34E4 catalytic antibody with a 5-nitrobenzisoxazole substrate.

Table 1. Correlation between the Observed and Calculated Δgqcat Values for the Kemp Elimination Reaction with Different Catalytic Bases (acceptors) and Substrates (donors) Systema

Baseb

Donor (benzisoxazole)c

Δgqobs (kcal/mol)d

Δgqcalc (kcal/mol)

Standard Deviation

water (cage) 34E4 antibody (wild type, PDB entry 1Y0L)

Glu/Asp EH50

5-NO2 5-NO2

21.2 17.9

21.2 17.3

0.30 0.47

34E4 mutant (EH50D, PDB entry 1Y18)

DH50

5-NO2

19.9

20.2

0.66

EH50

5-NO2

19.5

19.9

0.96

HG2 mutant (S265T, from wild-type HG2)

D126

5-NO2

17.7

18.2

0.92

1A53-2 (X-ray from S. Mayo)

E178

5-NO2

20.0

20.7

1.01

water (cage)

Glu/Asp

5,7-(NO2)2

17.7

17.7

0.30

34E4 antibody (wild type, PDB entry 1Y0L)

EH50

5,7-(NO2)2

13.6

13.3

0.93

water (cage) 34E4 antibody (wild type, PDB entry 1Y0L)

Glu/Asp EH50

unsubstituted unsubstituted

23.9 23.5

24.0 23.7

0.30 0.68

water (cage)

Glu

6-glutaramide

24.2

24.2

0.30

13G5 antibody (wild type, PDB entry 3FO0)

DH35

6-glutaramide

19.7

19.7

0.37

13G5 mutant (HH95N, from wild type)

DH35

6-glutaramide

20.4

20.6

0.29

13G5 mutant (EL34Q, PDB entry 3FO2)

DH35

6-glutaramide

19.8

19.4

0.52

13G5 mutant (EL34A, PDB entry 3FO1)

DH35

6-glutaramide

18.4

18.3

0.38

water (cage)

His

5-NO2

19.8

19.8

0.30

KE70 designed variant (wild type) KE70 mutant (D44N, from wild type)

H16-D44 H16-D44N

5-NO2 5-NO2

18.5 19.1

19.3 19.5

0.67 1.09

L32

34E4 mutant (Y

K, from wild type)

water (cage)

Lys

5-NO2

15.2

15.2

0.30

HSA (wild type, PDB entry 3B9L)

K199

5-NO2

17.9

17.7

0.84

BSA (wild type, from HSA sequence)

K222

5-NO2

16.4

16.3

1.05

a

The system definition includes the name of the protein, its mutant, and the X-ray PDB entry. b The base definition includes the name of the base and the number in the protein X-ray structure. c All donors are in the family of the benzisoxazoles with different group substitutions: 5-NO2, 5-nitrobenzisoxazole; 5,7-(NO2)2, 5,7-dinitrobenzisoxazole; unsubstituted benzisoxazole; 6-glutaramide, 6-glutaramidebenzisoxazole. d The source of the experimental Δgq are given in the text. 3851

dx.doi.org/10.1021/bi200063a |Biochemistry 2011, 50, 3849–3858

Biochemistry

ARTICLE

Information and ref 13). This treatment is justified, because we forced the EVB TS to reach the same geometry, and more importantly the same charge distribution, as the corresponding ab initio solution system, with the concerted path. Thus, the whole issue of catalysis is related to the difference between the reactant state and transition state and not the exact path between them. In other words, in this work we were looking for a fast yet accurate method, and once we realized that the standard twostate charge distribution does not give the correct TS charges, we have two options: either to use three states (which is more expensive) or to use the current strategy, which guarantees the correct TS features but requires one to introduce a correction for the energetics of moving from the transition state to the product state (which is not crucial to this study). Of course, the second option is more reasonable, considering what we are looking for. At any rate, the EVB calculations were conducted with the MOLARIS33 package using the ENZYMIX force field. The EVB activation barriers were calculated at the configurations selected by the same free energy perturbation umbrella sampling (FEP/US) approach used in all our studies (e.g., refs 4 and 5). The simulation systems were solvated by the surface-constrained all atom solvent (SCAAS) model33 using a water sphere with an 18 Å radius centered on the substrate and surrounded by a 2 Å grid of Langevin dipoles and then by a bulk solvent, while long-range electrostatic effects were treated by the local reaction field (LRF) method.33 The EVB region includes the substrate and the functional group of the proton acceptor (e.g., the carboxylic group of the glutamic acid). Validation studies were conducted within a 22 Å radius of the inner sphere, where we repeated the calculations of the activation barrier and obtained practically the same results (treating the distant ionized groups with a highdielectric macroscopic model). The FEP mapping was evaluated by 21 frames (20 ps each) for moving along the reaction coordinate using the SCAAS model. All the simulations were conducted at 300 K with a time step of 1 fs. The simulations were repeated 20 times to obtain reliable results (see Results) with different initial conditions (obtained from arbitrary points of the relaxation trajectory). This generates consistent starting points for an averaging that reflects the Boltzmann probability of the system. The average was determined by taking in each case the difference between the calculated minimum at the RS and that of the given TS. The mutant systems were generated from the native enzymes via 100 ps relaxation runs. In addition to the EVB approach described above, it is useful to have faster screening approaches. Here our initial screening has been based on evaluation of electrostatic group contributions that are defined as the effect of “mutating” all the residual charges of the given group to zero. In principle, we could perform such mutations and evaluate the PDLD/S-LRA binding energy for the given native and mutant proteins,34 but because we are dealing with charged and polar residues, it is reasonable to start with the faster screening approximation introduced and described in ref 35. This approach estimates the electrostatic contribution of different residues (“groups”) to the activation barrier by an expression that can be formally simplified by grouping together the atomic charges of each residue and written as q = 332 ΔΔgelec

∑ij qj ΔQi =rij εij

ð1Þ

where hqj is the effective “charge” of the jth residue (which may represent dipoles in the case of polar residues), the ΔQs values

Figure 3. Results of 20 EVB mapping runs for the KE70 (wild-type) designed variant.

are the changes in the substrate charges upon going from the RS to the TS, and εij is the dielectric constant for the specific interaction. Now we can explore the trend of eq 1 for any possible mutations if we artificially assign to all protein residues a charge of 1.0 and then ask what charge change will lead to the most negative ΔΔgqelec. This leads to q ðΔqj Þopt =  RDΔΔgelec =Dqj ¼  R

∑i ΔQi =rijεij

ð2Þ

where R is a proportionality and the optimal Δqhj values are proportional to the electrostatic group contributions, when all the protein groups are positively charged. In this case, if Δqhj is positive, the given residue should be negative or have its dipole pointing the same direction as the field due to ΔQs. The treatment of eq 2 can be further refined by demanding that the mutations also retain or optimize protein stability. This can be done by our recent focused dielectric approach,35 which approximates the folding energy by ΔGfold = 332

∑ij qi qj =rijεfocus

ð3Þ

where εfocus is the optimal dielectric constant 35 and the qs are the charges of the ionized groups at the given pH. We can now optimize qj under the constraint of a large folding energy, and this leads to q ðΔqj Þopt =  βDGfold =Dqj  RDΔΔgelec =Dqj

ð4Þ

The use of the combined TS stabilization and the protein stability constraints is expected to provide quite an effective way for finding effective mutations of distant ionized residues. It should be noted that the group contributions can provide only very rough hints, because they do not reflect the reorganization energy consistently. In fact, obtaining more quantitative group contributions is possible (especially for an actual given sequence) by using the linear response approximation that captures the reorganization effect.

’ RESULTS Modeling the Catalytic Effects in Different Kemp Eliminases. The most basic requirement for effective enzyme design is the ability to reproduce the observed catalytic effects of different 3852

dx.doi.org/10.1021/bi200063a |Biochemistry 2011, 50, 3849–3858

Biochemistry

ARTICLE

Table 2. Calculated Reaction Barriers for the KE70 Designed Variant for 20 starting Conformations* τ (ps)

Δgqcalc (kcal/mol)

τ (ps)

Δgqcalc (kcal/mol)

10

18.71

110

18.97

20 30

18.61 18.9

120 130

19.45 19.07

40

18.44

140

19.67

50

19.28

150

19.82

60

20.15

160

18.99

70

18.51

170

20.11

80

20.67

180

19.08

90

18.19

190

20.08

100

19.03

200

19.78

average Δgqcalc (kcal/mol)

19.28

Δgqobs (kcal/mol)

18.51

standard deviation

0.67

Figure 5. The observed and calculated LFER for the reaction of 5nitrobenzisoxazole in water as a function of the pKa of the acceptors (pKa(A)). The specific pKa(A) are taken from ref 14.

* The given mapping starts from configurations generated from a single trajectory after the designated generation time (τ).

Figure 4. Correlation between the calculated and observed activation barriers.

design constructs and/or those of natural active sites. Thus, we started with a systematic evaluation of the activation barriers for the different systems considered in Table 1. These systems involve different proteins, different mutants, and different bases. Our typical procedure of obtaining the average activation barrier is illustrated in Figure 3 and Table 2, where we show the results of 20 EVB free energy profiles. The same averaging procedure has been applied to all the systems studied, and the results of the calculations are summarized in Figure 4. The main point that emerged from Table 1 and Figure 4 is the fact that our EVB approach provides a reliable and effective way of screening different design options. In particular, we point out that our results are much more reliable than those reported by alternative approaches (see Discussion). After establishing the reliability of the EVB approach, we turned our focus to attempts to explore possible avenues for controlling and changing the catalytic power of Kemp eliminases. As discussed in our previous work,13,36 it is extremely hard to design an effective Kemp eliminase because the change in charge

Figure 6. Experimental and calculated LFER for the reaction in water as a function of the corresponding ΔpKa. The experimental pKa(D) values were taken from ref 24, and pKa(A) was taken from ref 14 (more details can be found in Table S3 of the Supporting Information).

distribution upon moving to the TS is not large and furthermore this change is not localized. In fact, as found in our previous work,13 the main catalytic effect that has been obtained in Kemp eliminases (beyond the trivial effect of bringing the donor and acceptor to the same cage) is the destabilization of the ground state charge distribution of the proton acceptor. Although this is not the regular avenue taken by native enzymes, it is the simplest direction for improving Kemp eliminases. At any rate, because this effect is associated with the pKa of the proton acceptor, one may ask what can be gained by changing the type of donor or acceptor, and the corresponding analysis is given in the next section. Examining LFER Trends. An interesting and potentially promising aspect of our study is the ability to explore the linear free energy relationship (LFER) between the activation barrier and the nature of the donor and acceptor (the base). In solution, we have a clear LFER for 5-nitrobenzisoxazole as a function of the acceptor pKa, for a series of different catalytic bases (e.g., pyridine catalytic base with a pKaexp of 5.13 and a Δgqexp of 20.3 kcal/mol) with a β of 0.93. This correlation is easily reproduced by the EVB model (see Figure 5) by changing of the gas phase proton affinity (which is 3853

dx.doi.org/10.1021/bi200063a |Biochemistry 2011, 50, 3849–3858

Biochemistry

ARTICLE

Figure 7. The observed and calculated LFER as a function of the acceptor pKa for the reaction in the protein. Figure 8. The LFER as a function of ΔpKa for the reaction in the protein.

reflected by the EVB gas phase shift). Furthermore, we have a similar correlation between the activation barrier and the ΔpKa of the donor and acceptor in the reaction in water (Figure 6), and this correlation is also reproduced by the EVB. It is tempting to assume that we may obtain a similar correlation in the enzyme. Unfortunately, however, in the protein, the situation is more complex. That is, while the gas phase proton affinity is identical in the protein and in solution, the local environment changes the apparent pKa in the protein (the calculated values are listed in Table S2 of the Supporting Information). This is further complicated by the fact that the environment of the proton donor is also changed by the protein. In view of this complication, we start by first exploring the correlation between the observed pKa of the proton acceptor and the activation barrier (Figure 7). As seen from the figures, the correlation is very qualitative, with large deviations. This is significant because we obtained much better correlation between the total calculated and observed activation barriers (Figure 4). Thus, we must conclude that the correlation between the apparent pKa (or ΔpKa) and the activation barrier cannot provide a simple guide improving the catalytic power of Kemp eliminases. More specifically, the correlation between the activation barriers and the pKa of the proton acceptor in the protein (Figure 7) does follow the same trend as the LFER in solution (Figure 5). Furthermore, although the LFER in the protein for the ΔpKa has a slope in the correct direction, the overall effect on the activation barrier is small because the ΔpKa in the protein is much smaller than the corresponding ΔpKa in water (Figure 8). Thus, the effect on the activation barrier is small. A significant part of the problem reflects the fact that an increase in the pKa of the acceptor (because of a decrease in the polarity of its site) is usually accompanied by an increase in the pKa of the donor (Figure 8), because of the decrease in the polarity at the donor site. Of course, one can try to generate different polarities at both sites, but because of the complex nature of the change in the charge of the donor upon formation of the TS, it is hard to obtain effective active site mutations that will stabilize the developing donor charge (see ref 13). In other words, it is much simpler, for example, to reduce the number of water molecules in the active site than to selectively stabilize the ionized form of the proton donor. With this insight in mind, we can look for a way to change the apparent pKa of the acceptor by external charges. Optimizing Long Range Electrostatic Effects. The studies described above have established our ability to conduct an effective

screening but did not address the issue of enzyme design. This issue was discussed in our previous works,3,13 and in the specific case of the Kemp elimination reaction, it has been argued that the refinement of the local environment cannot be effective, because of the small change in the charge distribution during the reaction. Thus, we focused on the small improvements in the catalytic effect by distant ionized residues. That is, our previous studies (e.g., refs 4 and 37) have indicated that the effect of ionized residues at a distance of more than 6 Å can be approximated by using a relatively large effective dielectric constant for chargecharge interaction. Here we can use the approach introduced in our previous study13 and described in Methods, where we optimize the effect of distanced ionized residues on the difference in charge distribution between the RS and TS while simultaneously retaining the protein stability. This can be done by using eq 4. Here we report a qualitative attempt to use eq 3 to refine the 13G5 system. The calculated group contributions are given in Figure 9 and Table S5 of the Supporting Information, where we focus on the predicted effect of distant charged residues. The most effective sites for distant charged residues are also depicted in Figure 10. For example, we would predict from Table S5 a rate acceleration of ∼5-fold (TS stabilization of ∼0.9 kcal/mol) due to the VH78E mutation, but this is done without exploring the effect on stability (by the use of eq 4). The stability consideration should also take into account the fact that we already have two negatively charged groups (EH6 and DH72) in the same region. It may be simpler to explore our predication by simply introducing the EH6 Q mutation (where we predict an ∼4-fold rate reduction). We focus here on predicting the effects of distant charged residues in view of the fact that the use of eq 1 for predicting the effect of active site residues (in the neighborhood of the substrate) is not expected to be particularly effective. Of course, we can use the full EVB calculations in the same way that were used in Modeling the Catalytic Effects in Different Kemp Eliminases to predict catalytic mutations; however, as discussed in ref 13, it is hard to obtain major catalytic effects in the case of the Kemp elimination reaction, and thus, it would be more effective to try to exploit the simpler strategy of using eq 1 for distant residues. In this respect, it is useful to consider the HisH95Asn mutant, where the mutation of His to Asn reduces the rate significantly. However, here it is possible that in the case of the native enzyme HisH95 is the proton acceptor and AspH35 3854

dx.doi.org/10.1021/bi200063a |Biochemistry 2011, 50, 3849–3858

Biochemistry

Figure 9. Group contributions from the all distant (top) and ionized distant (bottom) residues. The calculations were conducted for the 13G5 catalytic antibody. The group contributions reflect the effect of changing the charge at the indicated site from zero to þ1.0. Residues 2 to 215 and 216 to 431 are located at the heavy and light chains of the 13G5 catalytic antibody, respectively.

stabilizes the protonated HisH95, while in the case of the HH95N mutant AspH35 is the proton acceptor. The exact role of the Asp-His pair will be the subject of a subsequent study. Here we mainly raise this point to illustrate the difficulty in proposing modification of ionizable residues in the immediate neighborhood of the reacting system. On the other hand, the effect of distant residues is much simpler to predict, and the prediction is likely to be reliable. Examination of the Inactive Designs. One of the questions that should be addressed in enzyme design is the ability to eliminate design constructs that lead to inactive enzymes. This issue is explored here by calculating the activity of several inactive enzymes. The results are summarized in Table 3. As one can see from the table, we obtained high activation barriers for the nonactive systems examined. Obviously, a more extensive examination is needed; however, despite the interest in predicting nonactive constructs and eliminating them from design experiments, we place much more importance on the ability to improve the catalytic activity of mutants that are already slightly active. In fact, we believe that this is the direction that would be the most beneficial to the use of our quantitative methods.

ARTICLE

Figure 10. Predicted sites of the most effective mutations for catalysis by distant residues.

’ DISCUSSION The rational design of enzymes with native activity requires the ability to predict the proper TS stabilization, and this involves the challenge of capturing the overall preorganization effect. Attempts to estimate the catalytic effect by using gas phase models, or even by looking at the electrostatic interaction between different residues and the TS, are unlikely to reproduce the correct catalytic effect because it is impossible to assess the preorganization effect without including the protein and simulating its reorganization during the reaction. The challenge of evaluating the catalytic power of a given mutant is not different from that addressed in our early 1986 study of computer-aided mutations.6 At this stage, it seems to us that the potential of the EVB has been demonstrated in welldefined cases (e.g., refs 3 and 38), where it was found to reproduce the large effects of mutations that destroy the catalytic effect of evolved enzymes. Thus, our main challenge is to use this approach in improving nonefficient enzymes. It is also important to clarify that we appreciate the advances made in designing artificial enzymes (and clearly those done with catalytic antibodies), in terms of generating active sites that bind 3855

dx.doi.org/10.1021/bi200063a |Biochemistry 2011, 50, 3849–3858

Biochemistry

ARTICLE

Table 3. Calculated Δgqcat for the Kemp Elimination Reactions of Inactive Enzymes system

a

basea

donor (benzisoxazole)

Δgqobs (kcal/mol)

Δgqcalc (kcal/mol)

water (cage)

Glu/Asp

5-nitrobenzisoxazole

21.2

21.2

KE59 designed variant

E231

5-nitrobenzisoxazole

inactive

24.3

HG2 (wild type, X-ray from S. Mayo)

D126

5-nitrobenzisoxazole

inactive

31.7

HG2 mutant (S265A, from wild type)

D126

5-nitrobenzisoxazole

inactive

26.7

The base definition indicates the base type and its position in the sequence.

and fit the given reacting system. However, we do not believe that the current steps are sufficient for generating effective catalysts, and a CAED must involve the ability to predict the catalysis in the given active site. At this point, it is useful to clarify the difference between our EVB approaches and current alternative approaches. The essential requirement of a proper screening approach is the ability to reproduce the observed catalytic effects. Obviously, this major requirement cannot be accomplished by gas phase models (including even gas phase models with the substrate and very few residues) that were used for the initial screening in some cases (e.g., refs 16 and 24). Instructive MM and related simulations19,39 can tell us about the optimal donor and acceptor geometries and help in the generation of proper scaffolds for the reacting systems but are unlikely to be able to predict the catalytic trends in properly oriented systems. More relevant and instructive would be a comparison of the EVB to current MO-QM/MM studies of enzyme design. Here it would be useful to consider several recent studies of the Kemp eliminase and related systems. The semiempirical MO-QM/MM study of Jorgensen and co-workers has provided reliable results for the water reference reactions,40 but the predicted trend in the protein17 is not encouraging. More specifically, the MO-QM/MM approach performs nicely in exploring the effect of changing the distance between the donor and acceptor (i.e., the Glu to Asp mutation41). However, the real challenge is to reproduce the effect of changing the environment (which occurs in directed evolution experiments and is usually responsible for the catalytic activity), and this challenge has not yet been met by the current MO-QM/MM studies of Kemp eliminases (which drastically underestimated the barrier in the enzyme). Interestingly, ref 18 argued that it could improve the MO-QM/MM results. However, the reported results (and the agreement with the corresponding experimental results) seem to overlook the energetics of forming the protonated water molecule that is thought to be the proton donor. Perhaps the current difficulties with the MO-QM/MM method are due to the fact that the reported studies kept the main chain fixed. Alternatively, Houk and co-workers19 have attempted to use an ONIOM truncated protein model but obtained relatively poor agreement [with a spread of ∼12 kcal/mol for experimental deviations of 2 kcal/mol (see Figure S3 of ref 19)]. This work also presented views that might lead to some confusion. First, there are problems with the argument that calculations with an error of 1.5 kcal/mol may not be useful because the observed mutational effects are around 2 kcal/mol. In fact, predictability with a 2 kcal/mol error range would be a fantastic tool in attempts to generate an enzyme with large catalytic effects. Second, and potentially more problematic, is the idea that the predicted power requires a very highlevel QM method. This suggestion is risky (in terms of its possible impact on the experimental community) and unjustified. That is, predictive approaches like the EVB method are not interested in predicting the absolute QM energy of the substrate, because what

counts is the change in this energy upon moving from water to the protein active site. Thus, the effort in developing a predictive method must be spent on having a good convergence and a proper long range treatment and not on obtaining the best basis set. Overall, we have no doubt that the MO-QM/MM approach with proper sampling (e.g., with our approach of using a reference potential42) will be able to provide a proper screening tool (in particular when used with an EVB as a reference potential). However, at present, the EVB seems to provide the most effective way to obtain reliable screening results perhaps because of its ability to allow for sufficiently extensive sampling. Some workers may see great potential for using desolvation effects in the enzyme and causing catalysis by ground state destabilization (GSD) (see the discussion in ref 4), but native enzymes do not catalyze reactions by exploiting desolvation effects.4 Furthermore, even Kemp eliminases have not been able to exploit this effect significantly. One of the problems is that even if we could create a strong RS desolvation for the base it would lead to a very large pKa, and this will not help at physiological pH. That is, if we try to destabilize the RS by destabilizing the ionized base (e.g., the ionized Asp), the base will be protonated by a bulk proton. Here the best option is to use the polar TS stabilization, but unfortunately, it is very hard to obtain for the Kemp TS (see ref 13). Perhaps a part of the reason why enzymes do not use desolvation effects is the available pH range and the available amino acids (see the discussion of ODCase in ref 43). Note that the GSD issue has been established with the reliable linear response (LRA) calculations of the ground state solvation free energy13 and by detailed comparison to the related case in dehalogenase, where a similar situation is handled by a neutrally evolved enzyme in a completely different way, with ground state stabilization and with very large transition state stabilization (see ref 13). Our conclusion about the fact that the Kemp eliminases use RSD is not related to the exact structure of the TS, namely concerted or stepwise, but to the charge distribution of the TS (which has been treated here in a rigorous way by our special procedure). Because this work invested major computational power in validating the EVB results, one may ask why we have not provided some predictions. The answer involves two points. First, we do provide several clear predictions with regard to the effects of mutating distant residue predictions. This reflects our finding (see ref 13) that it is extremely difficult to achieve strong catalysis in Kemp eliminases by simple mutations of the active site residues (this is why we turn our attention to other systems, where the changes in the substrate charges upon going to the TS are larger). Second, we already demonstrated our ability to have reasonable predictions of the Asn155 to Ala mutation in subtilisin in ref 7. Thus, this paper is more about what it takes to obtain a reliable prediction than about actual predictions. More specifically, in our previous works (e.g., ref 3), we examined the trade-off between speed and reliability in different approaches for enzyme design. At 3856

dx.doi.org/10.1021/bi200063a |Biochemistry 2011, 50, 3849–3858

Biochemistry

ARTICLE

Table 4. Estimating the Efficiency of the EVB Screening Calculationsa

Δgqcat using EVB

computer time

no. of mutants

per mutant

per 24 h per

no. of mutants per 24 h per

(runs, processor)

200 processorsb

1000 processorsb

17 h (1, 1)

14

70

a

The calculations were conducted on the University of Southern California HPCC (High Performance Computing and Communication) Linux computer, using Dual Intel Xeon(64-bit) 3.2 GHz 2GB Memory nodes. b Twenty runs per mutant.

present, we feel that our fast strategies (like using group contributions and reorganization energies) are not sufficiently predictive and that it is important to use the extensive averaging considered here. However, with the current advances in computer power, this is not such bad news. That is, as seen from Table 4, we can screen 14 mutants (20 runs for each mutant) in 24 h on 200 nodes and 70 mutants using 1000 nodes. We also used a more parallel approach, in which the mapping is distributed on several processers, but this did not lead to a more efficient screening. At present, there are still many who believe in dynamical and other esoteric effects that are presumed to contribute to catalysis (for a review, see ref 21). In many cases, it is clearly suggested that improving such effects will be crucial for optimal enzyme design (e.g., ref 20 and 44). However, it seems to us that by far the main factor that actually contributes to catalysis is the preorganization effect, and thus, we feel that there is no rational way to improve the dynamics and related effects as these factors do not contribute to catalysis.21 Furthermore, we believe that TS stabilization by delocalization effects16 is unlikely to provide a significant catalytic factor because the same effect exists in the reference solution reaction. Thus, the possible effect of π-stacking (which was considered in ref 16) is not expected to lead to a significant rate enhancement above the simple effect of having nearby induced dipoles, which is much less effective than having a preorganized polar environment. In fact, as realized by Hilvert and co-workers,15 the corresponding dispersion or more precisely inductive effect is small. Our previous work13 attempted to refine the electrostatic environment near O1. This effort can be considered by some as an extension of the idea of placing an acid near O1 (e.g., refs 2 and 15). However, the idea that such a base is needed is reminiscent of physical organic chemistry concepts that capture some of the electrostatic effect but end up looking at factors that do not play major role in enzyme catalysis. In our view, the issue is not a charge transfer or a covalent bond to the substrate, which might be concluded from gas phase calculations, because we are not dealing here with a bifunctional reaction with two steps (unless we have a new chemistry), but with a pure electrostatic stabilization. It is true that the attempts to focus on the base lead in some cases to what we consider the correct direction, like placing Lys or His near O1,2 but this has little to do with the pKa of Lys, as one would assume from the traditional picture. It actually reflects the electrostatic interaction between the stabilizing group and the charge that moves to O1 in the TS. Obviously, our strategy can and will be improved in the future, but the main point is the ability to consider enzyme design by using energy-based concepts in a rational way. In this respect, it is useful to point out a special nontrivial advantage of our CAED approach. That is, in the case of attempts to improve a specific enzyme, there is

an acquired advantage in the accumulated experience of modeling different mutants (as was the case here in the study of the evolved mutants). Even significant errors in predicting the first set of mutants can lead to improvement of the model (e.g., in selecting an enzyme specific dielectric to compensate for convergence difficulties in the treatment of ionized residues) and to a better understanding of the specific enzymes and thus to a better predictive ability in further design rounds. Such an advantage is not expected from computational design approaches that are not based on capturing the physics of the catalytic process. Finally, we reclarify that we have demonstrated the ability to reproduce quantitatively the absolute catalytic effects and mutational effects in naturally evolved enzymes4 and in designer enzymes (this work). This clearly indicates that the catalytic power of enzyme is not due to elusive effects (e.g., conformational dynamics) but to what is by now well understood, namely electrostatic preorganization. Thus, our difficulties in improving designer enzymes are not due to overlooking misunderstood factors, but to the difficulties in optimizing well-understood factors. In other words, a method that reproduces the catalytic rate in known systems should be able to do so in any unknown sequence, and the challenge is to find the unknown optimal sequence. At any rate, it seems to us that this study provides a useful analysis of the reasons for the less than perfect performance of current designer enzymes.

’ ASSOCIATED CONTENT

bS

Supporting Information. Computational and modeling information. This material is available free of charge via the Internet at http://pubs.acs.org.

’ AUTHOR INFORMATION Corresponding Author

*Telephone: (213) 740-4114. Fax: (213) 740-2701. E-mail: [email protected]. Funding Sources

This work was supported by Grant GM024492 from the National Institutes of Health.

’ ACKNOWLEDGMENT We gratefully acknowledge stimulating discussion with Dan S. Tawfik, Donald Hilvert, Anthony J. Kirby, and Steve Mayo. We thank the University of Southern California's High Performance Computing and Communication Center (HPCC) for computer time. ’ ABBREVIATIONS TS, transition state; EVB, empirical valence bond; MO-QM/MM, molecular orbital-combined quantum mechanics/molecular mechanics; CAED, computer-aided enzyme design; LFER, linear free energy relationship; RS, reactant state; PS, product state; FEP/ US, free energy perturbation umbrella sampling; SCAAS, surfaceconstrained all atom solvent; LRF, local reaction field; PDB, Protein Data Bank. ’ REFERENCES (1) Toscano, M. D., Woycechowsky, K. J., and Hilvert, D. (2007) Minimalist active-site redesign: Teaching old enzymes new tricks. Angew. Chem., Int. Ed. 46, 4468–4470. 3857

dx.doi.org/10.1021/bi200063a |Biochemistry 2011, 50, 3849–3858

Biochemistry (2) Khersonsky, O., Rothlisberger, D., Dym, O., Albeck, S., Jackson, C. J., Baker, D., and Tawfik, D. S. (2010) Evolutionary optimization of computationally designed enzymes: Kemp eliminases of the KE07 series. J. Mol. Biol. 396, 1025–1042. (3) Roca, M., Vardi-Kilshtain, A., and Warshel, A. (2009) Toward accurate screening in computer-aided enzyme design. Biochemistry 48, 3046–3056. (4) Warshel, A., Sharma, P. K., Kato, M., Xiang, Y., Liu, H., and Olsson, M. H. (2006) Electrostatic basis for enzyme catalysis. Chem. Rev. 106, 3210–3235. (5) Warshel, A. (1991) Computer Modeling of Chemical Reactions in Enzymes and Solutions, Wiley-Interscience, New York. (6) Warshel, A., and Sussman, F. (1986) Toward computer-aided site-directed mutagenesis of enzymes. Proc. Natl. Acad. Sci. U.S.A. 83, 3806. (7) Hwang, J. K., and Warshel, A. (1987) Semiquantitative calculations of catalytic free energies in genetically modified enzymes. Biochemistry 26, 2669–2673. (8) Liu, H., and Warshel, A. (2007) The Catalytic Effect od Dihydrofolate Reductase and Its Mutants Is Determined by Reorganization Energies. Biochemistry 46, 6011–6025. (9) Grigorenko, B. L., Nemukhin, A. V., Topol, I. A., Cachau, R. E., and Burt, S. K. (2005) QM/MM Modeling the Ras-GAP Catalyzed Hydrolysis of Guanosine Triphosphate. Proteins 60, 495–503. (10) Marti, S., Andres, J., Moliner, V., Silla, E., Tunon, I., and Bertran, J. (2007) Computer-Aided Rational Design of Catalytic Antibodies: The 1F7 Case. Angew. Chem., Int. Ed. 46, 286–290. (11) Marti, S., Andres, J., Moliner, V., Silla, E., Tunon, I., and Bertran, J. (2008) Predicting an Improvement of Secondary Catalytic Activity of Promiscuos Isochorismate Pyruvate Lyase by Computational Design. J. Am. Chem. Soc. 130, 2894–2895. (12) Mulholland, A. J. (2008) Computational enzymology: Modelling the mechanisms of biological catalysts. Biochem. Soc. Trans. 36, 22–26. (13) Frushicheva, M. P., Cao, J., Chu, Z. T., and Warshel, A. (2010) Exploring challenges in rational enzyme design by simulating the catalysis in artificial kemp eliminase. Proc. Natl. Acad. Sci. U.S.A. 107, 16869–16874. (14) Kemp, D. S., and Casey, M. L. (1973) Physical organic chemistry of benzisoxazoles II. Linearity of the bronsted free energy relationship for the base-catalyzed decomposition of benzisoxazoles. J. Am. Chem. Soc. 95, 6670–6680. (15) Debler, E. W., Ito, S., Seebeck, F. P., Heine, A., Hilvert, D., and Wilson, I. A. (2005) Structural origins of efficient proton abstraction from carbon by a catalytic antibody. Proc. Natl. Acad. Sci. U.S.A. 102, 4984–4989. (16) Rothlisberger, D., Khersonsky, O., Wollacott, A. M., Jiang, J., DeChancie, J., Betker, J., Gallaher, J. L., Althoff, E. A., Zanghellini, A., Dym, O., Albeck, S., Houk, K. N., Tawfik, D. S., and Baker, D. (2008) Kemp Elimination Catalysts by Computational Enzyme Design. Nature 453, 190–195. (17) Alexandrova, A. N., Rothlisberger, D., Baker, D., and Jorgensen, W. L. (2008) Catalytic mechanism and performance of computationally designed enzymes for Kemp elimination. J. Am. Chem. Soc. 130, 15907–15915. (18) Acevedo, O. (2009) Role of water in the multifaceted catalytic antibody 4B2 for allylic isomerization and Kemp elimination reactions. J. Phys. Chem. B 113, 15372–15381. (19) Kiss, G., Rothlisberger, D., Baker, D., and Houk, K. N. (2010) Evaluation and ranking of enzyme designs. Protein Sci. 19, 1760–1773. (20) Lassila, J. K. (2010) Conformational diversity and computational enzyme design. Curr. Opin. Chem. Biol. 14, 676–682. (21) Kamerlin, S. C., and Warshel, A. (2010) At the dawn of the 21st century: Is dynamics the missing link for understanding enzyme catalysis? Proteins 78, 1339–1375. (22) Kamerlin, S. C., Williams, N. H., and Warshel, A. (2008) Dineopentyl phosphate hydrolysis: Evidence for stepwise water attack. J. Org. Chem. 73, 6960–6969.

ARTICLE

(23) Thorn, S. N., Daniels, R. G., Auditor, M. T., and Hilvert, D. (1995) Large rate accelerations in antibody catalysis by strategic use of haptenic charge. Nature 373, 228–230. (24) Hu, Y., Houk, K. N., Kikuchi, K., Hotta, K., and Hilvert, D. (2004) Nonspecific medium effects versus specific group positioning in the antibody and albumin catalysis of the base-promoted ring-opening reactions of benzisoxazoles. J. Am. Chem. Soc. 126, 8197–8205. (25) Debler, E. W., Muller, R., Hilvert, D., and Wilson, I. A. (2009) An aspartate and a water molecule mediate efficient acid-base catalysis in a tailored antibody pocket. Proc. Natl. Acad. Sci. U.S.A. 106, 18539– 18544. (26) Privett, H. K. (2009) An iterative approach to de novo computational enzyme design and the successful application to the Kemp elimination. Ph.D. Dissertation, California Institute of Technology, Pasadena, CA. (27) Zhu, L., Yang, F., Chen, L., Meehan, E. J., and Huang, M. (2008) A new drug binding subsite on human serum albumin and drugdrug interaction studied by X-ray crystallography. J. Struct. Biol. 162, 40–49. (28) Hollfelder, F., Kirby, A. J., Tawfik, D. S., Kikuchi, K., and Hilvert, D. (2000) Characterization of Proton-Transfer Catalysis by Serum Albumins. J. Am. Chem. Soc. 122, 1022–1029. (29) Kamerlin, S. C., and Warshel, A. (2009) On the energetics of ATP hydrolysis in solution. J. Phys. Chem. B 113, 15692–15698. (30) Kamerlin, S. C., and Warshel, A. (2010) The EVB as a quantitative tool for formulating simulations and analyzing biological and chemical reactions. Faraday Discuss. 145, 71–106. (31) Kamerlin, S. C., and Warshel, A. (2011) The empirical valence bond model: Theory and applications. Wiley Interdisciplinary Reviews: Computational Molecular Science 1, 30–45. (32) Kamerlin, S. C., Cao, J., Rosta, E., and Warshel, A. (2009) On unjustifiably misrepresenting the EVB approach while simultaneously adopting it. J. Phys. Chem. B 113, 10905–10915. (33) Lee, F. S., Chu, Z. T., and Warshel, A. (1993) Microscopic and Semimicroscopic Calculations of Electrostatic Energies in Proteins by the Polaris and Enzymix Programs. J. Comput. Chem. 14, 161–185. (34) Singh, N., and Warshel, A. (2010) Absolute binding free energy calculations: On the accuracy of computational scoring of protein-ligand interactions. Proteins 78, 1705–1723. (35) Vicatos, S., Roca, M., and Warshel, A. (2009) Effective approach for calculations of absolute stability of proteins using focused dielectric constants. Proteins 77, 670–684. (36) Hollfelder, F., Kirby, A. J., and Tawfik, D. S. (2001) On the magnitude and specificity of medium effects in enzyme-like catalysts for proton transfer. J. Org. Chem. 66, 5866–5874. (37) Warshel, A. (1987) What about protein polarity? Nature 330, 15–16. (38) Shurki, A., and Warshel, A. (2004) Why does the Ras switch “break” by oncogenic mutations? Proteins 55, 1–10. (39) Baker, D. (2010) An exciting but challenging road ahead for computational enzyme design. Protein Sci. 19, 1817–1819. (40) Acevedo, O., and Jorgensen, W. L. (2004) Solvent effects and mechanism for a nucleophilic aromatic substitution from QM/MM simulations. Org. Lett. 6, 2881–2884. (41) Alexandrova, A. N., and Jorgensen, W. L. (2009) Origin of the activity drop with the E50D variant of catalytic antibody 34E4 for Kemp elimination. J. Phys. Chem. B 113, 497–504. (42) Rosta, E., Klahn, M., and Warshel, A. (2006) Towards accurate ab initio QM/MM calculations of free-energy profiles of enzymatic reactions. J. Phys. Chem. B 110, 2934–2941. (43) Warshel, A., Strajbl, M., Villa, J., and Florian, J. (2000) Remarkable rate enhancement of orotidine 50 -monophosphate decarboxylase is due to transition-state stabilization rather than to groundstate destabilization. Biochemistry 39, 14728–14738. (44) Nagel, Z. D., and Klinman, J. P. (2009) A 21st century revisionist’s view at a turning point in enzymology. Nat. Chem. Biol. 5, 543–550. 3858

dx.doi.org/10.1021/bi200063a |Biochemistry 2011, 50, 3849–3858