MM Methods for Modeling Enzyme Reactions

Aug 6, 2010 - Combined quantum mechanics/molecular mechanics (QM/MM) calculations with high levels of correlated ab initio theory can now provide ...
2 downloads 0 Views 2MB Size
J. Phys. Chem. B 2010, 114, 11303–11314

11303

Testing High-Level QM/MM Methods for Modeling Enzyme Reactions: Acetyl-CoA Deprotonation in Citrate Synthase Marc W. van der Kamp, Jolanta Z˙urek, Frederick R. Manby, Jeremy N. Harvey, and Adrian J. Mulholland* Centre for Computational Chemistry, School of Chemistry, UniVersity of Bristol, Bristol BS8 1TS, United Kingdom ReceiVed: May 4, 2010; ReVised Manuscript ReceiVed: July 2, 2010

Combined quantum mechanics/molecular mechanics (QM/MM) calculations with high levels of correlated ab initio theory can now provide benchmarks for enzyme-catalyzed reactions. Here, we use such methods to test various QM/MM methods and the sensitivity of the results to details of the models for an important enzyme reaction, proton abstraction from acetyl-coenzyme A in citrate synthase. We calculate multiple QM/ MM potential energy surfaces up to the local coupled cluster theory (LCCSD(T0)) level, with structures optimized at hybrid density functional theory and Hartree-Fock levels. The influence of QM methods, basis sets, and QM region size is shown to be significant. Correlated ab initio QM/MM calculations give barriers in agreement with experiment for formation of the acetyl-CoA enolate intermediate. In contrast, B3LYP fails to identify the enolate as an intermediate, whereas BH&HLYP does. The results indicate that QM/MM methods and setup should be tested, ideally using high-level calculations, to draw reliable mechanistic conclusions. Introduction Computational modeling can make significant contributions to the field of enzymology, for example, in resolving reaction mechanisms and analyzing the causes of catalysis.1-3 Modeling of enzyme reactions also promises to contribute in fields such as catalyst design4 and drug design.5,6 Computer simulations can be used to study unstable species in the reaction, evaluate possible alternative mechanisms, and calculate specific contributions to catalysis. In order to describe reactions in their enzymatic environment, hybrid quantum mechanical/molecular mechanical (QM/MM) simulations are particularly useful.7-9 The essence of the QM/MM approach is that a small region is treated quantum mechanically and is coupled to a simpler molecular mechanics description of the remainder of the system.10,11 In recent years, QM/MM calculations on reactions in enzymes with density functional theory (DFT)12 and ab initio molecular orbital QM/MM calculations13,14 have become increasingly popular, often resulting in successful reproduction of experimental results. More importantly, new insights into enzyme reaction mechanisms have been obtained.15-17 Highlevel local correlation methods can now be applied in QM/MM calculations on enzyme reactions, though such computations remain highly demanding.13,14,18 These studies have shown that the use of coupled cluster theory in QM/MM calculations (with local approximations19) is now possible, marking a significant step-change for QM/MM methods and promising high levels of accuracy in the calculation of enzyme reaction barriers.20 The choice of the specific setup and methods used in QM/ MM calculations is not trivial. Some quantum chemical methods widely used in computational enzymology may have significant limitations. For example, B3LYP is a popular DFT functional often used in QM/MM calculations, but although it is capable of successfully reproducing (apparent) experimental barriers for some enzymes (e.g., to within ∼5 kcal mol-1),12,13,21 it may fail * To whom correspondence should be addressed. [email protected]. Phone: +44(0)1179289097.

E-mail:

markedly for certain enzyme-catalyzed reactions.22 High-level correlated ab initio QM/MM calculations can now be performed to test DFT methods.13,18 Further, choices must be made as to which part of the enzyme-substrate system is treated quantum mechanically and which part is treated molecular mechanically and how to treat the boundary between these two regions. Such choices can affect the calculated energetics of enzymatic reactions significantly.23,24 Finally, it is important to consider a range of enzyme-substrate conformations.25,26 Such factors should be examined to test the sensitivity of conclusions about enzyme mechanisms drawn from calculations. In this study, we investigate the influence of QM methods (including basis set effects) and QM region for reaction modeling in (Si-)citrate synthase. This is an important example of enzymatic carbon-carbon bond formation and enzyme catalysis in general and has been extensively studied experimentally27-40 and computationally.15,34,41-49 It is furthermore crucial for almost all forms of life because it is the so-called pacemaker enzyme of the citric acid cycle.50,51 The overall reaction catalyzed by citrate synthase is the formation of citrate from acetyl-CoA and oxaloacetate (OAA).37,39 The reaction proceeds via deprotonation of acetyl-CoA to form a nucleophilic intermediate,40,52 which subsequently attacks the carbonyl carbon of oxaloacetate (Figure 1), which is polarized in the enzyme active site.31,33,47 Hydrolysis of the thioester bond triggers release of the products citrate and coenzyme A. Crystallography, computation, and mutagenesis have identified Asp375 as the catalytic base,27,29 with His274 also observed to interact with acetyl-CoA30 and believed to play a catalytic role28 (the numbering used is that for pig citrate synthase). Deprotonation of acetyl-CoA is the first reaction step in the enzymatic reaction and involves proton abstraction from the R-carbon of acetyl-CoA by an aspartate residue (Figure 1). After proton abstraction, an enolate intermediate is formed that is stabilized through hydrogen bonds donated by His274 and a conserved water molecule.43,48,49 The enolate attacks the OAA carbonyl carbon.46 To form the (kinetically competent) intermediate citryl-CoA,35 proton transfer to the former oxaloacetate

10.1021/jp104069t  2010 American Chemical Society Published on Web 08/06/2010

11304

J. Phys. Chem. B, Vol. 114, No. 34, 2010

Van der Kamp et al.

Figure 1. Enolate formation (acetyl-CoA deprotonation) and condensation steps leading to citryl-CoA formation in citrate synthase.15,43

carbonyl oxygen is needed. High-level QM/MM modeling studies (up to the MP2/aug-cc-pVDZ-CHARMM27 level) have recently indicated that Arg329 is the residue that provides this proton;15 this may disrupt the active site, causing a conformational change to allow hydrolysis to occur. For efficient catalysis of the overall reaction, stabilization of the enolate intermediate by the enzyme active site is crucial. It has been shown that this stabilization is mostly due to electrostatic (predominantly hydrogen bond) effects in the active site.47 Commonly, testing of QM/MM methods is performed using small (gas-phase) models,45,53 whereas their application is aimed at following reactions in large, condensed-phase systems.24,54,55 Here, we report tests involving detailed QM/MM modeling of the formation of the enolate intermediate (sometimes termed the enolization step) in citrate synthase, with two DFT functionals and several ab initio methods (up to the coupled cluster level) for the QM region. Among the challenges that this reaction poses for QM/MM modeling are the significant polarization of the reacting species by the enzyme and the high concentration of negative charge (an enolate is generated in close proximity to the dianionic OAA). The proton-transfer reaction here is, however, less complex than some other reaction steps catalyzed by enzymes. Not all results of the tests performed for this reaction may therefore be fully transferable to the modeling of more complex reactions. We assess the influence of the QM method (for both energy calculation and geometry optimization), basis set size, and the choice of the QM region. By calculating reaction profiles from different enzyme-substrate complexes, the influence of starting structure is also investigated. Such extensive and detailed testing of the application of highlevel QM/MM modeling to enzyme reactions should give

valuable insights for applications to other systems. The use of coupled cluster theory in the QM/MM calculations is of particular importance as it can predict energy barriers with an accuracy close to that of experiment; CCSD(T) (CCSD with a perturbative treatment of triple excitations) calculations of rate constants for gas-phase reactions agree closely with experiment;56 similarly, in the very few enzyme applications to date (chorismate mutase13 and p-hydroxybenzoate hydroxylase.13,14), LCCSD(T) QM/MM energy barriers agree with experiment almost within chemical accuracy (1 kcal/mol). We also test different approaches for calculating reaction pathways, comparing the typical reaction coordinate driving or adiabatic mapping approach with the more sophisticated climbing image nudged elastic band (CI-NEB) method, which does not require an a priori definition of the reaction coordinate, allowing for flexible optimization of the reaction path and transition-state structure. We calculate energy profiles for reaction and compare the results to the profiles calculated using adiabatic mapping along a simple predefined reaction coordinate. Altogether, these results should provide benchmark potential energy surfaces for an enzyme-catalyzed proton-transfer reaction. We discuss the implications of our findings for modeling other enzymes with QM/MM methods. Methods Setup and Starting Structures. A model of the enzymesubstrate system was prepared as reported previously.47 In summary, the dimeric form of the crystal structure of chicken CS cocrystallized with acetyl-CoA and R-malate (PDB entry code 4CSC30) was chosen, and malate was replaced by OAA. The system was solvated (with TIP3P57 water modified for use

Testing QM/MM Methods for Enzyme Reactions with the CHARMM22/27 force fields58) and reduced to an 18 Å sphere centered on the terminal acetyl-CoA carbon. Stochastic boundary conditions were applied with a 14 Å reaction region.44,47 QM/MM MD simulations were performed at 300 K using AM1 for the QM region and CHARMM2758,59 for the MM region. The QM region in these simulations consisted of the Asp375 side chain (from Cβ; pig CS numbering is used throughout), the methylthioester part of acetyl-CoA, and OAA. Link atoms10,60 were used to model bonds crossing the QM/MM boundary. The MM charges of groups adjacent to link atoms were excluded from interaction with the QM region (using the EXGR command in the CHARMM program61). To obtain five different starting structures around the transition state, a series of umbrella sampling molecular dynamics simulations was carried out, starting after 75-85 ps of the initial unbiased QM/ MM MD simulation. Umbrella sampling was performed along the reaction coordinate describing proton transfer from acetyl-CoA to Asp375, as reported previously.47 This reaction coordinate is defined as follows: rc ) d(CAcetyl-CoA-H) - d(OAsp375-H). Snapshots were taken after 10 ps of simulation with the reaction coordinate restrained to 0.2 Å from five different umbrella sampling series. Subsequently, the snapshots were minimized using AM1/CHARMM27, again with the reaction coordinate restrained to 0.2 Å. The value of the reaction coordinate restraint was chosen to obtain conformations close to the transition state for the proton abstraction reaction.47,49 The five different enzyme-reactant conformations thus obtained (from different enzyme-substrate conformations) were taken as starting points for the high-level QM/MM calculations. High-Level QM/MM Calculations. QM/MM optimizations with the hybrid density functionals B3LYP62 and BH&HLYP63 as well as Hartree-Fock (HF) were performed using QoMMMa, a locally developed set of routines coupling output from QM and MM software.13,15,64 The Jaguar software package65 was used for QM calculations, and the TINKER code66 using the CHARMM27 all-atom force field58 was used for MM calculations. MM atoms more than 16 Å away from the carbon of the terminal methyl group of acetyl-CoA were held fixed. Full optimization of the geometry of the MM atoms was performed at each QM step. Electronic polarization of the QM system is accounted for by including the MM atomic partial charges in the QM Hamiltonian. The steric QM/MM interactions are described using Lennard-Jones (van der Waals) parameters for the QM atoms. Lennard-Jones parameters taken from the CHARMM27 MM force field values58 for similar atoms were used as they were found to describe well the hydrogen bonding between the QM and MM regions when tested in small-model systems.22 As in the AM1/CHARMM27 QM/MM calculations, hydrogen link atoms were used to satisfy the valencies of covalent bonds at the QM/MM boundary, and charges of MM atoms of groups adjacent to link atoms were set to 0 in the QM calculations. No cutoff was applied for nonbonded interactions. Reaction profiles were calculated by following the reaction coordinate defined above, which has previously been successfully applied to describe the proton transfer from acetyl-CoA to Asp375.44,47 Initially, each starting transition-state-like structure (optimized with AM1/CHARMM27) was reoptimized, again with the reaction coordinate restrained to 0.2 Å by a harmonic potential with a force constant of 1000 kcal mol-1 Å-2. From this structure, the potential energy surface was calculated in an adiabatic mapping approach in both the forward and backward directions in a series of geometry optimizations, each optimization using the optimized structure of the im-

J. Phys. Chem. B, Vol. 114, No. 34, 2010 11305 mediately preceding reaction coordinate value as its starting point. A step size of 0.1 Å along the reaction coordinate was used. Single-point QM/MM energy calculations (on geometries optimized at lower QM/MM levels) with correlated ab initio methods were performed with the Molpro program.67 The MM protein environment was taken into account by including the MM atomic partial charges as external point charges. The MM van der Waals interaction energy between the QM and MM region was also included (as calculated by QoMMMa (using TINKER)). Density fitting was used to approximate integrals.19,68 MP2 and SCS-MP269 calculations were performed with standard canonical algorithms using density fitting to increase efficiency.68 LCCSD(T0) calculations (coupled cluster theory with local correlation treatment including single and double excitations and an approximate treatment of the conventional perturbative triples correction in which certain couplings are neglected70,71) were also performed, again using density fitting. This (T0) correction has been shown to give very similar results to the full perturbative triples corrections.14 For the local correlation treatment,19 domains were selected using the geometry of the approximate transition state (rc ) 0.3 Å) and a domain extension distance (REXT) of 2.5 Bohr. The use of these domains was first tested using (SCS-)LMP2 calculation of a complete reaction profile; these profiles were not significantly different from the profile calculated using the corresponding full correlation treatment (see Supporting Information). The use of density fitting, used throughout for the correlated molecular orbital methods, does not lead to a significant degradation in accuracy, as pointed out previously.14,19 All QM/MM geometry optimizations were performed with the 6-31+G(d) basis set. For energy calculations, a range of basis set types was tested in conjunction with B3LYP (6-31G(d), 6-31+G(d), 6-31G(d,p), 6-31G+(d,p), cc-pVDZ, and aug-ccpVDZ). For single-point QM/MM energy calculations with ab initio QM methods, the aug-cc-pVDZ basis set was used in most cases. For (SCS-)MP2/CHARMM27 energy calculations, convergence with basis set size was tested using aug-cc-pVDZ, augcc-pVTZ, and aug-cc-pVQZ, and for LCCSD(T0), convergence was tested using aug-cc-pVDZ and aug-cc-pVTZ. Climbing Image Nudged Elastic Band Reaction Pathway Calculations. The nudged elastic band (NEB) method,72 based on elastic band methods,73,74 can be used to find the minimumenergy path between two points. In contrast to the mapping of energy along a specified reaction coordinate (as described above), no distance restraints on specific atoms are applied. The reaction pathway is represented by a chain of structures (replicas or images). These images are connected with springs, forming an elastic band. In the NEB method, the forces acting on each image are projected to give a component perpendicular to the path and a component parallel to the path. Only the perpendicular component of the “true” force and the parallel component of the spring force act on the images. This ensures that the true force does not affect the spacing of the images (no sliding down of the images) and the spring force does not push the images out of the minimum-energy path (no corner cutting). This method has been implemented in the QoMMMa code (described above)75 following a similar approach to that described in ref 76. Here, we used the climbing image variant of the NEB method (CI-NEB), which prevents the sliding down of the image at the highest energy point (the approximate transition state).77 In this method, no spring force acts on the image with the highest energy, and the true force is applied with its component parallel to the path inverted. As a result, this image is pushed

11306

J. Phys. Chem. B, Vol. 114, No. 34, 2010

Van der Kamp et al.

TABLE 1: Barriers and Reaction Energies (in kcal mol-1) Obtained from QM/MM Profiles Calculated with Different QM Methods and CHARMM27 (based on B3LYP/ 6-31+G(d)-CHARMM27 optimized geometries) QM method for QM/MM energy

∆E‡

∆Ereact

∆E‡backa

B3LYP/6-31+G(d) B3LYP/aug-cc-pVDZ AM1b HF/6-31+G(d) HF/aug-cc-pVDZ BH&HLYP/6-31+G(d) BH&HLYP/aug-cc-pVDZ MP2/aug-cc-pVDZ SCS-MP2/aug-cc-pVDZ SCS-LMP2/aug-cc-pVDZ LCCSD(T0)/aug-cc-pVDZ

12.7 10.1 15.7 26.6 21.8 11.9 9.3 10.5 13.0 13.5 13.2

11.7 8.4 10.1 20.2 11.6 9.1 5.9 7.2 8.7 8.9 8.4

0.9 1.7 5.7 6.5 10.2 2.7 3.4 3.3 4.3 4.6 4.8

a Energy barrier for the reverse reaction going from the enolate intermediate to the substrate, the keto-form of acetyl-CoA. b AM1/CHARMM27 energies from AM1/CHARMM27-optimized geometries; from ref 47.

uphill along the path and should become (close to) a transitionstate structure. Images for CI-NEB calculations were selected from the potential energy profiles. For each geometry optimization method (B3LYP, BH&HLYP, and HF), 10 images were used. For the first and last images, which are held fixed during the CI-NEB optimization, the geometries of the minima on the reaction profiles were taken and subsequently optimized without a reaction coordinate restraint. Images 2-9 were taken from the energy profiles at reaction coordinate values of -0.7, -0.5, ..., 0.7 Å. After the optimization converged, QM/MM frequency calculations were performed for the first, last, and highest energy image. This allowed for the calculation of zero-point energy (ZPE) effects and confirmation of the maxima along the CINEB paths as transition-state structures (saddle points). Frequencies were computed for the QM atoms only in the fixed field of the MM atoms using the algorithm implemented in QoMMMa64 (code written by K. Senthilkumar et al.). Results and Discussion Testing QM Methods for Energy Calculations. An initial energy profile for acetyl-CoA deprotonation in CS was calcu-

lated using B3LYP/6-31+G(d)-CHARMM27 geometry optimization in an adiabatic mapping procedure. The QM region consisted of the methylthioester part of acetyl-CoA, the side chain of Asp375, and OAA. Using these B3LYP-CHARMM27optimized geometries, single-point QM/MM energy calculations were performed with a range of QM methods and basis sets (Table 1). Examples of profiles are shown in Figure 2. (For comparison, results from geometry optimization with AM1/ CHARMM27 from ref 47 are also given.) The B3LYP/6-31+G(d)-CHARMM27 profile shows a very shallow enolate minimum (Figure 2). This is significantly different for the MP2/aug-cc-pVDZ-CHARMM27 and SCSMP2/aug-cc-pVDZ-CHARMM27 profiles calculated using the B3LYP/6-31+G(d)-CHARMM27-optimized geometries. SCSMP2 in particular is expected to be quite accurate for reaction energies.69 To further assess the accuracy of the SCS-MP2CHARMM27 energies compared with other methods, coupled cluster calculations were performed (using a localized correlation treatment). It can be seen that the resulting profile at the LCCSD(T0)-CHARMM27 level is very similar to the SCSMP2-CHARMM27 profile. This gives a further indication that SCS-MP2 is the best QM method of affordable computational expense to describe the proton abstraction from acetyl-CoA. In contrast, the MP2-CHARMM27 profile seems to underestimate both the activation and reaction energy considerably, but not to the same extent; the activation energy is underestimated more compared to LCCSD(T0)-CHARMM27. Such behavior has been previously observed for QM/MM calculations of other enzyme reactions.13 It has to be noted that the double-ζ basis set used here can lead to inaccuracies in correlated ab initio energetics56 As shown below, however, calculations with larger basis sets show that the LCCSD(T0)/aug-cc-pVDZ QM/MM energies are reasonably close to the basis set limit. For certain applications of QM/MM calculations, such as QM/ MM molecular dynamics employed to obtain free-energy profiles of a reaction by calculating the potential of mean force along a reaction coordinate, ab initio or DFT methods are often still too computationally demanding. In such cases, semiempirical methods are often used. In this light, the performance of AM1 as a QM method for the reaction under investigation is encouraging; the AM1/CHARMM27 profile and energies are quite similar to the highest-level results presented here

Figure 2. QM/MM potential energy profiles for acetyl-CoA deprotonation with different levels of QM theory. In all cases, geometries were obtained using B3LYP/6-31+G(d)-CHARMM27 optimization along the reaction coordinate (adiabatic mapping), and CHARMM27 was used for the MM component. QM/MM energies in kcal mol-1 are relative to the lowest point in the profile, the substrate minimum. (A) Energies calculated with HF and DFT functionals B3LYP and BH&HLYP for the QM region, employing the 6-31+G(d) basis set. QM/MM profiles with AM1 (based on AM1/CHARMM27 geometry optimization) and SCS-MP2/aug-cc-pVDZ are shown for comparison. (B) Energies calculated with B3LYP and BH&HLYP and MP2, SCS-MP2, and LCCSD(T0), all with the aug-cc-pVDZ (aVDZ) basis set. Note that (A) and (B) have a different scale on the y (energy) axis.

Testing QM/MM Methods for Enzyme Reactions

J. Phys. Chem. B, Vol. 114, No. 34, 2010 11307

TABLE 2: Basis Set Effects on B3LYP/CHARMM27 QM/MM Energies (in kcal mol-1) basis set

∆E‡

∆Ereact

∆E‡backa

rc value substrate

rc value enolate

6-31Gd 6-31G(d,p) 6-31+Gd 6-31+G(d,p) 6-31++G(d,p) cc-pVDZ aug-cc-pVDZ

10.3 8.3 12.7 10.5 10.4 7.8 10.1

9.8 7.2 11.7 9.0 9.0 7.3 8.4

0.6 1.0 0.9 1.5 1.4 0.5 1.7

-0.8 Å -0.8 Å -0.9 Å -0.9 Å -0.9 Å -0.8 Å -0.9 Å

0.7 Å 0.7 Å 0.8 Å 0.9 Å 0.9 Å 0.7 Å 0.9 Å

a Energy barrier for the reverse reaction going from the enolate intermediate to the substrate, the keto-form of acetyl-CoA.

(LCCSD(T0)/aug-cc-pVDZ-CHARMM). As expected, AM1 overestimates the barrier for reaction somewhat, but it does predict a stable enolate intermediate. This might be due to a cancellation of errors, such as the known tendency of AM1 to favor protonated carboxylate groups too strongly, that is, it overestimates the proton affinity of carboxylate groups.46,78-80 Although the barrier to form the enolate is similar when using SCS-MP2/aug-cc-pVDZ or B3LYP/6-31+G(d) for QM/MM energies, the expected enolate minimum is almost nonexistent with B3LYP. HF/6-31+G(d)-CHARMM27 single-point energies overestimate the activation and reaction energy drastically (illustrating the known importance of including electron correlation, see, e.g., refs 13 and 48), but the enolate energy is significantly lower than that of the approximate transition state. BH&HLYP/6-31+G(d) is a hybrid DFT method that is capable of describing the reactions in citrate synthase relatively well (compared to many other (hybrid) DFT methods).22 When this method is used for single-point QM/MM calculations, a profile more similar to the (SCS-)MP2/aug-cc-pVDZ-CHARMM27 profiles is obtained, although the barrier is somewhat lower. B3LYP energies calculated with a aug-cc-pVDZ basis set instead of 6-31+G(d) show a significantly lower reaction energy with a slightly more stable enolate (see Table 1). The energy profile is overall lower than that with the 6-31+G(d) basis set, with an approximate activation energy of 10.1 kcal mol-1 instead of 12.7 kcal mol-1. More stable enolate energies and lower activation energies are also calculated for HF and BH&HLYP with aug-cc-pVDZ instead of 6-31+G(d). This prompted us to investigate further the effect of basis set choice on the B3LYPCHARMM27 energies. Single-point QM/MM calculations were carried out with the following basis sets (in addition to 6-31+G(d)andaug-cc-pVDZ):6-31G(d),6-31G(d,p),6-31+G(d,p), 6-31++G(d,p), and cc-pVDZ (Table 2). Some trends can be extracted from the results. First of all, it is evident that addition of p-polarization functions for hydrogen atoms significantly decreases the activation and reaction energy; for example, compare results with the 6-31G(d) and 6-31G(d,p) basis sets. Second, addition of diffuse functions on non-hydrogen (heavy) atoms however increases these energies (e.g., compare 6-31G(d,p) and 6-31+G(d,p) basis sets). Due to the fact that the 6-31+G(d) basis set (as used for geometry optimization) only has the energy increasing effect from the diffuse functions used on non-hydrogen atoms, the QM/MM reaction barrier is similar to the SCS-MP2/aug-cc-pVDZ-CHARMM27 barrier. Differences between the results from the similar Pople and Dunning type basis sets (6-31G(dp) and cc-pVDZ as well as 6-31++G(dp) and aug-cc-pVDZ) are small, as expected, but still exist. It must further be noted that only with basis sets that include polarization functions on all atoms and diffuse functions on (at least) the non-hydrogen atoms is a reasonable location

of the enolate minimum predicted. (Adding diffuse functions on hydrogen atoms does not make a significant difference.) These basis sets also give the largest difference in energy between the enolate minimum and the approximate transitionstate maximum, although this difference is still significantly smaller than values obtained from QM/MM energies calculated using the higher-level ab initio methods SCS-MP2/aug-cc-pVDZ and LCCSD(T0)/aug-cc-pVDZ. Whatever the basis set used, B3LYP consistently underestimates the depth of the enolate minimum on the energy profile and therefore also the expected lifetime of this intermediate. Effect of the Size of the QM Region. The minimal QM region in a QM/MM study of an enzyme reaction consists of the reacting species, that is, the parts of the substrate and enzyme where bond making and breaking take place. In the case of acetyl-CoA deprotonation in citrate synthase, this is the thioester part of acetyl-CoA and the Asp375 side chain, referred to as QM region 1 below. Here, the thioester part of acetyl-CoA is effectively modeled as methylthioacetate. It may be desirable to include more atoms (other substrates or active site residues) in the QM region for a more accurate description of important interactions during the reaction (and thereby more accurate reaction energetics). In previous high-level QM/MM studies of acetyl-CoA deprotonation in citrate synthase,43,48 the side chain of His274 and a conserved water molecule were included in the QM region (because they donate hydrogen bonds to the enolate oxygen). In another study, it was shown that OAA can also influence the enolate stability and can therefore also be important to include in the QM region.48 We have tested the influence of enlarging the “minimal” QM region by including either OAA (QM region 2) or the His274 side chain and/or the conserved water molecule (Wat1 in Figure 3; QM region 3) or both (QM region 4). When the dianionic OAA is included, the total charge of the QM region is -3 au (including -1 au formally contributed by Asp375). The large negative charge of the QM region might be considered a potential problem. To test this, we also tested including the side chains of three arginine residues that are positively charged and interact strongly with OAA; there were Arg329, Arg401, and Arg412′. This made the total charge for this largest QM region (QM region 5) equal to 0. Both the barrier and the reaction energy are significantly higher for the minimal QM region 1 compared to all other QM regions. Addition of OAA to the QM region (region 2), as used throughout this work, results in significantly lower activation and reaction energies along the reaction coordinate. Addition of the His274 side chain and the conserved water molecule that donate hydrogen bonds to the enolate oxygen in the minimal QM region (region 3) has approximately the same effect on the reaction profile as addition of OAA (although the reaction energy is slightly higher). Combining QM regions 2 and 3 by including both OAA and the His274 side chain and conserved water (region 4) has a further significant lowering effect on the potential energy profile, with the barrier about 1.2 kcal mol-1 lower and the reaction energy 2.1-2.6 kcal mol-1 lower. Furthermore, the highest energy point along the profile (the approximate transition state) is located at a reaction coordinate value of 0.2 Å, whereas for QM regions 1, 2, and 3, this is 0.3 Å. This means that the approximate TS is slightly earlier, as expected for a less endothermic step. The location of the enolate minimum has also shifted; for QM regions 2 and 3, this minimum is located at a reaction coordinate value of 1.0 Å, whereas this is 1.2 Å for QM region 4. These differences in the potential energy profile are related; a lower energy for the

11308

J. Phys. Chem. B, Vol. 114, No. 34, 2010

Van der Kamp et al. TABLE 3: Sizes of QM Regions and Corresponding QM/MM (in kcal mol-1) Activation and Reaction Energies at the SCS-MP2/aug-cc-pVDZ//B3LYP/ 6-31+G(d)-CHARMM27 Level region

number of QM atomsa

∆E‡

∆Ereact

1 2 3 4 5

16 (2) 27 (2) 30 (3) 41 (3) 63 (5)

16.2 13.0 13.1 11.8 13.2

13.6 8.7 9.2 6.6 9.3

a

Figure 3. (A) The active site of citrate synthase, showing the different QM regions tested here. Wavy lines indicate covalent bonds between the QM and MM regions that were modeled using link atoms. (B) QM/ MM potential energy profiles with different QM regions as shown in (A). The energy profiles shown are from single-point SCS-MP2/augcc-p-CHARMM27 calculations on B3LYP/6-31+G(d)-CHARMM27optimized geometries.

enolate intermediate coincides with an approximate transition state further away from this intermediate. A calculation with the large QM region (5) was performed to investigate whether the charge of the QM region influences the potential energy profile. It can be compared to QM region 2, the minimal QM region plus OAA, which has a charge of -3 au. As can been seen in Figure 3B and Table 3, there is little difference between the profiles of QM regions 2 and 5. The large negative charge in the QM region seems to have little effect, and inclusion of the arginine side chains to ensure neutrality of the QM region is therefore not necessary. The ideal choice of QM region in a QM/MM calculation is one in which further enlargement of the QM region at the expense of the MM region does not cause significant differences in calculated properties.13 In this study, this is not the case; the minimal QM region clearly overestimates the activation and reaction energies. QM region 2 (used throughout this work) and QM region 3 (used previously43,48) give an improvement, but further enlargement (QM region 4) affects the activation and

Number of QM/MM link atoms shown in parentheses.

reaction energies to some extent. Inclusion of both oxaloacetate and the groups that donate hydrogen bonds to the enolate oxygen (His274 and Wat1) is therefore shown to be important. The differences between the results obtained with different QM regions are likely to be largely due to the limitation of fixed MM point charges; the citrate synthase active site is highly polar, which may make this limitation particularly obvious. It can therefore be recommended to test the inclusion in the QM region of all groups that interact directly with charged species that are involved in the enzyme reaction studied, but this may not always be feasible. Also, for other, more complex, enzyme reactions, larger QM regions than those employed here may be needed for convergence of calculated properties. Although the effect of the QM region on the energies along the profile is significant, no significant structural effects were detected. All important distances, such as the hydrogen bonds donated by His274 and the conserved water molecule to the enolate oxygen, vary very little along the profile, and no trend is observed regarding the QM or MM treatment of certain residues. Reaction Profiles Calculated with Different QM Methods for Geometry Optimization. Potential energy profiles for the acetyl-CoA deprotonation in CS were calculated using QM/ MM(CHARMM27) with three different QM methods for geometry optimization (HF, B3LYP, and BH&HLYP) and five different starting conformations, obtained from QM/MM molecular dynamics simulations of the transition-state complex. The basis set used was 6-31+G(d). For all profiles, single-point energies were also calculated for the optimized structures along the pathway at the SCS-MP2/aug-cc-pVDZ-CHARMM27 level. All of the calculated energy profiles show a fairly large dependence on the initial enzyme-reactant conformation (Figure 4 and Table 4). The relative reaction energies as extracted from the three series of single-point SCS-MP2-CHARMM27 energy profiles differ in some cases by more than 3 kcal mol-1 (compare profiles 1 and 4 in Table 4). Comparing the B3LYP-CHARMM27 and SCS-MP2//B3LYPCHARMM27 results shows that the activation energies are quite similar for the two methods. The enolate energy, however, is consistently overestimated by B3LYP by about 3 kcal mol-1. This overestimation results in a very shallow enolate minimum that is shifted toward the transition state (rc ) 0.8 Å instead of rc ) 1.0 Å in the SCS-MP2//B3LYP-CHARMM27 profile). The HF-CHARMM27 profiles show a large overestimation of the activation and reaction energies (which is expected because of the lack of electron correlation in this method), although all profiles do predict an enolate that is significantly lower than the approximate transition state. SCS-MP2//HF-CHARMM27 calculations, however, do not give smooth profiles in all cases. This indicates that the geometry optimization has led to a discontinuous reaction path, as discussed below. BH&HLYPCHARMM27 profiles show a similar enolate minimum (both

Testing QM/MM Methods for Enzyme Reactions

J. Phys. Chem. B, Vol. 114, No. 34, 2010 11309

Figure 4. QM/MM potential energy profiles obtained with different QM methods, for five different enzyme/reactant complexes. Individual profiles are shown in gray, and the average profile (arithmetic mean) is shown in black, with bars indicating the standard deviation. On the left are energy profiles with the QM method used for optimization (B3LYP, HF, or BH&HLYP with a 6-31+G(d) basis). On the right are energy profiles obtained by single-point QM/MM calculations at the SCS-MP2/aug-cc-pVDZ level, using the same optimized geometries as those in the plots on the left.

TABLE 4: Barriers and Reaction Energies (in kcal mol-1) Obtained from the Profiles Shown in Figure 4 B3LYP ∆E 1 2 3 4 5 av.a B.av.b SDc a



12.7 13.1 13.9 14.8 12.5 13.3 13.0 0.98

SCS-MP2// B3LYP ‡

∆Ereact

∆E

11.7 12.7 13.5 14.7 12.4 13.0 12.4 1.10

13.0 12.9 13.7 14.4 12.2 13.2 12.9 0.78

HF ‡

∆Ereact

∆E

8.7 9.7 10.7 11.9 9.3 10.1 9.4 1.31

24.0 23.8 25.5 23.1 22.3 23.7 23.1 1.19

SCS-MP2/HF ‡

∆Ereact

∆E

15.2 16.1 18.8 15.6 15.3 16.2 15.6 1.47

13.1 15.0 14.8 12.8 12.0 13.3 12.8 1.19

BH&HLYP ‡

∆Ereact

∆E

9.0 12.4 12.0 9.7 9.5 10.4 9.6 1.52

11.9 11.8 13.7 13.5 11.0 12.3 11.7 1.17

SCS-MP2//BH&HLYP

∆Ereact

∆E‡

∆Ereact

9.4 10.3 12.3 12.7 9.7 10.9 10.0 1.50

13.4 13.0 14.8 14.8 12.4 13.6 13.1 1.12

9.0 9.9 11.7 12.2 9.4 10.4 9.6 1.43

Arithmetic mean. b Boltzmann-weighted average81 at a temperature of 300 K. Boltzmann weights are reported in the Supporting Information, Table S2. c Standard deviation.

11310

J. Phys. Chem. B, Vol. 114, No. 34, 2010

the location of the minimum and the value of the reaction energy) as the single-point SCS-MP2 calculations. This is a clear improvement over the B3LYP-CHARMM27 profiles. The reaction energy, however, is somewhat underestimated. Further, the SCS-MP2//BH&HLYP-CHARMM27 profiles are less smooth than the SCS-MP2//B3LYP-CHARMM27 profiles, and the variation between the individual profiles is greater. Both HF and BH&HLYP outperform B3LYP in terms of the location of the (approximate) transition state; it is at the same value of the reaction coordinate (0.3 Å) as the SCS-MP2 profiles, whereas for the B3LYP profiles, it is shifted further (0.5 Å). From these profiles, the following conclusions can be drawn: when full profiles with single-point SCS-MP2 energies are calculated, B3LYP is the preferred method of geometry optimization for this reaction. When only the reactant and enolate minima and the transition state are optimized (as in refs 43, 48, and 49), BH&HLYP (or even HF) should be preferred because B3LYP will get the positions of the transition state and enolate wrong (or even fail to predict an enolate minimum). Next, even when geometry optimization leads to an energy profile that is apparently smooth, it is possible that the underlying set of structures corresponds to a discontinuous series, with shifts in atoms not directly involved in the definition of the reaction coordinate. This occurrence can of course often be detected during geometry optimization but, in this case, is highlighted only when different methods are used for geometry optimization and energy calculation. At the first level of theory, the structural discontinuity does not lead to an energetic discontinuity, but at the second level, it does. On the basis of this study, it appears that this happens more frequently when using QM/MM-optimized geometries obtained with methods that do not include electron correlation (such as HF) for singlepoint calculations with correlated methods. This may be a purely accidental observation but may also be due to a greater tendency for structural discontinuity during optimization of a HF reaction path in turn due to the greater magnitude of the component of the gradient along the reaction path, with the HF method leading to a less smooth optimization. If no single-point calculations with high-level ab initio methods (e.g., MP2, coupled cluster) are performed, BH&HLYP gives the best results (of the three methods tested here) for the deprotonation of acetyl-CoA reaction in citrate synthase. Finally, the difference in energy profiles calculated using different enzyme-reactant conformations can be significant. It is therefore recommended to calculate a large number of profiles when one wants to relate potential energy barriers to experimental activation enthalpies (see also refs 13, 14, and 81). It should also be noted that for some enzyme-catalyzed reactions, for example, involving large changes in solvation, an adiabatic mapping approach may not be applicable.82 Effects of Basis Set Size for Ab Intio QM/MM Energies. To investigate the importance of basis set size, single-point SCSMP2 calculations on the key structures from the SCS-MP2// B3LYP-CHARMM27 profiles (Figure 4) were performed with the larger triple- and quadruple-ζ basis sets (aug-cc-pVTZ and aug-cc-pVQZ). Increasing the basis set size has only a small effect on QM/MM SCS-MP2-CHARMM27 energies, especially for the barrier (Table 5; the same is true for MP2 and HF). The reaction energy becomes a little lower with larger basis sets. When compared to the differences between results for different enzyme-reactant conformations, the influence of basis set size is not significant. It therefore seems better, at least for this enzyme, to invest in calculating more profiles (thereby improving statistics) than in extending the basis set at this level of

Van der Kamp et al. TABLE 5: Barriers and Reaction Energies (in kcal mol-1) at the SCS-MP2-CHARMM27 QM/MM Level with Different Basis Sets aug-cc-pVDZ ‡

profile

∆E

1 2 3 4 5 av.a B.av.b SDc

13.0 12.9 13.7 14.4 12.2 13.2 12.9 0.78

aug-cc-pVTZ ‡

∆Ereact

∆E

8.7 9.7 10.7 11.9 9.3 10.1 9.4 1.31

12.9 12.8 13.7 14.1 11.9 13.1 12.6 0.86

aug-cc-pVQZ

∆Ereact

∆E‡

∆Ereact

8.1 9.1 10.4 11.6 8.7 9.6 8.8 1.37

13.0 12.9 13.8 14.1 11.9 13.1 12.7 0.86

8.1 8.9 10.3 11.4 8.6 9.5 8.7 1.35

a Arithmetic mean. b Boltzmann-weighted average81 at a temperature of 300 K. Boltzmann weights are reported in the Supporting Information, Table S3. c Standard deviation.

Figure 5. LCCSD(T0) QM/MM potential energy profiles with different basis sets for the QM region. Geometries (optimized at the B3LYP/ 6-31+G(d)-CHARMM27 level) for these single-point calculations were the same as those for the profiles shown in Figure 2B (profile 1 in Table 4). The energies labeled LCCSD(T0)/‘CBS’ are estimates for a complete basis set, obtained by extrapolating the correlation energies from the calculations with aug-cc-pVDZ and aug-cc-pVTZ basis sets.83

QM/MM theory. The fact that sometimes quite large effects on energetics can be observed upon increasing the basis set in correlated ab initio methods is well-known.56 It is unclear why the present SCS-MP2 results show less sensitivity than the LCCSD(T0) results; this may also reflect the fact that the latter treat correlation in a local way whereas the former use canonical orbitals. In Figure 5, QM/MM profiles with LCCSD(T0) as the QM method are shown, calculated for the B3LYP-CHARMM27optimized profile 1. There is a significant effect on the energy profile when the basis set size is increased to aug-cc-pVTZ. Both the activation and reaction energies are shifted downward by 1.5-1.6 kcal mol-1. The contribution of this difference from the triples correction is significant (0.3 and 0.5 kcal mol-1 for the activation and reaction energy, respectively), indicating the importance of including the triples correction with a larger basis set also. A further increase in basis set size might lead to even lower energies, but calculations with the aug-cc-pVQZ basis set or larger were not computationally feasible. It is possible, however, to approximate the correlation energy (and thereby the total energy) in the complete basis set (CBS) limit by extrapolating the results obtained with double and triple-ζ basis sets, as proposed by Helgaker et al.83 A profile to which this approximation is applied is also shown in Figure 5 and shows that another small shift downward for the reaction energies can be expected. The activation energy extracted from this profile is 11.0 kcal mol-1, and the reaction energy is 6.8 kcal mol-1. It should also be noted that the location of the maximum (the

Testing QM/MM Methods for Enzyme Reactions

J. Phys. Chem. B, Vol. 114, No. 34, 2010 11311

Figure 6. QM/MM potential energy profiles obtained by climbing image nudged elastic band optimization from initial geometries obtained using adiabatic mapping. The energies and location along the reaction coordinate of the starting images are depicted with open diamonds and those of the final images with closed diamonds (see text for discussion).

approximate transition state) and the enolate intermediate minimum in the profiles obtained with the aug-cc-pVTZ basis set and the extrapolated results are both slightly shifted to the reactant side (maximum at rc ) 0.2 Å; minimum at rc ) 1.0 Å) compared to the profile obtained with the aug-cc-pVDZ basis (rc ) 0.3 and 1.1 Å, respectively). The large effect of increasing the basis set for coupled cluster calculations might be expected as this has also been shown for a series of small-molecule reaction energies.56 It is nevertheless important to reiterate that (activation and reaction) energies obtained using coupled cluster methods only become accurate with at least a triple-ζ basis set (including diffuse functions if necessary). Climbing Image Nudged Elastic Band Reaction Pathway Calculations. Another potential source of error is the discrepancy between the reaction coordinate representation and the true minimum-energy path. To test how well the reaction coordinate approach that is used above to model the deprotonation reaction represents the real minimum-energy path, climbing image nudged elastic band (CI-NEB)77 calculations were carried out. This method is capable of optimizing a reaction pathway between two end points and finding a saddle point between them. This saddle point should be (close to) the transition-state structure of the reaction. We performed CI-NEB QM/MM calculations using the same three QM methods for geometry optimization as above, B3LYP/6-31+G(d), HF/6-31+G(d), and BH&HLYP/6-31+G(d). For HF/6-31+G(d), the CI-NEB calculation did not converge to a minimum-energy pathway within a reasonable number of cycles. Results for the BH&HLYP/631+G(d)-CHARMM27 CI-NEB pathway are shown in Figure 6, together with single-point SCS-MP2/aug-cc-pVDZ-CHARMM27 calculations on the optimized geometries. The reaction pathway obtained from the B3LYP/6-31+G(d)CHARMM27 CI-NEB optimization, although converged in terms of gradients (according to typical criteria), was not correct. Due to the fact that B3LYP/6-31+G(d)-CHARMM27 fails to describe and locate the enolate minimum, the images along the starting profile (from adiabatic mapping along the reaction coordinate) shifted away from their starting positions completely. Several images, including the climbing image, ended up near or past the reaction coordinate values of the end points. The main reason for this is the overestimation of the enolate (or reaction) energy relative to the approximate transition-state (or activation) energy. These results therefore emphasize the point discussed above; it is crucial to have a QM/MM method that

describes maximum and minimum energies and locations along the path correctly. The BH&HLYP/6-31+G(d)-CHARMM27 CI-NEB calculation does result in a correct reaction pathway. The final converged path is very close to the profile obtained by mapping along the reaction coordinate. Several images have shifted down toward the end points in relation to their starting positions, but the overall activation barrier has not changed significantly, both for BH&HLYP/6-31+G(d)-CHARMM27 and SCS-MP2/augcc-pVDZ-CHARMM27 energies. Furthermore, the climbing image, that is, the saddle point or transition state, is located at the same reaction coordinate value as the highest point in the profile obtained by mapping along the reaction coordinate. This indicates that the approximate transition state obtained when using a reaction coordinate is very close to the “real” transition state. The well-established43,47,84 reaction coordinate for proton transfer is found to be very accurate, and following this reaction coordinate therefore gives a valid description of this protontransfer reaction. To verify the nature of the transition state found by the CI-NEB calculation, a QM/MM (BH&HLYP/631+G(d)-CHARMM) frequency calculation was carried out on the climbing image. This revealed one imaginary frequency of 1379.9 cm-1. As expected, the eigenvector corresponding to this frequency is largely composed of the proton-transfer motion. QM/MM frequency calculations were also performed on the two end points (yielding no imaginary frequencies). Using the ZPEs obtained from these calculations (scaled by 0.96; see Supporting Information), the zero-point effects (ZPEs relative to the reactant minimum) on the activation and reaction energy are -2.9 and -0.3 kcal mol-1, respectively, which are similar to the values obtained from small-model HF/6-31G(d) calculations of -2.8 and -0.2 kcal mol-1.45 When these (BH&HLYP/ 6-31+G(d)-CHARMM QM/MM) zero-point effects are added to the SCS-MP2/aug-cc-pVDZ-CHARMM27 energies obtained from the CI-NEB pathway, the corrected potential energy barrier and reaction energy are 9.7 and 8.5 kcal mol-1 respectively, that is, there is a small barrier to the reverse reaction, indicating that the acetyl-CoA enolate is an intermediate, that is, it will have a finite lifetime (see below). Consequences for the Mechanism of the Citrate Synthase Catalyzed Reaction. The high-level results obtained here for acetyl-CoA deprotonation in CS can be used to make predictions of the reaction energetics in this enzyme. Assuming that the ZPE effect is approximately the same for all levels of theory and the effect of increasing the QM region is the same for SCS-

11312

J. Phys. Chem. B, Vol. 114, No. 34, 2010

MP2 and LCCSD(T0) treatment of the QM region, we can estimate reaction and activation enthalpies for enolate formation as follows: the activation enthalpy can be approximated by taking the LCCSD(T0)/‘CBS’-CHARMM27 energy of 11.0 kcal mol-1, subtracting 3.1 kcal mol-1 as the ZPE effect and 0.1 kcal mol-1 for the enthalpic temperature effect and a further 1.2 kcal mol-1 due to QM region expansion (from QM region 2 to 4; see Table 3), and arriving at a value of 6.6 kcal mol-1. Proton tunneling, although significant in some other enzymes,84,85 is not expected to play a large role in this reaction because the significant difference in energy between the reactant and enolate reduces the tunneling probability. The reaction enthalpy for enolate formation can be estimated as 4.5 kcal mol-1, with a ZPE and enthalpic temperature effect of 0.2 kcal mol-1 and an effect of QM region expansion of 2.1 kcal mol-1. These estimates do not take into account conformational variability of the enzyme-reactant complex, but it can be expected that this effect is fairly small, especially for the relative difference between the activation and reaction enthalpy. The enthalpy of the enolate is lower than the enthalpic barrier for the proton abstraction (and entropy may increase the barrier to the reverse reaction somewhat, for example, by ∼1.9 kcal/mol47), confirming that the enolate is an intermediate in the reaction. Despite the fact that evidence for the enolate intermediate was found in several independent computational studies (published up to 10 years ago),42,43,48,49 some popular biochemistry textbooks still show the reaction with an enol intermediate.50,86 Given the relatively small (enthalpy) minimum for the enolate, its lifetime is expected to be quite short. It is therefore a transient species that will be very difficult to observe experimentally. The activation enthalpy for enolate formation (∼6.8 kcal mol-1) is low compared to the (apparent) free-energy barrier for the overall reaction from experiment (14.7 kcal mol-1).27,44 Entropic effects are estimated to contribute up to 1.9 kcal mol-1 (from comparison of AM1/CHARMM27 potential and free-energy profiles),47 giving an estimated free energy of activation of about 8.7 kcal mol-1 for acetyl-CoA deprotonation in CS. Combined with previous results, we can use the findings here to estimate the energy profile for the formation of citryl-CoA in the enzyme. Previously, we have found that at the MP2CHARMM27 level, the potential energy difference between the enolate and the transition state for carbon-carbon condensation is 6.5 kcal mol-1.15 At the SCS-MP2-CHARMM27 level, this difference is larger, 9.4 kcal mol-1.22 When the reaction energy for enolate formation estimated above is added to this barrier for condensation (4.4 kcal mol-1), this gives an overall potential energy barrier of 10.9 (MP2) or 13.8 (SCS-MP2). To estimate a free energy of activation, zero-point vibrational (and thermal) effects as well as further entropy effects have to be considered. The former is not likely to have a significant contribution in nucleophilic attack.22 The latter is likely to increase the barrier as a carbon-carbon bond is formed to arrive at the large citrylCoA intermediate. This would bring the barrier for the overall condensation reaction to a value that is in line with the freeenergy barrier calculated from experiment of 14.7 kcal mol-1. It seems therefore likely that the condensation step (carboncarbon bond formation to form citryl-CoA), and not acetyl-CoA deprotonation, has the highest-energy transition state (and is thus the potential rate-limiting step) in the overall enzymatic reaction, although the final hydrolysis step (which probably also requires a conformational change of the enzyme)15,36 may have a similar barrier for mesophilic citrate synthase.87 Extremophilic citrate synthases are believed to follow the same mechanism,41,87,88 but the rate-limiting step may be different.32

Van der Kamp et al. Comparison to Other QM/MM Studies on Acetyl-CoA Deprotonation in Citrate Synthase. Citrate synthase has become an important model enzyme, being the subject of a number of computational studies. Here, we compare results for the acetyl-CoA deprotonation from previous ab initio (and DFT) QM/MM and two-level QM (QM/QM) calculations.43,48,49 In both of these previous studies, the reaction was described by three points only, the substrate, (approximate) transition state, and enolate. It is important, however, that the stationary points identified correspond to the same conformational state on the potential energy surface, which cannot be guaranteed when only a limited number of points is considered. Further, both in the work of Yang and Drueckhammer (Y&D)49 and in the work by Mulholland et al.,43,48 the high-level QM method used for geometry optimization was HF (with either a 6-31+G(d) or 6-31G(d) basis set). Single-point calculations were carried out up to the MP2/6-31+G(d) level. Here, we have shown that a 6-31+G(d) basis is probably too limited. These differences and limitations explain the quantitative differences between the results obtained previously and the best results here. When equivalent calculations are compared, however, similar results are obtained. In the work by Van der Kamp et al.48 (which includes the results from ref 43), a reaction energy for deprotonation at the HF/6-31G(d)-MM level (using united-atom CHARMm22 parameters for the MM region) of 15.9 kcal mol-1 was obtained. The previous calculations used a QM region equivalent to QM region 3 in Figure 3A, whereas we used QM region 2 in most calculations here, but it has been shown here that these QM regions give similar energy profiles. The reaction energy that they found is therefore very similar to our average HF/6-31+G(d)-CHARMM27 reaction energy for five profiles of 16.2 kcal mol-1. It is encouraging that similar results are found despite the use of different structures and MM force fields. The enolate minimum was also found at a similar place (rc ) 1.26 Å) as in our HF/6-31+G(d)-CHARMM27 profiles (rc ) 1.2 or 1.3 Å). Barriers cannot be directly compared as the previous work48 approximated the transition state by using a reaction coordinate value of 0 Å. At this reaction coordinate value, the energies are similar (18.9 kcal mol-1 versus our average energy of 19.4 kcal mol-1). The MP2/6-31+(d)corrected QM/MM energies that they report, however, do not compare well with the results obtained here; this could be due to the fact that the MP2 correction was applied to the QM region only, the limitations of the 6-31+G(d) basis set for this reaction and/or the fact that the geometry optimization with HF may not yield optimal geometries for correlated ab initio methods. Y&D49 used an ONIOM-type (subtractive) method89 to calculate (two level) QM energies for an active site model (atoms within 10 Å of the R-carbon were included). Geometry optimization was performed with HF/6-31+G(d) for a region equivalent to region 3, and the semiempirical method PM3 was used for the rest of the model. Although the transition state found has a slightly different position along the reaction coordinate (0.16 versus ∼0.3 Å in our study), the activation and reaction energies when using HF for the high-level region (22.9 and 16.9 kcal mol-1, respectively) are similar to the average values and within the structural variation, as reported in Table 4. The subsequent B3LYP and MP2 single-point energies that they performed, however, are again significantly different, in particular for the enolate intermediate, which may indicate that HFoptimized geometries are not optimal for other, correlated, QM methods. It is also possible that the model is too small to accurately describe the stabilization of the enolate in the enzyme.

Testing QM/MM Methods for Enzyme Reactions Some of these previous calculations49 may also have suffered from problems with geometry optimization. Comparison to High-Level QM/MM Studies of Other Enzymes. To our knowledge, (local) coupled cluster QM/MM methods have thus far only been reported for two other enzymes, chorismate mutase (CM) and p-hydroxybenzoate hydroxylase (PHBH).13,14 For both enzymes, several reaction profiles were calculated (16 for CM and 10 for PHBH). The average LCCSD(T0) QM/MM activation energies (activation enthalpies and free energies; see ref 13 for details), with the aug-cc-pVTZ basis set for oxygen atoms and cc-pVTZ for all other atoms, agree with experiment with close to chemical accuracy for both enzymes. The B3LYP and LMP2 results (with the same basis set) underestimate the activation energy by 2.5-5 kcal mol-1. SCS-LMP2 energies were reported for PHBH,14 giving, on average, the same result as LCCSD(T0). This was also the case for LCCSD(T0) energies where the triples correction was calculated with a smaller double-ζ basis set. Here, B3LYP and MP2 also underestimate the barrier compared to LCCSD(T0) and SCS-MP2, although this is only true with the aug-cc-pVDZ basis set. Due to the large effect of increasing basis set on LCCSD(T0) energies, the activation energies calculated by B3LYP and MP2 (with the aug-cc-pVDZ basis) are similar to the estimated LCCSD(T0) complete basis set result (10.13 and 10.51 versus 11.00 kcal mol-1). It is likely that this is due to a cancellation of errors. In general, SCS-MP2 is likely to give results comparable to LCCSD(T0). From this work, however, it is clear that it is important to test the effect of increasing basis set for each method used as the effect can differ markedly. High-level QM/MM reaction modeling for other enzymes has mostly been performed with density functional theory, in particular, with the hybrid DFT functional B3LYP.2,8,9 This functional has been shown to work well for many enzymes,12 even when transition metals are involved (such as in cytochrome P450 enzymes81,90). It is however possible that large basis set effects can occur,81 as also shown here.

J. Phys. Chem. B, Vol. 114, No. 34, 2010 11313 to model some enzyme-catalyzed reactions, including at least the initial steps of the reaction in citrate synthase. The exact method used for geometry optimization is not necessarily very important (as long as established methods are used with reasonable basis sets). Optimization with Hartree-Fock QM/ MM methods, however, can lead to suboptimal structures for QM methods that include correlation. Hybrid DFT functionals such as B3LYP are preferable. For reaction pathway calculations applying methods such as CI-NEB, one should be very careful that the method for optimization predicts the shape of the potential energy surface (in particular, the location of the maxima and minima) reasonably well. Overall, it should be stressed that the testing of both the QM method and the basis set used in QM/MM energy calculations is important in order to get reliable results. High-level correlated ab initio QM/MM calculations will have an important role in testing more approximate methods and will be useful in highlighting potential limitations of current density functional theory based methods for some enzyme reactions. Acknowledgment. M.W.v.d.K. was funded by the BRISENZ Marie Curie Early Stage Training Centre and an IBM Ph.D. Fellowship; he and A.J.M. thank the European Commission and IBM, respectively, for this support. A.J.M., J.N.H., and F.R.M. thank EPSRC for support. A.J.M. and J.Z. also thank Vernalis plc. A.J.M. is an EPSRC Leadership Fellow. F.R.M. was a Royal Society University Research Fellow at the time that most of this research was performed. We thank Dr. Ricardo Mata for help with the LCCSD(T0) calculations. Supporting Information Available: Tests of local correlation treatment, Boltzmann weights for individual profiles in Tables 4 and 5, and results of QM/MM frequency calculations on profiles shown in Figure 4. This material is available free of charge via the Internet at http://pubs.acs.org. References and Notes

Conclusions The extensive testing of high-level QM/MM calculations on citrate synthase in the present study provides important lessons for the application of such methods to other enzymes. Different QM methods can result in significantly different energy profiles for a reaction. It is therefore important to test established methods of moderate computational expense, such as the hybrid DFT functional B3LYP, against more expensive correlated ab initio methods. Furthermore, the basis set can also have a significant influence on the reaction profile obtained. For acetylCoA deprotonation in CS, diffuse functions on heavy atoms and polarization functions on both heavy and hydrogen atoms have been shown here to be important. The magnitude of the basis set effect depends on the method used. The definition of the QM region can also have a significant influence on QM/ MM calculations. When the goal of a QM/MM study is to predict reaction energetics accurately, different QM region sizes should thus be tested. A QM region with net (positive or negative) charge does not necessarily cause a problem. As shown in this study and elsewhere,25 the starting structure of an enzyme-reactant complex can have a significant influence on the calculated energy profile. If extensive conformational sampling is not possible (e.g., due to the computational expense of the QM method), several profiles should be calculated from different starting structures (e.g., from molecular dynamics simulations13,47,81). With these caveats, adiabatic mapping (reaction coordinate driving) approaches can be used successfully

(1) Garcia-Viloca, M.; Gao, J.; Karplus, M.; Truhlar, D. G. Science 2004, 303, 186–195. (2) Lonsdale, R.; Ranaghan, K. E.; Mulholland, A. J. Chem. Commun. 2010, 46, 2354–2372. (3) Warshel, A.; Sharma, P. K.; Kato, M.; Xiang, Y.; Liu, H. B.; Olsson, M. H. M. Chem. ReV. 2006, 106, 3210–3235. (4) Alexandrova, A. N.; Rothlisberger, D.; Baker, D.; Jorgensen, W. L. J. Am. Chem. Soc. 2008, 130, 15907–15915. (5) Lodola, A.; Mor, M.; Rivara, S.; Christov, C.; Tarzia, G.; Piomelli, D.; Mulholland, A. J. Chem. Commun. 2008, 214–216. (6) Mulholland, A. J. Drug DiscoVery Today 2005, 10, 1393–1402. (7) Mulholland, A. J. Biochem. Soc. Trans. 2008, 36, 22–26. (8) Senn, H. M.; Thiel, W. Top. Curr. Chem. 2007, 268, 173. (9) Van der Kamp, M. W.; Mulholland, A. J. Nat. Prod. Rep. 2008, 25, 1001–1014. (10) Field, M. J.; Bash, P. A.; Karplus, M. J. Comput. Chem. 1990, 11, 700–733. (11) Warshel, A.; Levitt, M. J. Mol. Biol. 1976, 103, 227–249. (12) Friesner, R. A.; Guallar, V. Annu. ReV. Phys. Chem. 2005, 56, 389– 427. (13) Claeyssens, F.; Harvey, J. N.; Manby, F. R.; Mata, R. A.; Mulholland, A. J.; Ranaghan, K. E.; Schu¨tz, M.; Thiel, S.; Thiel, W.; Werner, H. J. Angew. Chem., Int. Ed. 2006, 45, 6856–6859. (14) Mata, R. A.; Werner, H. J.; Thiel, S.; Thiel, W. J. Chem. Phys. 2008, 128. (15) Van der Kamp, M. W.; Perruccio, F.; Mulholland, A. J. Chem. Commun. 2008, 1874–1876. (16) Bowman, A. L.; Grant, I. M.; Mulholland, A. J. Chem. Commun. 2008, 4425–4427. (17) Z˙urek, J.; Foloppe, N.; Harvey, J. N.; Mulholland, A. J. Org. Biomol. Chem. 2006, 4, 3931–3937. (18) Hermann, J. C.; Pradon, J.; Harvey, J. N.; Mulholland, A. J. J. Phys. Chem. A 2009, 113, 11984–11994. (19) Schu¨tz, M.; Manby, F. R. Phys. Chem. Chem. Phys. 2003, 5, 3349– 3358.

11314

J. Phys. Chem. B, Vol. 114, No. 34, 2010

(20) Mulholland, A. J. Chem. Cent. J. 2007, 1, 19. (21) Sousa, S. F.; Fernandes, P. A.; Ramos, M. J. J. Phys. Chem. A 2007, 111, 10439–10452. (22) Van der Kamp, M. W. Modelling reactions and dynamics of Claisen enzymes. Ph.D. Thesis, University of Bristol, 2008. (23) Rodriguez, A.; Oliva, C.; Gonzalez, M.; Van der Kamp, M.; Mulholland, A. J. J. Phys. Chem. B 2007, 111, 12909–12915. (24) Lennartz, C.; Schafer, A.; Terstegen, F.; Thiel, W. J. Phys. Chem. B 2002, 106, 1758–1767. (25) Lodola, A.; Mor, M.; Z˙urek, J.; Tarzia, G.; Piomelli, D.; Harvey, J. N.; Mulholland, A. J. Biophys. J. 2007, 92, L20–L22. (26) Ranaghan, K. E.; Ridder, L.; Szefczyk, B.; Sokalski, W. A.; Hermann, J. C.; Mulholland, A. J. Mol. Phys. 2003, 101, 2695–2714. (27) Alter, G. M.; Casazza, J. P.; Wang, Z.; Nemeth, P.; Srere, P. A.; Evans, C. T. Biochemistry 1990, 29, 7557–7563. (28) Evans, C. T.; Kurz, L. C.; Remington, S. J.; Srere, P. A. Biochemistry 1996, 35, 10661–10672. (29) Karpusas, M.; Branchaud, B.; Remington, S. J. Biochemistry 1990, 29, 2213–2219. (30) Karpusas, M.; Holland, D.; Remington, S. J. Biochemistry 1991, 30, 6024–6031. (31) Kurz, L. C.; Ackerman, J. J. H.; Drysdale, G. R. Biochemistry 1985, 24, 452–457. (32) Kurz, L. C.; Constantine, C. Z.; Jiang, H.; Kappock, T. J. Biochemistry 2009, 48, 7878–7891. (33) Kurz, L. C.; Drysdale, G. R. Biochemistry 1987, 26, 2623–2627. (34) Kurz, L. C.; Fite, B.; Jean, J.; Park, J.; Erpelding, T.; Callis, P. Biochemistry 2005, 44, 1394–1413. (35) Lo¨hlein-Werhahn, G.; Bayer, E.; Bauer, B.; Eggerer, H. Eur. J. Biochem. 1983, 133, 665–672. (36) Pettersson, G.; Lill, U.; Eggerer, H. Eur. J. Biochem. 1989, 182, 119–124. (37) Remington, S. J. Curr. Top. Cell. Regul. 1992, 33, 209–229. (38) Srere, P. A. J. Biol. Chem. 1966, 241, 2157–2165. (39) Srere, P. A. AdV. Enzymol. Relat. Areas Mol. Biol. 1975, 43, 57– 101. (40) Wlassics, I. D.; Anderson, V. E. Biochemistry 1989, 28, 1627– 1633. (41) Bjelic, S.; Brandsdal, B. O.; Aqvist, J. Biochemistry 2008, 47, 10049–10057. (42) Donini, O.; Darden, T.; Kollman, P. A. J. Am. Chem. Soc. 2000, 122, 12270–12280. (43) Mulholland, A. J.; Lyne, P. D.; Karplus, M. J. Am. Chem. Soc. 2000, 122, 534–535. (44) Mulholland, A. J.; Richards, W. G. Proteins: Struct., Funct., Genet. 1997, 27, 9–25. (45) Mulholland, A. J.; Richards, W. G. J. Phys. Chem. B 1998, 102, 6635–6646. (46) Mulholland, A. J.; Richards, W. G. J. Mol. Struct.: THEOCHEM 1998, 427, 175–184. (47) Van der Kamp, M. W.; Perruccio, F.; Mulholland, A. J. Proteins: Struct., Funct., Bioinf. 2007, 69, 521–535. (48) Van der Kamp, M. W.; Perruccio, F.; Mulholland, A. J. J. Mol. Graph. Model. 2007, 26, 596–601. (49) Yang, W.; Drueckhammer, D. G. J. Phys. Chem. B 2003, 107, 5986–5994. (50) Berg, J. M.; Tymoczko, J. L.; Stryer, L. Biochemistry, 6th ed.; W. H. Freeman: New York, 2006. (51) Voet, D.; Voet, J. G.; Pratt, C. W. Fundamentals of biochemistry, 1st ed.; John Wiley & Sons, Inc.: New York, 1999. (52) Clark, J. D.; O’Keefe, S. J.; Knowles, J. R. Biochemistry 1988, 27, 5961–5971. (53) Mulholland, A. J.; Richards, W. G. J. Mol. Struct.: THEOCHEM 1998, 429, 13–21. (54) Lin, H.; Truhlar, D. G. Theor. Chem. Acc. 2007, 117, 185–199. (55) Riccardi, D.; Schaefer, P.; Yang, Y.; Yu, H. B.; Ghosh, N.; PratResina, X.; Konig, P.; Li, G. H.; Xu, D. G.; Guo, H.; Elstner, M.; Cui, Q. J. Phys. Chem. B 2006, 110, 6458–6469. (56) Helgaker, T.; Ruden, T. A.; Jorgensen, P.; Olsen, J.; Klopper, W. J. Phys. Org. Chem. 2004, 17, 913–933.

Van der Kamp et al. (57) Jorgensen, W. L.; Chandrasekhar, J.; Madura, J. D.; Impey, R. W.; Klein, M. L. J. Chem. Phys. 1983, 79, 926–935. (58) MacKerell, A. D.; Bashford, D.; Bellott, M.; Dunbrack, R. L.; Evanseck, J. D.; Field, M. J.; Fischer, S.; Gao, J.; Guo, H.; Ha, S.; JosephMcCarthy, D.; Kuchnir, L.; Kuczera, K.; Lau, F. T. K.; Mattos, C.; Michnick, S.; Ngo, T.; Nguyen, D. T.; Prodhom, B.; Reiher, W. E.; Roux, B.; Schlenkrich, M.; Smith, J. C.; Stote, R.; Straub, J.; Watanabe, M.; Wiorkiewicz-Kuczera, J.; Yin, D.; Karplus, M. J. Phys. Chem. B 1998, 102, 3586–3616. (59) Foloppe, N.; MacKerell, A. D. J. Comput. Chem. 2000, 21, 86– 104. (60) Reuter, N.; Dejaegere, A.; Maigret, B.; Karplus, M. J. Phys. Chem. A 2000, 104, 1720–1735. (61) Brooks, B. R.; Bruccoleri, R. E.; Olafson, B. D.; States, D. J.; Swaminathan, S.; Karplus, M. J. Comput. Chem. 1983, 4, 187–217. (62) Becke, A. D. J. Chem. Phys. 1993, 98, 5648–5652. (63) Becke, A. D. J. Chem. Phys. 1993, 98, 1372–1377. (64) Harvey, J. N. Faraday Disc. 2004, 127, 165–177. (65) Jaguar, 5.5 ed.; Schrodinger LLC: Portland, OR, 2005. (66) Ponder, J. W. TINKER: Software Tools for Molecular Design, v4.0; Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine: Saint Louis, MO, 2003. http://dasher. wustl.edu/tinker/. (67) Werner, H.-J.; Knowles, P. J.; Lindh, R.; Manby, F. R.; Schu¨tz, M.; Celani, P.; Korona, T.; Rauhut, G.; Amos, R. D.; Bernhardsson, A.; Berning, A.; Cooper, D. L.; Deegan, M. J. O.; Dobbyn, A. J.; Eckert, F.; Hampel, C.; Hetzer, G.; Lloyd, A. W.; McNicholas, S. J.; Meyer, W.; Mura, M. A.; Nicklass, A.; Palmieri, P.; Pitzer, R.; Schumann, U.; Stoll, H.; Stone, A. J.; Tarroni, R.; Thorsteinsson, T. MOLPRO, 2002.10 ed.; University College Cardiff Consultants Limited: Cardiff, U.K., 2006. (68) Werner, H. J.; Manby, F. R.; Knowles, P. J. J. Chem. Phys. 2003, 118, 8149–8160. (69) Grimme, S. J. Chem. Phys. 2003, 118, 9095–9102. (70) Schu¨tz, M. J. Chem. Phys. 2000, 113, 9986–10001. (71) Schu¨tz, M.; Werner, H. J. Chem. Phys. Lett. 2000, 318, 370–378. (72) Mills, G.; Jonsson, H.; Schenter, G. K. Surf. Sci. 1995, 324, 305– 337. (73) Czerminski, R.; Elber, R. J. Chem. Phys. 1990, 92, 5580–5601. (74) Elber, R.; Karplus, M. Chem. Phys. Lett. 1987, 139, 375–380. (75) Z˙urek, J. Modelling reactiVity of threonyl-tRNA synthetase and cytochrome P450 enzymes. Ph.D. Thesis, University of Bristol, 2008. (76) Henkelman, G.; Jonsson, H. J. Chem. Phys. 2000, 113, 9978–9985. (77) Henkelman, G.; Uberuaga, B. P.; Jonsson, H. J. Chem. Phys. 2000, 113, 9901–9904. (78) Dewar, M. J. S.; Dieter, K. M. J. Am. Chem. Soc. 1986, 108, 8075– 8086. (79) Schro¨der, S.; Daggett, V.; Kollman, P. J. Am. Chem. Soc. 1991, 113, 8922–8925. (80) Aakero¨y, C. B. J. Mol. Struct.: THEOCHEM 1993, 100, 259–267. (81) Lonsdale, R.; Harvey, J. N.; Mulholland, A. J. J. Phys. Chem. B 2010, 114, 1156–1162. (82) Bowman, A. L.; Ridder, L.; Rietjens, I. M. C. M.; Vervoort, J.; Mulholland, A. J. Biochemistry 2007, 46, 6353–6363. (83) Helgaker, T.; Klopper, W.; Koch, H.; Noga, J. J. Chem. Phys. 1997, 106, 9639–9646. (84) Masgrau, L.; Ranaghan, K. E.; Scrutton, N. S.; Mulholland, A. J.; Sutcliffe, M. J. J. Phys. Chem. B 2007, 111, 3032–3047. (85) Masgrau, L.; Roujeinikova, A.; Johannissen, L. O.; Hothi, P.; Basran, J.; Ranaghan, K. E.; Mulholland, A. J.; Sutcliffe, M. J.; Scrutton, N. S.; Leys, D. Science 2006, 312, 237–241. (86) Nelson, D. L.; Cox, M. M. Lehninger’s Principles of Biochemistry, 5th ed.; W. H. Freeman: New York, 2009. (87) Kurz, L. C.; Drysdale, G.; Riley, M.; Tomar, M. A.; Chen, J.; Russell, R. J. M.; Danson, M. J. Biochemistry 2000, 39, 2283–2296. (88) Russell, R. J. M.; Ferguson, J. M. C.; Hough, D. W.; Danson, M. J.; Taylor, G. L. Biochemistry 1997, 36, 9983–9994. (89) Svensson, M.; Humbel, S.; Froese, R. D. J.; Matsubara, T.; Sieber, S.; Morokuma, K. J. Phys. Chem. 1996, 100, 19357–19363. (90) Bathelt, C. M.; Z˙urek, J.; Mulholland, A. J.; Harvey, J. N. J. Am. Chem. Soc. 2005, 127, 12900–12908.

JP104069T