Improved Interaction Potentials for Charged Residues in Proteins

Jan 19, 2008 - The latter error of ∼1 kcal/mol contrasts MAEs from standard OPLS-AA of up to 13 kcal/mol for the entire series of charged residues o...
0 downloads 0 Views 256KB Size
1820

J. Phys. Chem. B 2008, 112, 1820-1827

Improved Interaction Potentials for Charged Residues in Proteins Kasper P. Jensen† Department of Chemistry, Technical UniVersity of Denmark, Building 207, Office 104, 2800 Kgs. Lyngby, DK, Denmark ReceiVed: September 25, 2007; In Final Form: NoVember 9, 2007

Electrostatic interactions dominate the structure and free energy of biomolecules. To obtain accurate free energies involving charged groups from molecular simulations, OPLS-AA parameters have been reoptimized using Monte Carlo free energy perturbation. New parameters fit a self-consistent, experimental set of hydration free energies for acetate (Asp), propionate (Glu), 4-methylimidazolium (Hip), n-butylammonium (Lys), and n-propylguanidinium (Arg), all resembling charged residue side chains, including β-carbons. It is shown that OPLS-AA free energies depend critically on the type of water model, TIP4P or TIP3P; i.e., each water model requires specific water-charged molecule interaction potentials. New models (models 1 and 3) are thus described for both water models. Uncertainties in relative free energies of charged residues are ∼2 kcal/mol with the new parameters, due to variations in system setup (MAEs of ca. 1 kcal/mol) and noise from simulations (ca. 1 kcal/mol). The latter error of ∼1 kcal/mol contrasts MAEs from standard OPLS-AA of up to 13 kcal/ mol for the entire series of charged residues or up to 5 kcal/mol for the cationic series Lys, Arg, and Hip. The new parameters can be used directly in molecular simulations with no modification of neutral residues needed and are envisioned to be particular important in simulations where charged residues change environment.

Introduction This paper describes a problem associated with modeling free energies of charged amino acid residues in explicit water and provides a solution in the form of a set of reparametrized OPLSAA charges for Asp, Glu, Arg, Lys, and Hip, also applicable to N- and C-terminii. It is shown that specific water-charged residue interaction potentials are needed for each water model, and thus, a parameter set to be used with TIP4P and one to be used with TIP3P are introduced. Empirical force fields serve a very important role in modern computational biochemistry and drug discovery.1 Modeling of almost any biomolecule requires an empirical force field because more exact treatments by quantum mechanics are highly restricted by the size of the system.2,3 To obtain realistic insight into chemical processes, account of the free energy change of the process is necessary. The most accurate way to do this is by statistical sampling of the phase space of reactants and products of the process. Free energy perturbation is one such method, traditionally performed by either Monte Carlo or molecular dynamics sampling of phase space.4 Thermodynamic cycles can be used to interconvert substances, often via unreal parallel processes, to circumvent the alternative, more complicated real processes.5,6 These procedures are in principle exact but are limited by deficiencies in statistical sampling, perturbation methodology, and physicochemical model. The first problem is straightforward and occurs mainly for larger molecules, e.g., proteins in aqueous solution, and it is solved readily by additional computing time. The second problem is reasonably straightforward and can be tested in simple ways by choosing alternative perturbation schemes and evaluating the effect on the outcome,7 e.g. reverse perturbation or the use of different intermediate states or dummy atoms. The perturbation scheme and thermodynamic cycle also †

E-mail: [email protected].

need to exactly mimic the process for which information is desired. The third problem is the most complex, and with the advent of large supercomputing facilities, now the main obstacle to accurate modeling. The physicochemical model includes the explicit atoms and molecules included in the system setup, the boundary conditions, and the energy function (force field). Of these factors, the energy function alone depends on a variety of approximations, including the inherent approximations in the force field parameters, fixed point charges on atoms, energetic treatment of conformers, treatment of long-range electrostatics and other nonbonded interactions by cutoffs, etc.8 Since these approximations are many and complex, a standard approach of a fixed point charge force field with a limited set of bonded and nonbonded parameters has won the consensus of biomodeling. Force fields to describe biomolecular structure and energies such as AMBER,9 CHARMM,10 GROMOS,11 and OPLS-AA12 resemble each other in being harmonic, diagonal force fields with preferably Lennard-Jones and Coulomb nonbonded energy functions. Together with water models such as SPC,13 TIP4P, and TIP3P,14 to account for explicit water around the biomolecules, these force fields have witnessed tremendous success over the past two decades. In particular, since the goal is usually free energies of conversion in aqueous phase, it has been recognized that the force field should be parametrized to reproduce condense phase experimental data; this approach was pioneered by Jorgensen and co-workers.12 It is known that deficiencies occur in the force field treatment of the electrostatic energy, which is the main component of relative energies in classical biomolecular processes where bonds are not broken. Thus, attempts to go beyond the simple models by including polarization have been seen.15 However, as we will point out here, there is still a lot of space for improvement in standard fixed point-charge force fields before the need for polarization in condense phase can be proven, in particular considering that condense phase already is strongly polarized:

10.1021/jp077700b CCC: $40.75 © 2008 American Chemical Society Published on Web 01/19/2008

Interaction Potentials for Charged Residues fixed point charges of OPLS-AA and TIP3P/TIP4P are more polarized than in vacuum to simulate an average, effective induced dipole moment;14 polarization becomes important only in specific cases where the polarization component of the energy changes substantially during a process, e.g., for charged groups coming into contact with molecules that in turn polarize more than they would do in their own liquid phase.16 A main problem here is that many condense phase data are taken directly from the liquid phase of a molecule, assuming that the selfpolarization found in the liquid resembles the polarization encountered in the heterogeneous environment of a biomolecular system. This leads to problems when the molecule encounters strongly polarizing entities, e.g., ions.17 Due to the dominance of the electrostatic interactions, when one models charged residues in proteins, some new issues emerge; one issue is the aforementioned polarization component of the energy. Another is however encountered and can be resolved already at the nonpolarizable level: as will be shown here, the computed interaction energies and free energies become very sensitive to the choice of water model. The major part of the energy of the simulated system will be prone to large errors if specific interaction potentials for the charged residues are not used for each particular water model. First of all, it is important to realize that condense phase data obtained for protein parameters work well in most cases, since, inside the protein, the dielectric environment resembles that of a pure organic liquid to some extent. Second, ideally, free energies of transfer from water to an organic phase or free energies of hydration should be used for parametrization, since they reflect a unique spectrum of the free energy change going from the aqueous to the low-dielectric (vacuum, protein) environment and, thus, should reflect the true free energies of the molecule. Electrostatic interactions become important when they change due to different conformations. As an example, consider a charged residue, e.g., Asp, starting out in an unfolded protein state being completely hydrated but ending up (partially) dehydrated in the folded state. Alternatively, envision the same Asp in a binding pocket bound to water in the free state but bound to a hydrophobic ligand in the bound state. The dehydrated interaction is dominated by the direct and weakly screened interaction between Asp and the protein or ligand, and this part of the interaction potential can be expected to be modeled well by the charged residues if the interaction energy is in good accord with ab initio quantum chemical interaction energies. This approach has been followed in standard procedures,18,19 leaving the hydrated free energy limit rather unexplored, and thus, the free energy of any charged residue is not well-calibrated. The main result of this work is that the correct limiting behavior can be secured by requiring the force field parameters to correctly reproduce the hydration free energies of the isolated, charged residues. These changes reflect changes in interaction energies of closelying neighbors. The important thing is to realize that this change is not well-modeled by a gas-phase quantum chemical calculation or a liquid-phase simulation alone, as has been done hitherto, but instead requires calibration toward a free energy calculation reflecting the actual change in enVironment. It is shown in this paper that errors from 5 to 13 kcal/mol for the relative free energies of two residues can occur because of this neglect. The solution is to use free energies of hydration as the target data for parametrization of OPLS-AA for charged residues, since free energies of processes reflecting changes in electrostatic environment are the most significant in most molecular simulations. Existing approaches do not take into

J. Phys. Chem. B, Vol. 112, No. 6, 2008 1821 account the free energy of hydration of charged groups and, thus, cannot be expected to be accurate in a number of cases where charged groups change environment during simulation. Earlier work in similar direction includes an attempt to improve OPLS-AA parameters for Asp and Glu on the basis of DFT calculations,20 with the absolute free energies obtained with the SPC water model. Gromos 53A6 also gives a qualitatively correct, however slightly overestimated, difference in free energy of the two acidic residues, again using SPC water.21 This was corrected ad hoc by adjusting the Lennard-Jones parameters in one case.22 A recent study computed the free energy of hydration using OPLS-AA and TIP4P as here for the neutral amino acid residues,23 which are however not the subject of this work. Methods Use of Target Data. Parameters have been taken from standard OPLS-AA and modified only in two atom types of the functional groups. Most of the original OPLS-AA parameters were fitted to HF/6-31G(d) interaction energies between water and the small charged groups and also aimed at reproducing heats of hydration for some charged groups.24 Whereas OPLSAA displays superior performance for neutral groups in general, there are several problems with the standard force fields for treating the charged residues in proteins, as also seen in, e.g., the Gromos force field:25 (i) Free energies have so far not been used as target data, and only some times, heats of hydration have been used,24 so the entropy component of the hydration free energies is missing in all cases. (ii) For Arg and Hip, no heats or free energies of hydration have been used in the target data and are thus not consistent with the energies of the other charged molecules. (iii) For those molecules where heats of hydration have been used as target data, the data are not from the same source.24 Since different research groups use different extrathermodynamic assumptions for the determination of enthalpies and free energies of hydration, it is crucial that these data stem from the same procedure. (iv) For Lys, Asp, and Glu, the studied models, NH4+, CH3NH3+, and HCOO-, are not consistent with the actual sizes of the amino acids, i.e., the number of methylene groups in the side chain. Since a methylene unit contributes approximately ∼1 kcal/mol to the hydration free energy, this effect is usually minor but systematic. The problem of incorrect relative free energies of hydration for charged amino acids was documented earlier by Dixit et al. using finite-difference Poisson-Boltzmann calculations.26 This analysis was based on a particular choice of experimental absolute free energies, which can vary, but any choice will give almost the same relative free energies. Whereas the relative free energies of hydration of Asp and Glu are correct to within 1.5 kcal/mol or less with both AMBER, CHARMM, and OPLS, the relative free energies of other residues Hip, Arg, and Lys display errors in relative free energies of hydration which sometimes exceed 10 kcal/mol. As an example, AMBER overestimates the free energy of Lys by 11.7 kcal/mol and Hip by 7.8 kcal but underestimates that of Asp by 3.3 kcal/mol. CHARMM overhydrates Lys by 13.9 kcal/mol and Hip by 6.8 kcal/mol but is close to the chosen experimental data for Asp and Glu. A very similar result is seen for OPLS.26 Fitting to water-solute dimerization energies does not reproduce the physically correct situation in condense phase, where compensating solute-solvent and solvent-solvent interactions occur: in general, for solvation of a charged solute in a protein or water, the solute-solvent interactions are favorable for the free energy, whereas the solvent-solvent interactions are unfavorable.27 Therefore, a desirable approach

1822 J. Phys. Chem. B, Vol. 112, No. 6, 2008

Jensen, K. P.

Figure 1. Snapshots of acetate in 474 water spheres of radius 15 Å, to test size consistency (left), and in 509 water cubic boxes with periodic boundary conditions (right).

TABLE 1: Test of Two Approaches To Calculate the Absolute Free Energy of Hydration, Using Optimized Parameters of Model 1 in 10-Å Radius TIP4P Water Caps molecule

expta

model 1

NB model 1

acetate propionate n-propylguanidinium n-butylammonium 4-methylimidazolium

-80.7 -79.1 -66.1 -69.2 -64.1

-81.2 -80.1 -65.9 -69.6 -65.0

-79.8 -76.2 -56.1 -66.0 -64.4

a

See ref 32.

is to optimize parameters and fit computational free energies to target data of free energies resembling conversion from water to low-dielectric environments, as has been achieved in this work. System Setup. To test the robustness of the approach, a variety of system setups have been attempted, with finite spheres including Born corrections and cubical cells with periodic boundary conditions as the two types of simulation system (see Figure 1). Spheres of 10-Å and 15-Å radii were tested with effectively infinite (100 Å) nonbonded cutoffs and corresponding Born corrections of 16.39 and 10.93 kcal/mol and shown to give size-consistent results (see later). These setups ensure that all competing water-water and solute-water interactions within the sphere are computed. Periodic boundary conditions (PBC) without Ewald summation are known from earlier work to give cutoff-dependent results because of the changing bias between water-water and solute-water interactions and are thus not useful for obtaining free energies when charges change environment in nonneutral systems.27 Ewald summation has not been attempted because of complications when dealing with charged systems, i.e., the need to introduce a uniform background charge although such a background charge is by no means uniform close to the ion. Computational Details. Monte Carlo sampling was performed with Boss, version 4.7.28 It was found that 10 windows using double-wide sampling were sufficient to reach the target accuracy. Typically, 5 million configurations of equilibration and 10 million configurations of averaging were necessary for each window. All computations were carried out at 298 K and 1 atm pressure corresponding to the conditions of the experimental target data. As will be clear later, for anions, the standard OPLS-AA parameters overestimate the free energy substantially in TIP4P, whereas, for cations, free energies are underestimated in TIP4P. This means that the OPLS-AA point charge models are overpolarized for the anions and underpolarized for the cations. It also implies that not even the relative free energies of hydration of the groups are well modeled: this is a problem, e.g., when forming salt bridges, where the dissociation limit of

one hydrated negative residue and one hydrated positive residue should equal the sum of the hydration free energies minus the interaction free energy of the salt bridge. It is also a problem in other cases where both positively and negatively charged residues become partly dehydrated during a chemical process. The solution to this problem is to adjust the partial charges of the functional groups -NH3+ (Lys), -COO- (Asp, Glu), -NH(C)NH+- (Hip), and -NH2(C)NH2+. Alternatively, the Lennard-Jones (LJ) parameters can be adjusted leaving the point charges fixed. However, this approach breaks part of the simplicity of the OPLS-AA force field, for which many LJ parameters can be allowed to be similar and typical for a particular element, and also remain of a consensus nature when using QM-derived charges to generate parameters for new molecules. An additional parameter set was optimized where LJ parameters were modified instead (model 2) to show that these parameters can indeed be modified to yield similar interaction energies, structures, and free energies, but this model should not be used in connection with OPLS-AA. The fragment approach of OPLS makes it rather straightforward to modify parameters and maintain the overall charge required for both model molecules and amino acid side chains. Charge is usually localized so that changes can occur only in one functional group. The one exception was Hip: the charges of the ring of protonated Hip sums to 0.885 in both the old and new OPLS force field, but to model it with the 4-methylimidazolium, a modified carbon charge of -0.065 had to be used together with three standard hydrogen atoms (type 140) with Q ) +0.06, giving the desired total charge of methyl of +0.115). When Hip itself is modeled, the corresponding charge of the β-carbon (type 505) is -0.005, which, together with two 140 hydrogen atoms, also sums to +0.115. Thus, the new parameters for Hip only include 512 and 513 atom types, whereas, for 4-methylimidazolium, they also include a methyl carbon with Q ) -0.065. For all other residues studied, the charges of fragment molecules and amino acids sum up to the desired total charge by modifying only two atom types within each functional group. While the total charges of the fragments were kept in this way, charges of the functional groups were varied and their free energies of hydration simulated from free energy perturbation. Other properties, in particular structure and interaction energies with water, were required to be physically reasonable as well. Free energies were computed by annihilating in one calculation the charge, keeping LJ parameters unaltered, and in a second perturbation annihilating LJ parameters of the neutral molecules. Since LJ parameters were not reparametrized, one calculation for each molecule sufficed for the LJ annihilation, whereas the charge annihilation was repeated with different partial charges until computed and experimental free energies were in agreement. The LJ annihilation gave a typical free energy of 2-4 kcal/mol for all molecules, smallest for acetate, in good agreement with the hydration free energy of neutral molecules. Several perturbation schemes were applied, including various parameters for midpoint dummies, annihilation of protons in one perturbation, and the number of windows. Longer runs were attempted to ensure statistical convergence. These tests confirmed the robustness of the procedure, with estimated average errors from statistical sampling and perturbation scheme (∼1 kcal/mol) as well as system type (1 kcal/mol) of approximately 2 kcal/mol. This number has therefore been the target accuracy of this work. It resembles a large number for relative free energies of neutral residues but is ambitious when dealing with free energies of charged residues.

Interaction Potentials for Charged Residues

J. Phys. Chem. B, Vol. 112, No. 6, 2008 1823

TABLE 2: Two Sets of Modified Nonbonded OPLS Parameters for Charged Molecules in TIP4P model 1 (Q optimized) molecule acetate and propionate n-propylguanidinium n-butylammonium 4-methylimidazolium

OPLS atom O272 C271 N300 H301 N287 H290 512N 513H 59Ca

model 2 (σ,  optimized)

Q

σ



Q

σ



-0.59 0.28 -1.06 0.59 -0.72 0.47 -0.72 0.64 -0.065

2.96 3.75 3.25 0.00 3.25 0.00 3.25 0.00 3.50

0.21 0.105 0.17 0.00 0.17 0.00 0.17 0.00 0.066

-0.80 0.70 -0.80 0.46 -0.30 0.33 -0.54 0.46 -0.065

3.30 3.75 2.92 0.00 3.12 0.00 2.87 0.00 3.50

0.20 0.10 0.20 0.00 0.10 0.00 0.10 0.00 0.066

a 4-Methyl carbon modified from methyl of 1-methylimidazole (-0.199) to be used for modeling only 4-methylimidazolium and not Hip. It ensures a total charge of +1 for the fragment. In Hip, this atom type is 505 (β-carbon).

TABLE 3: Test of Sensitivity of Model 1 to System Setup: TIP4P Water Spheres with Radii of 10 and 15 Å and Infinite Cutoffs and Cubic Boxes of 508 and 746 TIP4P Molecules with Periodic Boundary Conditions (PBC) and 10 Å and 12 Å Nonbonded Cutoffs, Respectively (with Born Correction)

a

molecule

expta

CAP rad 10

CAP rad 15

PBC 508/10

PBC 746/12

acetate propionate n-propylguanidinium n-butylammonium 4-methylimidazolium

-80.7 -79.1 -66.1 -69.2 -64.1

-81.2 -80.1 -65.9 -69.6 -65.0

-80.2 -81.0 -66.0 -70.0 -64.3

-82.2 -79.2 -86.1 -89.3 -79.8

-79.9 -77.7 -86.8 -87.9 -79.2

See ref 30.

Table 1 compares two approaches to computing the absolute free energy of hydration for the molecular ions with explicit solvation. In the first case (model 1), which is applied throughout this work, the free energy of hydration ∆Ghyd was computed as29

∆Ghyd ) ∆G*(aq) - ∆G*(g)

(1)

where ∆G* is the free energy of annihilating all nonbonded interactions involving the solute in water (aq) and gas (g), respectively. The correction for the gas phase is due to the fact that intrasolute nonbonded energy terms remain in the gas phase, however, with different population of conformers. In the second case (NB model 1), ∆Ghyd was computed from one free energy perturbation in water only, where all nonbonded parameters are perturbed to zero, as in ∆G*(aq) of eq 1, but without evaluating internal nonbonded energies during perturbation, i.e., assuming that internal nonbonded energies contribute similarly in aqueous and gas phase. In practice this was done by setting the intrasolute nonbonded cutoff to 0.001 Å in all calculations. As seen from Table 1, this approximation is good for the small and rigid molecules (acetate and 4-methylimidazolium) but not for the molecules with flexible hydrophobic chains, which are expected to occupy different conformations in gas and aqueous phases. These molecules tend to be less solvated by ∼1 kcal/mol per methylene group in this model, similar to the approximate hydration free energy of such a group. Because of this situation, only the approach of eq 1 has been followed in this work. Results and Discussion Interaction Potentials To Be Used with TIP4P. Table 2 shows the optimized nonbonded parameters for the two models where charges have been modified (model 1) and where LJ parameters have been modified (model 2). Model 2 is only displayed to show that all three parameters can in principle be modified and that substantial redundancy is present, which is one reason many OPLS LJ parameters are quite similar and kept similar for reasons of simplicity; it makes parametrization

of new compounds much more straightforward. The standard OPLS-AA parameters can be deduced from columns 2 and 3 of model 1 (σ, ) and column 1 of model 2 (Q). Table 3 shows the free energy results of various system setups using the new parameters of model 1. First, the experimental target data are taken from the same source30 and are subject to the same extrathermodynamic assumptions, necessary to put the positively and negatively charged residues on the same free energy scale. The next two columns show the performance of the sphere model with differently sized spheres of 10-Å and 15-Å radii. The largest difference between absolute free energies of the two spheres is 1.4 kcal/mol, and both agree with experiment within 1 kcal/mol (except propionate, which in the large system had a slightly larger error of 1.9 kcal/mol). Thus, the model used for parametrization is size consistent and treats all water-water and ion-water interactions consistently, which is the secret to obtaining accurate free energies in these cases.27 The success of the sphere models indicates that differential microscopic solvation of cations and anions takes place within the first few hydration spheres of the (molecular) ion and subsequently more or less cancels to render a dielectric description sufficient. However, overall errors are 1-2 kcal/ mol due to model system type, and this is also an inherent error in all computed relative free energies of hydration presented in this work. It is unrealistic to expect a better accuracy than this for absolute and relative free energies of hydration of molecular ions. The last two columns of Table 3 show the performance of the new parameters using cubic simulation cells with periodic boundary conditions and truncated nonbonded interactions, in one case a box of 508 TIP4P molecules with 10-Å cutoffs for all interactions and in the other case a slightly larger box with 746 water molecules and 12-Å cutoffs. Although the absolute free energies are very different from those obtained with the sphere models and experiment for the cations, and thus not reliable, it can be seen that PBC still works reasonably well for obtaining relative free energies within either series of positively or negatively charged residues but not for all residues as a whole. The hydration of cations and anions is physically

1824 J. Phys. Chem. B, Vol. 112, No. 6, 2008

Jensen, K. P.

TABLE 4: Absolute Hydration Free Energies (kcal/mol) of Charged Residues from OPLS-AA, OPLS-AA with CM1P Charges, and Two New Parameter Sets in TIP4P Water molecule

expta

OPLS-AA

OPLS-AA CM1P

OPLS-AA new model 1

OPLS-AA new model 2

acetate propionate n-propylguanidinium n-butylammonium 4-methylimidazolium MAE

-80.7 -79.1 -66.1 -69.2 -64.1

-91.7 -88.0 -58.9 -66.5 -57.9 7.2

-84.7 -82.3 -61.8 -66.4 -57.1 4.3

-81.2 -80.1 -65.9 -69.6 -65.0 0.6

-80.3 -79.5 -66.9 -69.7 -64.3 0.5

a

See ref 30.

TABLE 5: Comparison of Relative Hydration Free Energies (kcal/mol) of Charged Residues from OPLS-AA, OPLS-AA with CM1P Charges, and Two New Parameter Sets in TIP4P Water (Acetate ) 0 kcal/mol) molecule

expta

OPLS-AA

OPLS-AA CM1P

OPLS-AA new model 1

OPLS-AA new model 2

propionate n-propylguanidinium n-butylammonium 4-methylimidazolium MAE

-1.6 -14.6 -11.5 -16.6

-3.7 -32.8 -25.2 -33.8 12.8

-2.4 -22.9 -18.3 -27.6 6.7

-1.1 -15.3 -11.6 -16.2 0.4

-0.8 -13.4 -10.6 -16.0 0.9

a

See ref 30.

different and causes waters to align oppositely close to the solute, and thus, simulations using PBC suffer from the unbalanced treatment of water-water and water-solute interactions. With Ewald summation of a neutral system including both types of residues, this error should be remedied. Comparison of New and Old Parameters. Table 4 shows a comparison of the absolute free energies of hydration obtained with standard OPLS-AA, OPLS-AA with CM1P charges derived from a semiempirical PM3 calculation,30 and the new charges of model 1 and 2 for use with TIP4P. The first observation is that OPLS-AA/TIP4P overhydrates the negatively charged residues and underhydrates the positively charged residues. This means that also relative free energies of negatively and positively charged residues have large errors associated with them. As can be seen, the quantum-mechanical CM1P charges directly resolve a large part of this problem, by lowering hydration free energies of acetate and propionate and increasing free energies for the cationic molecules. The two new models have been optimized to fit these free energies within 1 kcal/mol, to give consistent relative energies for all of the molecules. Table 5 shows the relative free energies of hydration, when acetate is put at 0 kcal/mol. In a comparison of the two carboxylic acid residues, it becomes apparent that the old OPLS set slightly overestimates the differential hydration free energy of the two acids, mainly due to the neglect of entropy of hydration. Importantly, all models (except the quantum-mechanical charges) have the same nonbonded parameters for acetate and propionate and can succeed in giving reasonable relative free energies of hydration in this way, because the major difference in hydration free energy between propionate and acetate is due to the extra methylene group of propionate, which corresponds to about 1 kcal/mol of less hydration free energy. Hence, all four models predict the qualitatively correct answer, with the CM1P charges being quite good. However, for cationic molecules, things are much worse, and the chemical diversity is now larger than just the additional 1 kcal/mol of a carbon group. Due to the problem of absolute free energies, the overall MAE of standard OPLS-AA is 12.8 kcal/mol, although significantly smaller within each group of charged molecules. Interestingly, CM1P charges succeed in lowering this MAE to 6.7 kcal/mol for the total set, which is impressive considering that these charges are merely from gas phase and not expected to handle the polarized bulk situation

very well. There is a lot of arbitrariness in the QM charges due to the charge deviation scheme and also due to the dependence of charges on molecular conformations, but part of the discrepancy of OPLS-AA is resolved due to more physically reasonable charges. Finally, the new OPLS parameters bring the MAE for the whole set down to 0.4 kcal/mol for model 1. Focusing only on the cationic molecules illustrates the inconsistency of the standard force field also for the relative free energies of this group of similarly charged molecules. Although all trends are correct, which is important, the quantitative free energies are not satisfactory, as seen in Figure 2. The differential free energy of the Lys and Hip models should be 3 kcal/mol but is computed to be almost 8 kcal/mol, and that of Lys and Arg should be 5 kcal/mol but is almost 9 kcal/ mol. CM1P charges improve the first case but not the last. The new models by design provide relative free energies within 1 kcal/mol. Importantly, the errors found here are very similar to the errors computed by Dixit et al. using finite-difference Poisson-Boltzmann calculations,26 as mentioned earlier, and thus this problem is well-documented with two rigorous and widely different approaches. Monohydrate Energies and Structures. After having demonstrated the improved performance of model 1 with TIP4P as compared to standard OPLS-AA, we now look at some other

Figure 2. Free energies of hydration for cationic side chain models relative to n-butylammonium.

Interaction Potentials for Charged Residues

J. Phys. Chem. B, Vol. 112, No. 6, 2008 1825

TABLE 6: Interaction Energies (kcal/mol) of Molecule-TIP4P Complexes method

EMP2 - EOPLS

molecule

OPLS-AA

OPLS-Mod.1

B3LYP/TZVPP

MP2/TZVPPa

OPLS-AA

OPLS-Mod.1

acetate propionate n-propylguanidinium n-butylammonium 4-methylimidazolium

-19.5 -19.7 -13.9 -17.1 -15.4

-17.1 -17.5 -15.6 -19.1 -19.7

-19.4 -19.3 -15.7 -17.5 -16.2

-20.9 -20.8 -17.1 -18.2 -17.4

1.4 1.1 3.2 1.1 2.0

3.8 3.3 1.5 -0.9 -2.3

a

MP2/TZVPP single point energy at B3LYP/TZVPP optimized geometry.

properties that are also important for the correct description of the molecular systems. A first test is whether the obtained force field remains physically reasonable, giving monohydrate geometries and interaction energies close to what could be expected, in particular to avoid spurious optimal parameter sets which could in principle occur from the parametrization procedure. Table 6 shows the interaction energies of monohydrates of the five molecules obtained with OPLS-AA, the new model 1 with modified charges on functional groups, B3LYP/ TZVPP, and MP2/TZVPP. All interaction energies are stronger with MP2 than with B3LYP, indicating the contribution of ca. 1 kcal/mol of dispersion energy to the interaction energy. Standard OPLSAA gives results very close to those of B3LYP but consistently a bit weaker than MP2. The MAE for all five complexes is slightly larger with model 1 (2.4 kcal/mol) than with OPLSAA (1.8 kcal/mol), a small difference considering that some of the old OPLS parameters were optimized toward this type of vacuum interaction energies. It is expected that interaction energies of a good condense-phase force field will differ somewhat from quantum chemical energies because the force field will be designed to describe an average bulk situation with polarization from other solvent molecules as well. However, due to the dominating electrostatic component of the interaction energy, these numbers should not deviate substantially from ab initio results to be physically reasonable. All of the interaction energies in Table 6 have physically reasonable vacuum interaction energies, confirming the qualitatively correct nature of the ion-water interaction. The TIP4P water molecule has a dipole moment substantially larger (2.18 D) than it should have in vacuum (1.85 D). This will increase the interaction energy relative to that expected for a hypothetic nonpolarizing ion. However, ionic molecules can be expected to be at least as polarizing as water, so in this particular case, a TIP4P-ion interaction energy is not expected to be very different from ab initio results. For other systems where polarization has not been included in parametrization, e.g., for hydrophobic residues parametrized to fit liquid-phase data, there will be larger errors when interacting with ions (e.g., cation-π interactions).16 The largest difference between old and new OPLS parameters is seen for Hip, with a 4.3 kcal/mol stronger binding using the new parameters. This stronger vacuum binding also reflects the more negative free energy of hydration of the new parameters. Table 7 shows the hydrogen bond distances obtained for the monohydrate complexes of the charged fragment molecules. The lowest energy conformation was found using the simulated annealing procedure of Boss, and subsequent geometry optimization was done with OPLS-AA, model 1, and B3LYP/ TZVPP. The old and new force field parameters give almost identical geometries upon gas-phase geometry optimization of the monohydrate complex. For the two residues where simple directional hydrogen bonds are encountered, the force fields agree well with B3LYP. However, for the cases where more

TABLE 7: Hydrogen Bond Distances (Å) of Molecule-TIP4P Complexes method molecule acetate propionate n-propylguanidinium n-butylammonium 4-methylimidazolium

OPLS-AA

OPLS-Mod.1

B3LYP/TZVPP

1.73 2.30 1.79 2.22 1.69 1.71 1.67

1.74 2.30 1.79 2.22 1.69 1.71 1.67

1.91 2.15 1.99 2.04 2.06 1.72 1.74

hydrogen bond sites interfere, the OPLS models show their bias toward the situation where more water molecules compete for the available hydrogen bonds and, thus, give quite different geometries, differing up to 0.37 Å for n-propylguanidinium. As can be seen, the aVerage distances between the hydrogen atoms of water and the oxygen atoms of the acidic groups are similar with OPLS-AA and B3LYP, i.e., ca. 2.01-2.03 Å, but the geometry is somewhat more symmetric with B3LYP. Importantly, no change in geometric preferences of the new force field is seen, compared to standard OPLS-AA, so the new parameters retain the geometric characteristics of standard OPLS-AA. Bulk Structure. We now look at the average bulk structure obtained from Monte Carlo simulations of the ions in TIP4P water, in particular the averaged radial distribution functions of the hydrogen bonds, seen in Figure 3. Importantly, although the monohydrate geometries are essentially identical for old and new OPLS parameters, the statistically averaged structures differ because the interaction energies, and thus the sampling preference of configurations, change between the models. This is also reflected in the intensity of peaks and in the shell structure. The main effect of model 1 is to bring water hydrogens further away from the carboxylic acids on average but to bring the water oxygen closer to the hydrogen atoms of cationic molecules. This is seen from the position of the first peak of the radial distribution functions in Figure 3. This change in bulk structure directly reflects Table 4, where the new parameters reduce the free energy of hydration for the carboxylic acids and increase it for the cationic molecules, relative to standard OPLS-AA. Thus, both the statistically averaged free energies and bulk structures consistently document the correction of OPLS-AA’s overhydration of Asp and Glu vs underhydration of Hip, Arg, and Lys. Effect of Water Model on Simulation Results: TIP3P. We now turn to the problem of what happens when another water model than TIP4P is used with a given protein force field. Table 8 shows the absolute free energies of hydration computed with different OPLS force fields in TIP3P water. As is obvious from comparing Table 8 with Table 4, free energies for a given protein force field depend critically on the choice of water model. OPLS-AA gives widely different free energies with TIP3P and TIP4P, with the TIP3P-OPLS-AA potentials being in much better agreement with experimental free energies than the TIP4P-OPLS-AA potentials. It is therefore apparent that any

1826 J. Phys. Chem. B, Vol. 112, No. 6, 2008

Jensen, K. P.

Figure 3. Radial distribution functions for (N)H-O(water) (His, Arg, Lys) and O-H(water) (Asp, Glu), computed from Monte Carlo simulation with new and old OPLS parameters in TIP4P water spheres of 15-Å radii.

TABLE 8: Absolute Hydration Free Energies (kcal/mol) of Charged Residues from New and Old OPLS-AA Parameters in TIP3P Water

a

molecule

expta

OPLS-AA

OPLS-AA CM1P

OPLS-AA model 1

OPLS-AA model 3

acetate propionate n-propylguanidinium n-butylammonium 4-methylimidazolium

-80.7 -79.1 -66.1 -69.2 -64.1

-84.9 -82.6 -65.4 -69.9 -60.5

-79.6 -76.4 -63.7 -69.6 -60.8

-77.2 -77.2 -75.2 -75.7 -71.5

-79.9 -77.7 -64.6 -70.7 -64.0

See ref 30.

TABLE 9: Comparison of Relative Hydration Free Energies (kcal/mol) of Charged Residues from New and Old OPLS-AA Parameters (Acetate ) 0 kcal/mol) in TIP3P Water

a

molecule

expta

OPLS-AA

OPLS-AA CM1P

OPLS-AA model 1

OPLS-AA model 3

propionate n-propylguanidinium n-butylammonium 4-methylimidazolium MAE

-1.6 -14.6 -11.5 -16.6

-2.3 -19.5 -15.0 -24.4 4.2

-3.2 -15.9 -10.0 -18.8 1.7

0.0 -2.0 -1.5 -5.7 8.8

-2.2 -15.3 -9.2 -15.9 1.1

See ref 30.

set of charged residue parameters should be optimized and used in conjunction with only one water model. Because of this situation, we decided to derive new parameters to be used also with the OPLS-AA/TIP3P setup. This TIP3P-consistent model is referred to here as model 3. Table 9 gives an overview of the relative free energies obtained with the force fields in TIP3P water. OPLS-AA performs much better with TIP3P, and the PM3-derived charges perform astonishingly well. However, the latter charges cannot simply be used because they are not derived in a consistent way but only from OPLS-AA geometry optimized structures. More importantly, the PM3-CM1P atomic charges are not consistent with the fragment approach of the entire OPLS force field, with no unique way of truncating to simple fragment charges. However, the computations suggest that the QM-based methods for predicting charges depend on water models as well but are usually at least qualitatively correct. As seen in Tables 8 and 9, the newly optimized model 1 fails to predict even relative free energies of the residue models when TIP3P is used

as water model (MAE of 8.8 kcal/mol). Model 3 by design obtains a MAE of about 1 kcal/mol for the relative free energies in TIP3P water. Because of the aboVe results, it is Very important to realize that molecular simulations using different water models require entirely different protein force field parameters to mimic correctly the water-molecule interaction potentials. This is mainly important for modeling the energies of interactions with charged residues, as seen here. The dipole moment of water models is permanent in the sense that they are not polarizable. They mimic the normal permanent dipole plus the average induced dipole to give a permanent effective average dipole. TIP3P has a larger dipole moment (2.35 D) than TIP4P (2.18 D) and can be expected to have stronger interaction with ions, as seen recently.27 Since these errors, as illustrated in Tables 3-6, can exceed 5 kcal/mol per residue, it is very important to use optimized and physically realistic interaction potentials depending on the water model used. The strong dependency of water-amino acid interactions on water model may also

Interaction Potentials for Charged Residues

J. Phys. Chem. B, Vol. 112, No. 6, 2008 1827 any molecular mechanics code; the OPLS-AA force field remains valid for modeling all other interactions, also with these new parameters included.

TABLE 10: Modified Nonbonded OPLS Parameters for Charged Molecules in TIP3P (Model 3) params molecule acetate and propionate n-propylguanidinium n-butylammonium 4-methylimidazolium

OPLS atom O272 C271 N300 H301 N287 512N 513H 59Ca

Q

σ



-0.67 0.44 -0.96 0.54 -0.48 -0.63 0.55 -0.065

2.96 3.75 3.25 0.00 3.25 3.25 0.00 3.50

0.21 0.105 0.17 0.00 0.17 0.17 0.00 0.066

illustrate that both models are distinct from a more physically correct model, which includes, e.g., polarization. OPLS has been developed mainly for TIP4P, but many groups continue to use TIP3P for reasons of computational simplicity (in MD studies), even though there are clear pitfalls related to this. However as we have seen, for the charged amino acid fragments, TIP3P in fact works much better but still not optimally. Because of this, it has been crucial for us now to correct the OPLS-AA charges for charged residues to reflect better the free energies of interactions with both TIP4P and TIP3P water. The models to be used are models 1 and 3, the parameters of which are given in Table 10. The performance of model 3 is seen in the last column of Tables 8 and 9. Conclusion In this paper, a self-consistent literature set30 of hydration free energies for comparable amino acid models, including β-carbons, has been used to reparametrize OPLS-AA charges for charged amino acid residues, solving a serious problem in biomolecular modeling. It has been shown that different water models result in very different free energies of hydration for a particular force field, in this case OPLS-AA, and thus, the water models require specific and different interaction potentials for charged amino acids to be well-modeled. Such an effect has been observed also for neutral amino acids where the effect is however less significant.31 We have parametrized two such new parameter sets by modifying the atomic point charges of the atom types of the functional groups only, keeping the remaining fragments as well as LJ parameters unaltered in the spirit of OPLS-AA. Model 1 in Table 2 and model 3 in Table 10 should be used with TIP4P and TIP3P, respectively. The MAEs of individual free energies of residues have been decreased from a MAE of 13 kcal/mol in case of OPLS-AA/ TIP4P and 4 kcal/mol with OPLS-AA/TIP3P to ca. 1 kcal/mol with both parameter sets. The actual uncertainty is estimated to be ca. 2 kcal/mol, owing to approximations in the simulation procedures. Since these errors are per residue, large improvement in the description of free energies of proteins can be envisioned. The important improvement obtained here is that the overall relative free energy between any states of the protein, in particular those involving changes in the environment of one or more charged residues, is substantially more meaningful and accurate. For simulations of, e.g., binding of charged peptides to proteins, protein-protein interactions, and protein folding, all of which involve substantial changes in hydration of individual charged residues, large improvement in accuracy of relative free energies is expected. Further tests of such situations are underway. The changes introduced only affect charged residues and encompass only 8 new protein parameters, easily changed in

Acknowledgment. This work was made possible by the Velux foundation through a Villum-Kann-Rasmussen fellowship and by a grant from the Danish Center for Scientific Computing (DCSC). References and Notes (1) Jorgensen, W. L. Science 2004, 303, 1813-1818. (2) Jensen, F. Introduction to Computational Chemistry, 2nd ed.; John Wiley & Sons: New York, 2007. (3) Cramer, C. J. Essentials of Computational Chemistry; John Wiley & Sons: New York, 2004. (4) Allen, M. P.; Tildesley, D. Computer Simulation of Liquids; Clarendon Press: Oxford, U.K., 1987. (5) Tembe, B. L.; McCammon, J. A. Comput. Chem. 1984, 8, 281283. (6) Jorgensen, W. L.; Ravimohan, C. J. Chem. Phys. 1985, 83, 30503054. (7) Jorgensen, W. L. Acc. Chem. Res. 1989, 22, 184-189. (8) Hunenberger, P. H.; McCammon, J. A. J. Chem. Phys. 1999, 110, 1856-1872. (9) Cornell, W. D.; Cieplak, P.; Bayly, C. I.; Gould, I. R.; Merz, K. M.; Ferguson, D. M.; Spellmeyer, D. C.; Fox, T.; Caldwell, J. W.; Kollman, P. A. J. Am. Chem. Soc. 1995, 117, 5179-5197. (10) MacKerell, A. D.; Bashford, D.; Bellott, M.; Dunbrack, R. L.; Evanseck, J. D.; Field, M. J.; Fischer, S.; Gao, J.; Guo, H.; Ha, S.; JosephMcCarthy, D.; Kuchnir, L.; Kuczera, K.; Lau, F. T. K.; Mattos, C.; Michnick, S.; Ngo, T.; Nguyen, D. T.; Prodhom, B.; Reiher, W. E.; Roux, B.; Schlenkrich, M.; Smith, J. C.; Stote, R.; Straub, J.; Watanabe, M.; Wiorkiewicz-Kuczera, J.; Yin, D.; Karplus, M. J. Phys. Chem. B 1998, 102, 3586-3616. (11) Hermans, J.; Berendsen, H. J. C.; Van Gunsteren, W. F.; Postma, J. P. M. Biopolymers 1984, 23, 1513-1518. (12) Jorgensen, W. L.; Tirado-Rives, J. J. Am. Chem. Soc. 1988, 110, 1657-1666. (13) Berendsen, H. J. C.; Postma, J. P. M.; Van Gunsteren, W. F.; Hermans, J. In Intermolecular Forces; Pullman, B., Ed.; Reidel: Dordrecht, The Netherlands, 1981. (14) Jorgensen, W. L.; Chandrasekhar, J.; Madura, J. D.; Impey, R. W.; Klein, M. L. J. Chem. Phys. 1983, 79, 926-935. (15) Kaminski, G. A.; Stern, H. A.; Berne, B. J.; Friesner, R. A.; Cao, Y. X.; Murphy, R. B.; Zhou, R.; Halgren, T. A. J. Comput. Chem. 2002, 23, 1515-1531. (16) Caldwell, J. W.; Kollman, P. A. J. Am. Chem. Soc. 1995, 117, 4177-4178. (17) Jorgensen, W. L.; Jensen, K. P.; Alexandrova, A. J. Chem. Theory Comput. 2007, 3, 1987-1992. (18) Kaminski, G. A.; Friesner, R. A.; Tirado-Rives, J.; Jorgensen, W. L. J. Phys. Chem. B 2001, 105, 6474-6487. (19) Jorgensen, W. L.; Tirado-Rives, J. J. Am. Chem. Soc. 1988, 110, 1657-1666. (20) Xu, Z.; Luo, H. H.; Thieleman, P. J. Comput. Chem. 2006, 28, 689-697. (21) Geerke, D. P.; Van, Gunsteren, W. F. Chem. Phys. Chem. 2006, 7, 671-678. (22) Åqvist, J.; Fothergill, M.; Warshel, A. J. Am. Chem. Soc. 1993, 115, 631-635. (23) Chang, J.; Lenhoff, A. M.; Sandler, S. I. J. Phys. Chem. B 2007, 111, 2098-2106. (24) Jorgensen, W. L.; Gao, J. J. Phys. Chem. 1986, 90, 2174-2182. (25) Oostenbrink, C.; Villa, A.; Mark, A. E.; Van Gunsteren, W. F. J. Comput. Chem. 2004, 25, 1656-1676. (26) Dixit, S. B.; Bhasin, R.; Rajasekaran, E.; Jayaram, B. J. Chem. Soc., Faraday Trans. 1997, 93, 1105-1113. (27) Jensen, K. P.; Jorgensen, W. L. J. Chem. Theory Comput. 2006, 2, 1499-1509. (28) Jorgensen, W. L.; Tirado-Rives, J. J. Comput. Chem. 2005, 26, 1689-1700. (29) Jorgensen, W. L.; Tirado-Rives, J. Proc. Natl. Acad. Sci. U.S.A. 2005, 102, 6665-6670. (30) Sitkoff, D.; Sharp, K. A.; Honig, B. J. Phys. Chem. 1994, 98, 19781988. (31) Storer, J. W.; Giesen, D. J.; Cramer, C. J.; Truhlar, D. G. J. Comput.Aided Mol. Des. 1995, 9, 87-110. (32) Hess, B.; Van der Vegt, N. F. A. J. Phys. Chem. B 2006, 110, 17616-17626.