Optimized Parameters for Continuum Solvation Calculations with

Apr 3, 2008 - carbohydrates in conjunction with the Carbohydrate Solution Force Field are ... A good example is the use of continuum solvent models,...
2 downloads 0 Views 698KB Size
5238

J. Phys. Chem. B 2008, 112, 5238-5249

Optimized Parameters for Continuum Solvation Calculations with Carbohydrates David F. Green† Department of Applied Mathematics and Statistics, and Graduate Program in Biochemistry and Structural Biology, Stony Brook UniVersity, Stony Brook, New York 11794-3600 ReceiVed: October 4, 2007; In Final Form: January 18, 2008

Continuum solvent models have been shown to be an efficient method for the calculation of the energetics of biomolecules in solution. However, for these methods to produce accurate results, an appropriate set of atomic radii or volumes is needed. While these have been developed for proteins and nucleic acids, the same is not true of carbohydrates. Here, a set of optimized parameters for continuum solvation calculations of carbohydrates in conjunction with the Carbohydrate Solution Force Field are presented. Explicit solvent freeenergy perturbation simulations were performed on a set of hexapyranose sugars and used to fit atomic radii for Poisson-Boltzmann and generalized-Born calculations, and to fit atomic volumes for use with the analytical continuum electrostatics model. The solvation energetics computed with the optimized radii and a PoissonBoltzmann model show remarkable agreement with explicit solvent simulation, with a root-mean-square error of 1.19 kcal/mol over a large test set of sugars in many conformations. The generalized-Born model gives slightly poorer agreement, but still correlates very strongly, with an error of 1.69 kcal/mol. The analytical continuum electrostatics model correlates well with the explicit solvent results, but gives a larger error of 4.71 kcal/mol. The remarkable agreement between the solvation free energies computed in explicit and implicit solvent provides strong motivation for the use of implicit solvent models in the simulation of carbohydratecontaining systems.

1. Introduction Carbohydrates play an essential role in a large number of biological systems. In addition to their key role as the primary external energy source for many organisms, large polysaccharides play structural roles in prokaryotes (peptidoglycans provide rigidity to the bacterial cell wall), plants (the plant cell wall is composed of cellulose, hemicelluose, and pectin) and animals (glycosaminoglycans form a key component of cartilage). Furthermore, oligosaccharides are attached to the surface of many proteins of medical relevance. For example, the various types of the ABO blood-group result from differences in carbohydrate antigens on the surface of red blood cells. Additionally, many viruses, including both influenza and HIV, express human carbohydrates on their surface to help in immune system avoidance. Despite their many important functions, historically, the study of carbohydrates has lagged that of the other biological polymers (proteins and nucleic acids). This has largely been a result of the difficulties in experimental characterization of carbohydrates, especially in mixed protein-carbohydrate systems. In particular, natural glycoproteins and proteoglycans tend to be difficult to characterize structurally (either by X-ray crystallography or NMR spectroscopy). This is partly due to the intrinsic properties of the carbohydrates, but is additionally hampered by the difficulty in obtaining large quantities of pure oligosaccharides and glycoconjugates. One reason for this is the tendency of glycans to be heterogeneously glycosylated. Furthermore, when eukaryotic glycoproteins are expressed in bacterial systems (one of most common methods for obtaining large quantities of a † Address: Department of Applied Mathematics & Statistics, Stony Brook University, Stony Brook, New York 11794-3600; phone: (631) 632-9344; fax: (631) 632-8490; e-mail: [email protected].

protein of interest), the glycosylation is completely absent. Finally, the nonbiological synthesis of complex carbohydrates in high yield and purity is very challenging. Some of these hurdles are being overcome through improvements in experimental techniques, in particular, novel synthetic methods for complex carbohydrates and glycosylated peptides and techniques for the ligation of biologically expressed proteins and synthetic constructs.1-4 A complementary approach to advances in experiment is the simulation of carbohydrate-containing systems. Computational methods can provide predictions of the structures of difficult to characterize systems, insight into the dynamics of macromolecules, and can be used to dissect the energetics of diverse processes in a manner inaccessible to experiment. The coupling of experimental and computational methods has been shown to be a particularly powerful tool, and thus, as experimental glycobiology moves forward, concurrent advances in computational methods for carbohydrates are particularly desirable. The computational study of carbohydrates is not new. The conformational equilibria of mono- and oligosaccharides have been investigated with both quantum mechanics and empirical force fields, in the gas phase and in explicitly represented aqueous solvent;5-9 molecular dynamics and simulated annealing have been used to in conjunction with X-ray crystallography and NMR spectroscopy in the determination of glycoconjugate structures;10-14 and a number of carbohydrate/protein systems have been studied with molecular dynamics simulations.15-17 However, the range of computational methods applied to carbohydrate systems has not kept pace with the more diverse set of approaches used in protein and nucleic acid simulation. A good example is the use of continuum solvent models, which have been shown to be highly useful for the efficient calculation of the energetics of biological macromolecules in

10.1021/jp709725b CCC: $40.75 © 2008 American Chemical Society Published on Web 04/03/2008

Continuum Solvation Calculations with Carbohydrates

J. Phys. Chem. B, Vol. 112, No. 16, 2008 5239

aqueous solution. These models have been shown to provide detailed insight into macromolecular dynamics,18,19 protein stability,20-23 and the energetics of association of macromolecules, both with each other and with small molecules.24-29 Furthermore, continuum models have been shown to be particularly useful in drug screening30,31 and in the design of optimized ligands for protein targets.32-36 However, while these models have been parametrized to give accurate results for proteins and nucleic acids, continuum solvent models have yet to be extended in a robust way to carbohydrates. Here we present a detailed parametrization of the Carbohydrate Solution Force Field37 (CSFF) for use in three continuum solvent models: Poisson-Boltzmann38 (PB) and two implementations of the generalized-Born (GB) model.39-41 2. Methods Setup of Explicit Solvent Calculations. All molecular mechanics and dynamics calculations were performed using the CHARMM software package (version 32B1)42 and the CSFF.37 Solutes were built in default geometry, and then minimized to convergence (using a constant dielectric of 1.0 for electrostatic terms). Each solute was centered in a 25.0 Å radius sphere of TIP3P water molecules. Waters with oxygens within 2.50 Å of any solute heavy atom were removed, and the remaining waters were subjected to 500 steps of adapted-basis Newton-Rhapson minimization. All waters beyond 15 Å of the sphere center were then removed, and a single atom of the solute was fixed in position (C5 for all monosaccharides). The system was then equilibrated with 10.0 ps of Langevin dynamics at a constant temperature of 300 K, with a friction coefficient yielding a relaxation time of 5 ps-1. A time step of 2 fs was used, and the SHAKE algorithm was used to fix hydrogen bond lengths at their equilibrium values; no cutoffs were applied to nonbonded interactions. The spherical solvent boundary potential (SSBP) approach was used to approximate the effect of bulk water outside the sphere of explicit solvent.43 Computation of Solute-Solvent Radial Distribution Functions (RDFs). Equilibrated solute-solvent systems were simulated for 200 ps, using the same conditions as in equilibration. Coordinates were saved every 0.1 ps, and these trajectories were used with the RDFSOL module of CHARMM to compute RDFs of solvent oxygen and hydrogens around each solute atom. Atomic RDFs were converted to distribution of solvent charge-density by

gx,e(r) ) dwat[2qHgx,H(r) - qOgx,O(r)]

(1)

where gx,y(r) is the “x-y” atomic pair correlation function, qy is the partial atomic charge of atom “y”, and dwat is the density of water (computed from the number of waters in an equilibrated, solute-free sphere of radius 15 Å). Computation of Explicit Solvent Electrostatic Solvation Free Energies. Electrostatic solvation free energies were computed from explicit solvent simulations starting from equilibrated solute-solvent systems, and using the same conditions as in equilibration, following the protocol of Roux and co-workers.44,45 A standard free-energy perturbation (FEP) approach was taken, using the PERT module of CHARMM.46 All solute partial atomic charges were incrementally reduced in magnitude toward zero, using a scaling of (1 - λ)q; 11 evenly spaced values of λ between 0.0 and 1.0 (inclusive) were used. For each value of λ, the system was equilibrated for 5 ps, and then data was collected for 35 ps of dynamics; an additional 20 ps of equilibration was performed before the λ ) 0.0 and after the λ ) 1.0 simulations. The same procedure was then repeated

in reverse, with decreasing λ values from 1.0 to 0.0. The total simulation time for each FEP calculation was thus 960 ps. In all cases, all solute atoms were fixed throughout the simulation, either in an energy-minimized geometry or in a conformation sampled from the dynamic trajectory described above. The results from each simulation window were analyzed by the weighted-histogram approach using the WHAM module of CHARMM.47 Computation of PB Continuum Electrostatic Solvation Free Energies. The electrostatic solvation free energies were computed as the difference between two states: the solute immersed in water (a low-dielectric region with partial atomic charges at atom centers, surrounded by a high-dielectric region), and the solute in vacuum (the same low-dielectric region with point charges at atom centers, surrounded by a region of unit dielectric). Energies of each state were computed from the electrostatic potentials obtained by solution of the PB equation,48,49 using a multigrid finite-difference solver distributed with the integrated continuum electrostatics (ICE) software package (Tidor and co-workers).50,51 Charges were taken from the CSFF force-field,37 and various radii were used as defined in this manuscript. For consistency with the explicit solvent calculations, a solute dielectric constant of 1.0 was used, and a dielectric constant of 80.0 was used to approximate water; the dielectric boundary was defined by the molecular surface defined using a 1.4 Å radius probe. Although the PB equation was used, the ionic strength was set to 0.0 M, making the PB equation equivalent to the Poisson equation. A 65 × 65 × 65 unit grid was used (for a final grid spacing of, at most, 0.25 Å for all solutes). Focusing boundary conditions were used: an initial calculation, in which the longest dimension of the molecule occupies 23% of one edge of the grid, was performed using Coulombic potentials (calculated with a uniform external dielectric constant) at the boundary; the potentials from this solution were then used at the boundary of a second calculation, in which the longest dimension of the molecule occupies 92% of one edge of the grid. Results for each state were averaged over four translations of the solute with respect to the grid. Single-point calculations were performed on the minimized solute conformations; conformationally averaged energies were obtained using 20 conformations of each solute, taken by even sampling (at 10 ps intervals) of the 200 ps explicit solvent trajectory used for determining RDFs. As these conformations were sampled at regular time intervals from a constant temperature simulation, Boltzmann averaging was unnecessary, and the simple arithmetic average of all computed solvation energies was taken. Computation of GB Solvation Free Energies. Implicit solvation free energies using a GB model were calculated with two methods: the generalized-Born with simple smoothing (GBSW) and the analytical continuum electrostatics (ACE) implementations. For consistency with the other calculations, an internal solute dielectric of 1.0 was used, along with a solvent dielectric constant of 80.0. GBSW solvation energies were computed using the GBSW routine within the CHARMM computer program.41 The molecular surface boundary option was used, with a half smoothing length of 0.2 Å. The look-up table grid spacing was set to 1.5 Å; coefficients for the Coulomb field approximation and for the correction term were 1.2045 and 0.1866, respectively. A single calculation on a given structure was performed to obtain the GBSW electrostatic solvation free energy. ACE solvation energies were computed using the ACE2 routine within the CHARMM computer program.40 The smoothing

5240 J. Phys. Chem. B, Vol. 112, No. 16, 2008 parameter, R, was set to 1.3 (for consistency with standard PARAM22 protein parameters), and Born radii were limited to a maximum value of 14.0 Å. No cutoffs were used in the evaluation of any interactions. A single calculation on a given structure was performed; the sum of the ACE self and screening terms gives the electrostatic solvation free energy. Optimization of PB Atomic Radii. Atomic radii were optimized to give the lowest root-mean-square (rms) error between the electrostatic solvation energies computed by FEP and by the PB approach. As suggested by Roux and co-workers, an initial set of radii were chosen from the radial distribution of water molecules around each atom type.44,45 The minimum radius with a nonzero solvent charge density for any of the simulations was chosen as the initial radius (all sugars gave near identical results). As all atoms of a given type displayed near identical distributions, only a few standard atom types were considered: hydroxyl oxygens and hydrogens, the hemiacetal oxygen (O5), carbons (in any position), and the attached hydrogens. An alternate initial set of radii was selected as the CSFF minimum energy van der Waals (VDW) radii (Rmin). These radii were then optimized using two independent algorithms. In the first, all radii were first scaled by a constant factor, with intervals of 1.0%, until a minimum in the rms error was found. Using the minimum from this initial search, each variable radius was independently varied by 0.01 Å increments over a range of (0.05 Å. The radii set with the lowest rms error was accepted, and the procedure was repeated until all variables were at a minimum. In the second approach, the initial optimization consisted of a greedy, downhill search, in which each variable radius was independently altered by (0.05 Å, and the single change leading to the greatest reduction in rms error was accepted. This procedure was repeated while the rms error was monotonically decreasing. Subsequently, the same refinement procedure as in the first approach was used. Neither of these approaches is guaranteed to give the global optimum, and there are multiple minima on the error surface. Rather, these methods both are designed to give locally optimal solutions relatively near the starting conditions. Optimization of GBSW atomic radii. Atomic radii were similarly optimized to give the lowest rms error between the electrostatic solvation energies computed by FEP and by the GBSW approach. Initially, the CSFF minimum energy VDW radii (Rmin) were scaled by a constant factor, with intervals of 1.0%, until a minimum in the rms error was found. These radii were then further optimized using a Monte Carlo (MC) algorithm with the Metropolis acceptance criterion. The target function used in the optimization was the rms error, and an effective temperature of 0.15 was used. Each move consisted of changing the radius of a randomly selected atom type by a random value ranging between -0.1 and +0.1 Å; 50 000 moves were taken, and the set of volumes giving the lowest rms error was selected as the optimal. Convergence of the algorithm was assessed by repeating the MC procedure for an additional 50 000 steps. Optimization of ACE Atomic Volumes. The primary variable parameters in the ACE calculations are a set of atomic volumes that are used to derive effective Born radii. Initial volumes were taken as the Voronoi volumes for the ribose atomtypes of the PARAM27 all-atom nucleic acid force field.52 These volumes were then uniformly-scaled by constant factors between 0.90 and 1.10 (in 0.01 intervals) to minimize the rms error between the ACE and FEP energies of the training set. These volumes were then further optimized using an MC algorithm with the Metropolis acceptance criterion. The target function

Green used in the optimization was the rms error, and an effective temperature of 0.15 was used. Each move consisted of changing the volume of a randomly selected atom type by a random value ranging between -1.0 and +1.0 Å3; 50 000 moves were taken, and the set of volumes giving the lowest rms error was selected as the optimal. Convergence of the algorithm was assessed by repeating the MC procedure for an additional 50 000 steps. 3. Results and Discussion Continuum solvation models fall under the general class of classical, empirical energetic models. As with all such a models, a theoretical framework taken from classical macroscopic physics is applied to a microscopic system, with the model parameters tuned to give optimal performance. The parametrization can be through one of two methods (or a combination of both). First, parameters may be optimized to reproduce experimental data; small-molecule X-ray crystal structures, vibrational spectra, and the properties of bulk liquids have all been used in the parametrization of empirical force fields based on the laws of Newtonian mechanics. When inadequate experimental data is available, higher-level simulations may be used in their place; the torsional potentials in molecular-mechanics force fields are typically fit (at least partially) to the results of quantummechanical simulations.42,53-57 In the parametrization of continuum solvent models, one experimental observable of key interest is the free energy of solvation. These data have been used to develop the PARSE parameters for the 20 standard amino acids,58 and in the evaluation of ab initio charge determination methods.59,60 The key limitation to this approach, however, is the availability of experimental data; free energies of hydration have been determined for several hundred small organic molecules, but all tend to contain few chemical functionalities (with most being monofunctional).61 Roux and colleagues have demonstrated that explicit-solvent molecular dynamics simulations can be an efficient tool for the parametrization of continuum solvation models, without a requirement for experimental solvation data.44 In this approach, electrostatic hydration free energies are computed with FEP methods, and these energies are then used to train a set of radii for use in a continuum solvent model. This method, which has the additional benefit of yielding parameters consistent with the underlying force-field, has been used to parametrize both amino acids44,62 and nucleic acids.45 We have chosen a similar approach to the parametrization of carbohydrates. Solvent Structure around Hexopyranose Monosaccharides. All eight hexoaldose sugars were modeled in their pyranose forms (4C1 chair conformation), as both R- and β-anomers. The pyranose form is predominant in aqueous solution for all these sugars, and the 4C1 conformation is the prevalent state for all but R-idose (in which case it is still significantly populated).63 The D chiral configuration was used in all cases (as water is achiral, the solvation free energies are independent of chirality). The average radial distribution of water atoms around each atom of the sugars was obtained from a 200 ps Langevin dynamics simulation in a 15 Å radius solvent sphere. The distribution around all hydroxyls was nearly identical, regardless of whether the hydroxyl was primary (at C6) or secondary (at C1-4), and independent of the axial or equatorial configuration of the secondary alcohols. Figure 1A highlights this for the distribution of water hydrogens around the hydroxyl oxygens of R-mannopyranose, which contains two axial and two equatorial secondary hydroxyls, as well as one primary alcohol. The distribution of water oxygen atoms, and

Continuum Solvation Calculations with Carbohydrates

J. Phys. Chem. B, Vol. 112, No. 16, 2008 5241

Figure 1. Solvent atom density around R-mannopyranose. The RDF of solvent atom density around all atoms of R-mannose is displayed. (A) The hydroxyl oxygen/water hydrogen correlation function is plotted as an average over all sugar hydroxyls as well as for each individual hydroxyl. The results are near identical. (B-F) The correlation function of each sugar atom type with both solvent oxygen and hydrogen atoms are plotted. All results are taken from a 200 ps Langevin dynamics trajectory.

the distributions around the hydroxyl hydrogens show similar consistency (data not shown); the solvent distribution around all aliphatic carbons and hydrogens was also found to be invariant. For this reason, it is reasonable to group the sugar atoms into five general classes: C and H of methylene and methine groups, hydroxyl O and H, and hemiacetal O. Figure 1B-F details the solvent structure around each atomic class in R-mannopyranose. Only the hydroxyls are seen to induce significant solvent structure; the expected strong peaks of hydrogen-bonded waters in the first solvation shell are seen, both for the oxygen and hydrogen, but little structure is apparent beyond the first shell. The solvent structure around the hemiacetal oxygen (O5) shows evidence of hydrogen bond donation from the solvent (as seen by a significant hydrogen atom population at and slightly below 2.0 Å). However, there is not evidence of the same level of structure as is seen at the hydroxyls, and the hydrogen atom density increases close to monotonically until reaching bulk density. This is a result of two factors: (1) the charge on the hemiacetal oxygen is significantly smaller in magnitude than that of the hydroxyls (-0.40e as compared to -0.66e in the CSFF force field); (2) solvent is sterically occluded from around the hemiacetal oxygen, both by the directly bonded ring carbons (C1 and C5) as well as by the neighboring primary alcohol at position 6. All monosaccharides were seen to have essentially identical solvent structure around each atom type. This is highlighted in Figure 2 by the distribution of solvent charge density around the atoms of three sugars with different configurations of secondary hydroxyls (R-idose contains only axially oriented hydroxyls, β-glucose contains only equatorially oriented hydroxyls, and R-mannose contains both two axial and two equatorial groups); the charge density around each atom type in the different sugars is virtually indistinguishable. Thus, it is clear that the configuration at these positions does not influence solvent structure, at least in any manner apparent from these analyses. While this result is not surprising, as the axial and

equatorial groups are chemically the same and share identical parameters in the force field, it is not a trivial result; because of differences in accessibility presented by groups at varying positions on the ring, the solvent structure could also have been different. Electrostatic Hydration Free Energies of Monosaccharides in Explicit Solvent. The free energy of “charging” each monosaccharide in explicit solvent was computed using FEP over a 480 ps Langevin dynamics trajectory; the procedure was then repeated in the reverse direction in order to assess reproducibility of the result. These data are presented in Table 1 (under ∆GFEP). All monosaccharides have quite favorable electrostatic hydration energies, but the exact values vary over a significant range (-15 to -28 kcal/mol). These calculations were done using a single, energy-minimized conformation of each sugar, and thus these data should not be heavily interpreted. However, the magnitudes are consistent with the highly polar (but uncharged) character of the hexopyranose sugars. Optimization of Atomic Radii for Use in PB Continuum Electrostatic Calculations. While known to be highly accurate,64 the computation of solvation free energies through FEP techniques is very computationally costly. Continuum models of solvent, and in particular the PB continuum electrostatic approach, have been shown to reproduce both experimental solvation free energies58,59 and those computed by explicitsolvent simulations44,45,65 with good accuracy and greatly reduced cost. In the PB model, the solvent is represented by a region of high dielectric constant ( ) 80 for water), while the interior of the solute is represented by a region of low dielectric constant ( ) 1-8, typically). The higher values of the internal dielectric constant are generally used with large biomolecules to account for the dielectric response of, for example, the protein backbone;66 in this work, the vacuum dielectric of 1.0 is used for consistency with the underlying molecular-mechanics force field. Partial atomic charges (most typically located at atom centers) are used to describe the polarity of the solute, and the

5242 J. Phys. Chem. B, Vol. 112, No. 16, 2008

Green

Figure 2. Solvent charge density around monosaccharide atoms. The radial distribution of solvent charge is displayed for the atoms of R-idose, R-mannose, and β-glucose: (A) hydroxyl atoms; (B) hemiacetal oxygen; (C) CHx groups. Only the hydroxyl atoms are seen to induce a significant amount of solvent structure. All results are taken from a 200 ps Langevin dynamics trajectory.

the potential from the linearized PB equation, the electrostatic solvation free energy can be computed by

TABLE 1: Training Set of Electrostatic Hydration Free Energiesa molecule

∆GFEP b

∆GPB c

∆GGB c

∆GACE c

R-allose β-allose R-altose β-altose R-galactose β-galactose R-glucose β-glucose R-gulose β-gulose R-idose β-idose R-mannose β-mannose R-talose β-talose

-24.12 ( 0.30 -27.26 ( 0.25 -24.38 ( 0.08 -19.57 ( 0.22 -16.14 ( 0.03 -19.40 ( 0.25 -15.64 ( 0.05 -16.67 ( 0.22 -18.65 ( 0.15 -17.48 ( 0.21 -22.48 ( 0.27 -17.86 ( 0.17 -20.19 ( 0.39 -20.77 ( 0.39 -17.58 ( 0.27 -18.11 ( 0.02

-23.50 -26.98 -23.67 -20.14 -16.91 -19.62 -16.85 -17.91 -18.55 -17.91 -21.81 -18.00 -20.49 -20.96 -17.55 -18.31

-23.00 -26.01 -23.11 -19.70 -16.75 -19.70 -16.72 -17.73 -20.09 -20.87 -17.64 -17.17 -21.63 -18.08 -17.49 -18.54

-21.96 -27.08 -25.73 -20.75 -16.75 -18.18 -17.25 -18.48 -22.19 -19.22 -17.36 -17.85 -22.84 -17.47 -16.71 -13.92

a All energies are in kcal/mol and are based on a single energyminimized structure. b Error is standard deviation of results from forward and reverse perturbations. c Calculated with best-choice parameters.

boundary between solute and solvent is generally defined by the molecular surface generated with a solvent-sized probe sphere (in this work, of radius 1.4 Å). Given this description, the electrostatic potential, φ(r b), can be computed by solution of the PB equation:

∇‚(b)∇φ( r b) r - (b)κ r 2(b) r sinh φ(b) r ) -4πF(b) r

(2)

where (r b) describes the regions of varied dielectric constants, F(r b) describes the locations and magnitudes of the charges, and κ2(r b) ) 8πI(r b)/ekT describes the location of monovalent mobile ions (I(r b) is ionic strength). Most often, the linearized PB equation is used, in which sinh φ(r b) is approximated by φ(r b); this approximation is valid when φ(r b) is small in the regions of nonzero ionic strength, which is often the case for non-highly charged solutes (such as most proteins and carbohydrates). Given

∆G )

1 2

r b)dV r ∫V φ(b)F(

(3)

When the charge distribution is a finite set of point charges, the integral is replaced with a sum over these charges. For the nonlinear PB equation, an additional integral over the solvent is needed to account for the free-energy of the mobile ions. Efficient software for the solution of both linearized and nonlinear PB solvers has been available for two decades, and boththeoryandapplicationshavebeenextensivelyreviewed.26,38,65,67-70 The key parameters of the PB model are (1) a set of partial atomic charges, (2) a set of atomic radii, (3) a solvent probe radius for the generation of the molecular surface, and (4) an ion-exclusion radius (Stern layer). The choice of a probe radius of 1.4 Å has been well-established as a reasonable choice for water, and in the current work the choice of an ion-exclusion radius is irrelevant, as the ionic strength is zero. This leaves the atomic radii and charges as the variable parameters in the model. Either of these may be fixed at a given value, and the other optimized to give the best results with the PB equation, or both may be simultaneously varied; any of these approaches can give good results for the computation of solvation energies. The PARSE parameters for proteins were developed by assuming a simple set of radii and optimizing a set of partial atomic charges;58 various ab initio charge determination methods for small organic molecules have similarly been evaluated in the context of a fixed set of radii.59,60 However, for use with a particular molecular-mechanics force field, challenges arise if the partial atomic charges are not held fixed. The parameters of a force field are not independent, and the combination of all the parameters (charges, VDW parameters, bond and angle force constants, torsional barriers, and so on) are balanced to give the observed results; because of the codependencies between all energetic terms, it is inconsistent to substitute these partial atomic charges for any others. The radii, however, are not as well defined, and it has been shown that simply using the VDW

Continuum Solvation Calculations with Carbohydrates

J. Phys. Chem. B, Vol. 112, No. 16, 2008 5243

TABLE 2: Optimization of Continuum Electrostatic Parameters PB Radii (Å) RDFa

Rminb

OOH O5 C1-5 C6 HOH HCHx

1.55 1.60 2.00 2.00 1.40 1.40

1.77 1.77 2.28 2.28 0.22 1.32

rms errorf max. errorf R2

3.22 5.46 0.89

5.56 6.69 0.97 GB Radii (Å)

1.20‚RDFa

1.05‚Rminb

opt.c

1.86 1.86 2.39 2.39 0.24 1.39

1.86 1.94 2.40 2.40 0.00 0.00

Atomic Parameters 1.86 1.92 2.40 2.40 0.00 0.00 Performance on Training Set 0.58 1.09 0.98

1.05‚Rminb

1.02‚Rminb

OOH O5 C1-5 C6 HOH HCHx

1.86 1.86 2.39 2.39 0.24 1.39

1.80 1.80 2.31 2.31 0.23 1.34

rms errorf max. errorf R2

4.20 8.31 0.67

Performance on Training Set 1.84 1.72 4.73 4.12 0.67 0.71

opt.d Atomic Parameters 1.40 2.25 2.73 2.73 0.00 0.00

0.60 0.58 1.24 1.07 0.98 0.98 ACE Volumes (Å3) Vvore

0.93‚Vvore

opt.d

20.837 16.468 12.710 20.205 0.0 0.0

19.378 15.315 11.820 18.791 0.0 0.0

2.885 0.113 37.508 5.971 0.0 0.0

3.19 7.15 0.54

2.42 5.12 0.54

2.25 4.83 0.56

a Radii from first nonzero point of RDF. b Rmin from the all-atom CSFF parameter set. c Best result of downhill (greedy) optimization from multiple starting points (including RDF and 1.05‚Rmin). d Best result from 50 000 steps of MC search. e Voronoi volumes for ribose from the allatom PARAM27 parameter set for nucleic acids. f Errors in kcal/mol.

radii leads to significant errors in computed solvation energies.62,71 Thus, for use of the PB model to replace explicit solvent in the context of a particular force field, it is best to fix the partial atomic charges and consider the atomic radii as the sole tunable parameter. Initial sets of radii were chosen by two methods and subsequently optimized; the first rows of Table 2 contains a summary of these data. As the boundary of interest is that between solvent and solute, the solvent RDFs about each solute atom (discussed above) were considered, and the first point of nonzero solvent density was selected as the atomic radius. This procedure is similar to that employed by Roux and co-workers in their development of radii for the use of PB electrostatics with the PARAM22 protein and PARAM27 nucleic acid parameter sets,44,45 and by Swanson et al. in the parametrization of radii for the AMBER force-field.62 These radii consistently overestimated the electrostatic solvation free energy, with an rms error of 3.2 kcal/mol. However, the correlation between the PB- and FEP-computed energies was strong, with an R2 of 0.89. With hydrogen radii set to zero, the overestimation by the PB energies was much greater (29 kcal/mol rms error), as a result of the solvent approaching overly close to the hydroxyl protons; with an oxygen radius of 1.55, the high dielectric approaches as close a 0.5 Å from the proton charge, leading to an overly strong dielectric response. However, despite this error, a strong correlation was still seen (R2 of 0.86). A second set of radii was selected as the minimum energy distance (Rmin) from the CSFF VDW parameters. These radii also resulted in PBcomputed solvation free energies that were consistently higher than the FEP values (rms error of 5.6 kcal/mol/mol), but the correlation was excellent (R2 ) 0.97). Both sets of radii were then scaled uniformly in order to minimize the rms error between the PB and FEP results. The optimal scaling was 1.04 for the RDF-derived radii (1.1 kcal/mol rms error), 1.20 for the RDF-radii with zero-radius hydrogens (0.6 kcal/mol rms error),

and 1.05 for the VDW radii (0.6 kcal/mol rms error). The last two scalings gave very similar heavy-atom radii. The VDW hydrogen radii are nonzero, but the hydroxyl hydrogens have a radius 1.6 Å less than that of the hydroxyl oxygens, and the aliphatic hydrogens have a radius 1.0 Å less than that of the aliphatic carbons. For hydrogen atomic radii to have an effect, the difference in radius must be no larger than the hydrogenheavy-atom bond length, which is at most 1.1 Å. Further optimization of these radii, by minimization of the rms error with respect to each individual radius, only yielded a marginal improvement, as clearly seen in Figure 3 (left-most panel). The simplicity of the scaled VDW radii makes it an attractive choice, and the agreement with the explicit solvent results is remarkable; we suggest this set at the preferred choice for PB calculations with the CSFF force field. The PB equation contains a term describing the effect of mobile ions through a Debye-Hu¨ckel-like model; when the ionic strength is set to zero (as is the case here), the PB equation reduces to the Poisson equation. It has been shown that sets of charges and radii derived to optimize solvation energies in a Poisson model are able to reasonably reproduce ionic-strengthdependent properties when used in the PB model, without additional parametrization.72 Thus, it may be expected that the results presented here will be equally effective in the context of the full PB continuum electrostatic model, at least for uncharged sugars. Elcock and co-workers have also recently described a protocol by which ionic-strength dependent properties may be determined through molecular dynamics simulations with subsequent comparisons to PB-computed values.73 This procedure may be useful for further considering the parametrization of ionic-strength-dependent terms of the continuum models developed here, but is beyond the scope of this work. Optimization of Parameters for GB Implicit Solvent Calculations. While the PB model provides a relatively fast and highly accurate replacement for fully explicit solvent

5244 J. Phys. Chem. B, Vol. 112, No. 16, 2008

Green

Figure 3. Optimized correlation of PB, GBSW, and ACE solvation free energies with FEP results (training set). Dotted and dashed lines correspond to least-squares fits; solid line is y ) x. Black points correspond to scaled physical values (radii or volumes); gray points correspond to parameters with the minimum rms error.

simulations, faster solutions are desirable for many applications. One commonly used alternative is the GB model.39 While the basic physical model of the GB approach is the same as that in the PB model (solvent is represented as a continuum of high dielectric, and solutes as regions of low dielectric constant with partial atomic charges at atomic centers), the details are significantly different. In the GB model, each solute atom is treated as a sphere of low dielectric with an effective Born radius; the solvation free energy of that individual atom is then given by the Born equation:

∆Gself )i

(

)

2 1 1 qi 1 4π0 int ext 2bi

(4)

where int is the solute dielectric constant, ext is the solvent dielectric constant, qi is the partial atomic charge, and bi is the Born radius. The solvent-screening of interactions between atoms are given by screen ∆Gi,j )-

(

1 1 1 4π0 int ext

)x

qiqj -ri,j2/4bibj

(5)

ri,j + bibje 2

This is an arbitrary function that approaches the correct analytical forms from classical electrostatics at the extremes of small and large interatomic distances, ri,j. The total solvation energy of a system (of N atoms) is then given by N

∆Gsolv )

N

N

screen ∆Gself + ∑ ∑ ∆Gi,j ∑ i i)1 i)1 j)i+1

(6)

screen Note that ∆Gself ) (1/2)∆Gi,i , since ri,i ) 0; as a result, the i solvation energy is often written as

N

∆Gsolv ) (1/2)

N

screen ∆Gi,j ∑ ∑ i)1 j)1

(7)

As for the PB model, the theory and applications of the GB approach have been discussed extensively in the literature.39,40,74-78 Essential to the GB method is the definition of the effective Born radii for all atoms (bi). In the original derivation by Still et al., eq 4 was used with atomic solvation energies computed with the Poisson model to define bi for each atom;39 it has been shown that the GB method performs well with these “exact” Born radii.76 However, as this approach is computationally costly, alternative approaches have been developed to approximate this calculation in a much more efficient man-

ner.40,74,77 We have focused on two such approaches for the work presented here; the GBSW41 and ACE40 methods are both implemented in the CHARMM computer program. Parameters for each of these models were optimized in a similar manner as those for the PB calculations: by minimizing the error of electrostatic solvation energies computed by the approximate model and by FEP, using the set of eight hexopyranose sugars in both anomeric forms. The data are summarized in the latter rows of Table 2. The GBSW method was implemented with an intent to work with the same radii as used in PB calculations,41 and thus the radii derived for PB calculations (scaled VDW radii, 1.05‚Rmin) provide a useful starting set. While giving reasonable correlation (R2 ) 0.67), the rms error from this set was rather large as a result of the GB-computed energies consistently underestimating the solvation energies computed with FEP. This suggested that the radii were uniformly too large, and thus alternate scalings of the VDW radii were considered. An optimal scaling of 1.02‚ Rmin gave the same correlation, but much reduced rms error. GBSW used with this set of radii gave good performance (see Figure 3, center panel), although not nearly as impressive as that of the PB model. While the PB model was able to give near-optimal performance with scaled VDW radii, it is possible that an alternate radii set is needed for optimal GBSW performance. The 1.02‚Rmin radii were thus further optimized, allowing the radii of all unique atom types (as defined in the parent force-field, CSFF) to vary. This optimization only produced modest improvements in the statistical measures (R2 and rms error), and visually (Figure 3) does not make any significant improvement to the overall fit of GBSW to FEPcomputed energies. These optimized radii deviate significantly from the scaled VDW radii, and thus the small degree of improved performance raises the possibility that these radii are over-fit to the training set. Thus, in keeping with the principle of maximizing the simplicity of our approach, the 1.02‚Rmin scaled VDW radii are suggested as the optimal set for GBSW calculations with the CSFF force-field. It is not surprising that the optimal radii for GBSW differ somewhat from those for PB calculations; while the GBSW model is meant to approximate PB results, it has a different theoretical basis, and thus should be expected to have a different dependency on the model parameters. That the optimal radii for each method are similar, however, is an indication that both models are capturing the same fundamental physics, as desired. The ACE method takes as input a set of atomic volumes, rather than radii;40 these volumes are meant to capture the effective volume of the atom in the context of a molecule (i.e.,

Continuum Solvation Calculations with Carbohydrates

J. Phys. Chem. B, Vol. 112, No. 16, 2008 5245

Figure 4. Optimized correlation of PB, GBSW, and ACE solvation free energies with FEP results (testing set). Dashed lines correspond to the best linear fit; solid line is y ) x. Gray points are distinct conformations of the molecules used in fitting; black points are conformations of molecules that were not used in fitting. The bottom row is scaled to show only non-charged molecules.

bonded to neighboring atoms) and thus cannot be directly related in any simple way to the atomic radius. A set of such volumes for amino acids and for nucleic acids has been derived from analysis of protein, DNA, and RNA crystal structures, defining each atomic volume by the molecular surface (for the portion of atoms exposed on the surface) and a Voronoi polyhedron partitioning of the interior volume.52 While a similar approach could be taken to define effective volumes for carbohydrates, the resulting volumes would likely have to be further optimized; there is no reason to expect the Voronoi volumes to give optimal performance in ACE-computed solvation free energies. However, it is useful to initialize the optimization at a physically reasonable set of parameters, as a model grounded in physical principles is more likely to be robustly transferable between systems. As the bonded environment of the atoms of the hexopyranose sugars is very similar to that of the atoms of the pentofuranose sugars, the Voronoi volumes for the atoms of ribose (from Schaefer et al.) were taken as an approximation to the expected Voronoi volumes for the atoms of the pyranose sugars; each CSFF atom type was assigned the volume of the equivalent ribose atom. When used with ACE to compute solvation free energies, this set of volumes gives results that correlate reasonably (R2 ) 0.54) with those computed by FEP. However, the ACE energies were seen to be systematically lower in magnitude than those computed by FEP, resulting in significant errors in absolute energies. Thus, an initial correction was considered by uniformly scaling these volumes, as was done with the radii optimized for both PB and GBSW calculations. A scaling of 0.93 was found to give the best performance, significantly improving the error. While these volumes gave improved results as compared to the initial volumes, the errors were still fairly large in magnitude. Thus, these volumes were further optimized, allowing the volume of each atom type to vary freely. Again,

this optimization provided only minimal statistical improvement; as seen in Figure 3 (right-most panel), the optimized parameters give results nearly indistinguishable from the scaled Voronoi volumes. The optimized volumes differ greatly from the Voronoi volumes and take on non-physical values: oxygen volumes are near zero, and the volumes of the ring carbons are 3 times their Voronoi values. As a result, it seems very likely that these are greatly over-fit to the training data, and the scaled Voronoi volumes (0.93‚Vvor) are suggested as the best choice for ACE calculations with CSFF. Electrostatic Solvation Free Energies of Multiple Conformations and Modified Sugars. In any parameter optimization there is danger of over-fitting to the data used in training; it is essential to evaluate the transferability of a fit model to additional systems. To address this, a large number of additional calculations were performed, computing the electrostatic solvation free energy using FEP, PB, GB, and ACE (with the latter three using the best-choice parameters). Two particular questions of transferability were of particular interest. First, as only a single conformation of each sugar was used in training, the performance of the optimized parameters on distinct conformations of the training molecules was evaluated. To do this, 20 snapshots were taken (by regular sampling) from a molecular dynamics simulation from each of the 16 training molecules. The results of these studies (Figure 4, gray points) are clear: the solvation free energies calculated by all continuum-based methods correlate with the FEP-computed energies with essentially the same accuracy as the training data. Table 4 gives the electrostatic solvation free energy of each sugar, averaged over these snapshots, and demonstrates the importance of conformational sampling in computing the energetic properties of sugars. The average energies span a fairly narrow range (-27.5 to -21.7 kcal/mol), consistent for a set of molecules that are all diastereomers. However, the range of

5246 J. Phys. Chem. B, Vol. 112, No. 16, 2008

Green

TABLE 3: Overall Performance on Testing Set PB rms errord maximum absolute errord rms error from LSQ lined,e slope of LSQ linee

a

1.19 3.04 0.75 0.994

b

c

GB

ACE

1.69 5.51 1.02 0.988

4.71 22.7 1.74 0.805

a 1.05‚Rmin from the all-atom CSFF parameters, with protein radii of Roux. b 1.02‚Rmin from the all-atom CSFF parameters, with protein radii of Roux. c 0.93‚Vvor volumes for ribose, with 0.67‚Vvor protein volumes. d Errors in kcal/mol. e LSQ line: linear least-squares fit.

energies for individual structures of a single molecule is just as large; the standard deviation of the energies is about 3 kcal/ mol for most of the sugars. This can lead to significant mispredictions in the relative energetics of two sugars if a single structure is used, as is seen in the solvation energies in Table 1, where a range of -27.3 to -15.6 kcal/mol for the solvation energies of the 16 sugars was observed. This variation in energies is particularly notable given the relative rigidity of the sugar molecules. The ring of all sugars remains essentially fixed in the starting chair conformation (4C1) for the duration of all the simulations, and thus the only torsional degrees of freedom that are sampled are the rotation about the C5-C6 bond and the rotation of each hydroxyl proton. That such minor structural variations can lead to energetic difference of well over 5 kcal/mol has important ramifications for the calculation of carbohydrate energetics. In addition, the performance the optimized parameters on additional sugars, not present in the training set, is of interest. Thus, five additional molecules were considered: the R- and β- anomers of both fucose and N-acetyl-2-glucosamine (GlcNAc), and sialic acid (N-acetyl-neuraminic acid); these were chosen as representatives of common sugars (other than unmodified hexopyranoses) found in the carbohydrate structures of glycoproteins. Fucose is simply the 6-deoxy form of galactose, and thus is a slightly less polar molecule than those used in training the model; N-acetyl-2-glucosamine contains an amide in place of one hydroxyl, and thus is of similar (or slightly higher) polarity as the training molecules; sialic acid contains a charged carboxylate moiety as well as amide substitution at one hydroxyl and a three-carbon chain extending from the ring. Thus, sialic acid represents the greatest departure from the training set and the greatest challenge to the methods. Both GlcNAc and sialic acid contain chemical functionalities that were not included in the training set, and that were not part of the original CSFF force-field. Parameters (including partial atomic charges) for these functional groups were adapted from the PARAM22 protein parameters (the amide parameters from the acetylated Nterminus and the carboxylic acid parameters from a charged C-terminus). For PB and GBSW calculations, the optimized radii of Nina et al. were used for these atoms.44 The resulting data (Figure 4, black points) are striking s both the PB- and GBcomputed energetics show strong correlation (with near unity slope) through the full range of energies. That the radii optimized by Nina et al. and those optimized here work well together is additional evidence that neither set has been over-fit to the original data used in training these models. The ACE volumes for protein atoms have not been parametrized for reproduction of solvation free energies in the same manner as radii for PB and GB calculations; rather, Schaefer et al. have derived a set of volumes based on an analysis on protein structures.52 In order to make comparisons between the various methods more reasonable, these Voronoi volumes were refined so as to optimally reproduce published solvation free energies computed with explicit solvent methods.44 The data for this

procedure are presented in the Supporting Information, but the results are quite simple. Solvation energies computed with ACE and the Voronoi volumes gave good correlation with those computed by FEP, but were under-predicted in magnitude for all amino acids with the exception of the basic residues (Lys and Arg), while those for Lys and Arg were significantly overpredicted. To correct this, two changes were made. First, all volumes were scaled by 0.67 (giving a significantly improved rms error over all non-basic residues. Second, the volumes of the protons attached to formally charged nitrogens (atom type HC) were increased from zero to an optimal volume of 9.79 Å3. This corresponds to a sphere of radius 1.32, and thus is physically reasonable. In the analysis of Schaefer et al., all hydrogen atoms had their volumes set to zero. For most protons, this approximation does not have a significant effect. However, for the highly charged protons of charged basic groups, this leads to small effective Born radii and thus overly large (in magnitude) solvation free energies. With these adaptations, the error in ACE-computed energies for amino acids (relative to the FEP-computed energies) is similar to those for the sugar training set. However, when combined, the solvation energies for sialic acid are underestimated by ACE by a significant amount; for non-charged molecules, this systematic error is not present. The mean error relative to the least-squares line is somewhat larger than that for GBSW, consistent with the larger rms error in the training set. However, the overall correlation remains reasonable. Comparative Performance of Three Solvation Models: Ramifications for Hierarchical Procedures. As is often the case, for any particular application it is necessary to balance accuracy with computational expense, and thus understanding the cost of each method is essential; the following results on costs are all based on the use of an AMD Opteron 250 processor. Explicit-solvent FEP calculations are very computationally demanding: each calculation took roughly 24 h. The PB-based solvation calculations are much faster: the evaluation of each conformation took roughly 6 s. To compute GB and ACE energies for all 20 snapshots took roughly 3.1 and 1.2 s, respectively. However, a significant fraction of this time was spent in setup, and the evaluations of 2000 frames of the same trajectory (a 100-fold increase in the number of calculations) required only 60 s for ACE (a 50-fold increase in cost) and 200 s for GBSW (a 65-fold increase in cost). Thus, for these small solutes, GBSW provides a speed-up of about 40- to 60fold relative to PB, while a speed-up of 100- to 200-fold is seen for ACE. The ACE model is faster, by a factor of about 3, than the GBSW model; the GBSW model, on the other hand, gives more accurate predictions of the FEP-computed solvation energy (and of the highly correlated PB-computed energy). As a result, each method may be preferred in different circumstances. As a replacement for explicit solvent or for PB-based continuum solvent, the increased accuracy of GBSW may make it preferable. However, another use of GB-type models is as part of a hierarchical procedure, where a large set of initial states are successively screened through increasingly more accurate (and more expensive) procedures; such approaches are common in both ligand and protein design. In this case, the faster performance of ACE may prove particularly useful, as it would allow for the screening of more states with the same amount of expense. Important Future Directions. It is important to note that the analysis here has focused on the energies of individual structures. While many applications require precisely this sort

Continuum Solvation Calculations with Carbohydrates

J. Phys. Chem. B, Vol. 112, No. 16, 2008 5247

TABLE 4: Monosaccharide Electrostatic Hydration Free Energiesa R-allose β-allose R-altose β-altose R-galactose β-galactose R-glucose β-glucose R-gulose β-gulose R-idose β-idose R-mannose β-mannose R-talose β-talose R-fucose β-fucose R-GlcNAc β-GlcNAc Sialic acid

∆GFEP

∆GPB

∆GGBSW

∆GACE

-25.35 (3.29) -24.64 (3.57) -26.05 (3.14) -24.19 (2.92) -23.15 (2.74) -27.24 (4.81) -22.54 (2.87) -24.23 (2.71) -23.33 (3.22) -24.29 (2.87) -22.14 (2.67) -22.20 (3.41) -22.93 (3.13) -24.85 (2.51) -21.72 (3.43) -27.54 (3.27) -19.89 (3.11) -23.93 (3.99) -26.68 (2.05) -32.96 (3.10) -120.17 (6.03)

-24.15 (2.76) -23.49 (2.85) -24.85 (2.66) -23.38 (2.37) -22.44 (2.24) -26.22 (4.06) -21.98 (2.22) -23.59 (2.18) -22.26 (2.82) -23.62 (2.35) -21.40 (2.21) -21.47 (2.76) -22.40 (2.55) -24.02 (2.05) -21.27 (2.75) -26.24 (2.81) -18.73 (2.49) -22.44 (3.40) -26.11 (1.66) -31.60 (2.82) -118.94 (6.11)

-21.82 (2.52) -20.58 (2.62) -22.46 (2.60) -20.48 (2.02) -19.76 (2.18) -23.19 (3.91) -19.22 (1.99) -20.66 (1.97) -19.85 (2.13) -20.72 (2.23) -18.85 (1.75) -19.08 (2.52) -19.58 (2.37) -21.30 (1.82) -18.91 (2.15) -23.35 (2.82) -16.42 (2.24) -19.70 (3.20) -23.50 (1.47) -28.57 (2.43) -115.53 (6.27)

-23.21 (3.13) -23.83 (3.34) -26.91 (3.41) -24.36 (3.21) -23.17 (2.35) -26.02 (4.38) -23.05 (2.15) -24.32 (2.49) -21.56 (3.78) -25.13 (2.55) -22.05 (2.86) -21.47 (2.76) -24.11 (3.27) -22.67 (2.35) -21.00 (3.22) -22.84 (3.50) -18.58 (2.62) -21.85 (3.56) -25.28 (2.27) -28.87 (2.72) -93.02 (6.48)

a All energies (in kcal/mol) are given as averages over 20 conformations sampled from an explicit-solvent dynamic trajectory, with the standard deviation in parentheses.

of calculation, molecular dynamics simulations additionally require the accurate calculation of forces. It has been shown that optimizing a GB model without a focus on forces still yields stable and accurate molecular dynamics simulations for proteins and nucleic acids,76 and thus it is likely that the same will be true for these carbohydrate parameters. Liu and co-workers have taken a combined approachsconsidering both the reproduction of PB-calculated solvation free energies and the stability of dynamic trajectories with respect to those using explicit solventsin parametrizing a GB model for proteins.79 However, it is difficult to directly compare the results of this parametrization to those that do not explicitly consider dynamics. Detailed comparisons of carbohydrate simulations in both explicit and implicit solvent will be required to validate the use of the parameters presented here in a molecular dynamics simulation. Additionally, the interactions between distinct molecules are of great interest to many problems in biology, while the focus of this work was on the solvation energies of isolated molecules. While a thermodynamic cycle may be constructed that decomposes the energetics of intermolecular interactions in solution into a gas-phase interaction energy and the solvation energies of the bound and unbound components, it is entirely possible that the optimal parameters for these calculations will differ from those for single-molecule solvation energies. There are a number of ways in which the performance of these models can be used for addressing molecular association. For one, comparisons of computed to experimental binding free energies could be made. However, this would require a large enough data set of experimentally determined binding free energies of carbohydrates to various macromolecules. Alternatively, an approach similar to that presented here could be taken: detailed explicit-solvent simulations could be used to derive the optimal parametrization for implicit solvent models. Brooks and coworkers have recently described one such approach, involving the matching of the potentials of mean force for interacting molecules between distinct computational methods.78 Both these issues are important in the modeling of mixed carbohydrate/ protein systems, and thus must be considered as the computational models of carbohydrates mature. The optimized models presented here provide a foundation on which these extensions may be made.

Finally, a major problem in glycobiology is the prediction of the three-dimensional structures of oligosaccharides.12-14 As is the case for proteins, the conformations of oligosaccharides are defined by coupled degrees of freedom that operate on different structural scales: movements along the glycosidicbond torsions lead to global rearrangements, while rotations of the dihedrals of the functional groups on each monomer lead to local variations. The free energy of a given configurational state is determined partially by the local preferences of each torsion, but also by nonbonded interactions between distinct monomers, by solute-solvent interactions, and by entropic considerations. The role of solvent cannot be ignored; it has been shown that solvent can play an important role in determining both the conformational preferences of monosaccharides7 and the minimum free-energy structure of oligosaccharides.80,81 These configurations can be explored by a number of computational techniques: molecular dynamics,15,17 Monte Carlo,8,82 systematic search,80,81,83 and genetic algorithms84 have all been used. However in all applications, a balance must be chosen between a thorough search of conformations and accurate energy evaluation. Explicit-solvent molecular dynamics is potentially the most accurate approach, but is prohibitively expensive for complete sampling of configurational space. Other techniques provide much more thorough sampling by taking large steps through conformational space, but these steps are largely incompatible with an explicit-solvent model. Accurate continuum solvent models would provide increased accuracy to these methods. Of course, an open question is how well implicitsolvent models can capture the details of solvent structure and its effect on solute energetics; for example, can the stabilization of a water-bridged interaction be well-described by a continuum model? Further studies carefully comparing explicit and implicit models for larger systems will be necessary to answer these questions in the context of oligosaccharide structure. It is likely that these comparisons will reveal deficiencies in the initial models presented here, which may then be improved by further optimization. Recently, Margulis and co-workers have suggested an elegant combination of search methods: first, a discrete conformational search is used with a fast, approximate ranking to generate a set of low-energy structures; these are then subjected to explicit-

5248 J. Phys. Chem. B, Vol. 112, No. 16, 2008 solvent molecular dynamics for a fairly short period of time to refine the structure and assess its stability.80,81 For small systems, these simulations can be extended to compute relative conformational free energies by population analysis, but limitations in sampling make this approach infeasible for larger oligosaccharides. However, the results presented here suggest an extension to these methods s free energies may be computed for each dynamic ensemble using the popular molecular mechanics/Poisson-Boltzmann surface area (MM/PBSA) model,85 with the optimized parameters presented here. This may provide an ideal system in which to assess the overall performance of continuum solvent models for oligosaccharides. 4. Conclusions A set of optimized parameters for the use of implicit solvent models in conjunction with the CHARMM CSFF has been developed through detailed comparisons with explicit solvent FEP simulations. For carbohydrates, the optimal parameters for each model are simple physical constants (VDW radii or Voronoi volumes), scaled by a constant factor. The PB model gives remarkable performance with the optimized radii, giving an rms error relative to the explicit solvent results of only 1.2 kcal/mol over a data set including modified sugars that were not included in the training set. Two GB models were also parametrized. The GBSW model gives only slightly poorer performance than the PB model (rms error of 1.7 kcal/mol), while the ACE model performs somewhat worse (rms error of 4.7 kcal/mol). However, both models are strongly correlated with the explicit solvent results, and the ACE model is about 3 times faster than GBSW model. The optimal radii for both PB and GBSW give very close agreement to explicit solvent results for both N-acetyl-glucosamine and sialic acid when the amide and acid functionalities were modeled with previously optimized parameters for proteins. This strongly suggests that these parameter sets are compatible, and may be reasonably combined in the modeling of mixed carbohydrate-protein systems. Acknowledgment. The author thanks Bruce Tidor for making available the MultigridPBE software. This work was supported by start-up funding awarded to D.F.G. by the State University of New York, the College of Engineering and Applied Sciences and the Department of Applied Mathematics and Statistics. Supporting Information Available: The details of the optimization of protein Voronoi volumes for ACE calculations. This material is available free of charge via the Internet at http:// pubs.acs.org. References and Notes (1) Ho¨lemann, A.; Seeberger, P. H. Curr. Opin. Biotechnol. 2004, 15, 615-622. (2) Wu, B.; Tan, Z.-P.; Chen, G.; Chen, J.-H.; Hua, Z.-H.; Wan, Q.; Ranganathan, K.; Danishefsky, S. J. Tetrahedron Lett. 2006, 47, 80098011. (3) Chen, J.-H.; Chen, G.; Wu, B.; Wan, Q.; Tan, Z.-P.; Hua, Z.-H.; Danishefsky, S. J. Tetrahedron Lett. 2006, 47, 8013-8016. (4) Prescher, J. A.; Bertozzi, C. R. Cell 2006, 126, 851-854. (5) van Eijck, B. P.; Hooft, R. W. W.; Kroon, J. J. Phys. Chem. 1993, 97, 12093-12099. (6) French, A. D.; Kelterer, A.-M.; Cramer, C. J.; Johnson, G. P.; Dowd, M. K. Carbohydr. Res. 2000, 326, 305-322. (7) Kirschner, K. N.; Woods, R. J. Proc. Natl. Acad. Sci. U.S.A. 2001, 98, 10541-10545. (8) Ho¨o¨g, C.; Rotondo, A.; Johnston, B. D.; Pinto, B. M. Carbohydr. Res. 2002, 227, 2023-2036. (9) Muslim, A. M.; Bryce, R. A. Chem. Phys. Lett. 2004, 388, 473478.

Green (10) Kessler, H.; Matter, H.; Gemmecker, G.; Kling, A.; Kottenhahn, M. J. Am. Chem. Soc. 1991, 113, 7550-7563. (11) Miller, K. E.; Mukhopadhyay, C.; Cagas, P.; Bush, C. A. Biochemistry 1992, 31, 6703-6709. (12) Woods, R. J. Curr. Opin. Struct. Biol. 1995, 5, 591-598. (13) Wormald, M. R.; Petrescu, A. J.; Pao, Y. L.; Glithero, A. T. E.; Dwek, R. A. Chem. ReV. 2002, 102, 371-386. (14) Johnson, M. A.; Pinto, B. M. Carbohydr. Res. 2004, 339, 907928. (15) Mukhopadhyay, C. Biopolymers 1998, 45, 177-190. (16) Zuegg, J.; Gready, J. E. Glycobiology 2000, 10, 959-974. (17) Veluraja, K.; Margulis, C. J. J. Biomol. Struct. Dyn. 2005, 23, 101111. (18) Sorin, E. J.; Engelhardt, M. A.; Herschlag, D.; Pande, V. S. J. Mol. Biol. 2002, 317, 493-506. (19) Hornak, V.; Okur, A.; Rizzo, R. C.; Simmerling, C. Proc. Natl. Acad. Sci. U.S.A. 2006, 103, 915-920. (20) Gilson, M. K.; Honig, B. Proc. Natl. Acad. Sci. U.S.A. 1989, 86, 1524-1528. (21) Hendsch, Z. S.; Tidor, B. Protein Sci. 1994, 3, 211-226. (22) Hendsch, Z. S.; Jonsson, T.; Sauer, R. T.; Tidor, B. Biochemistry 1996, 35, 7621-7625. (23) Lee, M. R.; Duan, Y.; Kollman, P. A. Proteins: Struct., Funct., Genet. 2000, 39, 309-316. (24) Zacharias, M.; Luty, B. A.; Davis, M. E.; McCammon, J. A. Biophys. J. 1992, 63, 1280-1285. (25) Hendsch, Z. S.; Tidor, B. Protein Sci. 1999, 8, 1381-1392. (26) Sheinerman, F. B.; Norel, R.; Honig, B. Curr. Opin. Struct. Biol. 2000, 10, 153-159. (27) Green, D. F.; Tidor, B. J. Mol. Biol. 2004, 342, 435-452. (28) Joughin, B. A.; Green, D. F.; Tidor, B. Protein Sci. 2005, 14, 13631369. (29) Strockbine, B.; Rizzo, R. C. Proteins: Struct., Funct., Bioinf. 2007, 67, 630-642. (30) Liu, H.-Y.; Kuntz, I. D.; Zou, X. J. Phys. Chem. B 2004, 108, 5453-5462. (31) Lee, M. R.; Sun, Y. J. Chem. Theory Comput. 2007, 3, 11061119. (32) Kangas, E.; Tidor, B. J. Phys. Chem. B 2001, 105, 880-888. (33) Mandal, A.; Hilvert, D. J. Am. Chem. Soc. 2003, 125, 5598-5599. (34) Sims, P. A.; Wong, C. F.; McCammon, J. A. J. Comput. Chem. 2004, 25, 1416-1429. (35) Green, D. F.; Tidor, B. Proteins: Struct., Funct., Bioinf. 2005, 60, 644-657. (36) Chellappan, S.; Reddy, G. S. K. K.; Ali, A.; Nalam, M. N. L.; Anjum, S. G.; Cao, H.; Kairys, V.; Fernandes, M. X.; Altman, M. D.; Tidor, B.; Rana, T. M.; Schiffer, C. A.; Gilson, M. K. Chem. Biol. Drug Des. 2007, 69, 298-313. (37) Kuttel, M.; Brady, J. W.; Naidoo, K. J. J. Comput. Chem. 2002, 23, 1236-1243. (38) Gilson, M. K.; Honig, B. H. Nature 1987, 330, 84-86. (39) Still, W. C.; Tempczyk, A.; Hawley, R. C.; Hendrickson, T. J. Am. Chem. Soc. 1990, 112, 6127-6129. (40) Schaefer, M.; Karplus, M. J. Phys. Chem. 1996, 100, 1578-1599. (41) Im, W.; Lee, M. S.; Brooks, C. L., III. J. Comput. Chem. 2003, 24, 1691-1702. (42) Brooks, B. R.; Bruccoleri, R. E.; Olafson, B. D.; States, D. J.; Swaminathan, S.; Karplus, M. J. Comput. Chem. 1983, 4, 187-217. (43) Beglov, D.; Roux, B. J. Chem. Phys. 1994, 100, 9050-9063. (44) Nina, M.; Beglov, D.; Roux, B. J. Phys. Chem. B 1997, 101, 52395248. (45) Banavali, N. K.; Roux, B. J. Phys. Chem. B 2002, 106, 1102611035. (46) Straatsma, T. P.; McCammon, J. A. Annu. ReV. Phys. Chem. 1992, 43, 407-435. (47) Kumar, S.; Bouzida, D.; Swendsen, R. H.; Kollman, P. A.; Rosenberg, J. M. J. Comput. Chem. 1992, 13, 1011-1021. (48) Gilson, M. K.; Sharp, K. A.; Honig, B. H. J. Comput. Chem. 1988, 9, 327-335. (49) Gilson, M. K.; Honig, B. Proteins: Struct., Funct., Genet. 1988, 4, 7-18. (50) Green, D. F.; Tidor, B. In Current Protocols in Bioinformatics; Petsko, G. E., Ed.; John Wiley & Sons, Inc.: New York, 2003; Chapter 8.3. (51) Altman, M. D.; Tidor, B. MultigridPBEsSoftware for Computation and Display of Electrostatic Potentials; MIT: Cambridge, MA, unpublished. (52) Schaefer, M.; Bartels, C.; LeClerc, F.; Karplus, M. J. Comput. Chem. 2001, 22, 1857-1879. (53) Weiner, S. J.; Kollman, P. A.; Case, D. A.; Singh, U. C.; Ghio, C.; Alagona, G.; Profeta, S., Jr.; Weiner, P. J. Am. Chem. Soc. 1984, 106, 765784. (54) Jorgensen, W. L.; Tirado-Rives, J. J. Am. Chem. Soc. 1988, 110, 1657-1666.

Continuum Solvation Calculations with Carbohydrates (55) MacKerell, A. D.; Wiorkiewicz-Kuczera, J.; M. Karplus, M. J. Am. Chem. Soc. 1995, 117, 11946-11975. (56) Cornell, W. D.; Cieplek, P.; Bayly, C. I.; Gould, I. R.; Merz, K. M.; Ferguson, D. M.; Spellmeyer, D. C.; Fox, T.; Caldwell, J. W.; Kollman, P. A. J. Am. Chem. Soc. 1995, 117, 5179-5197. (57) Jorgensen, W. L.; Maxwell, D. S.; Tirado-Rives, J. J. Am. Chem. Soc. 1996, 118, 11225-11236. (58) Sitkoff, D.; Sharp, K. A.; Honig, B. J. Phys. Chem. 1994, 98, 19781988. (59) Green, D. F.; Tidor, B. J. Phys. Chem. B 2003, 107, 1026110273. (60) Rizzo, R. C.; Aynechi, T.; Case, D. A.; Kuntz, I. D. J. Chem. Theory Comput. 2006, 2, 128-139. (61) Cabani, S.; Gianni, P.; Mollica, V.; Lepori, L. J. Solution Chem. 1981, 10, 563-595. (62) Swanson, J. M. J.; Adcock, S. A.; McCammon, J. A. J. Chem. Theory Comput. 2005, 1, 484-493. (63) Angyal, S. J. Angew. Chem., Int. Ed. 1969, 8, 157-166. (64) Carlson, H. A.; Nguyen, T. B.; Orozco, M.; Jorgensen, W. L. J. Comput. Chem. 1993, 14, 1240-1249. (65) Mohan, V.; Davis, M. E.; McCammon, J. A.; Pettitt, B. M. J. Phys. Chem. 1992, 96, 6428-6431. (66) Gilson, M. K.; Honig, B. H. Biopolymers 1986, 25, 20972199. (67) Sharp, K. A.; Honig, B. Annu. ReV. Biophys. Biophys. Chem. 1990, 19, 301-332. (68) Sharp, K. A.; Honig, B. J. Phys. Chem. 1990, 94, 7684-7692. (69) Davis, M. E.; McCammon, J. A. Chem. ReV. 1990, 90, 509521. (70) Honig, B.; Sharp, K.; Yang, A.-S. J. Phys. Chem. 1993, 97, 11011109.

J. Phys. Chem. B, Vol. 112, No. 16, 2008 5249 (71) Dixit, S. B.; Bhasin, R.; Rajasekaran, E.; Jayaram, B. J. Chem. Soc., Faraday Trans. 1997, 93, 1105-1113. (72) Huyghues-Despointes, B. M. P.; Thurlkill, R. L.; Daily, M. D.; Schell, D.; Briggs, J. M.; Antosiewicz, J. M.; Pace, C. N.; Scholtz, J. M. J. Mol. Biol. 2003, 325, 1093-1105. (73) Thomas, A. S.; Elcock, A. H. J. Am. Chem. Soc. 2006, 128, 77967806. (74) Qui, D.; Shenkin, P. S.; Hollinger, F. P.; Still, W. C. J. Phys. Chem. A 1997, 101, 3005-3014. (75) Reddy, M. R.; Erion, M. D.; Agarwal, A.; Viswanadhan, V. N.; McDonald, D. Q.; Still, W. C. J. Comput. Chem. 1998, 19, 769-780. (76) Onufriev, A.; Case, D. A.; Bashford, D. J. Comput. Chem. 2002, 23, 1297-1304. (77) Lee, M. S.; Salsbury, F. R., Jr.; Brooks, Charles L. I. J. Chem. Phys. 2002, 116, 10606-10614. (78) Chen, J.-H.; Im, W.; Brooks, C. L., III. J. Am. Chem. Soc. 2006, 128, 3728-3736. (79) Zhu, J.; Shi, Y.; Liu, H. J. Phys. Chem. B 2002, 106, 4844-4853. (80) Xia, J.; Daly, R. P.; Chuang, F.-C.; Parker, L.; Jensen, J. H.; Margulis, C. J. J. Chem. Theory Comput. 2007, 3, 1620-1628. (81) Xia, J.; Daly, R. P.; Chuang, F.-C.; Parker, L.; Jensen, J. H.; Margulis, C. J. J. Chem. Theory Comput. 2007, 3, 1629-1643. (82) Peters, T.; Meyer, B.; Stuike-Prill, R.; Somorjai, R.; Brisson, J.-R. Carbohydr. Res. 1993, 238, 49-73. (83) Rosen, J.; Robobi, A.; Nyholm, P.-G. Carbohydr. Res. 2004, 339, 961-966. (84) Nahmany, A.; Strino, F.; Rosen, J.; Kemp, G. J. L.; Nyholm, P.G. Carbohydr. Res. 2005, 340, 1059-1064. (85) Srinivasan, J.; Cheatham, T. E.; Cieplak, P.; Kollman, P. A.; Case, D. A. J. Am. Chem. Soc. 1998, 120, 9401-9409.