Multiscale Coarse-Graining of Monosaccharides - The Journal of

The MS-CG procedure described in Section 2 was applied to the saved ..... Allocations of computer time from the Pittsburgh Supercomputing Center (PSC)...
0 downloads 0 Views 383KB Size
11566

J. Phys. Chem. B 2007, 111, 11566-11575

Multiscale Coarse-Graining of Monosaccharides Pu Liu, Sergei Izvekov, and Gregory. A. Voth* Center for Biophysical Modeling and Simulation and Department of Chemistry, UniVersity of Utah, Salt Lake City, Utah 84112-0850 ReceiVed: March 17, 2007; In Final Form: July 23, 2007

A systematic multiscale coarse-graining (MS-CG) algorithm is applied to build coarse-grained models for monosaccharides in aqueous solution. The methodology is demonstrated for the example of R-D-glucopyranose. The nonbonded interactions are directly derived from the force-matching approach, whereas the bonded interactions are obtained through Boltzmann statistical analyses of the underlying atomistic trajectory. The MS-CG model is shown to reproduce many structural and thermodynamic properties in the constant NPT ensemble. Although the model is derived at a single temperature, pressure, and concentration, it is shown to be reasonably transferable to other thermodynamic states. In this model, long-range interactions are effectively mapped into short-range forces with a moderate cutoff and are evaluated by table look-up. As a result, molecular dynamics employing the MS-CG model is ∼3 orders of magnitude more efficient than its atomistic counterpart. Consequently, the model is particularly suitable for simulating carbohydrate systems at large length and long time scales. Results for an R-(1f4)-D-glucan with 14 glucose units are also presented, demonstrating that the MS-CG algorithm is also applicable to the coarse-graining of other saccharide systems.

I. Introduction Carbohydrates (saccharides) are the most abundant product of photosynthesis. They act as structural materials in living organisms as well as for energy storage. Moreover, they are directly involved in vital biological processes,1 including cellular recognition and antibody immune response. Saccharides are also important components of emerging biomaterials, including artificial tissues. Solutions of saccharides (or their hydrolysis derivatives) are widely used in the pharmaceutical and food industries. In particular, understanding the behavior of solutions with very high saccharide concentrations is crucial for food preservation and manufacture of pharmaceutical tablets. Because concentrated saccharide and water mixtures have very high viscosity and usually form glasses at normal temperatures,2 atomistic simulation studies of their thermodynamic properties and dynamics is a very challenging task. Coarsegrained (CG) modeling3-36 provides a promising way to study long time scale behavior at a reasonable computational cost. Ideally, CG models can capture the most fundamental physical and chemical properties after averaging out less important detailed atomistic information both spatially and temporally. This goal is usually achieved by grouping atoms into coarser sites intuitively and, thus, significantly reducing the complexity of the system considered. The key challenge is the parametrization of effective interactions between coarse-grained sites, which is typically carried out by an iterative adjustment of parameters in a set of preselected analytical functions. However, due to the complexity of the original atomistic models, the preselected functions may be far from the optimal functions that can decribe the interaction between CG sites accurately. The CG M3B model has been developed by Molinero and Goddard21-23 for malto-oligosaccharides and their water mixtures, in which each glucose monomer is represented by three * Author to whom correspondence should be addressed. E-mail: [email protected].

Figure 1. (a) Atomistic structures for R-D-glucopyranose and water molecules. (b) Coarse-grained R-D-glucopyranose and water structures.

beads, and water, by a single site. The valence interactions between two CG sites are expressed as conventional bond, angle, and torsion interactions, depending on the connectivity between CG site pairs. The nonbonding interaction is described with a two-body Morse potential. Since no charges or hydrogenbonding terms are included in their method, their model is extremely efficient: simulations using their model are ∼7000 times more efficient than their atomistic counterpart. The M3B model can reproduce several properties, such as excluded volume, distribution of torsion angle, and even the glass transition temperatures. However, the Morse function is not necessarily the optimal function to describe the interaction between CG sites. Instead of being derived directly from the atomistic simulations, the parameters are fitted from some specific thermodynamic properties. Therefore, their model does not accurately predict relevant structural properties, such as radial distribution functions (RDF). Recently, a new method, multiscale coarse-graining (MSCG),14-16 has been developed in our group for the systematic construction of a CG force field from the underlying explicit

10.1021/jp0721494 CCC: $37.00 © 2007 American Chemical Society Published on Web 09/13/2007

Multiscale Coarse-Graining of Monosaccharides

J. Phys. Chem. B, Vol. 111, No. 39, 2007 11567

Figure 2. Representative effective force (top panels; unit: kcal/mol/Å) and energy (bottom panels; unit: kcal/mol) curves between CG site pairs, A-W (left column) and A-C (right column), employed in the MS-CG simulations. In the top-left subplot, the dashed line is the force curve directly from FM procedure, and the solid line is the force curve with a constant extrapolation into the core region.

atomistic trajectories via a statistical mechanical extension of force-matching (FM).37 The interactions at the CG level are also no longer restricted to intuitively selected analytical forms, so the optimal interactions for a given system can be computed. The method has demonstrated its robustness and accuracy for a variety of systems, including water,37 methanol,15 and lipids,14,16,28 as well as R-helix and β-hairpin peptides.35 Molecular dynamics simulation with the MS-CG model can reproduce structural properties that are comparable to those observed in atomistic simulations using a significantly simplified CG representation. In this work, the MS-CG approach is applied to develop a coarse-grained model of R-D-glucopyranose solution. The outline of the article is as follows: In Section 2, the CG scheme is introduced, and the MS-CG method is briefly described, followed by a short description of the resulting MS-CG force fields. The thermodynamic properties from the resulting MSCG model are presented in Section 3, along with a discussion of its transferability to other thermodynamic conditions. Section 4 contains conclusions.

extended to polysaccharides, because polysaccharides usually contain 1-1, 1-4, or 1-6 (R- or β-) linkages between monomers. B. Multiscale Coarse-Grained Force Field. A new coarsegraining method, statistically derived from the least-squares FM approach,37,38 is the basis of the MS-CG approach. This method can systematically build a CG pairwise effective force field from given atomistic trajectories using force data, regardless of the origins of this information. A detailed description of the MSCG method is given elsewhere.37 Essentially, a set of CG force field parameters {gm} are optimized by minimizing a residual 2 function, ∑l,i|F B ref F CG F ref i,l - B i,l ({gm})| , where B i,l is a sum of total forces acting on atoms within CG site i in the lth frame of the reference trajectory, and B F CG i,l ({gm}) is the CG force calculated from the corresponding CG trajectory given the parameter set {gm}. If the force field to be fit is expressed as a linear function of fitting parameters, such as those that can often be achieved via an appropriate (e.g., cubic spline) interpolation, the aforementioned optimization problem can be written as the following set of overdetermined linear equations:

II. Methods A. Coarse-Graining Scheme. Most CG schemes usually derive from the resolution level of interest and physical or chemical intuition. The CG model employed in this article is similar to the M3B model by Molinero and Goddard.22 In this model, three CG sites (A, B and C) are employed to represent an R-D-glucopyranose molecule. Their positions coincide with the C1, C4, and C6 atoms, respectively, in the atomistic model, as shown in Figure 1. This CG scheme has a resolution comparable to the CG water molecules, which may enhance its capacity to capture accurate structural information. The preferred conformation of R-D-glucopyranose is a chair conformation and, thus, far from flat. However, due to the rigidity of this ring structure, a normal vector can be defined to characterize the molecular orientation. With this CG scheme, the orientational information of molecules can be determined. These properties make the mapping between atomistic and CG models more straightforward. Most importantly, such a CG representation maintains the connectivity of the atomistic model, even when

-

∑ ∑ ∑

γ)nb,b β)1,K j)1,Nβ

(

f(rRil,βjl, {fRβ,γ,k}, {f ″Rβ,γ,k}) + QRβ r

2

Ril,βjl

)

CG δγ,nb rˆRil,βjl ) B FRil (1)

with respect to a set of force parameters {fRβ,k, f′′ Rβ,k, QRβ}, where R ) 1, ‚‚‚, K; i ) 1, ‚‚‚, NR; k ) 1, ‚‚‚, NRβ,mesh. In eq 1, {Ril} labels the ith atoms of kind R in the lth configurations; rRil,βjl is the distance between atoms {Ri} and {βi} in the lth configuration; QRβ ) QRQβ with QR being the partial charge of the CG sites of kind R; Nβ, K, and NRβ,mesh are, respectively, the number of CG sites of kind β, the total number of types of CG sites in the system, and the number of mesh points for the interaction between type R and type β; and δγ,nb is Kronecker’s delta function to distinguish bond and nonbonded interactions. In some MS-CG models, QRβ is set to zero so that only short-

11568 J. Phys. Chem. B, Vol. 111, No. 39, 2007

Liu et al.

Vb )

1

1

kr(r - r0)2 + ∑ kθ(θ - θ0)2 + ∑ 2 bonds angles 2 ∑ Cn(cos(nø - γn) + 1) dihedrals

(4)

where Kr, kθ, and Cn are force constants; r0, θ0, and γ are equilibrium positions for bonds, angles, and dihedral angles, respectively; and n is an integer number. For the R-Dglucopyranose solution, only the bonded interactions exist in the MS-CG potentials. III. Results and Discussions

Figure 3. Bond distributions from atomistic (solid line) and CG simulation (circles) for the bonds A-B, B-C, and A-C.

TABLE 1: Bonded Parameters for r-D-Glucopyranose (eq 4) bond type A-C A-B B-C

r0 (Å)

kr (kcal mol-1 Å-2)

3.680 2.910 2.579

146.20 173.00 161.65

ranged CG forces are considered in the model. This is the case in the present study. Because the coarse-graining of a system does not necessarily preserve virial terms, in some cases, the MS-CG models derived from the aforementioned procedure fail to maintain the proper internal pressure in constant NPT simulations. As a result, the system density deviates from atomistic results. Fortunately, because the internal virial depends linearly on atomistic forces, the FM force field can be constrained to reproduce the correct pressure by one additional constraint,

-







γ)nb,b R)1,K;β)1,K i)l, NR;j)l,Nβ

{f′′Rβ,γ,k})rRil,βjl +

QRβ rRil,βjl

(

f(rRil,βjl, {fRβ,γ,k},

)

δγ,nb ) 3W atm + 2(E kin,atm l l E kin,CG ) (2) l

kin,atm where W atm , and E kin,CG are the instantaneous atoml , El l istic virial, the instantaneous atomistic kinetic energy, and CG kinetic energy in the lth configuration, respectively. The interactions between CG sites can be expressed as an empirical force field potential, V, with a bonded part, Vb, and a nonbonded part, Vnb, i.e.,

V ) Vb + Vnb

(3)

The FM method allows one to explicitly separate the bonded and nonbonded forces. Thus, both the bonded potential, Vb, and nonbonded potential, Vnb, can be directly obtained from the FM algorithm described above. For the sake of simplicity, to implement the MS-CG model into popular molecular dynamics (MD) packages, a systematic and accurate way is adapted to retrieve effective bonded force field parameters from MD trajectories, as shown in the MS-CG application on ionic liquid.33 The term Vb generally has bond, angle, and dihedral angle contributions among CG sites,

In this section, the MS-CG method is applied to construct CG force fields for R-D-glucopyranose/water mixtures of different concentrations at two different temperatures, T ) 300 and 353 K, and pressures, p ) 1 and 500 atm. All the atomistic simulations were carried out with the OPLS/AA39 force field for the R-D-glucopyranose and SPC/E40 model for explicit water molecules with a cutoff of 12.17 Å using the MD code GROMACS.41 The long-range electrostatic interactions were treated with the PME method. The internal geometries of water molecules were constrained using SETTLE,42 and all bond lengths were fixed by LINCS,43 which allowed the use of a 2 fs time step to propagate the dynamics. The Berendsen thermostat and barostat44 were employed to control the temperature and pressure, respectively. The CG force field was used for the MD simulations in the DL_POLY program.45 No electrostatic interaction evaluations or bond constraints are required for the CG systems in the present application. A. MS-CG Modeling of a r-D-Glucopyranose Solution with 9.1% Glucose Molar Fraction. An atomistic R-Dglucopyranose and water mixture with 50 R-D-glucopyranose and 500 water molecules under periodic boundary conditions was first equilibrated in the constant NPT ensemble for 400 ps at a temperature of T ) 300 K and a pressure of p )1 atm. This specific system size was chosen because it is small enough to be manipulated in the MS-CG procedure while large enough to reduce the finite size effects and allow pairwise effective interactions to be well sampled up to the cutoff length. At the chosen concentration, the temperature 300 K is higher than the glass transition temperature of the system. Thus, the system will not be trapped in the glassy state and is more likely to be well sampled during regular MD simulations. The equilibrated system was then propagated for 6 ns in the constant NVT ensemble at T ) 300 K in a cubic box with a side length of 29.56 Å. During the simulation, a total of 6000 configurations with atomic positions, velocities, forces, and system virial were recorded every 2 ps. To avoid the computationally expensive long-range electrostatic calculations, the partial charges of CG sites are set to zero on the basis of the fact that there is no net charge for R-Dglucopyranose molecules. The MS-CG procedure described in Section 2 was applied to the saved configurations to obtain nonbonded interactions between CG sites. Representative force and energy curves for the pairwise interaction among the CG sites are plotted in Figure 2. As seen from the gray line of Figure 2, there is a small region, r ∈ [0, rcore], that cannot be sampled sufficiently due to the large repulsion force once two CG particles are very close to each other. Therefore, the force curves in the core region need to be somehow extrapolated. In the present work, a constant force is chosen, as shown in the black line of Figure 2. The specific choice is expected to have little impact on system properties in ambient conditions because there

Multiscale Coarse-Graining of Monosaccharides

J. Phys. Chem. B, Vol. 111, No. 39, 2007 11569

Figure 4. Radial distribution functions (RDFs) between CG sites for a system consisting of 50 R-D-glucopyranose and 500 water molecules from atomistic (solid line) and CG (dashed line) simulations at T ) 300 K and p ) 1 atm.

is only a remote probability for two CG sites to be so close. However, this constant force might be softer than it should be if the pressure or temperature is too high, thus making the density higher than that in atomistic simulations. In our simulation, the force curves are tabulated without any smoothing procedure. From the trajectories, the probability distribution, which was fit to the Boltzmann distribution, was obtained for the three bonds in CG a-D-glucopyranose molecules:

P(r) ) C exp(-Vb(r)/kBT)

(5)

where Vb ) 1/2kr(r - r0)2, C is a constant, kB is the Boltzmann’s constant, and T is the system temperature. If the bond considered is an isolated bond, the procedure is an exact solution. However, this treatment is incomplete, since the bonds in the condensed phase are no longer isolated. The iterative procedure employed by Wang et. al.33 was adapted for the R-D-glucopyranose/water mixture system, in which the force constant and equilibrium bond length are adjusted by the following transformation, in 0 n,out ) (k n,in kr0)/k n,out r n+1, ) r n,in k n+1,in r r r 0 0 + r0 - r 0

(6)

where n is the number of iterations, and in and out denote whether the parameters are the input or output parameters for CG simulations. The parameters usually converged after two iterations. The bond distributions from the second-round CG simulations are almost identical to those from atomistic simulations, as shown in Figure 3. The corresponding bonding parameters are summarized in Table 1. These results suggest that the iterative Boltzmann analysis is an accurate way to systematically derive CG bonding parameters for molecules in condensed phases. B. Comparison of Structural Properties. The radial distribution functions (RDFs) g(r) were calculated according the following formula,

g(r) )

V Ns2

δ(r - rij)〉 ∑i ∑ j*i



(7)

where V is the system volume, Ns is the total number of sites, δ is the Dirac delta function, and rij is the radial distance between site i and site j. The RDFs of atomistic simulations and CG simulations at T ) 300 K are shown in Figure 4. Generally speaking, the MS-CG model can reproduce structural information quite accurately. However, the radial anisotropy of each CG site cannot be retained in CG simulations, although the MSCG model partially carries the orientational information of R-Dglucopyranose molecules, which explains the differences in the RDFs. For example, there are less bulky groups in some directions around C1 in atomistic models, whereas the site A is treated as an isotropic sphere in the CG model (as shown in Figure 1). Thus, the probability for CG sites A from two R-Dglucopyranose molecules to be at a separation of 5 Å is higher in atomistic models than in CG models, resulting in a notable difference in the atomistic and MS-CG RDFs of an A-A pair. The rigid planar structure and the large dipole of a water molecule make it very difficult to be coarse-grained into a single site. As expected, the RDFs for water sites (site W) from atomistic and MS-CG simulations are significantly different for the simple SPC/E model, as seen in Figure 5a. Because of the strong hydrogen bond and pronounced tetrahedral order among water molecules in atomistic models, there is usually a higher peak and more significant valleys and peaks for the RDF from atomistic simulations, as compared with the one from CG simulations. In Figure 5b, the numbers of neighbors are plotted as functions of the simulation time. Surprisingly, the time evolution of a neighbor number from CG simulation is very similar to that from atomistic simulation. The distributions of neighbor numbers are shown in Figure 5c, where the similarity among the outlines of distributions from both simulations can be easily identified. The distribution from a CG simulation is more spread out because it is less structured. These results show the MS-CG water model can still reproduce the correct coordination numbers, even though the detailed tetrahedral network cannot be rebuilt with a single site representation. The relative orientation of R-D-glucopyranose molecules is explored in Figure 6. The cosine of the relative angle between the normal vectors of a pair of molecules, cos(θ), is used as the

11570 J. Phys. Chem. B, Vol. 111, No. 39, 2007 x-axis. The normal vector of an R-D-glucopyranose molecule is defined as

rC1-C4 × rC1-C6/||rC1-C4 × rC1-C6|| where rC1-C4 and rC1-C6 is the vector between atoms C1 (site A) and C4 (site B), C1 (site A) and C6 (site C), respectively. The distance r between the centers of two molecules (the center is defined as the geometrical center of C1, C4, and C6) is monitored as the y-axis. To capture the relative orientational information between two close R-D-glucopyranose molecules, the histogram of (cos(θ), r) for each pair of molecules is normalized by r2 and color-coded from blue (low density) to red (high density). The similarity between the outline of the top panel (CG model) and bottom panel (atomistic panel) demonstrates that the present CG model preserves not only the average density of molecules at a distance r from a given molecule but also the relative orientation between two molecules. For example, in both the CG and atomistic model, a pair of R-D-glucopyranose molecules separated by a distance r ) 6.5 Å prefers perpendicular orientation. However, as seen in Figure 6, the CG model does not capture the asymmetric behavior in the relative orientation. In particular, there is a pronounced peak at cos(θ) ) 1 and r ) 4.5 Å, indicating that two molecules prefer parallel orientation to antiparallel orientation when r ) 4.5 Å. This phenomenon is expected, since the CG sites C1, C4, and C6 are defined to be spherical in CG modeling. This suggests that more CG sites (i.e., a higher resolution MS-CG model) will be needed to describe the system in more detail. C. Thermodynamic and Dynamical Properties. The MSCG simulations were performed in the constant NPT ensemble with the pressure p ) 1 atm and temperature T ) 300 K. When compared with the density from atomistic simulations, the relative error of the density from CG simulations is 1 order of magnitude larger than the values from atomistic simulations, (0.077 ( 0.006) × 10-5 cm2/s (R-D-glucopyranose) and (0.678 ( 0.009) × 10-5 cm2/s (water). The discrepancy is due to the smoother energy landscape and the reduced number of degrees of freedom in the MS-CG model. More realistic dynamics can be recovered when appropriate friction is applied to the MS-CG sites, as shown by Izvekov and Voth.48 However, if only the equilibrated structures are of interest, the faster dynamics is an additional advantage for the MS-CG model. D. Transferability over Different Thermodynamic Conditions. Good transferability to various thermodynamic conditions is one appealing property of a general force field. In principle, one might reproduce some properties of a particular training set with oversimplified functional forms, in which inaccuracies in various energy components happen to cancel out. However, the imbalance of inaccuracies will lead to very poor performance when the force field is applied to other systems or other

Liu et al. thermodynamic conditions beyond the training set. In CG modeling, the transferability becomes more problematic because of the anisotropic geometry of CG groups. The MS-CG modeling reduces the system complexity by averaging over the degrees of freedom absent from the CG system. This averaging process causes the resulting force field to be training-setdependent, that is, the CG model from one system may give poor results in another system or other thermodynamic conditions. To examine the thermodynamic transferability of the MSCG R-D-glucopyranose force field, the system with 50 R-Dglucopyranose and 500 water molecules was simulated at different temperatures and pressures. The MS-CG force field employed in the studies was the one originally developed at T ) 300 K and p ) 1 atm for the aforementioned system. The constant NPT simulations were performed at T ) 353 K and p ) 1 atm for 4 ns. The trajectories were recorded every 2 ps for the analysis. The temperature T ) 353 K was chosen because the water will evaporate at very high temperature. The resulting RDFs from CG and atomistic simulations are shown in Figure 7. When compared with the RDFs from T ) 300 K, the first maximum is usually broader and weaker, which is consistent with the fact that molecules rotate and translate more rapidly at higher temperatures. The similarity between MS-CG sitesite RDFs from CG and atomistic simulations can be clearly seen from Figure 7. Similarly, independent CG/atomistic simulations were conducted at two thermodynamics points, (T ) 353 K, p ) 500 bar) and (T ) 300 K, p ) 100 bar). Only the results at T ) 353 K and p ) 500 bar are plotted in Figure 8 because the agreement is similar for the simulation at T ) 300 K and p ) 100 bar. We also extended the applications of the MS-CG force field to different concentrations. Four other compositions, 25 R-Dglucopyranose/900 water, 100 R-D-glucopyranose/500 water, 125 R-D-glucopyranose/150 water, and 125 R-D-glucopyranose molecules, were simulated with CG and atomistic force fields. For each concentration, three different thermodynamic points (T ) 300 K, p ) 1 bar; T ) 353 K, p ) 1 bar; and T ) 353 K, p ) 500 bar) were chosen for 4 ns constant NPT simulations for both CG and atomistic simulations. The RDFs for pure R-Dglucopyranose solutions at T ) 353 K (this temperature is

Figure 10. Radial distribution functions between CG sites for a system consisting of 150 R-D-glucopyranose and 125 water molecules from atomistic (solid line) and CG (dashed line) simulations at T ) 353 K and p ) 1 atm.

Multiscale Coarse-Graining of Monosaccharides

J. Phys. Chem. B, Vol. 111, No. 39, 2007 11573

Figure 11. Radial distribution functions between CG sites for a system consisting of 25 R-D-glucopyranose and 900 water molecules from atomistic (solid line) and CG (dashed line) simulations at T ) 353 K and p ) 1 atm.

chosen to alleviate a sampling problem for this glassy system) and p ) 1 atm from CG/atomistic simulations are plotted in Figure 9. It can be seen that the present model is able to capture the structural information with significant accuracy. A small inconsistency in the RDFs of A-A and B-B might come from the reduced detail in the MS-CG model. The higher first peak in the C-C RDF from MS-CG is a result of the anisotropic atomic geometry for site C. Similarly, the RDFs for the system with 125 R-D-glucopyranose and 150 water molecules are shown in Figure 10. When the R-D-glucopyranose solution is more diluted, the deviations of CG results from the atomistic results are larger because the water molecules become more and more important, and a single site CG water model has some difficulty in reproducing the accurate structure. As seen in Figure 11, the difference of RDFssin particular, the A-A RDFssbetween the MS-CG and atomistic simulations for the system with 25 R-D-glucopyranose and 900 water molecules is more pronounced, although the overall quality is still acceptable. These results show that the MS-CG model can give satisfactory structural properties. In particular, the model yields surprisingly good properties for high-concentration R-D-glucopyranose solution at different temperatures and pressures, although the MS-CG force field was derived from one thermodynamic condition and a single concentration. Because the present CG model is derived solely from R-Dglucopyranose solution at one specific concentration, it will likely give poor results when applied to other saccharide systems without any modification. However, it should be emphasized that the MS-CG method is general and can be readily applied to difference and even much more complex systems. To demonstrate this point, a MS-CG model for a different system, R-(1f4)-D-glucan with 14 glucose units, is developed here using a different atomistic force field, the carbohydrate solution force field (CSFF),49 for R-D-glucopyranose and TIP3P39 for water. Five saccharide molecules were solvated in 4653 water molecules, resulting in a system size of 70.2199 × 0.1775 × 5.5030 Å. The system was first equilibrated for 600 ps at p ) 1 atm and T ) 300 K, and then it was simulated in the constant NVT ensemble for 4.5 ns with conformations recorded every 1 ps. The same CG scheme is used for this system, that is, the positions of CG sites A, B, and C coincide with C1, C4, and C6 atoms in the atomistic model, respectively. The MS-CG

Figure 12. Results for the MS-CG model of R-(1f4)-D-glucan with 14 glucose units compared to atomistic MD results. Panel a: bonding distributions from atomistic (solid line) and MS-CG simulation (circles) for bonds between atoms A and B (intermolecular), B and C, A and B (intramolecular), and A and C. Panel b: radial distribution functions between CG sites A-B and B-W.

procedure described in Section II was applied to the saved trajectory to obtain the bonded and nonbonded interactions. In a manner similar to the MS-CG modeling of R-D-glucopyranose solution, the force curves in the core region were extrapolated to be constant. To validate the derived MS-CG force field, the CG system of 4637 water molecules and one polysaccharide molecule was simulated in the constant NVT ensemble for 1.5 ns. For comparison, the atomistic simulation was performed for

11574 J. Phys. Chem. B, Vol. 111, No. 39, 2007

Liu et al.

Figure 14. Time evolution of the number of the tagged R-Dglucopyranose molecules in the upper (black line) and lower half (gray line) of the simulation box from atomistic MD (solid line) and MSCG (dashed line) simulations.

Figure 13. Additional results for the MS-CG model of R-(1f4)-Dglucan with 14 glucose units as compared to atomistic MD results. Panel a: Joint distribution of two torsion angles, 41′4′1′′ (the torsion angle between sites B of the nth residue, A and B of the (n + 1)th residue, and A of the (n + 2)th residue) and 141′4′ (the torsion angle between sites A and B of the nth residue, and A and B of the (n + 1)th residue) for atomistic model. The density of gray represents the population of each state. Panel b: Joint distribution of the two torsion angles, 41′4′1′′ and 141′4′, for the MS-CG model.

1.5 ns for the same system. The results are shown in Figure 12. The bonding distributions from atomistic and CG simulations are almost identical, as shown in Figure 12a. The representative RDFs are shown in Figure 12b, which demonstrates that the CG model can capture the structural information accurately. To characterize the configurational space of R-(1f4)-Dglucan, the joint distribution of two torsion angles, 41′4′1′′ (the torsion angle between sites B of the nth residue, A and B of the (n +1)th residue, and A of the (n + 2)th residue) and 141′4′ (the torsion angle between sites A and B of the nth residue, and A and B of the (n + 1)th residue), was monitored for the atomistic model (Figure 13a) and MS-CG model (Figure 13b). The density of gray represents the population of each state. It is seen that the MS-CG model can capture the major configuration space, although the distribution from the atomistic simulation is more spread out. Both models predict that the lefthanded helices (corresponding to the negative 141′4′ torsion angle) are the predominant structures, which is consistent with experimental observations. E. Efficiency. In the present MS-CG model, there is no explicit charge in CG sites for either R-D-glucopyranose or the water molecules. Therefore, no computationally demanding long-range electrostatic interactions need to be evaluated. The complexity of the system is also significantly reduced because there are only three sites and one site for R-D-glucopyranose and water, respectively. Moreover, the CG sites are heavier than the atomistic sites, and the bonding force constants are smaller than any force constant in the atomistic simulation. All of these

factors make a time step as large as 10 fs suitable for the MSCG MD simulations. The nonbonded interactions and forces are tabulated in the MS-CG model; thus, only a table look-up is needed for the force evaluations. To test the efficiency of the present MS-CG models, a system with 250 R-D-glucopyranoses was simulated in a constant NPT ensemble at p ) 1 atm and T ) 353 K. The results show that the MS-CG model is ∼3 orders of magnitude more efficient than its atomistic counterpart. To examine the mixing efficiency, 125 CG molecules initially in one-half of the box were tagged. The time evolution of the number of tagged R-D-glucopyranose molecules within the two halves of the box was monitored, as shown in Figure 14. The R-D-glucopyranose molecules from the two halves of the box are thoroughly mixed within 1.5 nanoseconds, whereas no significant mixing process is observed for the atomistic MD simulation. This result is consistent with the observation that the dynamical processes evolve on a significantly shorter time scale for CG models, thus demonstrating that the MS-CG model is particularly suitable if only the equilibrium properties are of interest. IV. Conclusions A MS-CG model has been developed for R-D-glucopyranose and water mixture systems from atomistic simulations. Unlike the usual CG force-field development, the MS-CG force field can be systematically derived without complex iterations for parameter optimization (only a simple iteration might be needed for bonding interaction parameters). Compared with preselected analytical functions, the tabulated nonbonded interactions in the MS-CG model have the flexibility to describe the interactions between CG sites more accurately. Because of the simplified CG representation of molecules and the tabulated interactions, the simulations with the model are ∼3 orders of magnitude more efficient, as compared to atomistic simulations. Microsecond simulation can thus be accessible on a single desktop computer for a CG R-D-glucopyranose and water system equivalent to an atomistic system having thousands of atoms. Without any explicit hydrogen bonding or even partial charges, the MS-CG model can still reproduce the accurate structural properties and some of the important thermodynamic properties. However, the dynamics is usually faster in the MSCG simulation due to the reduction of degrees of freedom and the smoother energy landscape in our MS-CG model. The correct dynamics can be recovered, if desired, by introducing

Multiscale Coarse-Graining of Monosaccharides appropriate frictions to the CG sites with their parameters derived systematically from atomistic trajectories.48 However, if only equilibrium properties are of interest, the enhanced dynamics is an additional advantage of the MS-CG model over the atomistic model. The present MS-CG model was originally derived from an atomistic trajectory at one specific concentration, temperature, and pressure. It was then applied to different temperatures (T ) 300 K and T ) 350 K), different pressures (p ) 1 atm and p ) 500 atm), and different molar concentrations of R-Dglucopyranose (c ) 2.7, 9.1, 16.7, 45.5, and 100%). The structural and thermodynamic properties were found to be comparable to the results from atomistic MD simulations in all the cases studied. Thus, it appears possible to derive an MSCG model that is transferable to other thermodynamic conditions for certain systems. For the R-D-glucopyranose and water mixture, the hydrogen bonds between glucose/glucose, water/ water, and glucose/water molecules contribute significantly to the system properties. The ubiquitous existence of hydrogen bonds might be partially responsible for this degree of thermodynamic transferability. The present CG model is derived solely from R-D-glucopyranose solution at one specific concentration, so it may likely give erroneous information when applied to other saccharide systems without any modification. However, it should be appreciated that the MS-CG method is general and can be readily applied to other systems. To demonstrate this point, an MS-CG model for a different system, R-(1f4)-D-glucan with 14 glucose units, was developed for a different underlying atomistic force field. The results demonstrate that the resulting MS-CG model can again produce reasonable properties. With the current CG scheme for R-D-glucopyranose molecules, it may be straightforward to apply our systematic MSCG algorithm to oligosaccharide and polysaccharide systems with long straight or branched chains. Polysaccharides are indispensable components of bacterial capsules, extracellular matrix, and other systems. They play a pivotal role in numerous crucial processes, including cell-cell adhesion and antibody recognition. The resulting MS-CG force field might eventually be used to study these important systems and processes. These and other issues will be explored in future research. Acknowledgment. This research was supported by the National Science Foundation (CHE-0628257). We thank Prof. Valeria Molinero for her help in providing several atomistic trajectories and stimulating discussions and Dr. Ian Thorpe for a critical reading of the manuscript. Allocations of computer time from the Pittsburgh Supercomputing Center (PSC) are gratefully acknowledged. References and Notes (1) Dwek, R. A. Chem. ReV. 1996, 96, 683. (2) Slade, L.; Levine, H. Crit. ReV. Food Sci. Nutr. 1991, 30, 115. (3) Bond, P. J.; Sansom, M. S. P. J. Am. Chem. Soc. 2006, 128, 2697. (4) Brown, S.; Fawzi, N. J.; Head-Gordon, T. Proc. Natl. Acad. Sci. U.S.A. 2003, 100, 10712. (5) Buchete, N.-V.; Straub, J. E.; Thirumalai, D. J. Chem. Phys. 2003, 118, 7658. (6) Chu, J. W.; Izvekov, S.; Voth, G. A. Mol. Simul. 2006, 32, 211. (7) Ding, F.; Dokholyan, N. V.; Buldyrev, S. V.; Stanley, H. E.; Shakhnovich, E. I. Biophys. J. 2002, 83, 3525. (8) Friedel, M.; Sheeler, D. J.; Shea, J. E. 2003, 118, 8106.

J. Phys. Chem. B, Vol. 111, No. 39, 2007 11575 (9) Fujitsuka, Y.; Takada, S.; Luthey-Schulten, Z. A.; Wolynes, P. G. Proteins 2004, 54, 88. (10) Go, N.; Abe, H. Biopolymers 1981, 20, 991. (11) Goetz, R.; Lipowsky, R. J. Chem. Phys. 1998, 108, 7397. (12) Head-Gordon, T.; Brown, S. Curr. Opin. Struct. Biol. 2003, 13, 160. (13) Hoang, T. X.; Cieplak, M. J. Chem. Phys. 2000, 113, 8319. (14) Izvekov, S.; Voth, G. A. J. Phys. Chem. B 2005, 109, 2469. (15) Izvekov, S.; Voth, G. A. J. Chem. Phys. 2005, 123, 134105. (16) Izvekov, S.; Voth, G. A. J. Chem. Theory Comput. 2006, 2, 637. (17) Lazaridis, T.; Karplus, M. Curr. Opin. Struct. Biol. 2000, 10, 139. (18) Liwo, A.; Odziej, S.; Pincus, M. R.; Wawak, R. J.; Rackovsky, S.; Scheraga, H. A. J. Comput. Chem. 1997, 18, 849. (19) Lopez, C. F.; Nielsen, S. O.; Moore, P. B.; Klein, M. L. Proc. Natl. Acad. Sci. U.S.A. 2004, 101, 4431. (20) Marrink, S. J.; de Vries, A. H.; Mark, A. E. J. Phys. Chem. B 2004, 108, 750. (21) Molinero, V.; Cagin, T.; Goddard, W. A., III J. Phys. Chem. A 2004, 108, 3699. (22) Molinero, V.; Goddard, W. A., III J. Phys. Chem. B 2004, 108, 1414. (23) Molinero, V.; Goddard, W. A., III Phys. ReV. Lett. 2005, 95, 045701. (24) Murtola, T.; Falck, E.; Patra, M.; Karttunen, M.; Vattulainen, I. J. Chem. Phys. 2004, 121, 9156. (25) Neri, M.; Anselmi, C.; Cascella, M.; Maritan, A.; Carloni, P. Phys. ReV. Lett. 2005, 95, 218102. (26) Nguyen, H. D.; Hall, C. K. Proc. Natl. Acad. Sci. U.S.A. 2004, 101, 16180. (27) Onuchic, J. N.; Wolynes, P. G. Curr. Opin. Struct. Biol. 2004, 14, 70. (28) Shi, Q.; Izvekov, S.; Voth, G. A. J. Phys. Chem. B 2006, 110, 15045. (29) Shih, A. Y.; Arkhipov, A.; Freddolino, P. L.; Schulten, K. J. Phys. Chem. B 2006, 110, 3674. (30) Smith, A. V.; Hall, C. K. Proteins 2001, 44, 344. (31) Stevens, M. J. J. Chem. Phys. 2004, 121, 11942. (32) Tozzini, V. Curr. Opin. Struct. Biol. 2005, 15, 144. (33) Wang, Y.; Izvekov, S.; Yan, T.; Voth, G. A. J. Phys. Chem. B 2006, 110, 3564. (34) Zacharias, M. Protein Sci. 2003, 12, 1271. (35) Zhou, J.; Thorpe, I. F.; Izvekov, S.; Voth, G. A. Biophys. J. 2007, 92, 4289. (36) Zhou, Y.; Karplus, M. Proc. Natl. Acad. Sci. U.S.A. 1997, 94, 14429. (37) Izvekov, S.; Parrinello, M.; Burnham, C. J.; Voth, G. A. J. Chem. Phys. 2004, 120, 10896. (38) Ercolessi, F.; Adams, J. B. Europhys. Lett. 1994, 26, 583. (39) Jorgensen, W.; Maxwell, D.; Tirado-Rives, J. J. Am. Chem. Soc. 1996, 118, 11225. (40) Berendsen, H. J. C.; Grigera, J. R.; Straatsma, T. P. J. Phys. Chem. 1987, 91, 6269. (41) Lindahl, E.; Hess, B.; van der Spoel, D. J. Mol. Mod. 2001, 7, 306. (42) Miyamoto, S.; Kollman, P. A. J. Comput. Chem. 1992, 13, 952. (43) Hess, B.; Bekker, H.; Berendsen, H. J. C.; Fraaije, J. G. E. M. J. Comput. Chem. 1997, 18, 1463. (44) Berendsen, H. J. C.; Postma, J. P. M.; van Gunsteren, W. F.; Hermans, J. Intermolecular Forces; D. Reidel Publishing Company: Dordrecht, The Netherlands, 1981. (45) Forester, T. R.; Smith, W. DL_POLY user manual; CCLRC, Daresbury Laboratory: Daresbury, Warrington, UK, 1995. (46) Motakabbir, K. A.; Berkowitz, M. J. Phys. Chem. 1990, 94, 8359. (47) Tironi, I. G.; van Gunsteren, W. F. Mol. Phys. 1994, 83, 381. (48) Izvekov, S.; Voth, G. A. J. Chem. Phys. 2006, 125, 151101. (49) Kuttel, M.; Brady, J. W.; Naidoo, K. J. J. Comput. Chem. 2002, 23, 1236.