Computational Study of the Conformational Structures of Saccharides

Publication Date (Web): October 1, 2009 ... implementation of J coupling calculations in our recently developed Fast Sugar Structure Prediction Softwa...
0 downloads 0 Views 2MB Size
Biomacromolecules 2009, 10, 3081–3088

3081

Computational Study of the Conformational Structures of Saccharides in Solution Based on J Couplings and the “Fast Sugar Structure Prediction Software” Junchao Xia† and Claudio J. Margulis* Department of Chemistry, University of Iowa, Iowa City, Iowa 52242 Received July 6, 2009; Revised Manuscript Received September 11, 2009

This article reports on the implementation of J coupling calculations in our recently developed Fast Sugar Structure Prediction Software (FSPS). The FSPS combines a smart and exhaustive algorithm to search through conformational space with the calculation of different experimental nuclear magnetic resonance observables to establish the conformation of saccharides in solution. Using our algorithm in combination with NMR data, we investigate the solution structure of three simple disaccharides (methyl R-sophoroside, methyl R-laminarabioside, and methyl R-cellobioside) and one complex bacterial polysaccharide (Shigella flexneri 5a).

Introduction 1,2

Carbohydrate-based drugs have recently elicited significant attention due to the unique role that these molecules play in various recognition phenomena.3-6 To tackle the complex problem of carbohydrate recognition, one must first be able to determine the three-dimensional structure of complex carbohydrates. As an example, capsular polysaccharides (CPS) and lipopolysaccharides (LPS) are virulence factors7 on the cell surface of Gram-negative bacteria, which are commonly targeted as potential vaccine candidates.1,2,7 Nuclear magnetic resonance (NMR) techniques are the most prevalent experimental tools used to investigate the three-dimensional structures of carbohydrates in solution.8,9 Traditionally nuclear Overhauser effects (NOEs) of (H1-H1) are used to derive distance information across glycosidic linkages. Their values only provide information that is limited to short distance ranges.10 Residual dipolar couplings (RDCs)11,12 measurements have been used to derive the angle subtended between internuclear vectors and the alignment tensor in an aqueous dilute liquid crystalline medium. Such long-range information is ideally complementary to the short-range information obtainable from NOEs. An alternative technique for deriving carbohydrate structural information is the measurement of J spin-spin coupling constants or J scalar couplings.13,14 Unlike RDCs, which describe the direct couplings between nuclear magnetic dipole moments, JSCs describe the indirect coupling between nuclear dipoles mediated by the surrounding electrons.13 It is well-known that three bond vicinal spin couplings 3J obey a Karplus-type relationship,15-17 namely, the values of J couplings depend on torsional angles. In the particular case of glycosidic linkages, the values of inter-residue 3 J are determined by glycosidic dihedrals φ or ψ. Vicinal 3J have not been widely used so far because of the lack of interresidue H1-H1 3JHH values. Fortunately, recent developments have made the measurements of C13-H1 and C13-C13 possible and available. From a computational perspective, a successful determination of the 3D structure of sugars in aqueous solution depends on * To whom correspondence should be addressed. E-mail: claudio-margulis@ uiowa.edu. † Current address: BioMaPS Institute, Rutgers University, 610 Taylor Road, Piscataway, NJ 08854.

several factors such as the reliability of solvation models, the accuracy of force fields, and the availability of fast sampling algorithms.8,9 Commonly used implicit solvent models18 treat the solvent in a mean-field way. Several studies including some from our group have shown19-28 that, because of the particular polyalcohol structure of sugars, the detailed structure of the surrounding water does matter in making accurate structural predictions. For example, sugars are known to form watermediated hydrogen bonds that favor particular inter-residue conformations. From this perspective, explicit solvent simulations appear to be unavoidable to determine the 3D conformation of sugars in solution (see, for example, ref 29). Unfortunately, it is often the case that one cannot sample the full conformational space of a complex oligo- or polysaccharide on a time scale accessible by molecular dynamics (MD) simulations in explicit solvent. Simulations are often trapped in local basins and transitions between conformations are rare events, particularly when crowding and branching is present. Several force fields such as MM3,30,31 AMBER,32,33 CHARMM,34 OPLS,35-37 and GLYCAM38-40 are available for studying saccharides. An important recent development is GLYCAM 06.41 These potential parametrizations have the significant challenge of dealing with the subtle balance between inter- and intramolecular hydrogen bonding, the wide range of possible monosaccharides, functional groups, and glycosidic linkages as well as needing to account for the gauche, anomeric, and exoanomeric effects.32,42 For complex oligosaccharides, validation tests against NMR observables remain challenging (see for example these interesting review articles9,42). To tackle the sampling problem and as a possible aid for the analysis of NMR data, we have recently developed a fast structural prediction software for oligosaccharides and polysaccharides in solution.27,28,43 The program consists of a set of modules. The first module deals with carbohydrate ring perception and automatic detection of rotatable dihedrals. Other modules perform exhaustive systematic searches for clashes in dihedral space. An important part of the program involves a substructure matching algorithm for querying allowed conformations of previously studied substructures of saccharides stored in a database of conformers. Optimization of sterically allowed structures in the gas phase or in implicit solvent is done through an interface to external molecular modeling packages. The

10.1021/bm900756q CCC: $40.75  2009 American Chemical Society Published on Web 10/01/2009

3082

Biomacromolecules, Vol. 10, No. 11, 2009

algorithm computes different NMR observables for all sterically allowed and energy minimized structures and compares values with available experimental data. A ranking of structures is finally provided based on this comparison. Explicit solvent MD simulations are carried out for final refinement and evaluation of thermal averages around the global free energy minimum basin and adjacent accessible phase space regions. It is important to emphasize that the FSPS is neither a molecular builder44-47 nor a tool to determine NMR restraints. The FSPS is an exhaustive tool for the prediction of families of energy minimized structures that are consistent with the observed NOEs, RDCs or J couplings. In principle, if available force fields in combination with implicit solvents were accurate enough, the structures generated by the FSPS could be used to generate energies for computing partition functions and true Boltzmann averages that would yield purely computational ensemble averaged NMR observable predictions. As we have demonstrated in previous articles,27,28 this is often not the case. Instead, the most common use of the FSPS is not as a de novo tool for the computational prediction of structure but instead in combination with experimental data. From each member of the exhaustive set of allowed and minimized conformational families generated by the FSPS, different NMR observables are computed and compared with experimental values to find close matches. This methodology is only useful when a unique or small set of unique free energy minima are relevant. The approach is expected to fail when the free energy landscape is shallow and flat. It is often the case, however, that molecules important for biological recognition have particular local or global free energy minima, which are recognized by different agents and therefore their free energy landscape is not flat. Validation of the methodology is carried out in two different forms. For small saccharides, results can be directly checked against long MD simulations in explicit solvent. These are very time-consuming when compared to an FSPS run but a full free energy profile can be derived from them. In the case of larger molecules for which a full conformational sampling using MD in explicit solvent is simply not possible, we have used in the past a set of NMR observables such as NOEs to generate a ranking of most likely conformational families in solution and tested our results against an independent orthogonal method such as RDC measurements. If only the comparison of computational NOEs and experimental data is used to rank the FSPS structures, and the highest-ranked structure has computed RDCs that match the experimental values, then we conclude that structure is determined. In this article we present results in which computationally derived JSCs for conformers produced by the FSPS are ranked in comparison to experiments. To check the validity of this approach we compared these predictions to long explicit solvent simulations or to independent experimental NOE data. The disaccharides investigated in this article are methyl R-sophoroside (β-D-Glc-(1f2)-R-D-Glc-1-OME), methyl R-laminarabioside (β-D-Glc-(1f3)-R-D-Glc-1-OME), and methyl R-cellobioside (β-D-Glc-(1f4)-R-D-Glc-1-OME), as shown in Figure 1. In spite of their relatively simple chemical sequences, complete descriptions of the 3D conformational structures in solution of these molecules are still missing and recent attempts have been carried via JSC measurements48 to derive them. Shigella flexneri is a gram-negative bacillus responsible for the endemic form of shigellosis, a dysenteric syndrome.49 In this article, we also study the O-specific polysaccharide moiety Shigella flexneri 5a, a repeating unit of which is shown in Figure

Xia and Margulis

Figure 1. Schematic representation of three disaccharides: (a) methyl R-sophoroside, (b) methyl R-laminarabioside, and (c) methyl R-cellobioside. The dashed lines show the hydrogen bonds present in crystal structures.

Figure 2. Chemical sequence of one repeating unit of Shigella flexneri 5a, and the definition of residues and linkages used throughout this article. An additional A residue is added at the right end of this sequence for the purpose of our conformational search.

2. This polysaccharide is the major target of the protective immune response.50,51 Although the folding patterns of polysaccharides in the crystal state are dominantly helical,52 there exists large diversity in the solution state. One can find rigid helices, extended helices, and completely random coil structures.51,53-59 For example, the linear homopolysaccharides of glucose (glucans), have a wide spectrum of equilibrium conformations with different flexibility: stiff or extended (1f4)-β-linked cellulose,60 (1f4)-R-linked random coil chain of amylose,61 as well as the disordered helical motif of (1f3)-β-linked Curdlan.62,63 The total or partial loss of high correlation between neighboring repeating units due to the environment increases the flexibility and results in the diversity of equilibrium solution conformations. It is important to emphasize that as opposed to other systems studied in the past using our algorithm,27,28,43 the case of an extended linear polymeric sugars is the most challenging for the FSPS. These are often floppy molecules with rather flat free energy landscapes. It is, therefore, very interesting to investigate how the FSPS would handle such case and how its predictions compare to previously derived structural data.

Methods For all systems studied we performed conformational searches using the methodology outlined in references.27,28,43 For each φ-ψ glycosidic linkage (where φ ) H1′-C1′-On-Cn, ψ ) C1′-On-Cn-Hn and n is the linkage carbon number) the scanning increments were 10°. In the case of the three disaccharides, the first dihedral angle of the methyl or hydroxymethyl groups was also rotated with increments of 120°.

Structures of Saccharides in Solution

Biomacromolecules, Vol. 10, No. 11, 2009

Several thousand sterically allowed conformations were found in the case of the disaccharides. In the case of the SF5a polysaccharide, we constructed a six-residue system involving one repeating unit and an additional A residue, as shown in Figure 2. Coarse graining was applied as in refs 27, 28, 43 to make the number of conformers studied tractable. Specifically dihedral space was coarse grained for each linkage so that four adjacent sterically allowed points in φ and four adjacent points in ψ were converted into a single geometry-averaged structure. Only the first dihedral angle of the longest side chain (NAc group on residue D) was included in the search. A total of 85760 sterically allowed conformations of the SF5a repeating unit were obtained from the search. The AMBER 9 molecular simulation package64-66 in conjunction with the GLYCAM 06 forcefield41 were used for energy minimizations carried out in implicit solvent (KeyOptions are “IGB ) 1 and GBSA ) 1” for the input option for AMBER9 with GLYCAM) using the GBSA model.67 The 85760 energy minimizations in implicit solvent were carried out in parallel using 10 CPUs with a wall run time of approximately 5 days. As previously described in refs 27, 28, 43, minimized conformers were grouped into families belonging to the same conformational basin. We considered that two conformations belong to the same family if the energy difference between them is ∆E < 5.0 kcal/mol and the difference in each glycosidic dihedral angle is