Simulation of 2D NMR Spectra of Carbohydrates Using GODESS

May 26, 2016 - 2D NMR spectra in this format can be viewed, zoomed, panned, superimposed with experimental spectra, and saved as vector graphics by mu...
1 downloads 17 Views 889KB Size
Application Note pubs.acs.org/jcim

Simulation of 2D NMR Spectra of Carbohydrates Using GODESS Software Roman R. Kapaev*,† and Philip V. Toukach*,‡ †

Higher Chemical College of the Russian Academy of Sciences, Miusskaya sq. 9, Moscow 125047, Russia N.D. Zelinsky Institute of Organic Chemistry, Russian Academy of Sciences, Leninsky prosp. 47, Moscow 119991, Russia



S Supporting Information *

ABSTRACT: Glycan Optimized Dual Empirical Spectrum Simulation (GODESS) is a web service, which has been recently shown to be one of the most accurate tools for simulation of 1H and 13C 1D NMR spectra of natural carbohydrates and their derivatives. The new version of GODESS supports visualization of the simulated 1H and 13C chemical shifts in the form of most 2D spin correlation spectra commonly used in carbohydrate research, such as 1H−1H TOCSY, COSY/COSY-DQF/COSY-RCT, and 1 H−13C edHSQC, HSQC−COSY, HSQC−TOCSY, and HMBC. Peaks in the simulated 2D spectra are color-coded and labeled according to the signal assignment and can be exported in JCAMP-DX format. Peak widths are estimated empirically from the structural features. GODESS is available free of charge via the Internet at the platform of the Carbohydrate Structure Database project (http://csdb.glycoscience.ru).

C

based on BIOPSEL,13 or a hybrid approach,14 which attempts to select the best values from both approaches based on trustworthiness reported for every signal. After the chemical shifts are obtained, each of them is given a peak width (PW) expressed in hertz. PW along the proton axis is estimated from rough prediction of proton coupling constants empirically derived from structural features, such as stereo configurations, location in the cycle or in the aliphatic chain, and carbon hybridization state (for further details, see section 1 in the Supporting Information). After that, each atom (1H or 13C) is correlated with the other(s) (1H) according to the following rules (illustrated in Figure 1): • COSY: each proton is correlated with itself (diagonal peaks) and with its geminal and vicinal neighboring protons. • COSY-DQF: each proton is correlated with its geminal and vicinal neighboring protons (COSY with double quantum filter) • COSY-RCT: the same as COSY, except that the protons are also correlated with the protons located four C−H or C−C bonds away, if they are located in the same spin system (COSY with 1-step relayed coherence transfer). • TOCSY: each proton is correlated with all protons within the spin system it belongs to. • HSQC: each carbon is correlated with the proton(s) to which it is bonded directly (heteronuclear single quantum coherence).

arbohydrates are widely applied for disease treatment1−6 and biofuel development.7,8 However, this major class of biomolecules still suffers from a shortage of structural data.9 NMR spectra, which play a key role in the unraveling of carbohydrate structures, are often difficult to interpret.9 This obstacle has induced the development of new software intended to simplify the research. In particular, NMR spectrum simulation tools are helpful for confirming structural hypotheses and for automated elucidation of a structure from experimental NMR data.9−11 However, the structure determination based solely on 1D NMR spectra cannot be exact and comprehensive because of the overlap typical for 13C and especially 1H NMR spectra of carbohydrates,9 whereas 2D NMR simulators, which are vital for solving the problem, are in short supply. CASPER11 supports HSQC spectrum visualization; however, the number of structural features supported by the service is limited.12 Until now, there has been no glyco-tuned software for routine 2D NMR spectrum simulation except for HSQC.11 In this article, we report an update of the Glycan Optimized Dual Empirical Spectrum Simulation (GODESS) service that supports the visualization of 1 H−1H TOCSY, COSY/COSY-DQF/COSY-RCT and 1H−13C edHSQC, HSQC−COSY, HSQC−TOCSY, and HMBC 2D NMR spectra of natural carbohydrates and their derivatives, including those containing uncommon and/or noncarbohydrate constituents. To simulate 2D NMR cross-peaks, 1H NMR (and 13C NMR, if required) chemical shifts of an input structure are predicted. For proton NMR simulations, the Carbohydrate Structure Generalization Scheme (CSGS)12 is used; for carbon NMR simulations, the user can choose from the CSGS, an incremental approach © 2016 American Chemical Society

Received: February 15, 2016 Published: May 26, 2016 1100

DOI: 10.1021/acs.jcim.6b00083 J. Chem. Inf. Model. 2016, 56, 1100−1104

Application Note

Journal of Chemical Information and Modeling

Figure 1. Spin correlations revealed by 2D NMR experiments. H3 and C3 atoms in β-D-Fruf are used as an example; red arrows show the correlations.

• HSQC−COSY: each carbon is correlated with the protons, which are attached to it, and with their vicinal neighbors (e.g., C1−H2 correlation in Glcp). • HSQC−TOCSY: each carbon is correlated with all protons within the spin system to which the attached protons belong, and vice versa. • HMBC: each carbon is correlated with all protons located two or three bonds away, including those through heteroatoms, e.g., in inter-residue linkage (heteronuclear multiple bond correlation). Each pair of correlated atoms constitute a cross-peak, where the chemical shifts of the first (1H or 13C) and the second (1H) atoms are Y and X coordinates, respectively, and their PWs are the peak height and width. To plot the peaks, PWs are converted from hertz to parts per million using the spectrometer frequency specified by the user. The service is available by clicking NMR simulation in the left menu of the CSDB15 web site (http://csdb.glycoscience.ru/ database/ for merged CSDB, http://csdb.glycoscience.ru/ bacterial/ for Bacterial CSDB, http://csdb.glycoscience.ru/ plant_fungal/ for Plant&Fungal CSDB), by clicking Predict NMR under Usef ul tools at the CSDB homepage, or directly at http://csdb.glycoscience.ru/database/core/nmrsim.html. The input structure can be built in the CSDB Structure Wizard, drawn in the Glycan Builder,16 chosen from the library, converted from the GlycoCT code17 or typed in the CSDB linear format18 (Figure S1, (1) and (2)). Users must choose a nucleus to be predicted (1H, 13C, or both) with the Nucleus selector (Figure S1, (3)). By default, the 1H chemical shift prediction is visualized as COSY and TOCSY plots, and the combined 1H and 13C prediction is visualized as COSY, TOCSY, edHSQC, and HMBC plots. Simulation of more NMR experiments is available via the advanced option block (by clicking the More parameters checkbox as shown in Figure S1). Other simulation parameters are the solvent, approach to predict carbon chemical shifts (statistical CSGS,12 empirical BIOPSELbased13 or hybrid14), quality mode14 (for CSGS simulations), temperature and pH ranges, and NMR instrument frequency (Figure S1, (3)). 2D NMR spectra are output as 400 pixel × 400 pixel images (Figure 2b); 1000 pixel × 1000 pixel images are also available by clicking Hi-res image (Figure 2b, at the bottom). Each peak is color-coded according either to its assignment to a certain residue (see Figure 2a and b and Figure S2a), or to its trustworthiness metrics12 (see Figure S2b), which is calculated as a mean value of trustworthiness along the two axes. Every peak is displayed as an ellipse except for negative peaks in the edHSQC spectra which are displayed as rectangles (see Figure S3). Each

Figure 2. Features of the output interface: a part of the assignment table showing the residue linkages and color codes (a) and 2D NMR spectra (COSY is shown as an example) in the residue assignment visualization mode (b) for the structure schematically displayed in c.

nondiagonal peak is labeled. In the assignment color mode, the label normally conforms the A/B format, where A and B are numbers of interacting atoms in the residue, along the X and Y axis, respectively. For edHSQC spectra, the notation is reduced to the position number for the carbon−proton pair (e.g., “6′” stands for a C6/H6′ cross-peak). In the trustworthiness color mode, the label is a trustworthiness value. Clicking No peak labels hides all labels for clarity (Figure 2b, at the bottom). Inter-residue HMBC cross-peaks in the residue assignment mode and their labels are plotted in two colors according to both residue color codes (see Figure S4). The simulated spectra can be exported for external processing as JCAMP-DX19 files by mouse hovering over JDX (Figure 2b, bottom-right corner) with subsequent option selection (Figure S5). 2D NMR spectra in this format can be viewed, zoomed, panned, superimposed with experimental spectra, and saved as vector graphics by multiple NMR software processors, such as MestreLabs MestreNova, ACD/Labs NMR viewer, Bruker TopSpin and other (see Figures S6 and S7 as examples). The generated files can be opened with the cheminfo.org online 2D NMR viewer by mouse hovering over JDX and clicking Live view NMR (Figures S5 and S6). To examine the prediction accuracy and the peak width estimation reliability, we selected three spectra (1H−1H COSY and 1H−13C HSQC) of natural polysaccharides with various structural features, such as deoxy sugars, uronic acids, higher 1101

DOI: 10.1021/acs.jcim.6b00083 J. Chem. Inf. Model. 2016, 56, 1100−1104

Application Note

Journal of Chemical Information and Modeling Table 1. Structures Used for Peak Width Estimation Reliability Test structurea

source organism

(1)→2)-[α-Abep-(1→3)]-α-D-Manp-(1→4)-α-L-Rhap-(1→3)-α-D-Galp-(1→ (2)→3)-β-D-GalpNAc-(1→6)-[L-Lys-(2→6)-α-D-GalpA-(1→4)]-β-D-GalpNAc-(1→4)-β-D-GlcpA-(1→ (3)→4)-[α-NeupAc-(2→6)-β-D-Galp-(1→4)-β-D-GlcpNAc-(1→3)]-β-D-Galp-(1→4)-[α-D-Galp-(1→3)]-β-L-Rhap-(1→4)-β-D-Glcp-(1→

Citrobacter freundii O22 (strain PCM 1555)20 Proteus mirabilis G1 (O3)21 Streptococcus suis (serotype 2)22

a

Abe, 3,6-dideoxy-D-xylo-hexose (abequose); D-Man, D-mannose; L-Rha, L-rhamnose; D-Gal, D-galactose; D-GalNAc, 2-acetamido-2-deoxy-Dgalactose; Lys, lysine; D-GalA, D-galacturonic acid; D-GlcA, D-glucuronic acid; NeuAc, 5-acetamido-3,5-dideoxy-D-glycero-D-galacto-non-2-ulosonic acid (N-acetylated neuraminic acid); D-GlcNAc, 2-acetamido-2-deoxy-D-glucose; D-Glc, D-glucose; p, pyranose.

sugars (neuraminic acid), peptide chains (lysine), and bisubstitution at neighboring positions, which usually hampers accurate empirical simulation. These three structures are listed in Table 1. Both 1H and 13C NMR chemical shifts required for 2D NMR spectra were simulated using the statistical (CSGS) approach in the accurate mode12 with water (H2O or D2O) as solvent and unrestricted pH and temperature; the program was forbidden to use the NMR data from the referenced publications to avoid prediction bias. The simulated cross-peaks are narrower than the experimental ones because of the line broadening due to relaxation effects and nonideal spectrometer resolution tuning. Therefore, the frequency for the simulated 2D NMR spectra was decreased to 300 MHz to make the cross-peaks clearly visible. Superimposed images of the experimental and simulated spectra are shown in Figures 3−5. Separately plotted spectra are shown in Figures S8−S10. Figure 4. Superposition of experimental (black) and simulated (red) HSQC spectra of O-polysaccharide from Proteus mirabilis G1 (structure 2). The experimental spectrum was recorded using a 500 MHz spectrometer in D2O at 318 K.21

Figure 3. Superposition of experimental (black) and simulated (red) COSY spectra of O-polysaccharide from Citrobacter freundii O22 strain PCM 1555 (structure 1). The experimental spectrum was recorded using a 600 MHz spectrometer in D2O at 303 K.20 The diagonal peak at 4.75 ppm refers to water at 303 K.

Figure 5. Superposition of experimental (black) and simulated (red) TOCSY spectra portions of Streptococcus suis serotype 2 capsular polysaccharide (structure 3). The experimental spectrum was recorded using a 600 MHz spectrometer in D2O at 348 K.22

Cross-peak shapes in the simulated vs experimental spectra were observed to be in good agreement. More precise and reasonable peak width estimation algorithm requires a dedicated coupling constant database, which is a subject of further study. Precision of the 2D NMR spin-correlation simulation depends primarily on the 1D NMR chemical shift prediction accuracy. In multiple GODESS simulations, mean absolute errors were 0.07 and 0.86 ppm per 1H and 13C resonance, respectively.12 The most noticeable discrepancy between the cross-peak coordinates was found in the HSQC spectra of structure 2: lysine H2/C2 and

GlcpA H5/C5 along the proton axis (these chemical shifts are pH-dependent), and 4,6-bisubstituted GalpNAc H6/C6 (due to unusual steric surrounding). Nevertheless, the RMS deviation and linear correlation between the experimental and simulated spectra of structure 2 were 0.09 ppm (1H), 0.52 ppm (13C) and 0.9954 (1H), 0.9999 (13C), respectively. For structure 3, experimental TOCSY does not contain weak correlations through the coupling constants below 4 Hz. 1102

DOI: 10.1021/acs.jcim.6b00083 J. Chem. Inf. Model. 2016, 56, 1100−1104

Application Note

Journal of Chemical Information and Modeling

Data; NMR, nuclear magnetic resonance; PW, peak width; RMS, root-mean square; TOCSY, total correlation spectroscopy

In conclusion, a glyco-tuned 2D NMR spectrum simulator was implemented as a part of the GODESS service, which is available at the CSDB15 web site (http://csdb.glycoscience.ru). This new feature helps corroborating the proposed structures and assigning their experimental 2D NMR spectra. Furthermore, the novel features open up an opportunity to develop a relevant algorithm of the automated structure elucidation based on 1D and 2D NMR user data.





(1) Gaidzik, N.; Westerlind, U.; Kunz, H. The Development of Synthetic Antitumour Vaccines from Mucin Glycopeptide Antigens. Chem. Soc. Rev. 2013, 42, 4421−4442. (2) Astronomo, R. D.; Burton, D. R. Carbohydrate Vaccines: Developing Sweet Solutions to Sticky Situations? Nat. Rev. Drug Discovery 2010, 9, 308−324. (3) Boltje, T. J.; Buskas, T.; Boons, G.-J. Opportunities and Challenges in Synthetic Oligosaccharide and Glycoconjugate Research. Nat. Chem. 2009, 1, 611−622. (4) Johnson, M. A.; Bundle, D. R. Designing a New Antifungal Glycoconjugate Vaccine. Chem. Soc. Rev. 2013, 42, 4327−4344. (5) Alper, J. Searching for Medicine’s Sweet Spot. Science 2001, 291, 2338−2343. (6) Kristian, S. A.; Hwang, J. H.; Hall, B.; Leire, E.; Iacomini, J.; Old, R.; Galili, U.; Roberts, C.; Mullis, K. B.; Westby, M.; Nizet, V. Retargeting Pre-Existing Human Antibodies to a Bacterial Pathogen with an AlphaGal Conjugated Aptamer. J. Mol. Med. 2015, 93, 619−631. (7) Schmidt, L. D.; Dauenhauer, P. J. Chemical Engineering: Hybrid Routes to Biofuels. Nature 2007, 447, 914−915. (8) Watt, G. A New Future for Carbohydrate Fuel Cells. Renewable Energy 2014, 72, 99−104. (9) Toukach, F. V.; Ananikov, V. P. Recent Advances in Computational Predictions of NMR Parameters for the Structure Elucidation of Carbohydrates: Methods And Limitations. Chem. Soc. Rev. 2013, 42, 8376−8415. (10) Rudd, T.; Yates, E.; Hricovíni, M. Spectroscopic and Theoretical Approaches for the Determination of Heparin Saccharide Structure and the Study of Protein-Glycosaminoglycan Complexes in Solution. Curr. Med. Chem. 2009, 16, 4750−4766. (11) Lundborg, M.; Widmalm, G. Structural Analysis of Glycans by NMR Chemical Shift Prediction. Anal. Chem. 2011, 83, 1514−1517. (12) Kapaev, R. R.; Toukach, P. V. Improved Carbohydrate Structure Generalization Scheme for 1H and 13C NMR Simulations. Anal. Chem. 2015, 87, 7006−7010. (13) Toukach, F. V.; Shashkov, A. S. Computer-Assisted Structural Analysis of Regular Glycopolymers on the Basis of 13C NMR Data. Carbohydr. Res. 2001, 335, 101−114. (14) Kapaev, R. R.; Egorova, K. S.; Toukach, P. V. Carbohydrate Structure Generalization Scheme for Database-Driven Simulation of Experimental Observables, Such as NMR Chemical Shifts. J. Chem. Inf. Model. 2014, 54, 2594−2611. (15) Toukach, P. V.; Egorova, K. S. Carbohydrate Structure Database Merged from Bacterial, Archaeal, Plant and Fungal Parts. Nucleic Acids Res. 2016, 44, D1229−D1236. (16) Damerell, D.; Ceroni, A.; Maass, K.; Ranzinger, R.; Dell, A.; Haslam, S. M. The GlycanBuilder and GlycoWorkbench Glycoinformatics Tools: Updates and New Developments. Biol. Chem. 2012, 393, 1357−1362. (17) Herget, S.; Ranzinger, R.; Maass, K.; Lieth, C.-W. GlycoCTa Unifying Sequence Format for Carbohydrates. Carbohydr. Res. 2008, 343, 2162−2171. (18) Toukach, P. V. Bacterial Carbohydrate Structure Database 3: Principles and Realization. J. Chem. Inf. Model. 2011, 51, 159−170. (19) Davies, A. N.; Lampen, P. Jcamp-Dx for NMR. Appl. Spectrosc. 1993, 47, 1093−1099. (20) Katzenellenbogen, E.; Kocharova, N. A.; Toukach, P. V.; Górska, S.; Korzeniowska-Kowal, A.; Bogulska, M.; Gamian, A.; Knirel, Y. A. Structure of an Abequose-Containing O-Polysaccharide from Citrobacter Freundii O22 Strain PCM 1555. Carbohydr. Res. 2009, 344, 1724−1728. (21) Sidorczyk, Z.; Zych, K.; Toukach, F. V.; Arbatsky, N. P.; Shashkov, A. S.; Knirel, Y. A. Structure of the O-Polysaccharide and Classification of Proteus Mirabilis Strain G1 in Proteus Serogroup O3. Eur. J. Biochem. 2002, 269, 1406−1412.

ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jcim.6b00083. Peak width estimation algorithm (section 1); illustrations of GODESS interface (Figures S1−S4); JCAMP-DX export option selection (Figure S5); simulated NMR spectrum exported in JCAMP-DX format and opened in cheminfo.org online 2D NMR viewer (Figure S6) and in MestreNova (Figure S7); experimental vs simulated 2D NMR spectra plotted separately (Figures S8−S10) (PDF)



AUTHOR INFORMATION

Corresponding Authors

*E-mail: [email protected] (R.R.K). *E-mail: [email protected] (P.V.T.). Notes

The 2D NMR data obtained with the reported tool can be used free of charge with reference to this paper. The authors declare no competing financial interest.



ACKNOWLEDGMENTS Research and development of 2D NMR prediction engine were funded by the Russian Foundation for Basic Research, grant 1504-01065. Visualization of 2D NMR data and web service were funded by Russian Science Foundation, grant 14-50-00126. The authors thank Luc Patiny and Andres Mauricio for help with the integration of GODESS and the cheminfo.org online NMR viewer.



REFERENCES

ABBREVIATIONS

Residue Names

Abe, 3,6-dideoxy-D-xylo-hexose (abequose); D-Fru, D-fructose; D-Gal, D-galactose; D-GalNAc, 2-acetamido-2-deoxy-D-galactose; D-GalA, D-galacturonic acid; D-Glc, D-glucose; D-GlcNAc, 2acetamido-2-deoxy-D-glucose; D-GlcA, D-glucuronic acid; Lys, lysine; D-Man, D-mannose; NeuAc, 5-acetamido-3,5-dideoxy-Dglycero-D-galacto-non-2-ulosonic acid (N-acetylated neuraminic acid); L-Rha, L-rhamnose; p, pyranose; f , furanose Other Abbreviations

BIOPSEL, biopolymer structure elucidation; CASPER, computerized approach to structure determination of polysaccharides; COSY, correlation spectroscopy; COSY-DQF, COSY with double quantum filter; COSY-RCT, COSY with 1-step relayed coherence transfer; CSDB, Carbohydrate Structure Database; CSGS, Carbohydrate Structure Generalization Scheme; HSQC, heteronuclear single quantum coherence; edHSQC, distortionless enhancement by polarization transfer (DEPT) edited HSQC; GODESS, Glycan Optimized Dual Empirical Spectrum Simulation; HMBC, heteronuclear multiple bond correlation; JCAMP, Joint Committee on Atomic and Molecular Physical 1103

DOI: 10.1021/acs.jcim.6b00083 J. Chem. Inf. Model. 2016, 56, 1100−1104

Application Note

Journal of Chemical Information and Modeling (22) Van Calsteren, M. R.; Gagnon, F.; Lacouture, S.; Fittipaldi, N.; Gottschalk, M. Structure Determination of Streptococcus Suis Serotype 2 Capsular Polysaccharide. Biochem. Cell Biol. 2010, 88, 513−525.

1104

DOI: 10.1021/acs.jcim.6b00083 J. Chem. Inf. Model. 2016, 56, 1100−1104