Anal. Chem. 2008, 80, 9204–9212
Software Tool for the Structural Determination of Glycosaminoglycans by Mass Spectrometry Be´range`re Tissot,† Alessio Ceroni,† Andrew K. Powell,‡ Howard R. Morris,†,| Edwin A. Yates,‡ Jeremy E. Turnbull,‡ John T. Gallagher,§ Anne Dell,† and Stuart M. Haslam*,† Division of Molecular Biosciences, Imperial College, London, London, SW7 2AZ, United Kingdom, Molecular Glycobiology Laboratory, School of Biological Sciences, University of Liverpool, L69 7ZB, United Kingdom, Glyco-Oncology Group, School of Cancer and Imaging Sciences, Paterson Institute for Cancer Research, University of Manchester, Christie Hospital, Manchester, M20 4BX, United Kingdom, and M-SCAN Ltd., Wokingham, Berks, RG41 2TZ, United Kingdom Structural elucidation of glycosaminoglycans (GAGs) is one of the major challenges in biochemical analysis. This is mainly because of the diversity of GAG sulfation and N-acetylation patterns and variations in uronate isomers. ESI-MS and recently MALDI-MS methodologies are important strategies for investigating the molecular structure of GAGs. However, the interpretation of MS data produced by these strategies must take into account a large number of variables (including the number of monosaccharide residues, acetylations, sulfate groups, multiple charges, and exchanges between different cations). We have developed a bioinformatics tool to assist this complex interpretation task. The software is based on GlycoWorkbench, a tool for semiautomatic interpretation of glycan MS data. The tool generates the sugar backbones in all their variants (GAG family, composition, acetylation positions, and number of sulfates) and automatically matches them with the selected MS peaks. The backbones corresponding to a given peak are validated against the selected MS/MS peaks by generating all possible fragmentations. Native chondroitin sulfate and heparin oligosaccharides as well as chemically modified heparin oligomers have been successfully analyzed by MALDI- and ESI-MS and MS/MS, and the results of the semiautomated annotation of these mass spectra are presented here. The structural analysis of glycosaminoglycans is one of the greatest challenges of glycobiology. Glycosaminoglycans (GAGs) are believed to act as regulatory molecules in many fundamental processes such as development, axonal growth, cancer progression, and angiogenesis.1,2 However, despite their involvement in these processes, the mechanisms by which they exert their functions still remain largely undetermined, mainly because of the difficulties associated with fully characterizing the active * To whom correspondence should be addressed. E-mail: s.haslam@ imperial.ac.uk. † Imperial College. ‡ University of Liverpool. § University of Manchester, Christie Hospital. | M-SCAN Ltd. (1) Bishop, J. R.; Schuksz, M.; Esko, J. D. Nature 2007, 446, 1030–1037. (2) Hacker, U.; Nybakken, K.; Perrimon, N. Nat. Rev. Mol. Cell. Biol. 2005, 6, 530–541.
9204
Analytical Chemistry, Vol. 80, No. 23, December 1, 2008
Table 1. Four Main Classes of GAGs Distinguished by the Different Composition of the Disaccharide Units and the Possible Positions of the N-Acetyl and Sulfate Substituentsa family of GAGs
disaccharide motif
constituents of motif
A ) (R/β1,4) glucuronic heparan sulfate A2X BNX or or iduronic acid; B ) (R1,4) (HS) including Ac,3X,6X glucosamine heparin (Hep) chondroitin sulfate A2X BNAc,4X,6X A ) (R/β1,3) glucuronic or (CS) including iduronic acid; B ) (β1,4) dermatan (DS) galactosamine hyaluronic acid (HA) ABNAc A ) (β1,3) glucuronic acid; B ) (β1,4) glucosamine keratan sulfate (KS) A6X BNAc,6X A ) (β1,4) galactose; B ) (β1,3) glucosamine a The modifications of the disaccharide units are represented as follows: NX or Ac quotes for possible sulfation or acetylation of the amino group, 3X for possible sulfation of the hydroxyl group of C3 carbon, 4X for possible sulfation of the hydroxyl group of C4 carbon, and 6X for possible sulfation of the hydroxyl group of C6 carbon. NAc denotes definite acetylation of the amino group.
oligosaccharide sequences. Indeed, even with the rapid progress of glycomics3,4 there are only a limited number of strategies described in the literature adapted to the characterization of these particular glycans. This lack of systematic methodology mainly arises from the structural complexity of GAGs, which are extended linear glycans typically composed of 10-100 repeating disaccharide units.5 There are four main classes of GAGs distinguished by the different composition of the disaccharide units and the possible positions of the acetyl and sulfate substituents (see Table 1). Heparan sulfates (HS) are among the most complicated GAGs to analyze as the different combination of N-acetylation and N- or O-sulfation gives rise to 48 different disaccharide units, though not all the possibilities are represented in vivo. Moreover, HS chains do not present a random organization as it has been shown that they are arranged into subdomains where modifications are more (3) Haslam, S. M.; North, S. J.; Dell, A. Curr. Opin. Struct. Biol. 2006, 16, 584–591. (4) Turnbull, J. E.; Field, R. A. Nat. Chem. Biol. 2007, 3, 74–77. (5) Rabenstein, D. L. Nat. Prod. Rep. 2002, 19, 312–331. 10.1021/ac8013753 CCC: $40.75 2008 American Chemical Society Published on Web 10/29/2008
concentrated.6 These subdomains are important for some of the main biological functions cited above.7 The analysis of GAGs extracted from cells or organs is often restricted to the determination by liquid chromatography,8,9 capillary electrophoresis,10,11 or mass spectrometry11-15 of the disaccharide unit composition16 which does not in itself provide details about the nature and arrangement of the subdomains. There has been tremendous effort to develop more sophisticated methodologies in order to analyze GAG domain structures and to sequence GAG oligosaccharides,17-21 mainly using ESI-MS instrumentation. However there are significant technical challenges associated with these sequencing strategies which still prevent them from being widely used. Prior to any type of detailed sequencing, determining composition is a usual first step in any structural characterization. Indeed, proteomics analysis starts with a fingerprinting of digested peptide products from a protein mixture and glycomics analysis starts with a profiling of all the extracted N-, O-, or glycolipid glycan structures present in a sample. Both of these steps can be performed using either MALDI-TOF MS or ESI MS of instrumentation.22-24 The advantages of MALDI-TOF MS over ESI-MS methodologies are its applicability to a high throughput, its robustness, and its relative resistance to high salt concentrations. MALDI can efficiently be applied to complex mixtures and its ability to create only singly charged species, as opposed to the multiply charged ions produced by ESI-MS, greatly simplifies the interpretation of the data. Despite the existing work demonstrating (6) Gallagher, J. T. J. Clin. Invest. 2001, 108, 357–361. (7) Kreuger, J.; Spillmann, D.; Li, J.-p.; Lindahl, U. J. Cell Biol. 2006, 174, 323–327. (8) Lamanna, W. C.; Baldwin, R. J.; Padva, M.; Kalus, I.; Ten Dam, G.; van Kuppevelt, T. H.; Gallagher, J. T.; von Figura, K.; Dierks, T.; Merry, C. L. Biochem. J. 2006, 400, 63–73. (9) van den Born, J.; Pisa, B.; Bakker, M. A.; Celie, J. W.; Straatman, C.; Thomas, S.; Viberti, G. C.; Kjellen, L.; Berden, J. H. J. Biol. Chem. 2006, 281, 29606– 29613. (10) Gunay, N. S.; Linhardt, R. J. J. Chromatogr., A 2003, 1014, 225–233. (11) Zamfir, A.; Seidler, D. G.; Schonherr, E.; Kresse, H.; Peter-Katalinic, J. Electrophoresis 2004, 25, 2010–2016. (12) Pope, R. M.; Raska, C. S.; Thorp, S. C.; Liu, J. Glycobiology 2001, 11, 505– 513. (13) Thanawiroon, C.; Rice, K. G.; Toida, T.; Linhardt, R. J. J. Biol. Chem. 2004, 279, 2608–2615. (14) Miller, M. J.; Costello, C. E.; Malmstrom, A.; Zaia, J. Glycobiology 2006, 16, 502–513. (15) Naimy, H.; Leymarie, N.; Bowman, M. J.; Zaia, J. Biochemistry 2008, 47, 3155–3161. (16) Saad, O. M.; Leary, J. A. Anal. Chem. 2003, 75, 2985–2995. (17) Lawrence, R.; Kuberan, B.; Lech, M.; Beeler, D. L.; Rosenberg, R. D. Glycobiology 2004, 14, 467–479. (18) Merry, C. L.; Lyon, M.; Deakin, J. A.; Hopwood, J. J.; Gallagher, J. T. J. Biol. Chem. 1999, 274, 18455–18462. (19) Turnbull, J. E.; Gallagher, J. T. Biochem. J. 1990, 265, 715–724. (20) Turnbull, J. E.; Hopwood, J. J.; Gallagher, J. T. Proc. Natl. Acad. Sci. U.S.A. 1999, 96, 2698–2703. (21) Venkataraman, G.; Shriver, Z.; Raman, R.; Sasisekharan, R. Science 1999, 286, 537–542. (22) Jang-Lee, J.; North, S. J.; Sutton-Smith, M.; Goldberg, D.; Panico, M.; Morris, H.; Haslam, S.; Dell, A. Methods Enzymol. 2006, 415, 59–86. (23) Karlsson, N. G.; Wilson, N. L.; Wirth, H.-J.; Dawes, P.; Joshi, H.; Packer, N. H. Rapid Commun. Mass Spectrom. 2004, 18, 2282–2292. (24) Wada, Y.; Azadi, P.; Costello, C. E.; Dell, A.; Dwek, R. A.; Geyer, H.; Geyer, R.; Kakehi, K.; Karlsson, N. G.; Kato, K.; Kawasaki, N.; Khoo, K.-H.; Kim, S.; Kondo, A.; Lattova, E.; Mechref, Y.; Miyoshi, E.; Nakamura, K.; Narimatsu, H.; Novotny, M. V.; Packer, N. H.; Perreault, H.; Peter-Katalinic, J.; Pohlentz, G.; Reinhold, V. N.; Rudd, P. M.; Suzuki, A.; Taniguchi, N. Glycobiology 2007, 17, 411–422.
the possibility of profiling GAG-related disaccharides,25,26 there was no real MS profiling methodology designed for larger GAGrelated molecules until recently. Major advances in the field associated with the enhanced ionization of heavily charged molecules have allowed the use of MALDI for the study of GAGs.27,28 We recently published a strategy that uses this approach to enable the profiling of highly sulfated GAGs.29 This MALDI methodology can provide information such as length, number and position of the N-acetyl (NAc) groups, and number of sulfate groups on oligosaccharides whose sizes range from disaccharide to dodecasaccharide. The observation of GAG oligosaccharides by MALDI-MS has also been achieved via the use of basic peptides;30 however, this methodology does not allow MALDI-TOF/TOF MS/MS experiments to be carried out therefore restricting the amount of structural information obtained. The data produced by both ESI and MALDI techniques are extremely complex, due to cation exchanges, the multiple possible arrangements of N-acetyl-, N-sulfate, and O-sulfate groups, and the multiply charged ions. Analysis of these data therefore requires a considerable effort that severely limits the possibility of carrying out large scale studies of these molecules. A bioinformatics tool which enhances the interpretation and annotation of MS and MS/MS data obtained from GAGs oligosaccharides is fundamental to reduce the time needed for handling experimental data and improve the throughput of these techniques in glycomics projects. The entire glycomics field has a paucity of bioinformatics tools for the analysis of MS data, and even fewer resources are available for the study of GAGs. To our knowledge only two other bioinformatics approaches have been proposed.21,31 Each of these addressed a selected analytical need but did not have generic applicability or the availability of a user interface designed for easy functionality. In the first approach,21 an hexadecimal notation has been devised to represent all the possible monosaccharide components of heparin glycans. With the use of this notation, a complex GAG polysaccharide can be represented by a sequence of characters. The mass of each possible sequence is then computed and matched with the MALDI-MS data. This software tool does not seem to be publicly available. In the second approach,31 a Microsoft Excel based application has been developed to assist the interpretation of ESI-MS and ESI-MS/MS data collected after enzymatic digestion of heparin glycans. The tool generates both intact and fragmented heparin polysaccharides, computes their masses, and matches them with the MS data. We recently developed a semiautomatic software tool called GlycoWorkbench, to aid the interpretation of glycomics data from N- and O-glycans.32 GlycoWorkbench was mainly designed to assist the annotation of MS/MS data by allowing the user to quickly (25) Desaire, H.; Leary, J. A. J. Am. Soc. Mass Spectrom. 2000, 11, 916–920. (26) Zaia, J.; Costello, C. E. Anal. Chem. 2001, 73, 233–239. (27) Laremore, T. N.; Murugesan, S.; Park, T. J.; Avci, F. Y.; Zagorevski, D. V.; Linhardt, R. J. Anal. Chem. 2006, 78, 1774–1779. (28) Laremore, T. N.; Zhang, F.; Linhardt, R. J. Anal. Chem. 2007, 79, 1604– 1610. (29) Tissot, B.; Gasiunas, N.; Powell, A. K.; Ahmed, Y.; Zhi, Z. L.; Haslam, S. M.; Morris, H. R.; Turnbull, J. E.; Gallagher, J. T.; Dell, A. Glycobiology 2007, 17, 972–982. (30) Juhasz, P.; Biemann, K. Proc. Natl. Acad. Sc.i U.S.A. 1994, 91, 4333–4337. (31) Saad, O. M.; Leary, J. A. Anal. Chem. 2005, 77, 5902–5911. (32) Ceroni, A.; Maass, K.; Geyer, H.; Geyer, R.; Dell, A.; Haslam, S. M. J. Proteome Res. 2008, 7, 1650–1659.
Analytical Chemistry, Vol. 80, No. 23, December 1, 2008
9205
define a candidate structure and test its fragmentation against a peak list. We have now developed additional components that are specific for the profiling and sequencing of GAG molecules and designed the tool with sufficient versatility to accommodate MALDI as well as ESI data. To demonstrate the tool’s unique capabilities for both profiling and partial sequencing we present here results obtained on heparin and chondroitin sulfate oligosaccharides. MATERIALS AND METHODS Preparation of the Chondoitin Sulfate Sample. Shark cartilage CS-C (90% CS-C, 10% CS-A; Sigma, St. Louis) was partially digested in 100 mM Tris-acetate pH 8 at 37 °C using chondroitinase ABC from Proteus vulgaris (Seikagaku, Tokyo, Japan). Partial digest aliquots were pooled and fractionated according to hydrodynamic volume using gel filtration chromatography on a Superdex 30 prep grade column (3 cm × 200 cm) run at 0.5 mL/min in 0.5 M ammonium hydrogen carbonate using an AKTA purifier 10 FPLC system (GE Healthcare, Little Chalf-ont, Buckinghamshire, U.K.). Fractions of 1 mL were collected, and the elution profile was monitored at λabs ) 232 nm. Fractions across the peak were pooled and desalted using HiPrep 26/10 desalting columns (GE Healthcare), eluting with water and monitoring elution at λabs ) 232 nm prior to lyophilization. All samples were resuspended in deionized water for long-term storage at -20 °C. Preparation of the Heparin Decasaccharide and Octasaccharide Samples. The octasaccharide (or degree of polymerization 8, dp8) was prepared by high-resolution gel filtration of a partial heparinases digest, as described in Goger et al.33 The decasaccharide (dp10) analyzed in this study was prepared according to previous protocol.34 MALDI and ESI-MS and MS/MS. Preparation of the Matrixes. The preparation of the ionic liquid as well as the preparation of norharmane matrix have been detailed in a previous publication.29 MALDI-TOF and MALDI-TOF/TOF Analysis of Heparin Oligosaccharides. Samples were dissolved and diluted with water prior to being mixed with the matrixes (1:1 volume ratio). The amount of oligosaccharides mixed with the matrixes was ∼10-80 pmol. The samples were dried under vacuum for 30 min or more. MALDI-TOF MS and MALDI-TOF/TOF MS/MS analyses were performed using a 4800 MALDI TOF/TOF analyzer (Applied Biosystems, Foster City). The instrument is equipped with an Nd: YAG laser (operating at 335 nm and 200 Hz). MS experiments were acquired in the negative ionization mode using reflectron settings. MS/MS data were obtained using the 1 kV mode with argon or air as the collision gas (CID cell gas pressure 3.5 × 10-6 Torr). The average of shots per spectra is between 2000 and 6000 depending on the amount of material spotted. ESI-MS and MS/MS Analysis of Heparin Oligosaccharides. ESI-MS and MS/MS spectra were acquired using a quadrupole-TOF (Micromass, Manchester, U.K.) instrument. The heparin fractions were dissolved in methanol/water (35:65 volume ratio) to the concentration of 10 µM before loading into a (33) Goger, B.; Halden, Y.; Rek, A.; Mosl, R.; Pye, D.; Gallagher, J.; Kungl, A. J. Biochemistry 2002, 41, 1640–1646. (34) Shriver, Z.; Raman, R.; Venkataraman, G.; Drummond, K.; Turnbull, J.; Toida, T.; Linhardt, R.; Biemann, K.; Sasisekharan, R. Proc. Natl. Acad. Sci. U.S.A. 2000, 97, 10359–10364.
9206
Analytical Chemistry, Vol. 80, No. 23, December 1, 2008
nanospray capillary coated with a thin layer of gold/palladium, inner diameter 2 mm (Proxeon, Odense, Denmark). A potential of 2.0 kV was applied to a nanoflow tip. The drying gas used was N2 and, when necessary, the collision gas was argon (with the collision gas pressure maintained at 10-4 mbar). Collision energies varied depending on the size of the oligosaccharide, typically between 5 and 20 eV. Software. GlycoWorkbench is completely written in Java and can be run on any operative system running a Java Virtual Machine version 1.5 or newer. The raw spectra files are read using the Proteome Common IO library35 which may need additional software for specific instrumentation. RESULTS Application of GlycoWorkbench to GAG Oligosaccharides. The GlycoWorkbench tool provides an integrated environment to assist the determination of glycan structures from MS data. GlycoWorkbench incorporates a visual editor of glycan structures, the GlycanBuilder,36 that enables a rapid assembly of structure models. The in-silico fragmentation engine computes a complete list of theoretical fragments including multiple glycosidic cleavages and all the possible cross-ring fragments. The annotation engine automatically matches the theoretical list of fragment masses with the manually defined experimental peak list. The proposed annotations are presented using comprehensive and easily understandable reports that allow the comparison of the different annotations from the structure candidates. The general design of the software as well as symbols, graphics, abbreviations, and nomenclature has been previously described in detail for the annotation of N- and O-glycan data.32 The process of profiling glycans from MS data involves an initial determination of the possible candidates for a given mass signal that are subsequently tested via MS/MS experiments for final identification. Each GAG molecule is made of a backbone formed by repeating disaccharide units modified with acetyl and sulfate substituents, with the different families having specific disaccharide units and substitution positions. The GAG profiling engine uses the information about a GAG family to compute all the possible backbones with a given number of disaccharide units and acetylation pattern. For each backbone all possible numbers of sulfated residues are then added, but the actual positions of the sulfated substituents remain unspecified unless the maximum number of sulfates is reached and all possible positions are filled. During profiling, the user can specify the list of GAG families that should be tested, the minimum and maximum number of disaccharide units, acetyl and sulfated groups, and other structural features such as reduction of the first monosaccharide, unsaturation of the nonreducing end monosaccharide, or derivatization of the complete structure. Additional chemical/enzymatical modifications, such as desulfation, reacetylation, or loss of the first residue (reducing end), can be further selected. The generated structures are then tested against the experimentally derived peak list by using the annotation engine, which computes all possible massto-charge values arising by the different combination of charges and cation exchanges. The matching structures are shown using (35) Falkner, J. A.; Falkner, J. W.; Andrews, P. C. Bioinformatics 2007, 23, 262– 263. (36) Ceroni, A.; Dell, A.; Haslam, S. M. Source Code Biol. Med. 2007, 2, 3.
Figure 1. Detailed view of the annotated list of peaks selected from a mass spectrum of heparin dp10 acquired in norharmane by MALDI-TOF MS. The main peak observed in the norharmane spectrum at m/z 1727.0 (m/z 1726.9849) corresponds to a dp10 carrying one NAc group and no sulfate group. The (S) are representing the possible positions for sulfate groups on the heparin backbone. The matching structures are displayed for each peak together with type, accuracy, mass to charge value, and associated ions. Keys are shown in the inset.
the annotation reports where the matches from each family can be compared. For each match the report shows the structure, the number of sulfates, the deviation of experimental and computed mass-to-charge values, the ion adducts, and the cation exchanges where present. A single mass-to-charge value can be tested for matches instead of a complete peak list, and the tool allows the user to load an MS file to select the peak to be tested directly from the displayed spectrum. The list of possible candidate structures selected from the set of matches proposed by the GAG profiling engine can then be copied into the drawing canvas to be used for identification by fragmentation experiments. The user can create and maintain multiple sets of candidate structures, peak-lists, mass spectra, and annotated peak lists in a single workspace so that all the information generated in an experiment can be organized and stored in a file. Structures, peaks and annotations can be copied across the workspace and exported to other documents and graphic editors for presentation purposes. Even though GlycoWorkbench was designed for annotation of MS/MS data, the fragmentation of GAG molecules in a mass spectrometer has specific features that required additional developments. First, the positions of the sulfates along the backbone are not specified after profiling. Second, the sulfate substituents
have a propensity of dropping from the backbone in a rather unpredictable manner. Third, signals corresponding to small fragments formed by a part of a monosaccharide plus a sulfate can be detected, given the presence of a charge. Therefore we extended the in-silico fragmentation engine to allow for all these cases. The fragmentation engine takes into account all the possible numbers of sulfates given the ones present in the structure and the available positions when computing a fragment of a given structure. All the theoretically possible fragments, including small fragments, with all the possible sulfation patterns are thus produced. The user can restrict the type of cleavages and the maximum number of fragments to be computed. The list of fragments is then matched with the list of experimentally derived peaks by the annotation engine, which uses the charges and exchanges present in the parent ion to limit the search for possible matches. The results of the annotation are shown in the corresponding reports where the matches resulting from the different candidate structures are shown next to each other. Therefore the user can assess the candidates and select the most probable given the coverage and accuracy of the assignments. The following sections provide hands-on examples of the usage of the tool for profiling, partial sequencing and annotation of data derived by either MALDI-MS or ESI-MS. Analytical Chemistry, Vol. 80, No. 23, December 1, 2008
9207
Figure 2. (a) Statistics view and (b) truncated summary view of the annotated of list of peaks selected from a MALDI-TOF/TOF MS/MS spectrum of heparin dp10 parent ion at m/z 1727.0 (m/z 1726.9849). The five possible positions for the NAc group are tested. The complete annotated list can be found in the Supporting Information (Figure S-3). For keys see Figure 1 inset.
Validation of the Tool on a Previously Sequenced dp10 Oligosaccharide of Heparin. To validate our methodology, we used a fraction of a highly purified decasaccharide of heparin whose sequence has previously been reported based on integral glycan sequencing and MALDI-MS using basic peptide complexes.34 We analyzed 10-40 pmol of dp10 by MALDI-TOF MS and MALDI-TOF/TOF MS/MS in norharmane and in ionic liquid (for details see Tissot et al.29). As a first step, the data obtained by MALDI-TOF MS in norharmane provided information on the length of the chains present in the samples as well as the number of NAc groups on the backbone.29 The spectrum obtained in this matrix was uploaded to GlycoWorbench, and a peak list was manually selected. 9208
Analytical Chemistry, Vol. 80, No. 23, December 1, 2008
Heparin and heparan sulfate structures were considered, with sizes ranging from dp6 to dp12. Cation exchanges have been included (sodium ions allowed only). The tolerance was set to 0.6 Da maximum. On the basis of this set of information, the software generates all the possible backbones and matches them with the list of peaks. The most probable structures are manually selected based on knowledge of the biological origin of the sample or any other parameters. In the case of the dp10 studied here, the main peak observed in the norharmane spectrum at m/z 1727.0 could only correspond to a dp10 carrying one NAc group as shown Figure 1. Another dp10 backbone is also identified for the previously detected29,34 minor species observed at m/z 1684.7. Because of the single charge carried by the molecular ions, the
Figure 3. Summary view of the annotated list of peaks from a mass spectrum of a heparin dp10 saccharide acquired in ionic liquid by MALDITOF MS. The peak at m/z 3141.6 (m/z 3141.6640) can only be matched by a monoacetylated dp10 carrying 13 sulfate groups, which corresponds to the sequence previously published (see text). For keys, see Figure 1 inset. Some structures have an S symbol on each residue. This representation indicates the only possible position of these sulfate groups. When there are several putative positions for the sulfate substituents, the S letter appears in parentheses. In this case, the total number of sulfate groups present on the backbone is given in the accolade on the left side of the backbone representation.
possible intervals between the masses of two similar structures are not at the decimal level. Therefore, although GlycoWorkbench computes values without limitation on the precision, such accuracy is not necessary for the first step of the MALDI-MS profiling. In a second step, MALDI-TOF/TOF MS/MS data in norharmane matrix is acquired, on the same spot of sample, in order to precisely position the NAc groups (if any) and/or to confirm the nature of the backbone if several structures match the same m/z value. Again, the MS/MS spectrum acquired from the parent ion, determined in the preceding phase, is annotated using GlycoWorkbench (for spectra see the reference Tissot et al.,29 Figure 6A,B). Given the list of possible precursors (for example, the five possible positions for the NAc group of the parent ion at m/z 1727.0), the tool computes their fragments and matches them with the labeled peaks. The statistics summarize the amount of peaks matched by each structure, and a detailed view of the assignments is proposed (parts a and b of Figure 2, respectively. For the full list of annotation see Figure 3-S in the Supporting Information). In the case of the dp10, a majority of peaks support the position of the acetyl group on the second glucosamine (from the reducing end, see Figure 6A, reference Tissot et al.29). These findings corroborate the published sequence determining the NAc group at the same position.34 However, the tool also found other possible isomers with lower percentages of coverage indicating that there might be other arrangements. In addition, the MS/MS data
confirmed the presence of a dp10 with no NAc group as suggested by the MS profile acquired in norharmane (Figure 6B reference Tissot et al.29). The second phase of the profiling process consists of the determination of the level of sulfation of each of the species detected in norharmane. A MS spectrum is thus acquired using an ionic liquid as matrix where there is a limited loss of the labile groups upon MALDI ionization and the fully sulfated species can be observed.29 The spectra acquired in ionic liquid present a high rate of sodium-proton and/or potassium-proton exchanges, which seriously complicates the interpretation of the data. Options are made for the user to decide which types of cation might be present in the sample (depending on the type of chromatography, the elution buffers, and the purification steps taken during the sample preparation). In the case of the dp10, all backbones matching peaks in the MS spectrum (in norharmane) are tested. The backbones dp8 and dp6 do not match any major peak in the MS spectrum acquired in ionic liquid confirming that the major species is a dp10 carrying one NAc group (Figure 3). The peak at m/z 3141.6 can only be matched by a monoacetylated dp10 carrying 13 sulfate groups, which corresponds to the sequence previously published.34 MALDI-TOF/TOF MS/MS data on these highly sulfated species does not provide useful information as the major fragmentation corresponds to the loss of sulfate groups from the Analytical Chemistry, Vol. 80, No. 23, December 1, 2008
9209
Figure 4. (a) Summary view of the annotated list of peaks obtained from MALDI-TOF/TOF MS/MS of the chondroitin sulfate parent ion at m/z 1771 (m/z 1771.4275, fraction 2, in the text). The spectrum was acquired in norharmane. (b) Statistic view of the MALDI-TOF/TOF MS/MS annotated peak list for parent ion at m/z 1771. (c) Statistic view of the annotated peak list obtained by MALDI-TOF MS analysis in an ionic liquid of chondroitin sulfate fraction 2. Most of the peaks observed are attributable to dp10 and dp9 structures, confirming the results obtained in norharmane (see text). For keys see Figure 1 inset.
backbone. However, this profiling methodology provides an important advance for the first steps of structural characterization of GAG oligosaccharides. With the performance of an initial MALDI-TOF MS analysis followed by subsequent MALDI-TOF/ TOF MS/MS experiments, the determination of information such as dp, the number of N-acetylated groups, and their position, and the number of sulfate groups is obtained in just one or two analyses. Because MALDI requires a lower level of purification than ES, these data could also be obtained on relatively crude fractions which have not undergone multiple chromatographic steps. These advantages can be crucial when profiling oligosaccharides extracted from cells or tissues, which would be available in minute amounts. Overall, these first steps are paving the way 9210
Analytical Chemistry, Vol. 80, No. 23, December 1, 2008
for sequencing strategies by reducing the amounts of analyte required and the time required for data interpretation. Profiling of Chromatographically Purified Chondroitin Sulfate Fractions. Two fractions of partially digested chondroitin sulfate (CS), have been profiled according to the strategy described above. The samples constitute two consecutive sizeexclusion chromatography fractions (labeled here as fractions 1 and 2). The two spectra obtained in norharmane show clear differences in the size of the backbones (parts a and b of Figure 1-S in the Supporting Information, fractions 1 and 2, respectively). Oligosaccharide structures of CS (A, B, C, and D types), ranging from dp6 to dp12 were considered. Again, the size exclusion chromato-
Figure 5. Annotated peak list from a ESI-MS spectrum of heparin dp8. The raw spectrum is given in the Supporting Information (Figure 2-S). A portion of the list of annotations generated is presented in here. The two major ions (see second column for intensity values) shown at m/z 383.6 and 370.3 are annotated as an octasaccharide carrying 12 and 11 sulfate groups, respectively. As expected for an extensively desalted material (see Materials and Methods), the two major ions are free of any sodium or potassium atoms (see the neutral exchange column). For keys, see Figure 1 inset.
Figure 6. Annotated list of peaks of the MS/MS spectrum obtained upon fragmentation of the parent ion at m/z 383.6268 corresponding to a dp8 carrying 12 sulfate groups. Two structures were considered, one with the 12 sulfates on 2-O-, 6-O-, and N-positions only (as shown in the first line of the fourth column) and the other one allowing 3-O-sulfation (fifth column). As shown here, no ion is characteristic of one particular structure. Each of the m/z value given can correspond to several fragment ions as exemplified by the 12 possible structures corresponding to the fragment at m/z 139.1527 (6 for each structure). For keys, see Figure 1 inset.
gram was used to provide a broad estimation of the size of the chains. The most probable structures proposed by the software are a dp8 carrying one sulfate group (m/z 1595) for fraction 1 and a dp10 carrying one sulfate group (m/z 1974) for fraction 2 (data not shown). Surprisingly, both fractions exhibit a poor percentage of assigned peaks (less than 20%) and intense unidentified signals (such as m/z 1392 in fraction 1 and m/z 1771 in fraction 2). The m/z values correspond to a loss of 203 mass units from the main peaks, which could match the loss of an N-acetylated hexosamine residue. The software allows users to take into account this possible loss. When this option is selected,
the same peak lists are then resubmitted for automated annotation taking into account the possible loss of the residue at the reducing end. This results in the percentage of annotated peaks being almost tripled. In order to confirm the possible structure of the species at m/z 1771, MALDI-MS/MS data were obtained and following the same procedure, 85% of the peaks produced could be annotated as fragments of the proposed dp9, confirming the absence of the expected N-acetylated galactosamine residue on the reducing end (parts a and b of Figure 4). Oligosaccharides with odd dp number have previously been observed in MALDI-MS type of analysis.28 The two abundant Analytical Chemistry, Vol. 80, No. 23, December 1, 2008
9211
species detected (dp7 and dp9) are not likely to be produced by in-source fragmentation only. Indeed, over the dozens of fractions analyzed using this method, none showed a percentage of in-source fragmentation greater than 10% when compared with ESI experiments (data not shown). It is therefore possible that these species are produced during the sample preparation and purification. From the profiles of the two fractions obtained in ionic liquid (parts c and d of Figure 1-S in the Supporting Information), two other peak lists were created and sulfation of the backbones selected from the previous step was computed and checked against the selected m/z values. Most of the peaks detected in ionic liquids seem to correspond to sulfated dp 7 and 8 for fraction 1 and sulfated dp9 and dp10 for fraction 2 (Figure 4c, fraction 2). Profiling of Chromatographically Purified Heparin Oligosaccharide Mixtures after Desulfation. A crude heparin sample was subjected to desulfation and re-N-acetylation followed by digestion with heparinases and size exclusion chromatography (see Supporting Information and Materials and Methods). Selected fractions were analyzed by MALDI-TOF MS in the two matrixes, and two examples of annotation are given in Supporting Information, describing various methods to annotate chemically modified oligosaccharides. Profiling of a Heparin Oligosaccharide Using ESI-MS. As the majority of analyses of GAG oligosaccharides are performed using ESI-MS methodologies it was important to ensure that the tool could annotate multiply charged ions as well. We thus analyzed a heparin dp8 sample by ESI-MS and MS/MS in the negative ionization mode. Because of the ESI susceptibility to salts, we chose a fraction which had undergone resolved chromatography and extensive salt removal. The data obtained on this fraction are complex as shown Figure 2-S in the Supporting Information. In this case, the possibility of having multiply charged ions drastically increases the number of combinations to calculate. Therefore, some of the parameters are restricted to well-defined values (such as the length of the chain, dp8, and the number of allowed N-acetyl groups, 0 in this case). Because of the level of purity of this fraction, neutral exchanges are considered but only one atom of sodium and one atom of potassium are allowed. The maximum number of charges and the accuracy needed to be restricted, according to the data obtained. With the possibility of having multiply charged ions, the increments of the m/z ratio between two different structures of similar molecular weight may be extremely small. Although not critical with MALDI-MS data, the mass accuracy can be a determining factor for the annotation of ESI-MS data. Therefore, in our conditions, the maximum level of charges detected was 7 and the variation of m/z value was set to 200 ppm. A portion of the list of annotations generated is presented in Figure 5. The main species are oligosaccharides carrying 12 and 11 sulfate groups with minor peaks potentially demonstrating the presence of chains carrying 13 sulfate groups (one 3-O-sulfation). The MS/MS data obtained on the fragmentation of the parent ion at m/z 383.6 (z ) 6, Mw ) 2307.8 g/mol), corresponding to a dp8 carrying 12 sulfate groups, have been
9212
Analytical Chemistry, Vol. 80, No. 23, December 1, 2008
annotated following the procedure described in the first example. Two structures were considered, one with the 12 sulfates on 2-O-, 6-O-, and N-positions only, and the other one allowing 3-O-sulfation. None of the fragments produced could help the distinction between the two patterns of sulfation (Figure 6), but almost 77% of the peaks selected could be annotated if 2 glycosidic bond cleavages and 1 cross-ring cleavage were allowed. CONCLUSIONS The examples presented here demonstrate the effective application of an enhanced software tool for the analysis, interpretation, and annotation of MS and MS/MS data from fractions of heparin and CS. In contrast, the GlycoWorkbench tool has already been adopted by many in the glycomics community as an essential tool for data interpretation. With the addition of the GAG component, GlycoWorkbench constitutes a unique and flexible resource for the determination of GAG structures. We described here the first version of this bioinformatics tool of which further developments are continuing. First, the use of this software tool greatly reduces the amount of time needed for the interpretation of complex MS-derived data. The graphical interface provides a comprehensive and easily understood way of representing a large amount of information which can help the user to annotate the data. Moreover, the use of a well tested computational tool decreases the probability of making mistakes inherent in a manual interpretation process. Second, the tool can generate ions with multiple charges and neutral cation exchanges and can compute fragments resulting from several types of cleavage events characteristic of different experimental approaches. The results of the computation are not dependent on specific assumptions, and the final decision is always in the hands of the expert user who possesses task-specific knowledge. Finally, the GlycoWorkbench tool with the GAG extension is publicly available and can be freely downloaded from http://www.eurocarbdb.org/applications/ms-tools. ACKNOWLEDGMENT B.T. and A.C. contributed equally to this work. We acknowledge the support of an RCUK Basic Technology Grant GR/S79268 (to A.D. and J.E.T.), Cancer Research UK (to J.G.), the Medical Research Council (MRC), and the Human Frontier Science Program (to J.E.T.), a BBSRC Professorial Fellowship (to A.D.), and MRC Senior Research Fellowship (to J.E.T.). B.T and A.K.P are supported by Research Councils UK and Biotechnology and Biological Sciences Research Council (BBSRC). A.C. is supported by the sixth European Union Research Framework Programme (EUROCarbDB RIDS Contract Number 011952). SUPPORTING INFORMATION AVAILABLE Additional information as noted in text. This material is available free of charge via the Internet at http://pubs.acs.org. Received for review July 3, 2008. Accepted September 29, 2008. AC8013753