HS Oligosaccharide Sequencing Tool - American Chemical

Aug 9, 2005 - enzymatic digestion, ESI-MS, and MSn for the sequencing of small ... application that was developed to aid in the integration and analys...
2 downloads 0 Views 780KB Size
Anal. Chem. 2005, 77, 5902-5911

Heparin Sequencing Using Enzymatic Digestion and ESI-MSn with HOST: A Heparin/HS Oligosaccharide Sequencing Tool Ola M. Saad and Julie A. Leary*

Genome Center, and Departments of Chemistry/Molecular and Cellular Biology, University of California, Davis, California 95616

Mass spectrometry, and specifically sequential stages of mass spectrometry (MSn), is an established tool for the analysis of carbohydrates, proteins, and more recently glycosaminoglycans. As this trend continues, the development of algorithms for the rapid and automatic interpretation of mass spectra to identify glycan structure is also expected to grow as an active field of research. The methodology described herein utilizes a combination of enzymatic digestion, ESI-MS, and MSn for the sequencing of small heparin oligosaccharides. The heparin oligosaccharide sequencing tool (HOST) is a basic software application that was developed to aid in the integration and analysis of the data generated from these experiments, facilitating the process involved in arriving at sequence information. The sequences of several heparin oligosaccharides were determined using this method to illustrate proof of principle. Tandem MS is a very rapid and efficient tool for oligosaccharide analysis when limited amounts of material are available. Having a means, such as HOST, for automating the interpretation of the MSn data generated from glycosaminoglycans, provides a practical methodology for the future analysis of heparin/HS oligosaccharides of unknown structure. Glycosaminoglycans (GAGs), proteins, and DNA are all important linear biopolymers. The sequencing of proteins and DNA to determine their primary structure has been facilitated by advancements in expression and amplification techniques. In contrast, GAG structural elucidation, like that of other oligosaccharides, remains hampered by their complexity, heterogeneity, and the limitations of the analytical methods used to study them. Heparin and heparan sulfate (HS) glycosaminoglycans are some of the most highly negatively charged biomacromolecules found in multicellular organisms and are important players in many physiological and pathophysiological conditions, making their structural characterization of particular interest. Unlike other complex carbohydrates, in which a branching structure is common, the primary structure of these sulfated polysaccharides resembles that of DNA and proteins in its linearity. In only a few cases, however, have specific sequences been identified as the structural motifs essential for modulation of various biological * To whom correspondence should be addressed. E-mail: [email protected]. Phone: (530) 754-4987. Fax: (530) 754-9658.

5902 Analytical Chemistry, Vol. 77, No. 18, September 15, 2005

processes.1-3 The classic example is that of a specific heparin pentasaccharide sequence identified as responsible for binding to antithrombin III, which plays an important role in hemostasis.3 The repeating unit for heparin/HS consists of the variably sulfated disaccharide [-4HexA(1,4)-GlcNAcR1-]n,4 which may be modified by N- and O-sulfation (6-O and 3-O-sulfation of the glucosamine and 2-O-sulfation of the uronic acid). Complete depolymerization of heparin/HS, using a combination of the bacterially derived enzymes, heparin lyases I, II, and III, from Flavobacterium heparinum, yields C4-C5 unsaturated uronic acid-glucosamine disaccharides. Correspondingly, controlled, partial digestion of heparin/HS (i.e., in which only one heparin lyase is employed or a shorter digestion time is used) results in a mixture of oligosaccharide products, disaccharides and larger, known to have primarily an even number of saccharide residues and an unsaturated uronic acid at the nonreducing end.5,6 Established methods for carbohydrate sequencing rely on using sequential digestions with exoglycosidases, in conjunction with radioactive or fluorescent labeling and high-resolution polyacrylamide gel electrophoresis or LC/CE separation.7,8 Such methods, comparable to using Edman degradation for peptide sequencing, have proven to be very useful, but can be quite laborious and time-consuming with longer oligosaccharides, and may require large amounts of sample depending on the sensitivity of the system and losses sustained by sample handling. Similar techniques have also been successfully applied to the sequencing of heparin/HS oligosaccharides.9,10 With the high detection sensitivity and molecular specificity of mass spectrometry, as well as MSn capabilities, this technique is now becoming an established (1) Capila, I.; Linhardt, R. J. Angew. Chem., Int. Ed. 2002, 41, 391-412. (2) Wu, Z. L. L.; Zhang, L. J.; Yabe, T.; Kuberan, B.; Beeler, D. L.; Love, A.; Rosenberg, R. D. Glycobiology 2002, 12, 685-685. (3) Casu, B.; Oreste, P.; Torri, G.; Zoppetti, G.; Choay, J.; Lormeau, J. C.; Petitou, M.; Sinay, P. Biochem. J. 1981, 197, 599-609. (4) Varki, A., Cummings, R., Esko, J., Freeze, H., Hart, G., Marth, J., Eds. Essentials of Glycobiology; Cold Spring Harbor Laboratory Press: Cold Spring Harbor, NY, 1999. (5) Pervin, A.; Gallo, C.; Jandik, K. A.; Han, X. J.; Linhardt, R. J. Glycobiology 1995, 5, 83-95. (6) Thanawiroon, C.; Rice, K. G.; Toida, T.; Linhardt, R. J. J. Biol. Chem. 2004, 279, 2608-2615. (7) Jackson, P. Mol. Biotechnol. 1996, 5, 101-123. (8) Rudd, P. M.; Dwek, R. A. Curr. Opin. Biotechnol. 1997, 8, 488-497. (9) Vives, R. R.; Pye, D. A.; Salmivirta, M.; Hopwood, J. J.; Lindahl, U.; Gallagher, J. T. Biochem. J. 1999, 339, 767-773. (10) Merry, C. L. R.; Lyon, M.; Deakin, J. A.; Hopwood, J. J.; Gallagher, J. T. J. Biol. Chem. 1999, 274, 18455-18462. 10.1021/ac050793d CCC: $30.25

© 2005 American Chemical Society Published on Web 08/09/2005

tool for the analysis of glycosaminoglycans.11-15 As the size and complexity of the carbohydrates increase, the development of algorithms for the rapid and automatic interpretation of mass spectra to identify glycan structure is also expected to grow as an active field of research.16 For heparin oligosaccharides, the disaccharide composition obtained can also be used toward its complete sequence analysis as shown previously in work by Venkataraman et al.17 In their study, sequential, selective heparinase digestions were also performed, and the saccharide products were detected as noncovalent complexes of the oligosaccharides with basic peptides by MALDI-MS.17 The information gathered could then be used collectively to determine the structure of the heparin oligosaccharides. One of our laboratory’s interests lies in the development of methodologies for determining the sequence of biologically important heparin/HS oligosaccharides by using ESI-MS. Previous work has shown that a combination of ESI-MS and MS/MS can be used for the differentiation and quantification of constituent disaccharides upon enzymatic depolymerization of heparin/HS samples.18 Upon exhaustive depolymerization to disaccharides, the information regarding uronic acid epimerization is lost, but sites of sulfation remain. Piecing together the sequence of the disaccharide building blocks provides a primary representation of the oligosaccharide structure, specifically with regard to the pattern of sulfation sites. This information in turn, has been shown to be quite important with regard to the specificity and function of many glycosaminoglycan-binding proteins, the majority of which have been shown to interact with HS or heparin GAGs.4 Additional studies carried out in our laboratory, also involve identifying such GAG-binding specificities that may differentiate particular chemokines involved in the inflammatory response, as well as probing the substrate specificity of a novel human endosulfatase implicated in the progression of certain cancers.19 This paper reports on the combined technology of utilizing enzymatic digestion, ESI-MS, MSn, and a new software tool for the rapid sequencing of small heparin oligosaccharides. The heparin oligosaccharide sequencing tool (HOST) we describe is a basic software application that was developed to aid in the integration and analysis of the data generated, facilitating the process involved in arriving at sequence information. The goal of this computational tool is to aid the user by accepting and merging the data acquired from two sets of experiments: the disaccharide composition obtained upon enzymatic digestion and topological information obtained from a set of MSn experiments applied to the intact heparin oligosaccharide. After information such as disaccharide composition, sample history (i.e., whether the oligosaccharide was isolated from a heparin lyase I or heparin lyase III digestion), and product ion masses have been entered, a list (11) Desaire, H.; Leary, J. A. J. Am. Soc. Mass Spectrom. 2000, 11, 916-920. (12) Zaia, J.; Costello, C. E. Anal. Chem. 2003, 75, 2445-2455. (13) Zamfir, A.; Seidler, D. G.; Kresse, H.; Peter-Katalinic, J. Rapid Commun. Mass Spectrom. 2003, 17, 265-265. (14) Saad, O. M.; Leary, J. A. J. Am. Soc. Mass Spectrom. 2004, 15, 1274-1286. (15) Zaia, J. Mass Spectrom. Rev. 2004, 23, 161-227. (16) von der Lieth, C. W.; Bohne-Lang, A.; Lohmann, K. K.; Frank, M. Briefings Bioinform. 2004, 5, 164-178. (17) Venkataraman, G.; Shriver, Z.; Raman, R.; Sasisekharan, R. Science 1999, 286, 537-542. (18) Saad, O. M.; Leary, J. A. Anal. Chem. 2003, 75, 2985-2995. (19) Saad, O. M.; Ebel, H.; Uchimura, K.; Rosen, S. D.; Bertozzi, C. R.; Leary, J. A. Glycobiology. In press.

of all possible sequences is generated and evaluated against the MSn data. While there are algorithms available for determining probable amino acid sequences for a peptide given MS/MS data,20 this technology has only recently been investigated for the analysis of oligosaccharides.21-23 Herein, we demonstrate the feasibility of our approach and describe the use of HOST to assist in the sequencing of heparin/HS oligosaccharides. EXPERIMENTAL SECTION General Materials and Methods. Heparin lyases I (EC 4.2.2.7), II (no EC number), and III (EC 4.2.2.8), as well as all heparin disaccharides used, were obtained from Sigma Chemical Co. (St. Louis, MO) or Calbiochem (La Jolla, CA). Each enzyme lot was tested to ensure no contaminating sulfatase activity was present, by overnight reaction with the trisulfated heparin disaccharide, ∆UA2S-GlcNS6S (IS). Two heparin hexasaccharides, Hexa 1 and Hexa 2, utilized in this study were provided by Dr. Zachary Shriver and Dr. Ganesh Venkataraman (Momenta Pharmaceuticals, Cambridge, MA). The saccharides were generated from heparin lyase III digestions of HS, and isolated by gel permeation chromatography, followed by strong anion-exchange high performance liquid chromatography (SAX-HPLC), and their structures determined independently by a combination of CE and MALDI-MS methodologies.24 Unless otherwise specified, all other reagents were obtained from Sigma (St, Louis, MO) and solvents used were of HPLC grade and purchased from Fisher (Santa Clara, CA). Tetrasaccharide Separation. A mixture of unsaturated tetrasaccharides produced by heparin lyase I cleavage, followed by gel permeation chromatography, was purchased from Dextra Labortories. From this mixture, several variously sulfated heparin tetrasaccharide substrates were isolated by analytical SAX-HPLC (Phenomenex SAX-HPLC column, Torrence, CA). A linear gradient of 0-1 M NaCl at pH 3.5 over 180 min was used at a flow rate of 1 mL/min. Separation was monitored by UV absorption at 232 nm and fractions collected were desalted using 1-kDa DispoBiodialyzer (The Nest Group, Inc., Southborough, MA). Compositional Analysis. Compositional analysis of oligosaccharides was accomplished by exhaustive enzymatic digestion using heparin lyases I-III, followed by ESI-MS/MS as described previously.19 Briefly, digestions of 10-µg samples of heparin oligosaccharides were carried out in 75 µL of 20 mM ammonium acetate buffer, pH 7.5, containing 2 mM Ca(OAc)2 and 0.01 unit each of heparin lyases I, II, and III and incubated at 37 °C for 16 h. One international unit (1U) produces 1 µmol of unsaturated uronic acid per minute at 37 °C. The enzymatic digestion was quenched by adding 200 µL of MeOH, followed by 20 µL of an aqueous solution (0.2 M) of ammonium hydroxide and 105 µL of water to make the solution 1:1 MeOH/H2O. A 10-µL aliquot of the sample was diluted further 10-fold into final solution of 1:1 MeOH/H2O, 10 mM ammonium hydroxide and containing 5 µM (20) Dass, C. In Principles and Practice of Biological Mass Spectrometry; John Wiley & Sons: New York, 2001; pp 217-242. (21) Gaucher, S. P.; Morrow, J.; Leary, J. A. Anal. Chem. 2000, 72, 2331-2336. (22) Lohmann, K. K.; von der Lieth, C. W. Proteomics 2003, 3, 2028-2035. (23) Ethier, M.; Saba, J. A.; Spearman, M.; Krokhin, O.; Butler, M.; Ens, W.; Standing, K. G.; Perreault, H. Rapid Commun. Mass Spectrom. 2003, 17, 2713-2720. (24) Rhomberg, A. J.; Ernst, S.; Sasisekharan, R.; Biemann, K. Proc. Natl. Acad. Sci. U.S.A. 1998, 95, 4176-4181.

Analytical Chemistry, Vol. 77, No. 18, September 15, 2005

5903

concentration of the internal standard, ∆UA2S-GlcNCOEt6S (I-P), for a total concentration of 50 µM heparin disaccharides. Samples were analyzed by ESI-MSn without further purification. Mass Spectrometry. All mass spectra were obtained using an electrospray ionization source on a quadrupole ion trap instrument (ThermoFinnigan LCQ, San Jose, CA). The data acquisition software used was Xcalibur, version 1.2. Spectra were obtained in negative ion mode using a spray voltage at 3.8 kV and a capillary temperature of 200 °C for all experiments. The automatic gain control was set to 1 × 107 counts for full-scan MS and to 2 × 107 for MSn experiments. Each mass spectrum obtained consists of an average of 10-20 scans. Heparin oligosaccharide digest samples were introduced by flow injection analysis using a 5-µL injection loop and a flow rate of 20 µL/min using a Harvard syringe pump and 1:1 MeOH/H2O as the solvent. For MS2 experiments on the disaccharides obtained from the digest, selection of each precursor ion was achieved using an isolation width of 3 Da, the ion was activated at 0.6-0.8 V (29% normalized collision energy) for 100 ms, and the qz value was maintained at 0.25. Heparin intact oligosaccharides were sprayed at a concentration of 20 pmol/µL, from a 1:1 MeOH/H2O solution with 10 mM NH4OH at a flow rate of 5 µL/min. For MSn experiments for the heparin tetrasaccharides and hexasaccharides, selection of each precursor ion was achieved using an isolation width of 3 Da, the ion was activated (20-22% normalized collision energy) for 100 ms, and the qz value was maintained at 0.25. The slower, zoom scan function was performed on all product ions with a 10-Da mass range to determine the charge state of the ion. HOST Software. Generating All Possible Sequence Permutations. The HOST algorithm first generates N possible permutations of the arrangement of the disaccharide building blocks specified by the user. The number of unique sequences evaluated (N) is determined as, N ) d!/r!(d - r)!, where d is the degree of polymerization of the saccharide (i.e., the number of disaccharides units comprising the oligosaccharide) and r is the number of distinct, nonredundant disaccharides present. Sample History. The HOST application can also incorporate valuable information from the heparin oligosaccharide’s history, such as from which specific heparin lyase digestion the saccharide was generated. This information is typically available to the user and can be applied in order to eliminate possible sequence permutations that do not conform to the enzymes’ specificity. For example, the substrate specifity recognized for heparin lyase III indicates that it will cleave via a β-elimination mechanism at the nonreducing end of nonsulfated glucuronic acid moieties, (i.e., at the cleavage site, GlcNY6X-GlcA, where X may be SO3, and Y may be SO3, Ac, or H).25-27 Larger oligosaccharides generated from such a heparin lyase III digestion, therefore, would also be expected to have a nonsulfated uronic acid moiety at the nonreducing end. Hence, theoretical sequences generated by HOST, which contained a disaccharide with a sulfated uronic acid at the nonreducing end, could be disregarded as unlikely, if the user chooses to take into account the biological origins of the sample and the substrate specificity of heparin lyase III. Heparin lyase I, similarly, has its particular sequence specificity and is found to (25) Desai, U. R.; Wang, H.; Linhardt, R. J. Arch. Biochem. Biophys. 1993, 306, 461-468. (26) Lohse, D. L.; Linhardt, R. J. J. Biol. Chem. 1992, 267, 24347-24355. (27) Casu, B.; Lindahl, U. Adv. Carbohydr. Chem. Biochem. 2001, 57, 159-206.

5904

Analytical Chemistry, Vol. 77, No. 18, September 15, 2005

cleave only between glucosamine residues where the 2-amino position is sulfated and a sulfated hexauronic acid moiety (GlcNS6X-IdoA2S). Again, this information can be used to narrow down possible sequences by removing those that have neither a sulfated uronic acid at the nonreducing end nor a sulfated amino group at the reducing end. For heparin oligosaccharides generated by heparin lyase II digestion, no sequences can be confidently eliminated at this time considering the enzyme’s broad substrate specificity. If no enzyme information is provided, HOST will simply continue to execute the analysis using all possible sequence permutations. Generating in Silico MSn Fragmentations. For each oligosaccharide sequence generated, HOST then calculates theoretical fragmentations, considering possible dissociation product ions that would be generated from tandem mass spectrometry experiments as predicted from previous MSn experiments and mechanistic studies performed on smaller heparin saccharides.14 Included in the theoretical fragmentation database generated by HOST are product ions that correspond to B and Y glycosidic cleavages, as per the Domon-Costello nomenclature.28 This series of ions provides an “oligosaccharide ladder sequence” and also addresses ions that could be generated by multiple glycosidic bond cleavages, as may be observed in successive stages of tandem mass spectrometry experiments. In addition, a specific and characteristic cross-ring cleavage that has been observed to occur from the reducing end of the heparin molecule (0,2A-ion) was also incorporated into the theoretical list of product ions generated by the HOST algorithm. For a heparin oligosaccharide containing up to six disaccharide units (12-mer), a list of all possible unique saccharide sequences and the generation of all theoretically possible MS relevant fragments for each sequence can be generated within a few minutes (Pentium M 1.4 GHz, 760-MB RAM). Upon generating these in silico arrays of possible product ions for each theoretical sequence, the program will then match these ions to those observed in several experimental MSn spectra as entered by the user and within a margin of mass error also defined by the user (e.g., (0.5 Da). All ions are entered using two columns starting with the m/z value and the charge state of the ion, z, as experimentally observed and determined from the ion’s isotope distribution. These values are then converted into masses of neutral fragments so that they may be compared directly to the in silico data (i.e.. neutral mass ) (m/z)z + 1.0078z), where 1.0078 is the monoisotopic mass of H. A related score for each theoretical sequence is then generated on the basis of the number of ions that match. Every product ion entered by the user from a set of MSn spectra is evaluated against possible product ions for each theoretical sequence and classified in one of three ways; if the ion is an “exact” match within the user’s specified margin of error, it is considered a “hit” and given a score of 1, if it corresponds to a match only upon considering the additional loss of a sulfate group, it is identified as a “tentative match” and is scored at half the weight (0.5), and finally if the ion does not match either of these criteria it is given a score of 0. The partial score for a tentative match was incorporated into the scoring mechanism used by the HOST algorithm after considering observations from previous studies, in which several product ions generated from (28) Domon, B.; Costello, C. E. Glycoconjugate J. 1988, 5, 397-409.

Figure 1. Structure of heparin/HS oligosaccharides as generated upon partial depolymerization by heparin lyases (n ) 0, 1, 2, ...). The repeating disaccharide unit shown in brackets consists of a uronic acid (R-L-iduronic acid or β-D-glucuronic acid) with a 1-4 linkage to R-D-glucosamine. The functional groups that can be modified by sulfation include the 2-O, 3-O, and 6-O positions (R) and the amino group that may be N-sulfated or N-acetylated (Y).

particular dissociations tend to show a loss of a labile sulfate in combination with the glycosidic cleavage (e.g., saccharides having a sulfate group on the nonreducing ring uronic acid, consistently formed B1 product ions as well as B1 - SO3 ions).14 These ions, however, are given only half the scoring weight since they are not as definitive in evaluating the structure of the heparin oligosaccharide as other ions generated, in which all sulfate groups are accounted for (i.e., such as a pair of complementary B and Y ions). A total score is calculated for each theoretical sequence in the pool. Finally, the sequences are ranked according to those for which the in silico fragmentation best matched the product ion spectra as entered by the user, and HOST returns the list of possible sequences ranked according to their relative scores, with the highest score indicating the best match. Calculation of Degree of Polymerization, Sulfation, and Acetylation of Heparin Oligosaccharide. Starting from the ESIMS data of the intact oligosaccharide, the heparin oligosaccharide mass can be determined, and from this, its unique overall composition can also be deduced, i.e., number of sulfates, N-acetyl groups, and length of the oligosaccharide up to a tetradecasaccharide in size.17 This is based on the fact that the core repeating disaccharide unit of heparin/HS is known to be an uronic acid (R-L-iduronic acid or β-D-glucuronic acid) 1,4-linked to R-Dglucosamine. For the core disaccharide unit, where all substitutions at the positions denoted R2, R3, R6, and Y are hydrogen atoms (Figure 1), the elemental composition can be described as C12H19O10N, with a total monoisotopic mass of 337.10 Da. As previously described, in the heparin structure the variable decorations are either sulfation (SO3, 79.96 Da) at various hydroxyl positions as shown in Figure 1 or acetylation (COCH3, 42.01 Da) at the 2-amino position of the glucosamine. The smallest difference in mass that can be accounted for in a larger oligosaccharide is 4.06 Da, which is the difference between the mass of two acetate groups versus a sulfate group. This allows identification of any heparin oligosaccharide generated by enzymatic digestion (i.e., having an unsaturated uronic acid moiety at the nonreducing end) using the empirical formula (C12H19O10N)d(SO3)s(COCH3)a, where d is the degree of polymerization, or the number of disaccharide units comprising the oligosaccharide, s is the number of sulfate groups, and a is the number of acetylated amino groups.29 Given a measured mass, M, the HOST program will solve the equation, (29) Wu, Z. L. L.; Lech, M.; Beeler, D. L.; Rosenberg, R. D. J. Biol. Chem. 2004, 279, 1861-1866.

Figure 2. Schematic overview that depicts the approach and algorithm used for sequencing of heparin oligosaccharides (dp ) degree of polymerization, SO3 ) sulfate group, Ac ) acetate group).

M ) 337.10d + 79.96s + 42.01a, in an iterative fashion to determine values for d, s, and a, to yield the given value of M within a small margin of error as specified by the user. The only constraints put on the solution are that a e d and s e 4d. Interface. The new tool, HOST, is a simple, downloadable, Excel-based workbook designed as a software application to aid in the integration of disaccharide compositional analysis data and tandem mass spectrometry data (see Supporting Information). The data are brought together for the purpose of obtaining sequence information for different low molecular weight heparin/HS oligosaccharides, theoretically up to a dodecamer in length, generated by heparinase-depolymerization. The interface is a simple Excel spreadsheet, which utilizes macros written in visual basic script (Microsoft VBA 6.3). RESULTS AND DISCUSSION Our methodology for the sequencing of heparin/HS oligosaccharides utilizes a combination of disaccharide composition analysis using an established MS method after complete enzymatic degradation by a mixture of heparinases19 and direct MSn analysis of the saccharide using an ion trap mass spectrometer. Figure 2 Analytical Chemistry, Vol. 77, No. 18, September 15, 2005

5905

illustrates the approach. Each HS oligosaccharide was subjected to two experiments. Complete depolymerization of one sample was carried out by overnight digestion using a mixture of heparin lyases, and the disaccharide composition was determined, as previously shown by ESI-MS and MSn.19 The other sample was analyzed without digestion, by ESI-MS and MSn. From disaccharide compositional analysis of the heparin oligosaccharide digest, the identities of the building block constituents of the saccharide, as well as their relative ratios in forming the complete heparin molecule, could be determined. What remains to be established is their order within the molecule, i.e., their sequence. This is where the second step of our methodology is introduced; from sequential MS/MS experiments, a wealth of information can be obtained. For example, product ions formed upon collision-induced dissociation (CID) of a heparin molecule each represent smaller oligosaccharides, which are contained in the larger species. The three main types of dissociation observed in the product ion spectra were as follows: (1) neutral loss of a small molecule (SO3, H2O, or CO2), (2) glycosidic cleavage, and (3) cross-ring cleavage. Those dissociations resulting in glycosidic and cross-ring cleavages were of most interest because they generate product ions for each saccharide that provide information on sequence and identify positions of sulfation. For MSn analysis, isolation and activation of the molecular ion with a high charge state yields a spectrum consisting of product ions generated mainly through various glycosidic and cross-ring cleavages, with minimal loss of sulfate.30 The collision-induced dissociation of the oligosaccharide produces a series of B- and Y-ions, as well as distinctive 0,2A- and X-ions from the reducing and nonreducing ends, respectively. Similar to peptides and proteins, heparin is a linear polymer, so many aspects of our algorithm borrow directly from that currently used for analysis of peptide MS/MS data. One issue that slightly complicates the analysis of the heparin oligosaccharide data, however, is determining the charge state for each product ion that is formed. With the ion trap instrument used in a mode to acquire the full scan, there is insufficient sensitivity and resolution to acquire this information. Instead, a second step is necessary in order to perform a higher resolution zoom scan on particular regions to identify the charge state of ions within that mass range. Several examples are illustrated below, including analysis of two heparin tetrasaccharides and two HS hexasaccharides. Sequencing of the tetrasaccharides, Tetra 2 and Tetra 4, is briefly described first without the use of the HOST program to explain the sequencing process. Structures of Hexa 1 and Hexa 2, were then determined using the same process, but with the assistance of HOST. Sequencing of Heparin Tetrasaccharides (Tetra 2 and Tetra 4). A tetrasaccharide mixture produced by partial heparin lyase I digestion of heparin was obtained from Dextra Laboratories and separated by SAX-HPLC with UV detection at 232 nm. Several tetrasaccharides were separated as shown in Figure 3A, followed by isolation and sequencing. Tetra 2 eluted at 82 min, just before the fully sulfated tetrasaccharide (containing six sulfates), and Tetra 4 eluted prior to this at 68 min (Figure 3A). Shown in Figure 3B is the mass spectrum for intact Tetra 2, which was identical (30) Naggar, E. F.; Costello, C. E.; Zaia, J. J. Am. Soc. Mass Spectrom. 2004, 15, 1534-1544.

5906

Analytical Chemistry, Vol. 77, No. 18, September 15, 2005

to that of Tetra 4 (data not shown). From the [M - 5H]5molecular ion observed at m/z 213.8 by ESI-MS, the saccharides were determined to be isomers with a molecular mass of 1074 Da. The structure consistent with this molecular weight is that of a tetrasaccharide with five sulfates and no acetates. Upon compositional analysis using enzymatic digestion and MS/MS, the two disaccharide constituents were identified as disaccharides IS and IIS for Tetra 2 (Figure 3C,D) and IS and IH for Tetra 4 (Figure 3C,E). The identification of the second disaccharide (IIS or IH), was based on the MS2 data obtained upon collision-induced dissociation of m/z 247.7 as shown in Figure 3D and E.14 In both cases, the positioning of the two disaccharides within the tetrasaccharide could be identified from one MS/MS experiment applied to each intact tetrasaccharide (MS2 213.8 f). Those cleavages that were observed in each case and significant in determining the sequence are noted in Figure 3F. For Tetra 2, the masses of two complementary product ions corresponding to B2 and Y2 ions, identified the sequence as ∆UA2S-GlcNS6S-HexAGlcNS6S. In the case of Tetra 4, it was a cross-ring cleavage (0,2A4) that was important in determining the sequence. Upon CID of the molecular ion, a 138-Da loss was observed, corresponding to a 2-sulfate-amino-1-ethenol (C2H4NO4S, 138 Da), as opposed to an unsubstituted amino-1-ethenol (C2H5NO, 59 Da). Since this common cross-ring cleavage has only been observed to occur from the reducing end of heparin saccharides,14 this places disaccharide IH at the nonreducing end of the tetrasaccharide, for a complete sequence of ∆UA2S-GlcN6S-HexA2S-GlcNS6S. Sequencing results for both tetrasaccharides were confirmed by heparin lyase digestions. Incubation of Tetra 2 with heparin lyase III resulted in production of the two constituent disaccharides, and a mass spectrum similar to that shown in Figure 3C, indicating that disaccharide IIS had to be at the reducing end of the molecule so that there would be a nonsulfated uronic acid at the site of heparinase cleavage. If, instead, disaccharide IS was at the reducing end, then the saccharide would have been resistant to heparin lyase III cleavage. In contrast, Tetra 4 was found to be resistant to both heparin lyase III and heparin lyase I digestion, resulting in a mass spectrum similar to that shown in Figure 3B. This is consistent with the structure ∆UA2S-GlcN6S-HexA2SGlcNS6S. The nonsulfated amino group on the second monosaccharide residue, renders the saccharide resistant to heparin lyase I digestion, while the 2-O-sulfation of the third monosaccharide residue causes the saccharide to be resistant to heparin lyase III digestion. When all three heparin lyases were used in combination, to determine disaccharide composition, the tetrasaccharide was digested into disaccharides IS and IH (Figure 3C and E). This can be explained by the saccharide’s susceptiblity only to heparin lyase II digestion, since it is not sequence specific. Sequencing of HS Hexasaccharide 1 (Hexa 1). In the case of Hexa 1, we began with a saccharide of unknown sequence, obtained from a heparin lyase III digest of HS. ESI-MS of the saccharide indicated a molecular weight of 1293.3 Da obtained from the [M - 3H]3- molecular ion identified at m/z 430.1. It was determined that this HS oligosaccharide corresponds to a hexasaccharide with three sulfates and one acetate as calculated from step 1 shown in Figure 4. In this first entry, the experimentally determined mass for the oligosaccharide is input with a 0.5Da error margin. Using our compositional analysis procedure with

Figure 3. Structural characterization of two tetrasaccharides, Tetra 2 and Tetra 4, isolated from a partial heparin lyase I digest of heparin. (A) Analytical SAX-HPLC of a heparin tetrasaccharide mixture. (B) ESI-MS spectrum of a solution of isolated heparin tetrasaccharide, Tetra 2. Tetra 4 looked the same. (C) ESI-MS spectrum of the disaccharide products after exhaustive digestion of Tetra 2 with a combination of heparin lyases I, II, and III. Disaccharide I-S (∆UA2S-GlcNS6S) was identified by its measured m/z value of 191.5, and quantification was performed using the internal standard I-P, (∆UA2S-GlcNCOEt6S). The MS1 spectrum for the compositional analysis of Tetra 4 was indistinguishable. (D) MS2 spectrum (m/z 247.7 f) of Tetra 2 digest. Product ions at m/z 168.6 and 338 identified the second disaccharide species as IIS (∆UAGlcNS6S). (E) MS2 spectrum (m/z 247.7 f) of Tetra 4 digest. Product ion at m/z 218.1 identified the second disaccharide species as IH (∆UA2SGlcN6S). (F) Structures of Tetra 2 and Tetra 4, with major product ions used for sequence identification indicated.

a combination of MS and MS/MS experiments, we were able to identify the three disaccharides that make up the hexasaccharide in a 1:1:1 ratio, as IVS, IIIS, and IVA (data not shown). Disaccharide IVS contains one sulfate, IIIS has two sulfates, and IVA has one acetate, consistent with the composition identified solely from the mass of the oligosaccharide. Within the HOST application each disaccharide is represented with a one-letter code as depicted in Figure 4, step 2. In this step, the disaccharide composition is entered by the user along with the identity of the heparin lyase by which the oligosaccharide was generated. We also obtained data from several stages of MS/MS of the intact hexasaccharide. Shown in Figure 5A is the MS2 spectrum obtained upon CID of the [M - 3H]3- molecular ion at m/z 430.1, and a subsequent MS3 spectrum of m/z 407.1 is shown in Figure 5B. Using the zoom scan feature on the LCQ ion trap, the charge states of each of these product ions could also be determined. Upon input of all product ions from these two MSn experiments (Figure 4, step 3)

along with the identity of the three disaccharide constituents (Figure 4, step 2), the HOST application was initiated. There are six possible arrangements for the disaccharides that would result in different sequences; ADF, AFD, FDA, FAD, DAF, and DFA. HOST generates all six permutations and determines whether any sequences can be eliminated, based on the substrate specificity of the enzyme by which the sample was generated. It then presents the in silico fragmentations for each sequence. As shown by the final list presented in Figure 4, step 4, two of the sequences, FDA and FAD, were eliminated based on the information provided in step 2 that the oligosaccharide was a product of a heparin lyase III digestion. The two arrangements, in which disaccharide IIIS would be at the nonreducing end (FAD or FDA), were both eliminated since the uronic acid in this disaccharide is sulfated. The resultant report lists the four remaining possible sequences (Figure 4, step 4), with the highest ranked one showing a normalized score of 100, and the next sequence showing a Analytical Chemistry, Vol. 77, No. 18, September 15, 2005

5907

Figure 4. Screen shot of HOST interface in Microsoft Excel. In steps 1-3, preliminary data are entered by the user. Results of ranked sequence matches are returned in step 4.

ranking one-fifth as likely, with a score of 20. Shown in Figure 6A are the in silico fragmentations for two of the possible sequences generated, DAF and DFA, and the product ions that were matched to a user-entered product ion. In Figure 6A, fragments that were “exact” matches are yellow, and those that were a “tentative match” based on the loss of SO3 are purple. The sequence DAF only showed one ion that may have been a tentative match, whereas for sequence DFA, there were six matches to the experimental MSn data entered; four exact and two tentative matches. Summing the number of matched ions for each sequence is the next step of the HOST algorithm, as depicted in the screen shot of the Excel spreadsheet shown in Figure 6B. In row 3 of the spreadsheet is the list of all experimentally observed neutral dissociation product masses from the MS2 and MS3 spectra as calculated from various product ions and their determined charge state (described above). The number of “total hits” for each of 5908

Analytical Chemistry, Vol. 77, No. 18, September 15, 2005

the four theoretical sequences is shown in the rightmost column. Finally, HOST ranks all theoretical sequences based on their total score (i.e., how well they match the acquired MSn data). The final sequence returned for the hexasaccharide is DFA or the structure ∆UA-GlcNS-HexA2S-GlcNS-HexA-GlcNAc. This was subsequently confirmed with our collaborators, who independently determined the structure using capillary electrophoresis, sequential enzymatic degradation steps, and MALDI-MS. Data from only two MSn experiments in combination with the sample’s history were used to determine the structure of Hexa 1. The option to use further stages of MSn and generate more data to incorporate into the sequencing process is also possible. At this point, data from up to four stages of sequential tandem mass spectrometry (MS4) can be incorporated into HOST. Sequencing of HS Hexasaccharide 2 (Hexa 2). The second hexasaccharide analyzed was also obtained from a heparin lyase III digest of HS. ESI-MS of the saccharide indicated a mass of

Figure 5. Tandem mass spectra for Hexa 1, [M - 3H]3-. (A) is the MS2 spectrum m/z 430.13-f, and (B) is the MS3 spectrum m/z 430.13-f 407.12-f. (C) Structure of Hexa 1, with major product ions used for sequence identification indicated.

1373.6 Da obtained from the [M - 4H]4- molecular ion identified at m/z 342.4 (data not shown). This mass corresponds to a hexasaccharide with four sulfates and one acetate group. From compositional analysis it was determined that Hexa 2 was not a single entity. There were at least five different disaccharides present, including IS (m/z 191.43-), IVA (m/z 378.11-), IVS (m/z 416.11-), a possible combination of IIS/IIIS, and small amounts of IH (m/z 247.72-). The main disaccharide species present were disaccharide IS containing three sulfates, IVS with one sulfate group, and IVA containing one acetate group. Together these disaccharides are consistent with the composition identified solely from the mass of the oligosaccharide. Disaccharides IS, IVS, and IVA comprised 93% of the total disaccharide composition, while

Figure 6. Screen shots of the different sequencing steps of HOST. (A) Generation of all theoretically possible MS relevant fragments for sequences DAF and DFA. Masses that are matched to an experimental product ion identified and entered by the user are highlighted. (B) The number of matched fragmentations for each sequence is summed and a total score calculated for each sequence.

the additional disulfated disaccharides present made up the remaining 7% of the composition. Although it was determined that Hexa 2 was not pure, this example can serve to illustrate another useful feature of this sequencing methodology. Samples containing microheterogeneity can still be examined because the ion of interest is isolated in the gas phase prior to collision-induced dissociation. Data from three MSn spectra were entered into HOST, as well as the information that the oligosaccharide was generated from a heparin lyase III digestion. Two of the six possible arrangements were eliminated based on the sequence specificity of heparin lyase III. The remaining four sequences, ADH, AHD, DAH, and DHA were ranked based on how well their in silico fragmentations matched the product ions entered. In this case, the correct sequence was Analytical Chemistry, Vol. 77, No. 18, September 15, 2005

5909

Figure 7. Report generated by HOST for data analysis of disaccharide composition and MSn spectra from HS oligosaccharide Hexa 2.

returned as the best match, ∆UA-GlcNS-UA2S-GlcNS6S-UAGlcNAc (Figure 7). As illustrated in the examples above, HOST is a useful and powerful tool in evaluating a set of MSn data such as those obtained in Figure 5A and B. These spectra were acquired simply in a data-dependent fashion; the molecular ion was identified, followed by selection and fragmentation to produce an MS/MS spectrum. The most intense ion of the MS/MS spectrum was then selected and fragmented further; this resulted in the MS3 spectrum shown (Figure 5B). In this process, HOST provides a list of possible sequences along with their in silico fragmentations, and from this list the best matching sequence can be selected, allowing one to return and assign those product ions that were matched, as was done in Figure 5C. However, even if there is insufficient or no MSn data available, the HOST application can still facilitate the sequencing process if the disaccharide composition is known. 5910 Analytical Chemistry, Vol. 77, No. 18, September 15, 2005

HOST can provide the arrays of product ions that would be expected from complete fragmentation of each possible sequence. This information in itself can be useful in directing the user to which particular MSn experiments would be of greatest value in definitively identifying the oligosaccharide sequence and differentiating it from other likely sequence candidates. In these particular examples, we were able to obtain sufficient product ion information to unambiguously determine the sequence of the saccharide from just one MS2 and MS3 spectrum. There is also additional information available, however, from MS/MS analysis of different charge states of the same oligosaccharides, as well as further stages of MS/MS. Current exploratory efforts utilize the data-dependent acquisition features on the Thermo Electron LTQ for the generation of “ion tree” experiments, which can provide a wealth of information for incorporation into our sequencing algorithm, to even further optimize our sequencing

capability and perhaps even dispense with the need for the disaccharide composition information. CONCLUSION Our group and others have sought to exploit the use of MS with its sensitivity and speed by avoiding separation steps for the analysis of GAGs.11-14,18 In the work presented herein, we have demonstrated the successful use of disaccharide compositional analysis and tandem mass spectrometry in the sequencing of several isolated heparin oligosaccharides. A major advantage of HOST is that a list of all possible structures is generated quickly with a scoring system that puts the more likely structures at the top of the list and relates subsequent structural sequences to the best match of the series. This is done by giving the best matching sequence a score of 100 and normalizing subsequent scores to the first-ranked sequence. A large difference (>15%) between the score for the best matching sequence and that of the secondranked sequence improves the probability of identifying the correct heparin oligosaccharide structure. Another benefit is that the program is entirely modifiable; each step of the process can be monitored, allowing the user to interrogate individual sequence possibilities and view which product ions were identified as hits and to which particular structure they correspond to. The user decides which ions to include in the MSn data input and how many stages of MSn data to evaluate. The more MS/MS fragmentation data provided for analysis, however, the greater likelihood that the first-ranked sequence will be a correct identification rather than a false positive. At this time, the HOST program clearly supports the sequencing process of heparin oligosaccharides, but it still requires further improvements. Limitations to the current system and future developments include devising the ability to incorporate sequencing of oligosaccharides with an odd number of monosaccharide units, those generated by other depolymerization methodologies, as well as tandem mass spectrometry methods to identify the missing uronic acid epimerization information (whether R-iduronic acid or β-glucuronic acid). We would also like to address those rare sites of 3-O-sulfation and how they influence heparin oligosaccharide fragmentation processes. At this point, we have been able to identify such sites solely by manual interpretation of the MSn data, after determining the degree of sulfation and acetylation of a particular oligosaccharide based on its mass. Due to the complexity of the tandem mass spectrometry data obtained at this time, our work has required the initial use of the compositional

analysis of the heparin oligosaccharide as a starting point, followed by ranking of the possible sequences to match MSn data obtained. Current work seeks to expand the application of HOST to incorporate an alternative starting point using only the mass of the oligosaccharide and no disaccharide composition information. In this system, the length of the saccharide, number of sulfation sites, and number of acetylation sites can be determined by an equation such as M ) (C6H8O5)u(C6H11O4N)g(SO3)s(COCH3)a. From the measured mass of the oligosaccharide, HOST would then generate all possible permutations of these building blocks using only criteria inherent in the glycosaminoglycan structure, without having to know disaccharide composition. In summary, the sequences of several small heparin/HS oligosaccharides were determined by enzymatic digestion, followed by ESI-MS and MSn analysis of the intact oligosaccharide species. The information obtained was integrated using the HOST application. Collectively, the results obtained suggest that this may provide a practical methodology for the future sequencing analysis of heparin/HS oligosaccharides of unknown structure. ACKNOWLEDGMENT O.M.S. and J.A.L gratefully acknowledge the NIH for funding this research (Grant GM47356). Heparin hexasaccharides, Hexa 1 and Hexa 2, were provided by Dr. Zachary Shriver and Dr. Ganesh Venkatamaren (Momenta Pharmaceuticals). SUPPORTING INFORMATION AVAILABLE Additional information as noted in text. This material is available free of charge via the Internet at http://pubs.acs.org. Abbreviations: ESI, electrospray ionization; MS, mass spectrometry; CID, collision-induced dissociation; GAG, glycosaminoglycan; CE, capillary electrophoresis; HS, heparan sulfate; HexA, hexuronic acid; IdoA, L-iduronic acid; GlcA, D-glucuronic acid; GlcN, glucosamine; ∆UA, a 4,5 unsaturated uronic acid; 2S and 6S, 2-O-sulfation and 6-O-sulfation, respectively; NS and NAc, N-sulfation and N-acetylation of the glucosamine, respectively. Heparin disaccharides: (IVA) ∆UA-GlcNAc; (IIIA) ∆UA2SGlcNAc; (IIA) ∆UA-GlcNAc6S; (IVS) ∆UA-GlcNS; (IA) ∆UA2SGlcNAc6S; (IIIS) ∆UA2S-GlcNS; (IIS) ∆UA-GlcNS6S; (IS) ∆UA2SGlcNS6S; (IH) ∆UA2S-GlcN6S; (IIH) ∆UA-GlcN6S; (IIIH) ∆UA2SGlcN; (IVH) ∆UA-GlcN. Received for review May 6, 2005. Accepted July 6, 2005. AC050793D

Analytical Chemistry, Vol. 77, No. 18, September 15, 2005

5911