Anal. Chem. 1999, 71, 4764-4771
An Automated Interpretation of MALDI/TOF Postsource Decay Spectra of Oligosaccharides. 1. Automated Peak Assignment Yasuko Mizuno* and Tatsuru Sasagawa
Toray Research Center, Inc., Tebiro, Kamakura, Kanagawa, 248-0036 Japan Naoshi Dohmae and Koji Takio
The Institute of Physical and Chemical Research (RIKEN), Wako, Saitama, 351-0198 Japan
A computer program has been developed that helps the interpretation of MALDI/TOF postsource decay (PSD) spectra of N-linked oligosaccharides of a protein. The program includes routines for automated peak assignment and generation of a simulated PSD spectrum. From a raw spectrum, peaks are assigned automatically; i.e., numbers of saccharide residues removed from the parent ion are calculated. If the structure of the oligosaccharide is known, a simulated PSD spectrum of the oligosaccharide will be generated. The simulated PSD spectrum helps interpretation of the observed spectrum. While, in a case where several candidate structures are given, one can narrow the field of plausible structures for the unknown oligosaccharide by comparing the observed spectrum with the simulated PSD spectra. Using a Pentium 233-MHz microprocessor, it takes only a few seconds to interpret a spectrum. For the characterization of oligosaccharides, MALDI/TOFMS has become a popular technique because of its high sensitivity and speed of analysis.1 Some problems, however, have been pointed out in the characterization of N-linked oligosaccharides by mass spectrometry using postsource decay (PSD). Although PSD data can be obtained rapidly, assignment of a peak is slow and tedious. In addition, the procedure for elucidating the structure from the observed spectrum requires time-consuming trial and error: postulating several candidate structures, predicting fragment ions, and then, checking the similarity between the observed and expected spectra. The solution to these problems is to develop and implement not only algorithms to extract partial structural information but also tools to simulate fragmentation patterns for oligosaccharides. To assign peaks, several approaches are possible. One of them is to search for peaks whose masses are smaller than that of the parent peak by the residue mass of a sugar unit. This approach is applicable to any types of oligosaccharide as long as the mass of constituent saccharide is registered in the program, and this method requires no a priori structural information. It is even more (1) Cancilla, M. T.; Penn, S. G.; Lebrilla, C. B. Anal. Chem. 1998, 70, 663672. 1998
4764 Analytical Chemistry, Vol. 71, No. 20, October 15, 1999
convenient if we can generate simulated PSD spectra for possible candidate structures to compare with the PSD spectrum of the unknown. This approach serves as another method of assignment. The aim of this study is to develop and implement the abovementioned algorithm and tools to provide unambiguous data interpretation in real time so as to fully complement the timeconsuming task inherent to mass spectrometric analysis. METHODS Programs have all been written in the C++ programming language using Microsoft Foundation Class Library 4.0. A peak detection routine was written specifically for this study to generate mass for peaks from the raw data. Our peak detection scheme seeks local maximums as defined by the local peak heights and slopes. Depending on the quality of the raw data, smoothing according to Savitzky and Golay2 was performed. MALDI/TOF-PSD spectra were obtained on a Bruker Reflex time-of-flight mass spectrometer equipped with a Scout multiprobe inlet and a gridless delayed extraction ion source (Bruker-Franzen, Bremen, Germany) using the acquisition program FAST. Ion acceleration voltage was 27.5 kV, and the reflector (ion mirror) voltage was decreased from 30.0 to 0.95 kV in 14 steps. For delayed ion extraction, an 8-kV potential difference between the probe and the extraction lens was applied with a time delay of 150 ns after each laser pulse using a high-voltage switch controlled by a time delay generator. As a parent ion, (M + H)+ was isolated with a pulsed ion gate. The pressure in the TOF analyzer was 5 mPa. Mass spectra were acquired as the sum of ion signals generated by irradiation of the target with 100-250 laser pulses (337-nm N2 laser). Mass spectra were calibrated using a matrixrelated ion signal (309.06) and adrenocorticotropic hormone 1839 (FW, 2465.20). The matrix used was 2,5-dihydroxybenzoic acid. Positive-ion mode was employed. Monoisotopic oligosaccharide ions were assigned. In this study, the following seven oligosaccharides labeled with 2-aminopyridine were purchased from either Takara Co. Ltd. (Osaka, Japan) or Nakano Vinegar Co. Ltd. (Nagoya, Japan). (2) Savitzky, A.; Gollay, M. J. E. Anal. Chem. 1964, 36, 1627-1639. 10.1021/ac981108o CCC: $18.00
© 1999 American Chemical Society Published on Web 09/16/1999
(a)
(b)
Figure 1. Types of oligosaccharide fragmentation (a) and genesis of B and Y ions (b) in the positive ion mode based on Domon and Costello. Table 1. Constituent Monosaccharides and Assigned Codes for Complex Type Oligosaccharides mass (m/z)
residue mass (m/z)
Fuc Man Gal
164.0684 180.0633 180.0633
146.0579 162.0528 162.0528
GlcNAc
221.0897
203.0793
GalNAc NeuAc
221.0897 309.1059
203.0793 291.0954
NeuGc
325.1006
307.0901
reference
ALGORITHM Assignment of Detected Peaks. The principle of this assignment protocol is as follows. As the first step, a peak, whose mass is smaller than the parent’s mass by the mass of a reference saccharide residue, Mx, is searched for in the spectrum. If such a peak is found, i.e., mathematically condition 1 is satisfied, it is assigned as “-Mx”, and we call it a “daughter” in relation to the parent. This daughter belongs to the Y-type fragment as shown in Figure 1a.
|MH+ - Mt(j) - Mx| < e
(1)
where MH+, Mt(j) and Mx are respectively the mass of the parent ion, the mass of a target peak j, and the residue mass of a reference saccharide, which is listed in Table 1. The e is a userdefined threshold value for the error. The reference table includes both mono- and disaccharides. First, the monosaccharide unit is searched for. If no peak whose mass is smaller than the parent’s mass by the residue mass of any reference monosaccharide is found, then peaks whose masses are smaller than the parent’s mass by the residue masses of reference disaccharides are searched for. When no such peak is found, peak assignment is not successful and the execution of the routine will be terminated. In the next step, the assigned daughter peaks become “seeds”.
codes .01, 0.001, 0.0001, 0.00001, 0.000001 3, 4, 5, 20, 200, 2000, 20000, 200000, 40, 400, 4000, 40000, 60, 600, 6000, 60000 .1, 1, 2, 10, 100, 1000, 10000,, 100000 30, 300, 3000,30000 50, 500, 5000, 50000 70, 700, 7000, 70000, 90, 900, 9000, 90000 80, 800, 8000, 80000
Based on a “seed” the same searching cycles are repeated until no peak is found. Thus, several “daughter-parent” relationships will be established. Finally, the masses of the assigned peaks are expressed by the mass of the parent and residue masses of already assigned constituent references by
Mt(j) ) MH+ - ∑Mx
(2)
In addition to monosaccharide references listed in Table 1, userdefined references are allowed for the analysis of special oligosaccharides. To avoid false assignment we have also introduced a function to restrict the use of particular monosaccharide reference. Construction of an N-Linked Oligosaccharide Structural Editor. Molds for hypothetical N-linked oligosaccharides were constructed for the purpose of editing the structure of several types of oligosaccharides, including complex, hybrid, and highmannose types. Figure 2 shows a mold for a hypothetical complextype pentaantennary oligosaccharide which consists of a skeleton and six side chains. The skeleton consists of a trimannosyl core (the code is 9; constituent monosaccharides are assigned codes 1-5) and five antennas. Five groups of numbers are assigned to monosaccharide residues of five antennas, i.e., 10-90, 100-900, Analytical Chemistry, Vol. 71, No. 20, October 15, 1999
4765
Figure 2. A mold for generating hypothetical N-linked oligosaccharide. Numbers indicate the codes assigned for the monosaccharide residues. The code for the trimannnosyl core is 9. Lower half shows a mold for a branch with the codes for the branch 10-90 shown in the upper half.
Figure 3. PSD spectra of pyridylaminated oligosaccharide A (code 10100.0). Observed (a) and predicted (b) spectra.
1000-9000, 10 000-90 000, and 100 000-200 000 (Table 1), with increments of 10, 100, 1000, 10 000, and 100 000, respectively. An antenna is represented by the largest code number in the group, which is assigned to the monosaccharide residue at the nonreducing terminus. Therefore, once the structure of an oligosac4766
Analytical Chemistry, Vol. 71, No. 20, October 15, 1999
charide is defined, a six-digit number is assigned to the skeleton structure. Since, these numbers are expressed in a unique digit, summation of the code numbers of the five antennas gives a unique code for the skeleton. The presence of a side chain is described by six subdecimal numbers: a Fuc at the reducing end
Figure 4. PSD spectra of pyridylaminated oligosaccharide B (code 11100.0). Observed (a) and predicted (b) spectra.
by 0.01, at the bisecting GlcNAc by 0.1, and a Fuc at an antenna 10-90, 100-900, 1000-9000, or 10 000-90 000 by 0.001, 0.0001, 0.000 01, or 0.000 001, respectively. The sum of the six subdecimal numbers gives a code for the modifications on the skeleton. The combination of an integer for the skeleton and a subdecimal number for side chains gives a unique identification code for the whole oligosaccharide. This code system is also used for the identification of Y-type fragments explained in the following section. Calculation shows that the mold can generate 992 352 species of oligosaccharides. Molds for other types of oligosaccharides were constructed in a similar manner. An oligosaccharide structure can be edited by adding check marks on the dialogue boxes of the oligosaccharide mold, which is displayed on the monitor of the computer. The structural information thus entered by the user is transformed to the code as described for further calculations and drawing. Generation of Simulated PSD Spectrum. In the positive ion mode, fragments result from the protonation of a glycosidic bond. A protonated glycosidic bond is subsequently broken to yield a B oxonium ion and a smaller glycoconjugate.3 Alternatively, cleavage of the glycosidic bond could be accompanied by a proton transfer which would yield the complementary Y ion (Figure 1b). More complex pathways involving cleavages of carbon-carbon bond(s) of the sugar ring are excluded from the present study. On the basis of the above assumptions, five types of simulated fragment structures are generated as follows: 1. Eliminate at least one monosaccharide residue from any nonreducing terminus of the constituent branches of the parent. At this step, many combinations of the leaving groups are possible. 2. From these daughter fragments, new descendants are generated by the same procedure. And this procedure is repeated (3) Domon, B.; Costello, C. E. Glycoconjugate J. 1988, 5, 397-409.
until no nonreducing terminal residue is found. At this stage, Y-type fragments, which retain the reducing terminus, are generated. Fragments of this type are shown as the tallest barsin the lower panels of Figure 3-Figure 9. 3. A series of ions, having the complementary mass of Y series, are generated. Fragments of this set (B series ions) are shown as shorter bars. 4. Losing reducing terminal GlcNAc(Fuc)-PA or GlcNAcGlcNAc(Fuc)-PA from all the Y-type fragments, internal fragments that lack both reducing and nonreducing termini are generated. 5. Cleaving mono- or oligosaccharide from both ends of all the B-type fragments, internal fragments that complement fragments generated at step 4 are generated. We call these fragments generated at steps 4 and 5 I-type fragments. Fragments of this type are shown as the shortest bars. 6. By subtracting 17 mass units from Y-type fragments, Z-type fragments are generated. A series of fragments (C type), having the complementary mass of the Z series, are also generated. Using the mathematical coding shown above, the implementation of this approach in a computer program is simplified. The step for removing one monosaccharide residue at a time corresponds to reduction of the assigned number of each antenna independently and, at the same time, reduction of the six-decimal number. Although redundant fragments appear, due to the multibranched structure, all the predicted peaks of the same type are indicated with identical heights. The program is designed to show the structure of the fragment by clicking the button of mouse with a cursor on the position of a peak. RESULTS The MALDI/TOF-PSD mass spectra of the seven different pyridylaminated oligosaccharides after automated peak assignment are shown together with simulated PSD spectra in Figures Analytical Chemistry, Vol. 71, No. 20, October 15, 1999
4767
Figure 5. PSD spectra of pyridylaminated oligosaccharide C (code 10100.1). Observed (a) and predicted (b) spectra.
Figure 6. PSD spectra of pyridylaminated oligosaccharide D (code 20200.0). Observed (a) and predicted (b) spectra.
3-9. Assigned ions are shown by horizontal lines with compositions of eliminated monosaccharide(s). These assignments serve as a highly sensitive compositional analysis for the unknown oligosaccharide. Since this is an empirical program for mass spectrometric data, isomeric saccharides, such as GlcNAc and GalNAc, are not distinguished. 4768
Analytical Chemistry, Vol. 71, No. 20, October 15, 1999
Oligosaccharides B and C are a triantennary and a bisecting GlcNAc-containing biantennary oligosaccharide, respectively, with an identical molecular mass of 1597.7 (pyridylaminated). They gave similar but different spectra (Figures 4 and 5). The difference is the presence of a fragment ion at m/z 868.6 in the spectrum of oligosaccharide C which is characteristic of the bisecting GlcNAc-
Figure 7. PSD spectra of pyridylaminated oligosaccharide E (code 20200.01). Observed (a) and predicted (b) spectra.
Figure 8. PSD spectra of pyridylaminated oligosaccharide F (code 70700.0). Observed (a) and predicted (b) spectra.
containing oligosaccharides. This suggests that MALDI/TOF MS using PSD is useful for the characterization of such isomers. Oligosaccharide E (Figure 7) is similar to oligosaccharide D (Figure 6), but has an additional Fuc at the reducing end. We can easily recognize, from the assigned spectrum, several pairs of fragments, i.e., fragments with a Fuc loss and a Hex loss. The
presence of a fragment ion at m/z 447.1 in the spectrum of oligosaccharide E (Figure 7) suggests that the Fuc residue is attached to the reducing terminus. A similar example is also seen for oligosaccharide G (Figure 9). Successful assignment depends on the proper selection of user-specified threshold value e as defined by condition 1. Usually an e value of 0.9 gives a good Analytical Chemistry, Vol. 71, No. 20, October 15, 1999
4769
Figure 9. PSD spectra of pyridylaminated oligosaccharide G (code 22220.01). Observed (a) and predicted (b) spectra.
assignment. However, the smoothing of the spectra causes considerable shift in observed mass values. Since, the residue mass of NeuAc (291.0954) is very close to that of two Fuc (146.0579 × 2 ) 292.1158), one might get a false assignment. In the case of oligosaccharide E (Figure 7), the assignment was carried out by restricting the use of NeuAc as a reference. Otherwise a false assignment was obtained. Even with the false assignment (not shown), the presence of a fragment ion at m/z 447.1 and several pairs of fragments, i.e., fragments with and without a Fuc, suggest the presence of Fuc and the absence of NeuAc. An example with the presence of NeuAc and the absence of Fuc is shown by the case of oligosaccharide F (Figure 8). Most major peaks observed were Y-type fragments as shown in Figures 3-9. Peaks not assigned in the upper panels include those of B- and I-type ions and those that lost water from a fragment ion. Those peaks were easily recognized by comparing the observed MALDI/TOF-PSD spectra with corresponding simulated PSD spectra (indicated by shorter and the shortest bars in the lower panels). The peaks corresponding to both Z- and C-type fragments were not observed as major peaks (not shown in the simulated PSD spectra). Some fragments give small signals or are not observed in experimental spectra, although they are expected in the simulated PSD spectra. As a typical example, let us look at the simplest case of oligosaccharide A having a molecular mass of 1394.3 (Figure 3). By losing one GlcNAc from either of the nonreducing termini of the parent, the fragment ion 1193.3 is observed as a major peak. Further, by losing an adjacent Man in the same branch from the fragment ion, fragment ion 1030.2 is observed as another major peak. By taking another decomposition pathway, i.e., losing two molecules of GlcNAc present in different branches, a fragment ion at m/z 989.5 is observed as a small peak. Similar phenomena 4770
Analytical Chemistry, Vol. 71, No. 20, October 15, 1999
(i.e., fragments resulting from a single bond cleavage generally give major peaks rather than those requiring multiple bond cleavages) are observed for oligosaccharides B (Figure 4), D (Figure 6), and F (Figure 8). Oligosaccharide G (Figure 9) is the most complicated sample in this study. It took us only a few seconds for above-mentioned operations including reading the raw data file, assigning peaks, spectral simulation, and final drawing, while manual operation would require more than 1 h. DISCUSSION The method described here presents a rapid way for interpreting MALDI/TOF-PSD mass spectra of oligosaccharides. We have shown two methods to assign peaks. The first approach requires no a priori structural information. This approach gives information for monosaccharide composition, but not the arrangement. When there are several candidate structures for an unknown, it is convenient if we could generate simulated PSD spectra for the candidates and identify the unknown by comparing the observed spectrum with the simulated spectra. For this purpose, MALDI/TOF-PSD mass spectra of several oligosaccharides were obtained and analyzed by the tools developed in this study. Until now, there is no extensive knowledge of mass spectral fragmentation rule in MALDI/TOF MS. Thus, we assumed that FAB fragmentation Y, B, C, Z, and I types also occur in MALDI/TOF. By using the present program, we found some rules for the PSD mass spectrometry of oligosaccharides: (1) The fragments cleaved at glycoside linkages retaining the tag (Y type) are usually observed as major. (2) Other peaks include B- and I-type ions and their dehydrated ions. (3) Ions produced by a single-bond cleavage are more abundant than fragment ions resulting from multiple-bond cleavages. Thus, it looks as if the fragmentation
initiated in a branch proceeds to the end of the same branch. We are currently developing an oligosaccharide structural database. The combination of the database and a spectrum simulation routine adopting the above-mentioned fragmentation rules enables a detailed comparison of the observed PSD spectrum with simulated PSD spectra of the most likely saccharide chains and concomitant automated presumption of a structure. This will be the subject of the next paper.4 Abbreviations used: Fuc, fucose; Man, mannose; Gal, galactose; Hex, hexose; GlcNAc, N-acetylglucosamine; GalNAc, Nacetylgalactosamine; GN, N-acetylhexosamine; NeuAc, N-acetyl(4) Sasagawa, T.; Mizuno, Y., manuscript in preparation.
neuraminic acid; NeuGc, N-glycolylneuraminic acid; PA, aminopyridine; MALDI/TOF-MS, matrix-assisted laser desorption ionization time-of-flight mass spectrometry; PSD, postsource decay. ACKNOWLEDGMENT We thank Dr. Yoko Ohashi for valuable discussions. This work was supported in part by a grant (to K.T.) from the “Biodesign Research Program” of The Institute of Physical and Chemical Research (RIKEN). Received for review October 6, 1998. Accepted June 4, 1999. AC981108O
Analytical Chemistry, Vol. 71, No. 20, October 15, 1999
4771