A Strategy for Identification of Oligosaccharide Structures Using

Jun 21, 2005 - Honcho 1-Chome, Nakano-ku, Tokyo164-8721, Japan, Shimadzu Corporation, 1, Nishinokyo-Kuwabaracho,. Nakagyo-ku, Kyoto 604-8511, ...
0 downloads 0 Views 255KB Size
Anal. Chem. 2005, 77, 4719-4725

A Strategy for Identification of Oligosaccharide Structures Using Observational Multistage Mass Spectral Library Akihiko Kameyama,† Norihiro Kikuchi,‡ Shuuichi Nakaya,†,§ Hiromi Ito,† Takashi Sato,† Toshihide Shikanai,†,‡ Yoriko Takahashi,‡ Katsutoshi Takahashi,| and Hisashi Narimatsu*,†

Research Center for Glycoscience, National Institute of Advanced Industrial Science and Technology (AIST), Open Space Laboratory C-2, 1-1-1 Umezono, Tsukuba, Ibaraki 305-8568, Japan, Mitsui Knowledge Industry Co., Ltd., Honcho 1-Chome, Nakano-ku, Tokyo164-8721, Japan, Shimadzu Corporation, 1, Nishinokyo-Kuwabaracho, Nakagyo-ku, Kyoto 604-8511, Japan, and Computational Biology Research Center (CBRC), AIST, 2-41-6 Aomi, Koto-ku, Tokyo 135-0064, Japan

Glycosylation is the most widespread posttranslational modification in eukaryotes; however, the role of oligosaccharides attached to proteins has been little studied because of the lack of a sensitive and easy analytical method for oligosaccharide structures. Recently, tandem mass spectrometric techniques have been revealing that oligosaccharides might have characteristic signal intensity profiles. We describe here a strategy for the rapid and accurate identification of the oligosaccharide structures on glycoproteins using only mass spectrometry. It is based on a comparison of the signal intensity profiles of multistage tandem mass (MSn) spectra between the analyte and a library of observational mass spectra acquired from structurally defined oligosaccharides prepared using glycosyltransferases. To smartly identify the oligosaccharides released from biological materials, a computer suggests which ion among the fragment ions in the MS/MS spectrum should yield the most informative MS3 spectrum to distinguish similar oligosaccharides. Using this strategy, we were able to identify the structure of N-linked oligosaccharides in immunoglobulin G as an example. Proteomics has rapidly become a useful tool for the comprehensive analysis of protein expression profiles in a very small amount of biological sample. Posttranslational modifications, such as phosphorylation and glycosylation, play a wide range of important roles in the functions of proteins. To examine the functions of proteins, a high-throughput structural analysis that includes translational modifications is desired. In fact, phoshoproteomics, which are technologies for the comprehensive and quantitative analysis of phosphorylated proteins, have been actively studied, and similar techniques for proteins having N-linked oligosaccharides have been recently reported.1,2 However, in glycosylation, unlike phosphorylation, which is controlled simply * To whom correspondence should be addressed. (phone) +81-29-861-3200; (fax) +81-29-861-3201; (e-mail) [email protected]. † National Institute of Advanced Industrial Science and Technology (AIST). ‡ Mitsui Knowledge Industry Co., Ltd. § Shimadzu Corp. | Computational Biology Research Center (CBRC). 10.1021/ac048350h CCC: $30.25 Published on Web 06/21/2005

© 2005 American Chemical Society

by on/off mechanisms, proteins are modified with a variety of structures of oligosaccharides, through very complicated processes in which many enzymes are involved. Since glycoprotein oligosaccharides play vital roles in biological processes such as stability, protein conformation, intra- and intercell signaling, and binding to and specificity for other biomolecules, analyses of the structure of oligosaccharides are essential for understanding at the molecular level the functions of glycoproteins.3 The structural analysis of oligosaccharides has been performed by combining NMR, methylation analysis by gas chromatography/ mass spectrometry, and the fragmentation of permethylated oligsaccharides using tandem mass spectrometry (MS/MS). Although these methods provide highly accurate results, they are not comparable to protein identification by proteomics in regard to sensitivity, simplicity, and rapidity because of the requirement of a large amount of sample and special techniques and knowledge.4 Empirical identification methods using exoglycosidase sequencing combined with a database of elution positions on two types of HPLC column have been used recently.5,6 But these strategies consume much time because they require several steps of HPLC and exoglycosidase treatments. Mass spectrometry can be considered a suitable tool for the analysis of oligosaccharides at high sensitivity and high throughput. Although an inability to identify the branching position, anomer, and diastereomer (Gal, Man, etc.) is intrinsic to MS, several examples in MS/MS or multistage tandem mass spectrometry (MSn), where different fragment ions or different intensities of the same fragment ions were observed for oligosaccharides with the same sequences, depending on the glycosidic (1) Zhang, H.; Li, X. J.; Martin, D. B.; Aebersold, R. Nat. Biotechnol. 2003, 21, 660-666. (2) Kaji, H.; Saito, H.; Yamauchi, Y.; Shinkawa, T.; Taoka, M.; Hirabayashi, J.; Kasai, K.; Takahashi, N.; Isobe, T. Nat. Biotechnol. 2003, 21, 627-629. (3) Dove, A. Nat. Biotechnol. 2001, 19, 913-917. (4) Dell, A.; Reason, A. J.; Khoo, K.-H. In Methods in Enzymology: Guide to Techniques in Glycobiology; Lennarz, W. J., Hart, G. W., Eds.; Acadmic Press: San Diego, 1994; Vol. 230, pp 108-132. (5) Royle, L.; Mattu, T. S.; Hart, E.; Langridge, J. I.; Merry, A. H.; Murphy, N.; Harvey, D. J.; Dwek, R. A.; Rudd, P. M. Anal. Biochem. 2002, 304, 70-90. (6) Rudd, P. M.; Dwek, R. A. Nature 1997, 388, 205-207.

Analytical Chemistry, Vol. 77, No. 15, August 1, 2005 4719

linkage and branching structures, have been described.7-9 These results raise the possibility of determining the glycan structures from MSn spectra alone. On the basis of these observations, a few attempts have been made recently to identify oligosaccharide structures only using MS with an aid, such as the rules of fragmentation10 and a fingerprint catalog of partial structures.11 But these can be applied only to a limited group of oligosaccharides such as human milk oligosaccharides,10 which have particular structural features, and abundant glycans from a large amount of glycoprotein, which can be obtained easily, such as egg jelly glycoproteins of Xenopus leavis,11 respectively. These technologies cannot be practically used to determine the glycan structures from human tissues and serum. Another approach to the structural assignment of oligosaccharides is the prediction of the structure with an algorithm for the interpretation of MS/MS spectra.12-14 These methods afford the most probable structure by taking the rules of biosynthesis into account, but the results are often ambiguous due to the impossibility of making an accurate elucidation of the regio- and stereoisomer of oligosaccharides, which should yield the same fragment profiles on calculation. At the present time, no practical method of analyzing glycan structures with a range applicable to human glycoproteomics currently exists. Here we developed a strategy for the rapid and accurate identification, including the regio- and stereochemistry of the oligosaccharides of glycoproteins using MS alone. In the structural analysis of oligosaccharides using MS, permethylated oligosaccharides are the most important derivatives because of their higher sensitivity on MS and the abundant structural information afforded from their ring cleaved fragment ions.15 However, we did not choose permethylated oligosaccharides as materials for two reasons. First, the permethylation requires a microgram quantity of oligosaccharides,16 which is not compatible with the glycoproteomics we intend to develop. Second, permethylation is incompatible with an analysis of oligosaccharides containing partially acetylated or partially methylated neuraminic acids, which are commonly present in the body.17 We have cloned and characterized many human glycogenes in the past several years.18 Unlike in previous trials by others, our strategy utilizes a large variety of structurally defined oligosaccharides that can be obtained not only from natural sources but also by an effortless and rapid enzymatic preparation using a library of glycosyltransferases directed by the glycogenes that have been accumulated in our laboratory. The system is fully assisted by computer and (7) Pfenninger, A.; Karas, M.; Finke, B.; Stahl, B. J. Am. Soc. Mass Spectrom. 2002, 13, 1341-1348. (8) Zaia, J.; Li, X. Q.; Chan, S. Y.; Costello, C. E. J. Am. Soc. Mass Spectrom. 2003, 14, 1270-1281. (9) Yamagaki, T.; Nakanishi, H. Proteomics 2001, 1, 329-339. (10) Pfenninger, A.; Karas, M.; Finke, B.; Stahl, B. J. Am. Soc. Mass Spectrom. 2002, 13, 1331-1340. (11) Tseng, K.; Hedrick, J. L.; Lebrilla, C. B. Anal. Chem. 1999, 71, 3747-3754. (12) Ethier, M.; Saba, J. A.; Ens, W.; Standing, K. G.; Perreault, H. Rapid Commun. Mass Spectrom. 2002, 16, 1743-1754. (13) Mizuno, Y.; Sasagawa, T.; Dohmae, N.; Takio, K. Anal. Chem. 1999, 71, 4764-4771. (14) Joshi, H. J.; Harrison, M. J.; Schulz, B. L.; Cooper, C. A.; Packer, N. H.; Karlsson, N. G. Proteomics 2004, 4, 1650-1664. (15) Dell, A.; Morris, H. R. Science 2001, 291, 2351-2356. (16) Zaia, J. Mass Spectrom. Rev. 2004, 23, 161-227. (17) Zanetta, J. P.; Pons, A.; Iwersen, M.; Mariller, C.; Leroy, Y.; Timmerman, P.; Schauer, R. Glycobiology 2001, 11, 663-676. (18) Narimatsu, H. Glycoconjate J. 2004, 21, 17-24.

4720

Analytical Chemistry, Vol. 77, No. 15, August 1, 2005

is based on a comparison of signal intensity profiles between the analyte and an observational spectral library that is built up by acquiring MSn spectra of these oligosaccharides. Furthermore, to achieve the identification of an oligosaccharide with as few acquisitions of MS3 as possible, an “intelligent acquisition of MS3 spectra” method was devised. EXPERIMENTAL SECTION Materials and Reagents. Oligosaccharides and pyridylaminolabeled oligosaccharides were purchased from Seikagaku Corp., Glyko, Dextra Laboratories, and Sigma. N-Linked oligosaccharides, from human IgG purchased from Sigma, were prepared as described.19 All glycosyltransferases, constructed and expressed as described in our previous papers,20-23 were used in a form bound with anti-FLAG M1 antibody resin (Sigma). Sugar nucleotides were purchased from Sigma. PNGaseF (EC 3.5.1.52) and Arthrobacter ureafaciens neuraminidase (EC 3.2.1.18) were purchased from Prozyme (San Leandro, CA) and Marukin Bio (Kyoto, Japan), respectively. Enzymatic Preparation of Oligosaccharide Library. The N-linked oligosaccharides having GalNAcβ1-4GlcNAc (Lac-diNAc) were prepared as described before.23 MUC1a tandem repeat peptide24 (carboxyfluorescein-labeled AHGVT*SAPDTR) having Galβ1-4GlcNAcβ1-6(Galβ1-3)GalNAc on T* was elongated through sequential glycosylation by the glycosyltransferases as follows. For the reaction in ST3GalIV, 50 mM HEPES buffer (pH 7.0) containing 0.1% Triton X-100, 500 µM CMP-Neu5Ac, 10 mM MnCl2, 25 µM of starting glycopeptide, and the enzyme was used. After incubation at 37 °C for 20 h, the enzyme was inactivated at 100 °C for 5 min. Subsequently, a final concentration of 250 µM UDP-GlcNAc and β3GnT2 was added to the solution. After incubation at 37 °C for 2 h, the reaction was terminated at 100 °C for 5 min, and then a final concentration of 25 µM GDP-Fuc and FUT6 was added to the reaction mixture. After incubation at 25 °C for 30 min, the reaction was terminated at 100 °C for 5 min. Finally, a final concentration of 25 µM UDP-Gal and β4GalT1 was added to the reaction mixture. After incubation at 25 °C for 2 h, the reaction was terminated at 100 °C for 5 min. The resulting mixture was filtrated with an Ultrafree-MC column (Millipore), and then the glycopeptides were roughly purified with ZIPTIP C18 (Millipore). Acquiring of MSn Spectra. Mass measurements were carried out using a matrix-assisted laser-desorption/ionization (MALDI) quadrupole ion trap time-of-flight mass spectrometer (AXIMA-QIT; Shimadzu) as reported.25 For sample preparation, 0.5 µL of a ∼2 (19) Takahashi, N.; Ishii, I.; Ishihara, H.; Mori, M.; Tejima, S.; Jefferis, R.; Endo, S.; Arata, Y. Biochemistry 1987, 26, 1137-1144. (20) Narimatsu, H.; Sinha, S.; Brew, K.; Okayama, H.; Qasba, P. K. Proc. Natl. Acad. Sci. U. S. A. 1986, 83, 4720-4724. (21) Shiraishi, N.; Natsume, A.; Togayachi, A.; Endo, T.; Akashima, T.; Yamada, Y.; Imai, N.; Nakagawa, S.; Koizumi, S.; Sekine, S.; Narimatsu, H.; Sasaki, K. J. Biol. Chem. 2001, 276, 3498-3507. (22) Iwai, T.; Inaba, N.; Naundorf, A.; Zhang, Y.; Gotoh, M.; Iwasaki, H.; Kudo, T.; Togayachi, A.; Ishizuka, Y.; Nakanishi, H.; Narimatsu, H. J. Biol. Chem. 2002, 277, 12802-12809. (23) Sato, T.; Gotoh, M.; Kiyohara, K.; Kameyama, A.; Kubota, T.; Kikuchi, N.; Ishizuka, Y.; Iwasaki, H.; Togayachi, A.; Kudo, T.; Ohkura, T.; Nakanishi, H.; Narimatsu, H. J. Biol. Chem. 2003, 278, 47534-47544. (24) Gallego, R. G.; Dudziak, G.; Kragl, U.; Wandrey, C.; Kammerling, J. P.; Vliegenthart, J. F. G. Biochimie 2003, 85, 275-286. (25) Koy, C.; Mikkat, S.; Raptakis, E.; Sutton, C.; Resch, M.; Tanaka, K.; Glocker, M. O. Proteomics 2003, 3, 851-858.

µM analyte solution was deposited on the target plate and allowed to dry. Then, 0.5 µL of 2,5-dihydroxybenzoic acid (BrukerDaltonik) solution (10 mg/mL in 20% ethanol) was used to cover the matrix on the target plate and allowed to dry. All collisioninduced dissociation (CID) spectra were obtained from Na adduct ions, and collisional energy was adjusted to reduce the intensity of the parent ion to less than 15% of the area of a basepeak. MSn Spectral Library of Oligosaccharides. MS2 spectra of each structurally defined oligosaccharide and MS3 spectra of all the major fragment ions in the MS2 spectra were acquired three times. The signal intensity profiles of all triplicate MS2 and MS3 spectra, and sample information such as the structure of the oligosaccharide, labeling reagent, and experimental conditions (m/z value of parent ion, acquisition mode, matrix, etc.), were stored in a relational database. Oligosaccharide structures were described in the library with an extensible markup language format we developed.26 Method of Scoring the Difference between the MSn Spectra. Since the number of peaks normally differs between the spectra, we used a two-way evaluation for the scoring method as follows. (i) To simplify the spectrum, peaks in each MS2 spectrum of the analyte were merged with a given m/z tolerance margin (typically 0.7 Da, this value was derived from peak matching tests using several spectra) to afford peaks (P1, P2, ..., Pk) of the number k. The intensity of the peak with the largest intensity among merged peaks represents the merged peak intensity xi. (ii) The peak list of an analyte that consists of m/z values of Pi and their intensities xi was converted into the vector X defined as follows:

X B ) (x1,x2,...,xk)

(1)

(iii) The peak list of each oligosaccharide in the MS2 library was converted into the vector Y defined as follows:

B Y ) (y1,y2,...,yk)

(2)

where yi (i ) 1, 2, ..., k) was the maximum intensity of the peaks of the library spectrum within tolerance of the merged peaks (P1, P2, ..., Pk) of the analyte spectrum. (iv) The difference score S1 was given by the square of the Euclidean distance: k

S1 )

∑ (x

m

- ym)2

(3)

m)1

(v) Interchanging analyte and the library spectrum, the difference score S2 was also calculated using the vector X derived from oligosaccharides in the spectral library and vector Y derived from the analyte. We defined (S1 + S2) as the difference between the analyte and the oligosaccharides in the spectral library. The difference scores (S1 + S2) of MS3 spectra between the analyte and candidates were calculated in the manner described above. Choice of the Next Parent Ion for MS3 Acquisition (Intelligent Selection). For all the major fragment ions in the MS2 spectra of candidate oligosaccharides, the difference scores (26) Kikuchi, N.; Kameyama, A.; Nakaya, S.; Ito, H.; Sato, T.; Shikanai, T.; Takahashi, Y.; Narimatsu, H. Bioinformatics 2005, 21, 1717-1718.

(S1 + S2) between the MS3 spectra of the fragment ions with the same m/z value were calculated in the manner described above. The fragment ion with the largest score was chosen as the “next parent ion” for MS3 acquisition of the analyte. In the case of more than three candidate oligosaccharides, we used the smallest score among the difference scores (S1 + S2) between MS3 spectra of the fragment ions with the same m/z value, as the representative score of the corresponding fragment ion for selection of the next parent ion. RESULTS AND DISCUSSION Outlines of the Strategy. The strategy consists of a rapid and efficient protocol for the identification of target oligosaccharides using a library of the signal intensity profiles of MS2 spectra and MS3 spectra of all the major fragments in the MS2 spectra for each structurally defined oligosaccharide (Figure 1). The structurally defined oligosaccharides include commercially available oligosaccharides as well as enzymatically synthesized oligosaccharides and glycopeptides produced with glycosyltransferases that have a strict specificity for substrates and are responsible for specific structures. The signal intensity profiles of MSn spectra are data sets of the m/z values and the relative intensities of each fragment ion in MSn spectra acquired with a MALDI quadrupole ion trap time-of-flight mass spectrometer. Avoiding complications attributable to the rearrangement of fucose in an ion trap,27,28 we stored only the data from Na adduct ions in the MSn spectral library. In MALDI MS of oligosaccharides, [M + Na]+ ions are usually observed as a major peak; therefore, MALDI is a suitable ionization for this strategy. To identify the oligosaccharides released from biological materials, (i) a MS2 spectrum of the sample is acquired, and its signal intensity profile is sent to the search server. (ii) The computer chooses several oligosaccharides in the library as candidates based on similarity with the signal intensity profile of the sample. (iii) The computer selects the “next parent ion” among the fragment ions in MS2 spectra of candidates by the “intelligent selection” described in the Experimental Section. In general, since structurally similar oligosaccharides afford similar MS3 spectra, it is important to choose the most informative parent ion that affords a different fragment pattern in the MS3 spectrum to rapidly identify the structure. (iv) The MS3 spectrum of a sample, based only on the fragment ion suggested as the “next parent ion”, is acquired. (v) The computer outputs the answer based on the similarities between the acquired MS3 spectrum and the corresponding data of the candidates in the library (Figure 1b). With these procedures, named the “intelligent acquisition of MS3 spectra”, the structure of oligosaccharides can be easily identified using a small amount of sample (∼1 pmol) and a minimal number of acquisitions of MSn spectra without any special knowledge of oligosaccharide analysis. Signal Intensity Profile as a “Fingerprint” of the Oligosaccharide Structure. To clarify the feasibility of using signal intensity profiles of CID spectra as characteristic data for oligosaccharide structures, the distribution of difference scores (S1 + S2) ,which were calculated between 17 017 pairs of signal intensity profiles of MS2 spectra in the library, is summarized in (27) Harvey, D. J.; Mattu, T. S.; Wormald, M. R.; Royle, L.; Dwek, R. A.; Rudd, P. M. Anal. Chem. 2002, 74, 734-740. (28) Brull, L. P.; Kovacik, V.; Thomas-Oates, J. E.; Heerma, W.; Haverkamp, J. Rapid Commun. Mass Spectrom. 1998, 12, 1520-1532.

Analytical Chemistry, Vol. 77, No. 15, August 1, 2005

4721

Figure 2. Difference score distributions for matching between the MS2 spectra in the library. Red line, scores between the same structures; blue line, scores between isomers; black line, scores between different structures with different molecular weights.

Figure 1. Schematic diagram for rapid identification of oligosaccharides. (a) Strategy for constructing the MSn spectral library. MS2 spectra and MS3 spectra of all the major fragments in the MS2 spectra for each structurally defined oligosaccharide were stored in the library. (b) Rapid identification of oligosaccharides using the intelligent acquisition of MS3 spectra. Candidate structures for analytes are selected among oligosaccharides in the library on the basis of the similarity of MS2 spectrum. The computer selects the next parent ion among the fragment ions in the MS2 spectra as the most informative ion. The structure of the analyte is identified by MS3 spectral matching with candidates.

Figure 2. The distribution of difference scores between the data obtained from the triplicate acquisition of the same structure was quite narrow and clearly separated from that between the different structure with the different molecular weight. This suggested that the data showed good reproducibility and that the difference score (S1 + S2) can be used for discrimination of the structure of an oligosaccharide. Meanwhile, the distribution of scores between isomers partially overlapped that between the same structure, which means that there are some cases in which a similarity evaluation of higher stage tandem mass spectra becomes necessary for the discrimination. For example, the scores between MS2 spectra of anomeric isomers of Gal (1 and 2) and diastereomers (β-galactoside 3 and β-mannnoside 4) are far from the distribution of scores for identical structures, whereas the three regional isomers (5- 7) of tetraantennary N-glycans, which differ from each other in only the galactosyl site, afforded a small score 4722 Analytical Chemistry, Vol. 77, No. 15, August 1, 2005

Figure 3. Difference scores for MSn spectra of oligosaccharide isomers. (a) Between MS2 spectra of anomeric isomers (R-galactoside and β-galactoside). (b) Between MS2 spectra of diastereomers (Gal and Man). (c) Between MS2 and MS3 spectra of the regional isomers based on branching structure. Asterisk * indicates the scores between MS3 (m/z 1967) spectra. 2-AB, 2-aminobenzamide. Symbols are as in Figure 1.

between 6 and 7 (Figure 3). In contrast, the scores between MS3 spectra of m/z 1967 of these three isomers are large enough to discriminate each other. These results suggested that MSn spectra up to the third order could generally provide characteristic signal intensity profiles for each individual oligosaccharide even when the structures were very similar. Answering Procedure Using the Difference Score. For the process of choosing the candidates among the library based on the difference score (S1 + S2) of MS2 spectra, the threshold was set at 60, which is high enough as shown in Figure 2 not to miss the correct answer. If there is only one candidate for which the difference score is below the threshold, this candidate is output as the answer. In the case of multiple candidates, the MS3 spectrum of the next parent ion selected among the fragment ions in the MS2 spectrum of the analyte by “intelligent selection” as described in the Experimental Section is acquired. Since the structures of candidates are generally very similar, to choose the answer based on the difference score of the MS3 spectrum, the threshold should be set very carefully. Executing a query using the library data of 64 2-aminopyridine (PA)-labeled oligosaccha-

Table 1. Effect of Threshold for the Score of MS3 Spectral Matching on the Performance of the Answering Procedurea

Table 2. Comparison of Intelligent Selection with Conventional Selectiona

threshold

TP

FN

TN

FP

sensitivity (%)

specificity (%)

10 20 30 40 50 60 70 80 90 100

40 53 57 62 62 62 62 61 61 61

24 11 7 2 2 2 2 3 3 3

64 60 54 54 52 46 43 41 39 38

0 4 10 10 12 18 21 23 25 26

62.5 82.8 89.1 96.9 96.9 96.9 96.9 95.3 95.3 95.3

100 93.7 84.4 84.4 81.3 71.9 67.2 64.1 60.9 59.4

no. of executionsb

intelligentc

conventionalc

2 3 4 5 6

37 6 0 0 0

27 8 4 2 2

a The threshold of the score for MS3 spectral matching was set at 70. b The numbers of executions including MS2 needed to attain the answer are indicated. c The numbers of oligosaccharides identified by the given number of executions are indicated.

a TP and FP are the number of queries for which search results below the given threshold include a correct structure and incorrect structures, respectively. FN and TN are the total number of queries minus TP and FP, respectively.

rides which afford multiple candidates, the performance of the answering procedure was evaluated at various thresholds of the score for MS3 spectral matching. The most suitable threshold for the similarity score of MS3 spectra was explored by calculation of the specificity and sensitivity:

sensitivity ) TP/(TP + FN)

(4)

specificity ) TN/(TN + FP)

(5)

where true positive (TP) and false positive (FP) are the number of queries for which search results below the given threshold include a correct structure and incorrect structures, respectively. False negative (FN) and true negative (TN) are the total number of queries minus TP and FP, respectively. As shown in Table 1, the sensitivity is over 95% at a threshold of more than 40, whereas the specificity decreases as the threshold increases. Therefore, an ideal value for the threshold for the difference score of MS3 spectral matching is 40. However, in several oligosaccharides, since this value is too strict, the number of acquisitions of MS3 spectra to attain the answer increases (data not shown). To make the best use of the intelligent selection procedure, we temporally set 70 as the threshold of the MS3 score in the current study. We plan to reoptimize this value once the amount of data in the library increases. We also evaluated the number of executions to achieve the identification using intelligent selection of the next parent ion by comparing with a conventional selection of next parent ion in order of intensity, as shown in Table 2. In the evaluation of both methods, the selection of parent ion was repeated until MS3 spectral matching afforded an only oligosaccharide whose difference score was below a threshold of 70. Using intelligent selection, 37 of 43 oligosaccharides could be identified with two executions including MS2, and all 43 oligosaccharides could be identified within three executions. On the other hand, using conventional selection, eight cases required over three executions, and two cases required six executions. This result supported the advantage of our intelligent selection procedure for effective identification. Preparation of an Oligosaccharide Library. Since the identification of oligosaccharides using this strategy depends on

the amount of data stored in the library, it is crucial to collect MSn spectral data for a large number of structurally defined oligosaccharides. We used not only commercially available oligosaccharides, but also oligosaccharides enzymatically synthesized with human glycosyltransferases, many of which we had already cloned,18,20-23 as structurally defined oligosaccharides. Figure 4a shows an example of enzymatic preparations of five N-linked oligosaccharides that are structurally similar and difficult to obtain from natural sources. The oligosaccharides having LacdiNAc on either or both branches of the biantennary were easily prepared using β4GalNAcT3 from the asialo agalacto biantennary N-linked oligosaccharide as described before.23 However, for the two products having only one GalNAc residue, it is difficult to clarify which branch has a GalNAc. After galactosylation of these isomers, this could be easily solved by comparing the MS2 spectrum with that of the same oligosaccharides prepared from the structurally defined monoagalacto biantennary N-linked oligosaccharide, which showed characteristic fragmentation patterns on MS2 spectra (data not shown). We examined whether a library of O-linked glycopeptides, most of which are not available on the market, can be synthesized with these enzymes (Figure 4b). Glycopeptide 1 was extended with successive glycosylations by ST3Gal4, β3GnT2, FUT6, and β4GalT1. Each reaction was stopped at ∼50% yield by monitoring the time course of the reaction using MS, and the next enzyme and the donor substrate were added to the mixture. Figure 4c shows the MS spectrum of the final mixture. Since the molecular weights and structures of the glycopeptides contained in the mixture were designed to show 1 to 1 correspondence, the oligosaccharide structure of each peak can be identified immediately. The synthesized glycopeptide mixture can be converted to a reduced oligosaccharide mixture by reductive β-elimination.29 Therefore, a variety of structurally defined O-linked oligosaccharides may be rapidly and easily synthesized using similar procedures. Oligosaccharide Analysis of Glycoproteins. To demonstrate the feasibility of this strategy, N-linked oligosaccharides released from a well-characterized human immunoglobulin G (IgG) with peptide-N-glycosidase F (PNGase F; Prozyme), which were labeled with PA followed by neuraminidase treatment, were analyzed. These N-linked oligosaccharides were chromatographed as reported19 and numbered in order of elution from the reversed-phase HPLC. Table 3 summarizes the results of these experiments. Figure 5 shows the structures of identified N-linked oligosaccha(29) Hanisch, F. G.; Peter-Katalinic, J. Eur. J. Biochem. 1992, 205, 527-535.

Analytical Chemistry, Vol. 77, No. 15, August 1, 2005

4723

Table 3. Results of the Identification of N-Glycans Released from Human IgG Using the MSnSpectral Library scored

candidates no. 1 2 3 4 5 6

parent iona 1417 1579 1579 1742 1563 1725 1280/1725e

7

1725 1280/1725e

8 9 10

1888 1929 1767 1321/1767e

11

2091 1443/2091e

IDb

structurec

S1

S2

S 1 + S2

ONA-51 ONG-a5 ONG-cf ONG-47 ONA-69 ONG-cd ONG-a6 ONG-cd ONG-a6 ONG-cd ONG-a6 ONG-cd ONG-a6 ONG-48 ONG-a8 ONA-a7 ONA-ad ONA-a7 ONA-ad ONG-df ONG-ac ONG-df ONG-ac

A B C D E

5 3 5 7 9 4 25 8 129 26 5 131 5 2 4 6 12 13 62 4 4 9 65

6 3 5 7 9 4 26 8 129 25 5 132 6 2 4 6 11 14 61 4 4 9 49

11 6 10 14 18 8 51 16 258 51 10 263 11 4 8 12 23 27 123 8 8 18 114

F

G H I J

K

a Parent ion shows m/z value of parent ion of MSn spectrum used for similarity search. b Candidates are indicated by the oligosaccharide ID, which is the code for a unique structure in our MSn spectral library. c Letters represents the structure as in Figure 5. They were only put on the conclusive line. d Score indicates similarity between analyte and candidate. The smaller the value, the more similar the two are. e For example, 1280/1725 indicates that the parent ion of m/z 1280 was derived from the MS2 spectrum of m/z 1725.

Figure 4. Example of the enzymatic preparation of structurally defined oligosaccharides and glycopeptides. Symbols are as in Figure 1. (a) Reaction scheme for the N-linked oligosaccharides containing Lac-diNAc. (b) Reaction scheme for the O-linked glycopeptide library. The wavy line indicates the MUC1a peptide (AHGVT*SAPDTR), which is glycosylated at T*. (c) MS spectrum of the prepared glycopeptide library. The asterisk indicates metastable ions from the sialylated glycopeptides.

rides. A similarity search in the MSn spectral library for signal intensity profiles obtained from the MS2 spectra of fractions 1-5, 8, and 9 resulted in A, B, C, D, E, H, and I as an only candidate with prominent scores where the threshold score of (S1 + S2) for selecting the candidates was set at 60, respectively. For fractions 6 and 7, the similarity search of MS2 spectra (m/z 1725) gave two candidates (ONG-cd and ONG-a6: these are the oligosaccharide IDs in our spectral library). MS3 spectra of the next parent ion (m/z 1280) chosen among the fragment ions in the MS2 spectra by “intelligent selection” were acquired. Spectral matching in the library with the MS3 spectra of fractions 6 and 7 resulted 4724 Analytical Chemistry, Vol. 77, No. 15, August 1, 2005

Figure 5. Structures of N-glycans from human IgG identified by the MSn spectral library search. Symbols are as in Figure 1.

in ONG-cd (F) and ONG-a6 (G) with prominent scores below a threshold of 70 for MS3 spectral matching, respectively. In a manner similar to that described above, we assigned the N-glycans of fractions 10 and 11 as shown in Table 3. All results were consistent with reported data.19 CONCLUSION We described a strategy for the rapid and accurate identification of oligosaccharides using an MSn spectral library and showed

that N-linked oligosaccharides from biological materials could be successfully identified with this method. The advantages of this strategy are manifold. First, the identification of oligosaccharides can be easily performed by a researcher who is not familiar with the structural analysis of oligosaccharides using MS, without a detailed assignment of fragment ions, by consulting the MSn spectral library. Second, since this strategy is performed only by mass spectrometry without any other analytical methods and sequential enzymatic processing, rapid identification is achieved using a small amount of sample. Third, MS3 experiments are only performed on fragment ions expected to afford the characteristic data, and so identification was achieved efficiently saving time and the consumption of samples. Fourth, unlike previous trials by others, accurate identification including the regio- and stereochemistry of oligosaccharides, such as the distinction of mannoside from galactoside, and branching isomers can be achieved by matching with their observational MSn spectra without the assistance of any other technique or knowledge. For the future comprehensive analysis of oligosaccharides, MALDI off-line LCMS can be easily coupled to this system using the MSn library.30,31 A limitation of this strategy is that the possibility of full identification depends on the amount of data stored in the spectral library. To overcome this limitation, we are increasing the number of structurally defined oligosaccharides by enzymatic synthesis as described above. The variety of oligosaccharide structures in an organism, which is strictly controlled by glycogenes, is much smaller than that expected by mathematical calculation. Most (30) Ericson, C. Phung, Q. T.; Horn, D. M.; Peters, E. C.; Fitchett, J. R.; Ficarro, S. B.; Salomon, A. R.; Brill, L. M.; Brock, A. Anal. Chem. 2003, 75, 23092315. (31) Zhang, B.; McDonald, C.; Li, L. Anal. Chem. 2004, 76, 992-1001. (32) Powell, A. K.; Harvey, D. J. Rapid Commun. Mass Spectrom. 1996, 10, 10271032. (33) Sekiya, S.; Wada, Y.; Tanaka, K. Anal. Chem. 2004, 76, 5894-5902.

glycogenes have already been cloned, especially human glycogenes; therefore, most of the oligosaccharides in the human body can be prepared enzymatically in the postgenomic era. As another option, we are studying the rules of fragmentation and the specific fragment patterns for partial structures using a massive amount of data on the MSn spectra of oligosaccharides, by which a de novo structural assignment based on MSn data can be developed. In this study, oligosaccharides were identified after the removal of sialic acids because MALDI MS for oligosaccharides containing sialic acid has low sensitivity and is labile. But this problem can be solved by esterification or amidation of the carboxyl residue of sialic acid.32,33 The final goal of this system would be completed in three steps: (1) the accumulation of MSn data for as many human glycans as possible; (2) coupling of the method of prediction of the signal intensity profile of arbitrary oligosaccharides, which will be developed from a large amount of observational MSn data; and (3) storage of the data calculated for all the possible N-, and O-glycans expected from human glycogenes. We are applying this strategy to glycomics related to human diseases and attempting to discover oligosaccharide markers. ACKNOWLEDGMENT This work was performed as a part of the R&D Project of the Industrial Science and Technology Frontier Program (R&D for Establishment and Utilization of a Technical Infrastructure for Japanese Industry) supported by the New Energy and Industrial Technology Development Organization (NEDO).

Received for review November 8, 2004. Accepted May 13, 2005. AC048350H

Analytical Chemistry, Vol. 77, No. 15, August 1, 2005

4725