Assigning N-Glycosylation Sites of Glycoproteins ... - ACS Publications

Apr 20, 2010 - The potential of employing Endo-M and exoglycosidases mixture in conjunction with LC-MS/MS analysis to assign the N-glycosylation sites...
0 downloads 0 Views 4MB Size
Assigning N-Glycosylation Sites of Glycoproteins Using LC/MSMS in Conjunction with Endo-M/Exoglycosidase Mixture Zaneer M. Segu,†,+ Ahmed Hussein,‡,+,§ Milos V. Novotny,†,‡ and Yehia Mechref*,†,‡ METACyt Biochemical Analysis Center, Department of Chemistry, Indiana University, Bloomington, Indiana 47405, and National Center for Glycomics and Glycoproteomics, Department of Chemistry, Indiana University, Bloomington, Indiana 47405 Received February 10, 2010

The assignment of protein glycosylation sites and their microheterogeneities are of biological importance, yet such characterization is still considered to be analytically very challenging. Several approaches have been recently developed to improve the characterization of glycosylation sites of proteins, including lectin and HILIC enrichment-based methods coupled to mass spectrometry. However, unequivocal assignment of protein glycosylation remains to be a daunting task, prompting continuous demands for the development of sensitive and cutting-edge analytical approaches. β-NAcetylglucosaminidase (endo-β-GlcNAc-ases, Endo-M) is an endoglycosidase capable of hydrolyzing N,N′-diacetylchitobiose moiety in N-linked oligosaccharides bound to the asparagine amino acid residue in various glycoproteins. An attractive feature of this enzyme is its ability to cleave the N,N′-diacetylchitobiose moiety while leaving an N-acetylglucosamine residue bound to the protein. This enzyme is also known to be inactive in the presence of core fucose residue linked to the reducing-end N-acetylglucosamine residue (GlcNAc). Here, we describe an approach capitalizing on these features of Endo-M to (a) determine the glycosylation sites of proteins and the occupancy of these sites, and (b) determine the attachment sites of fucose residue containing N-glycans. The latter is important because of its biological implications. Tryptically digested glycoproteins, which were subjected to Endo-M treatment, were analyzed by LC-MS/MS. Systematic evaluation of the activity of Endo-M toward different glycan structures indicated a dependence of enzyme activity on the complexity of the glycan structures. Efficient release of N-glycans using Endo-M is only achieved through the inclusion of a battery of exoglycosidases to reduce the complexity of the attached glycans and subsequently prompt an effective enzymatic release. Upon Endo-M/ exoglycosidase treatment of tryptically digested glycoproteins, glycosylated sites retain GlcNAc residue. The resulting peptides with GlcNAc residues attached to the glycosylation sites are easily assigned through LC-MS/MS analysis and subsequent database searching of the generated tandem MS of such entities. Comparing the LC-MS/MS results of the tryptic digest of glycoproteins treated with PNGase F and Endo-M/exoglycosidases allowed the assignment of core fucose residues to N-glycan reducing-ends. The detection of glycosylation sites only in the tryptic digest of PNGase F treated samples suggested core fucosylation of the attached N-glycans to such sites. This strategy was initially validated using model glycoproteins. It also proved to be useful in determining the glycosylation sites of blood serum glycoproteins. Keywords: Glycoprotein • Glycosylation sites • PNGase F • Endo M • Exoglycosidases

Introduction Glycosylation is a major post-translational modification of proteins. Glycans, which are N-linked to asparagine * To whom correspondence should be addressed. Tel: 812-856-5620. Fax: 812-855-8300. E-mail: [email protected]. † METACyt Biochemical Analysis Center, Indiana University. ‡ National Center for Glycomics and Glycoproteomics, Indiana University. + Both authors contributed equally to this work. § Present address: Department of Biotechnology, Institute of Graduate Studies and Research Alexandria University 163, Horreya Avenue, P.O. Box. 832, Alexandria 21526, Egypt.

3598 Journal of Proteome Research 2010, 9, 3598–3607 Published on Web 04/20/2010

residues in proteins, constitute the most common types of glycosylation.1 N-linked glycosylation occurs at a consensus sequence of NXT, NXS, and rarely NXC where X is any amino acid other than proline.2 Glycosylation affects a number of biological functions many of which are associated with many diseases.3 Moreover, protein glycosylation plays a significant role in various aspects of carcinogenesis. Increased levels of fucosylated glycans have been implicated in a number of pathological conditions, including inflammation and cancer.4-6 Although effective characterization of protein post-translational modifications (PTMs), in general, has been made possible 10.1021/pr100129n

 2010 American Chemical Society

research articles

Assigning N-Glycosylation Sites of Glycoproteins 7

through the rapid advancements in mass spectrometry, the identification of the glycosylation sites in proteins remains to be analytically challenging. This is partially due to their diverse structural modifications, ranging from few monosaccharide residues to heavily branched oligosaccharides composed of 7-40 monomers attached to the polypeptide chain through N-, O-, and C-linkage.1 Various glycoprotein or glycopeptide enrichment steps using, for example, lectin affinity chromatography,8 peroxidase oxidation prior to hydrazide coupling,9 boronic acid chromatography, and affinity chromatography using glycosylation-specific antibodies are currently employed to identify glycosylation sites of proteins. The separated/ purified glycoproteins or glycopeptides are commonly analyzed by tandem mass spectrometry with or without complete or partial cleavage of glycans through either chemical or enzymatic treatment.10 Peptide-N4-(acetyl-β-glucosaminyl) asparagines amidase (PNGase F) is a common endoglycosidase which is routinely used to cleave asparagines N-linked high mannose, hybrid and complex glycans.9,11-14 This enzymatic release of Nglycans is also accompanied by the deamidation of asparagine residue to aspartic acid which provides an indirect indication of N-glycosylation sites of a glycoprotein. This is achieved by monitoring for the shift in molecular weight (0.98 Da) associated with this conversion. However, since deamidation can occur in vivo and in vitro, false positive assignment of glycosylation sites based on this shift in molecular weight is inevitable. This caveat is rectified by performing the enzymatic cleavage in 18O-water, thus, in a larger shift in molecular weight (3 Da).13 A new strategy for the identification of N-glycosylation of proteins was recently described involving the use of endo-β-N-acetylglucosaminidases D/H. Treatment of glycoproteins with these enzymes generates peptides with single N-acetylglucosamine (GlcNAc) attached to the asparagine glycosylation site.15 However, the enzymatic activity of these enzymes is very low resulting in partial hydrolysis and subsequently weak signals. Therefore, there is still a continuous need to use other endoglycosidases to offer more insight into the glycosylation of proteins and eradicating the shortcomings of different endoglycosidases that are currently used. Endo-M was initially purified by Yamamoto et al.16 from the culture fluid of Mucor hiemalis isolated from soil. It is an endoglycosidase capable of hydrolyzing the N,N′-diacetylchitobiose moiety attached to the asparaginyl residue of proteins through N-glycosidic linkage. It cleaves after the reducing end N-acetylglucosamine (GlcNAc) residue, thus, retaining a GlcNAc residue bound to the protein. This retained GlcNAc residue may be utilized in mapping the glycosylation sites of proteins using LC-MS/MS and database searching, since it imposes a characteristic molecular weight shift which is easily discernible through database searching. Moreover, fucose monosaccharide residue attached to the core GlcNAc residue in some hybrid and complex type glycans renders Endo-M inactive,1,17 thus, offering an insight into the occupancy of specific glycosylated sites. Therefore, treating samples separately with PNGase F and Endo-M could be employed to reveal specific information related to core fucosylation of specific glycosylation sites. The potential of employing Endo-M and exoglycosidases mixture in conjunction with LC-MS/MS analysis to assign the N-glycosylation sites of proteins and core fucosylation occupancy of these sites is explored here in a gel free

proteomic scheme. The enzymes are applied to tryptically digested glycoprotein standard and human blood serum. Comparing the LC-MS/MS results of the tryptic digest of glycoproteins treated with PNGase F and Endo-M/exoglycosidases allowed the assignment of core fucose residues to N-glycan reducing-ends. The detection of glycosylation sites only in the tryptic digest of PNGase F treated samples suggested core fucosylation of the attached N-glycans to such sites. This strategy was initially validated using model glycoproteins. It also proved to be useful in determining the glycosylation sites of blood serum glycoproteins.

Materials and Methods Materials. Haptoglobin from pooled human plasma, fetuin fetal calf serum, R1-acid glycoprotein from human plasma, mercaptoethanol, female pooled human blood serum, PNGase F, N-acetyl-β-D-glucosaminidase (EC 3.2.1.30) from Diplococcus pneumoniae, and β-galactosidase (EC 3.2.1.23) from D. pneumoniae were obtained from Sigma (St. Louis, MO). Endo-M was obtained from TCI (Tokyo, Japan). Neuraminidase (EC 3.2.1.18) from Arthrobacter ureafaciens (5 U/mL) was acquired from Prozyme (San Leandro, CA). Acetonitrile and ammonium bicarbonate were purchased from Fisher Scientific (Fair Lawn, NJ). Sodium phosphate dibasic and sodium phosphate monobasic were purchased from Aldrich (Milwaukee, WI). Dithiothreitol (DTT) and iodoacetamide (IAA) were obtained from Bio-Rad Laboratories (Hercules, CA). Trypsin Digestion. Glycoprotein samples were subjected to tryptic digestion according to the following procedure. After thermal denaturation at 95 °C for 10 min, samples were reduced through the addition of DTT to a final concentration of 5 mM and incubated at 60 °C for 45 min. Alkylation was achieved by adding IAA to a final concentration of 20 mM prior to incubation at room temperature for 45 min in the dark. A second aliquot of DTT was then added, increasing the final concentration of DTT to ca. 10 mM. The samples were then incubated at room temperature for 30 min to quench the alkylation reaction. Next, trypsin was added (1:30 w/w) and the solutions were then incubated at 37 °C for 18 h. The enzymatic digestions were finally quenched through the addition of neat formic acid. Release of N-Glycans from Glycoproteins. The N-glycans of standard glycoproteins, such as haptoglobin, fetuin, and R1-acid glycoprotein, were enzymatically released using a cocktail of endo- and exoglycosidases, including PNGase F, Endo-M, and Endo-M/exoglycosidases mixture (neuraminidase (EC 3.2.1.18) from A. ureafaciens, N-acetyl-β-D-glucosaminidase (EC 3.2.1.30) from D. pneumoniae, and β-galactosidase (EC 3.2.1.23) from D. pneumoniae). In the case of PNGase F, the enzymatic release was performed according to our previously published procedure.18 Briefly, standard glycoproteins were suspended in 10 mM sodium phosphate buffer, pH 7.5, containing 0.1% mercaptoethanol. The samples were then thermally denatured at 95 °C for 5 min. Next, the samples were cooled to room temperature prior to the addition of 5 mU of PNGase F. The reaction mixture was then incubated at 37 °C for 18 h. The Endo-M and Endo-M with exoglycosidases (Endo-M/exoglycosidases) enzymatic release was performed at 37 °C for 18 h using the same buffer used for PNGase F except that the pH of the buffer was adjusted to 6. Immunoaffinity Depletion. Human blood serum highabundance proteins (albumin, IgG, IgA, transferrin, haptoJournal of Proteome Research • Vol. 9, No. 7, 2010 3599

research articles

Segu et al.

Table 1. Glycosylation Sites of Fetuin Identified through LC-MS/MS of PNGase F, Endo-M, and Endo-M/Exoglycosidase Treated Tryptic Digests glycosylation sites of fetuin identified after PNGase Fa

fetuin glycosylation sites

m/z

RPTGEVYDIEIDTLETTCHVLDPTPLAN99CSVR 1224.9211(+3) 871.4152(+2) LCPDCPLLAPLN156DSR 1006.5248(+3) VVHAVEVALATFNAESN176GSYLQLVEISR

glycosylation sites of fetuin identified after Endo Mb

mass Mascot accuracy ion (ppm) score

1.3 1.5 -0.2

62 55 62

m/z

969.4666(+4) 972.4624(+2) ND

mass Mascot accuracy ion (ppm) score

1.2 1.8 ND

33 42 ND

glycosylation sites of fetuin identified after Endo M/exoglycosidasesc

m/z

969.4662(+4) 972.4641(+2) 1073.8896(+3)

mass Mascot accuracy ion (ppm) score

1.6 0.03 0.05

54 42 38

a PNGase F treatment results in the deamidation of asparagine, b Endo M treatment of glycopeptides generates peptides with HexNAc attached to asparagines. c Endo M/exoglycosidases treatment of this glycopeptide generates a peptide with HexNAc attached to asparagine.

globin, antitrypsin and fibrinogen) were depleted using Agilent Multi Affinity Removal System (MARS) column (4.6 mm ×100 mm, Agilent Technologies, Santa Clara, CA) on an Akta purifier (Amersham Biosciences, NJ). Depletion was performed as suggested by the manufacturer. A 1.5-mL aliquot of the depleted sample was collected and subsequently buffer exchanged with 50 mM ammonium bicarbonate and preconcentrated to ca. 0.5 µg/µL using a 5 kDa MWCO spin concentrator. The total protein concentration of the sample was determined by Bradford protein assay (Bio-Rad, Hercules, CA). Instrumentation. LC-MS/MS analyses of the tryptic digests were performed using a Dionex 3000 Ultimate nano-LC system (Dionex, Sunnyvale, CA) interfaced to LTQ Orbitrap hybrid mass spectrometer (Thermo Scientific, San Jose, CA). Prior to separation, a 5-µL aliquot of trypsin digestion (1 µg protein equivalent) was loaded on PepMap300 C18 cartridge (5 µm, 300 Å, Dionex) and eluted through a pulled-tip capillary column (150 mm × 75 µm i.d.) packed with 90 Å Jupiter C12 bounded phase (Phenomenex, Torrance, CA). Peptides originating from

protein tryptic digests were separated using a reversed-phase gradient from 3-55% B, 97% acetonitrile with 0.1% formic acid over 45 min for standard glycoproteins and 2.75 h for proteins isolated from human blood serum, at 300 nL/min flow rate. The mass spectrometer was operated in an automated datadependent mode that was switching between MS scan and CID-MS. In this mode, ionized LC eluants were subjected to an initial full-spectrum MS scan from m/z 300 to 2000 in the Orbitrap at 15 000 mass resolution. Subsequently CID-MS (at 35% normalized collision energy) was performed in the ion trap. The precursor ion was isolated using the data-dependent acquisition mode with a 2 m/z isolation width to select automatically and sequentially five most intense ions (starting with the most intense) from the survey scan. The total cycle (6 scans) is continuously repeated for the entire LC-MS run under data-dependent conditions with a dynamic exclusion set to 60 s. Performing MS scanning in the Orbitrap offers high mass accuracy and accurate charge state assignment of the selected precursor ions.

Figure 1. Extracted ion chromatograms of a triantennary glycan chain attached to RPTGEVYDIEIDTLETTCHVLDPTPLANCSVR observed in the LC-MS/MS analysis of fetuin tryptic digest (a), fetuin tryptic digest treated with Endo-M (b), and fetuin tryptic digest treated with Endo-M/exoglycosidase (c). The inset of panel a is the ion associated with this glycopeptides, while that of panels b and c is the ion resulting from Edo M and Endo M/exoglycosidase treatment of fetuin tryptic digest, respectively. 3600

Journal of Proteome Research • Vol. 9, No. 7, 2010

Assigning N-Glycosylation Sites of Glycoproteins Database Searching. Mascot version 2.1.3 was used for all search results obtained in this work. The data were searched against Swiss-Prot database for appropriate taxonomy. In the case of human blood serum, the search was performed using Homo sapiens taxonomy. Trypsin was selected as the enzyme and a missed cleavage was allowed. Carbamidomethyl was selected as a fixed modification of all cysteine residues, while deamidation and HexNAc were selected as variable modifications of asparagine residues for PNGase F and Endo-M treated samples, respectively. The mass tolerance of both MS and MS/ MS data were set to 0.8 Da. Peptides with mass accuracy better than 2 ppm and Mascot ion score of 30 and above were considered as positive identification. Combining such mass accuracy and Mascot ion score resulted in a less than 1% false positive identification rate as was determined from database searching against a decoy database.

Results and Discussions The potential of using PNGase F and Endo M to characterize the glycosylation sites of proteins was initially explored using model glycoproteins, including fetuin, haptoglobin, and R1-acid glycoprotein. Fetuin A possesses three distinct N-glycosylationsiteswhichhavebeenpreviouslycharacterized.15,19 Tryptically, digested fetuin samples were separately treated with PNGase F, Endo-M, and Endo-M/exoglycosidases. PNGase F treatment commonly results in the deamidation of asparagine to aspartic acid where the sugar molecules are attached, while Endo-M cleaves the sugar after the reducingend N-acetylglucosamine residue (GlcNAc) such that a GlcNAc residue remains bound to the peptide-backbone. Therefore, PNGase F and Endo-M treatments of glycoprotein tryptic digests prompt an increase in the molecular weight of the peptide backbone possessing the glycosylation sites corresponding to 0.9840 and 203.0793 Da, respectively. This molecular weight change is easily discernible by Mascot database searching, thus, allowing confident assignment of protein glycosylation sites. LC-MS/MS analysis of PNGase F treated fetuin sample allowed the identification of all the glycosylation sites previously reported for fetuin (Table 1). However, LC-MS/ MS analysis of fetuin treated with Endo-M alone identified only two glycosylation sites and failed to identify the third glycosylation site at N176 (Table 1). This is believed to be due to Endo-M’s inefficient enzymatic digestion originating from the complexity of the glycan structure associated with this glycosylation site. This is in agreement with the fact that Endo-M efficiency is dependent on the glycan structure.17 However, this limitation of Endo-M is easily addressable through the use of exoglycosidases. When fetuin is simultaneously treated with both Endo-M and a mixture of different exoglycosidases, including neuraminidase, N-acetylβ-D-glucosaminidase, and β-galactosidase, all glycosylation sites were identified (Table 1). The mixture of exoglycosidases reduced the complexity of the glycan structure, thus, facilitating an efficient Endo-M cleavage of N,N′-diacetylchitobiose moiety. Figure 1 depicts the extracted ion chromatograms of a fetuin glycopeptide (RPTGEVYDIEIDTLETTCHVLDPTPLANCSVR) observed in the LC-MS/MS analysis of trypsin treated sample (Figure 1a), trypsin and Endo-M treated sample (Figure 1b), and trypsin and Endo-M/exoglycosidase treated sample (Figure 1c). Endo-M treatment of this glycopeptide did not result in the complete release of the glycan moiety (Figure 1b), thus, suggesting that Endo-M alone does not possess adequate

research articles activity to completely release complex and large glycans. The lower intensity of the ion depicted in Figure 1b (intensity ) 3.8 × 107) relative to that shown in Figure 1a (intensity ) 1.0 × 108) suggests partial cleavage of the glycan. This was further supported by the fact that the peptide backbone of this glycopeptide with a GlcNAc residue attached was observed in the LC-MS/MS analysis as shown in the inset of Figure 1b, and was revealed by the database searching of the LC-MS/MS data (Table 1). This clearly indicates the inefficient enzymatic activity of Endo-M when used alone. However, reducing the complexity and size of the attached N-glycans through the inclusion of a mixture of exoglycosidases allows the effective release of N-glycans as suggested by the inability to detect the ion corresponding to the intact glycopeptide as depicted in Figure 1c. This is further supported by the increase in the intensity of the ion corresponding to a reducing-end GlcNAc moiety attached to the peptide backbone as shown in the inset of Figure 1c. This addition reduced the complexity of glycan structure, while simultaneously enhanced the activity of Endo M. Comparing quantitatively the intensity of the ion depicted in the inset of Figure 1b which originates from the treatment of the sample with only Endo M, to that of the same ion depicted in Figure 1c which originates from the treatment of the sample with Endo-M/exoglycosidase suggests a more than an order of magnitude increase in the activity of Endo M. The tandem mass spectrum of the same glycopeptide deglycosylated using PNGase F is shown in Figure 2b. In this case, asparagine amino acid, to which the oligosaccharide is attached, was converted to aspartic acid via deamidation. The presence of the modified ions, therefore, is an indicative of glycosylation sites. Results depicted in Figure 2a do not unequivocally reflect positive confirmation of a glycosylation site, since spontaneous deamidation of asparagine can naturally occur. Moreover, Mascot search often returns falsepositive assignment of deamidated asparagine as the mass difference associated with deamidation is 0.98402 Da. This difference might originate from selecting the wrong isotope on which tandem MS is performed. On the other hand, Figure 2b shows a tandem mass spectrum corresponding to a deglycosylated fetuin peptide (RPTGEVYDIEIDTLETTCHVLDPTPLANCSVR) having a GlcNAc residue attached to asparagine amino acid residue. This peptide originates from the Endo-M treatment of its glycopeptide counterpart. In this case, false positive assignment is less common because of the large m/z difference originating from the presence of the reducing-end GlcNAc attached to the peptide backbone. The glycosylation sites of R1-acid glycoprotein (AGP) identified using PNGase F, Endo-M and Endo-M/exoglycosidase are summarized in Table 2. AGP possesses seven glycosylation sites, five associated with AGP-1 and two with AGP-2.20 Six glycosylation sites for AGP were identified upon treatment with PNGase F, of which four were those of AGP-1 while the remaining two were those of AGP-2 (Table 2). Only four glycosylation sites of AGP were identified using Endo-M/ exoglycosidase, of which three were those of AGP-1 while the remaining site was that of AGP-2. LC-MS/MS of AGP sample treated with Endo-M alone did not identify any glycosylation sites. This is expected because of both the complexity of the glycan structures associated with AGP and the possibility of core fucosylation of AGP N-glycans.21 Tryptic miscleavage of AGP results in the formation of a peptide with two glycosylation sites (N93 and N103). LC-MS/ MS analysis of PNGase F treated sample identified a peptide Journal of Proteome Research • Vol. 9, No. 7, 2010 3601

research articles

Segu et al.

Figure 2. Tandem mass spectra of peptide ions with m/z values of 1225.5896 (a) and 1292.9584 (b) observed in the LC-MS/MS analysis of fetuin tryptic digest treated with (a) Endo-M/exoglycosidases or (b) PNGase F. Table 2. Glycosylation Sites of AGP Identified through LC-MS/MS of PNGase F, Endo-M, and Endo-M/Exoglycosidase Treated Tryptic Digests glycosylation sites of AGP identified after PNGase Fa

AGP glycosylation sites

AGP 1 WFYIASAFRNEEYN56K SVQEIQATFFYFTPN72KTEDTIFLR QDQCIYN93TTYLNVQR QDQCIYN93TTYLNVQREN103GTISR AGP 2 QNQCFYN93SSYLNVQR QNQCFYN93SSYLNVQREN103GTVSR

glycosylation sites of AGP identified after Endo Mb

m/z

mass accuracy (ppm)

Mascot ion score

969.9542(+2) 1448.7242(+2) 958.9426(+2) 892.4159(+3)

1.0 0.06 2.0 1.9

961.4283(+2) 889.4005(+3)

0.7 1.7

a PNGase F treatment results in the deamidation of asparagine. HexNAc attached to asparagine.

b

m/z

mass accuracy (ppm)

Mascot ion score

59 65 68 42

ND ND ND ND

ND ND ND ND

ND ND ND ND

64 49

ND ND

ND ND

ND ND

Journal of Proteome Research • Vol. 9, No. 7, 2010

mass accuracy (ppm)

Mascot ion score

1071.0020(+2) 1033.5185(+3) 1059.9937(+2) ND

1.0 -1.2 -1.5 ND

34 73 52 ND

1062.4754(+2) ND

1.2 ND

69 ND

m/z

Endo M/exoglycosidases treatment of this glycopeptide generates a peptide with

possessing two deamidated aspragine residues, thus, suggesting the presence of two glycosylation sites associated with this peptide (Table 2, Figure 3). The tandem mass 3602

glycosylation sites of AGP identified after Endo M/exoglycosidasesb

spectrum of this peptide (QNQCFYN93SSYLSVQREN103GTVSR) is shown in Figure 3. This miscleaved peptide was not identified in the LC-MS/MS analysis of Endo-M/exoglycosi-

research articles

Assigning N-Glycosylation Sites of Glycoproteins

Figure 3. Tandem mass spectrum of an ion with m/z value of 889.7350 observed in the LC-MS/MS analysis of PNGase F treated tryptic digest of AGP. Table 3. Glycosylation sites of Haptoglobin Identified through LC-MS/MS of PNGase F, Endo-M, and Endo-M/Exoglycosidase Treated Tryptic Digests glycosylation sites of haptoglobin identified after PNGase Fa

heptaglobin glycosylation sites

m/z

MVSHHN126LTTGATLINEQWLLTTAK NLFLN149HSEN153ATAK NLFLN149HSEN153ATAKDIAPTLTLYVGK VVLHPN241YSQVDIGLIK

894.1295(+3) 730.8550 (+2) 911.4766(+3) 898.5012(+2)

mass Mascot accuracy ion (ppm) score

1.5 2.5 0.1 0.6

84 72 69 57

glycosylation sites of haptoglobin glycosylation sites of haptoglobin identified after Endo Mb identified after Endo M/exoglycosidasesc

m/z

961.4942(+3) ND ND 666.7010(+3)

mass Mascot accuracy ion (ppm) score

1.8 ND ND 1.9

59 ND ND 37

m/z

961.4940(+3) 932.9486(+2) ND 666.7010(+3)

mass accuracy (ppm)

Mascot ion score

2.0 1.8 ND 1.9

98 33 ND 36

a PNGase F treatment results in the deamidation of asparagine. b Endo M treatment of glycopeptides generates peptides with HexNAc attached to asparagine. c Endo M/exoglycosidases treatment of this glycopeptide generates a peptide with HexNAc attached to asparagine.

dase treated AGP sample. PNGase F cleaves asparagine linked high mannose as well as hybrid and complex oligosaccharides while Endo-M/exoglycosidase is equally active in cleaving asparagine linked high mannose as well as hybrid and complex oligosaccharides from glycoproteins. However, Endo-M, as mentioned above, is not active in the presence of a core fucose residue. The glycosylation site at N93 was detected in both trypsin treated sample and Endo M/exoglycosiadse treated sample, while the glycosylation site at N103 was not observed in either sample (Table 2). Accordingly, it appears that N103 may be mainly occupied by glycan structures possessing core fucose. Hence, comparing the results obtained from samples treated with PNGase F which cleaves all N-glycans to that obtained using Endo-M/ exoglycosidase might be a means to determine the presence or absence of core fucosylation associated with protein glycosylation sites. Glycosylation sites that are only observed when sample is treated with PNGase F suggest the presence of core-fucose, since Endo-M cannot cleave glycan structures with core fucosylation. The same endoglycosidases and exoglycosidases were also utilized to characterize the glycosylation sites of haptoglobin. The glycosylation sites of haptoglobin identified using the above-mentioned strategy and enzymes are summarized in Table 3. Haptoglobin possesses four glycosylation sites (N126, N149, N153, N241) which have been previously characterized.22,23 LC-MS/MS of PNGase F treated haptoglobin tryptic digest

allowed the identification of all known glycosylation sites, and the same sites were also identified from LC-MS/MS analysis of Endo-M/exoglycosidase treated haptoglobin tryptic digest (Table 3). However, only two glycosylation sites (N126 and N241) of haptoglobin were identified from LC-MS/MS analysis of Endo-M only treated haptoglobin tryptic digest. This is again due to the dependence of Endo-M activity on the complexity of N-glycans. The N149 and N153 presence in close proximity renders Endo-M alone inactive, since the presence of these glycosylation sites in the close proximity limits enzyme access to N-acetylglucosamine, thus, limits if not eliminates the ability of Endo-M to hydrolyze the N,N′-diacetylchitobiose moiety in oligosaccharides bound to the asparagyl residue. However, the inclusion of a battery of exoglycosidases reduces the size of the glycans and eases the complexity of the glycan structure, allowing Endo-M to generate peptides with GlcNAc attached to asparagines (Table 3). This permits the identification of the glycosylated sites, N149 and N153 which are not identified in samples treated only with Endo-M. Tandem mass spectra of the above-mentioned heptoglobin peptides originating from Endo M/exoglycosidase and PNGase F treated tryptic digest are shown in Figure 4, panels a and b, respectively. Therefore, this study clearly reveals that, although Endo-M provides such useful information allowing the identification of protein glycosylation sites, to ensure efficient digestion, it has to be used in conjunction with an exoglycosidase mixture. Journal of Proteome Research • Vol. 9, No. 7, 2010 3603

research articles

Segu et al.

Figure 4. Tandem mass spectrum of ions corresponding to haptoglobin glycopeptide with two glycosylation sites (NLFLNHSENATAK) observed in the LC-MS/MS analysis of heptoglobin tryptic digest treated with Endo M/Exoglycosidases (a) and PNGase F (b).

Finally, the potential of Endo-M/exoglycosidases in determining the N-glycosylation sites of proteins was explored for depleted human blood serum (HBS) sample. LC-MS/MS analysis of trypsin and PNGase F treated sample resulted in the identification of 984 peptides originating from 105 proteins, while 958 peptides originating from 102 proteins were identified from the LC-MS/MS analysis of trypsin and Endo-M/exoglycosidases treated sample. This slight discrepancy is believed to be due to MS duty cycle. A number of glycosylation sites identified for glycoproteins present in HBS after trypsin and PNGase F and trypsin and Endo-M/exoglycosidase enzymatic treatment is summarized in Table 4. Very stringent filtering criteria were utilized here, only peptides having Mascot ion score equal or greater than 30 and mass accuracy of 2.0 ppm and below were included in the table. LC-MS/MS analysis of both trypsin and PNGase F and trypsin Endo-M identified 44 glycosylation sites associated with 32 glycoproteins. There were 34 glycosylation sites observed in the LC-MS/MS analysis of trypsin and PNGase F treated sample of which 14 were unique. On the other hand, there were 33 glycosylation sites observed in the LC-MS/MS analysis of trypsin and Endo-M/exoglycosidase treated sample of which 14 were unique. Unique glycosylation sites observed only in the LC-MS/MS analysis of trypsin and PNGase F treated sample are believed to potentially contain core-fucosylation. However, it is also possible that some of them might be only observed in trypsin and PNGase F sample because of instrument duty cycle. The ability to determine core fucosylation using the comparison of the two LC-MS/MS analyses is very appealing, since the presence of this monosaccharide is a very specific biomarker for many diseases.24 According to the data summarized in Table 4, it appears that more information pertaining to the glycosylation sites of glycoproteins is offered through LC-MS/MS of Endo-M/exoglycosidase treated sample. This is concluded from considering the data of the model glycoproteins that are also present in HBS. As opposed to what we observed in the analysis of standard haptoglobin and AGP (Table 2, Table 3), QNQCFYNSSYLNVQR of AGP and VVLHPNYSQVDIGLIK of haptoglobin 3604

Journal of Proteome Research • Vol. 9, No. 7, 2010

were not identified by LC-MS/MS analysis of PNGase F treated tryptic digests. However, these glycosylation sites were identified by LC-MS/MS of Endo-M/exoglycosidase treated HBS tryptic digest (Table 4). Moreover, the enzymatic activity of Endo-M upon the inclusion of exoglycosidases appears to be superior to that of Endo-D/H which was recently reported.15 We also believe that the glycosylation sites identified only by PNGase F might be because of the presence of core fucosylation. As mentioned above, Endo-M/exoglycosidase is inactive in the presence of core fucosylation. Accordingly, it appears that we can deduce the presence of core fucosylation through comparing the LC-MS/MS analyses of PNGase F treated samples to their counterpart treated with Endo-M/ exoglycosidases. For example, tandem mass spectra of a glycopeptides originating from trypsin digestion and trypisn and Endo M/exoglycosidase digestion of antithrombin III are depicted in Figure 5, panels a and b, respectively. Automated databases search of LC-MS/MS analysis of human blood serum tryptic digest treated with Endo M/exoglycosidase identified the glcysoylation site depicted in Figure 5a. Manual search of an LC-MS/MS analysis of tryptic digest of the human blood serum resulted in the identification of this site with disialylted biantennary N-glycan structure as shown in Figure 5b. Therefore, the ability to observe this glcyosylation site in both analysis suggests that N-glycan structures attached to this site mainly do not possess core fucosylation. A glycosylation site observed in the LC-MS/MS analysis of PNGase F treated tryptic digest of human blood serum but not in that of Endo M/exoglycosidase treated tryptic digest is associated with beta-2-glycoprotein and shown in Figure 5c (Table 4). Manual inspection of the LC-MS/MS analysis revealed, as shown in Figure 5c, a core fucosylated N-glycan structure occupying this site, thus, explaining why it was not observed in the LC-MS/MS analysis of the tryptic digest of human blood serum treated with Endo M/exoglycosidase. Although the inability to observe a glyosylation in the LCMS/MS analyses of both PNGase F and Endo M/exoglycosidases might suggest the possibility of corefucosylation, we also

research articles

Assigning N-Glycosylation Sites of Glycoproteins

Table 4. Glycosylation Sites (Glycopeptides) Identified for HBS Glycoproteins from LC-MS/MS Analyses of Trypsin and PNGase F Treated Sample and Trypsin and Endo-M/Exoglycosidase Treated Samples glycosylation sites identified after PNGase Fa Mascot ion score

m/z

2.0

67

1059.9906(+2)

1.5

70

ND

ND

ND

1062.4768(+2)

-0.1

49

ND

ND

ND

1010.6952(+4)

-0.1

56

1123.2274(+3) 801.0649(+3)

0.8 -0.2

50 31

ND ND

ND ND

ND ND

1183.5720(+2) 886.9179(+2)

0.6 1.8

79 86

1284.6193(+2) ND

0.9 ND

64 ND

1082.5870(+2)

-0.4

93

1183.6365(+2)

-1.9

109

934.4511(+2) 1090.0259(+2)

-0.6 0.3

89 86

1035.4977(+2) 1191.0729(+2)

0.5 0.8

73 64

908.8015(+3)

2.0

65

ND

ND

ND

735.3723(+2) ND

1.4 ND

40 ND

ND 1253.5513(+3)

ND 0.8

ND 46

1064.4952(+2) 947.4147(+2) 1011.9969(+2)

0.4 2.0 1.2

110 57 89

1165.5440(+2) 1048.4629(+2) ND

-0.6 1.3 ND

75 60 ND

1205.5879(+2) 842.9141(+2)

1.3 -1.3

128 80

1306.6385(+2) ND

-1.0 ND

80 ND

1090.8595(+3)

-0.8

81

ND

ND

ND

752.7188(+2)

1.7

35

1229.6229(+2)

0.8

53

986.5331(+2)

0.9

91

1087.5814(+2)

0.3

93

1337.9409(+3) ND ND

1.7 ND ND

115 ND ND

ND 882.8683(+2) 742.0260(+3)

ND 1.1 2.0

ND 53 37

1504.7063(+2)a

0.9

52

1605.7572(+2)

-1.2

66

ND ND

ND ND

ND ND

1218.1188(+2) 764.6608(+3)

0.3 1.7

74 46

670.8488(+4) ND

1.9 ND

52 ND

961.4962(+3) 999.5483(+2)

-0.3 1.2

101 34

1265.0327(+2) 868.9343(+2) 600.2624(+4) ND

0.5 1.9 0.8 ND

96 69 30 ND

1366.0802(+2) 969.9827(+2) ND 804.3770(+2)

0.6 1.0 ND 1.6

74 43 ND 54

742.3840(+2)

1.1

43

ND

ND

ND

988.8643(+3)

1.8

87

1056.2300(+3)

1.2

64

1170.5326(+2)

1.1

116

ND

ND

ND

ND

ND

1517.2524(+2)

-1.9

76

1405.6711(+2)

1.0

106

1004.8154(+3)

0.6

32

874.7518(+3)

0.5

69

942.1168(+3)

0.6

50

glycoprotein and glycosylation sites

Alpha-1-acid glycoprotein 1 QDQCIYNTTYLNVQR Alpha-1-acid glycoprotein 2 QNQCFYNSSYLNVQR Alpha-1B-glycoprotein EGDHEFLEVPEAQEDVEATFPVHQPGNYSCSYR Alpha-1-antichymotrypsin YTGNASALFILPDQDKMEEVEAMLLPETLK FNLTETSEAEIHQSFQHLLR Alpha-2-HS-glycoprotein AALAAFNAQNNGSNFQLEEISR VCQDCPLLAPLNDTR Alpha-2-macroglobulin VSNQTLSLFFTVLQDVPVR Antithrombin-III LGACNDTLQQLMEVFK SLTFNETYQDISELVYGAK Apolipoprotein D ADGTVNQIEGEATPVNLTEPAKLEVK Beta-2-glycoprotein 1 VYKPSAGNNSLYR DTAVFECLPQHAMFGNDTITCTTHGNWTK Ceruloplasmin ENLTAPGSDSAVFFEQGTTR EHEGAIYPDNTTDFQR ELHHLQEQNVSNAFLDK Clusterin MLNTSSLLEQLNEQFNWVSR LANLTQGEDQYYLR Complement C1q subcomponent subunit A RNPPMGGNVVIFDTVITNQEEPYQNHSGR Complement C3 TVLTPATNHMGNVTFTIPANR Complement component C9 AVNITSENLIDDVVSLIR Complement factor H WDPEVNCSMAQIQLCPPPPQIPNSHNMTTTLNYR ISEENETTCYMGK IPCSQPPQIEHGTINSSR Complement factor I SIPACVPWSPYLFQPNDTCIVSGWGR Fibronectin LDAPTNLQFVNETDSTVLVR RHEEGHMLNCTCFGQGR Haptoglobin MVSHHNLTTGATLINEQWLLTTAK VVLHPNYSQVDIGLIK Hemopexin CSDGWSFDATTLDDNGTMLFFK ALPQPQNVTSLLGCTH GHGHRNGTGHGNSTHHGPEYMR SWPAVGNCSSALR Heparin cofactor 2 NLSMPLLPADFHK Ig alpha-1 chain C region LSLHRPALEDLLLGSEANLTCTLTGLR Ig mu chain C region GLTFQQNASSMCVPDQDTAIR Inter-alpha-trypsin inhibitor heavy chain H1 ANLSSQALQMSLDYGFVTPLTSMSIR Inter-alpha-trypsin inhibitor heavy chain H4 LPTQNITFQTESSVAEQEAEFQSPK Kininogen-1 HGIQYFNNNTQHSSLFMLNEVK

mass accuracy (ppm)

glycosylation sites identified after Endo M/exoglycosidasesb

m/z

958.9426(+2)

ND

mass accuracy (ppm)

Mascot ion score

Journal of Proteome Research • Vol. 9, No. 7, 2010 3605

research articles

Segu et al.

Table 4. Continued glycosylation sites identified after PNGase Fa

m/z

mass accuracy (ppm)

713.4260(+2) 1153.5493(+3)

1.9 1.1

79 69

ND ND

ND ND

ND ND

936.8053(+3)

0.7

52

ND

ND

ND

ND

ND

ND

1094.0485(+2)

-1.4

51

1051.5411(+2)

1.2

115

1152.5913(+2)

-1.1

79

ND

ND

ND

747.0995(+4)

0.9

35

1058.2018(+3) ND

1.0 ND

64 >ND

ND 790.1238(+4)

ND 1.4

ND 45

ND

ND

ND

992.4720(+3)

0.6

34

glycoprotein and glycosylation sites

Leucine-rich alpha-2-glycoprotein LPPGLLANFTLLR QLDMLDLSNNSLASVPEGLWASLGQPNWDMR N-acetylmuramoyl-L-alanine amidase LEPVHLQLQCMSQEQLAQVAANATK Plasma kallikrein IYPGVDFGGEELNVTFVK Plasma protease C1 inhibitor VLSNNSDANLELINTWVAK Plasminogen GNVAVTVSGHTCQHWSAQTPHTHNR Prothrombin WVLTAAHCLLYPPWDKNFTENDLLVR SRYPHKPEINSTTHPGADLQENFCR Vitronectin (2) NISDGFDGIPDNVDAALALPAHSYSGR a

glycosylation sites identified after Endo M/exoglycosidasesb

PNGase F treatment results in the deamidation of asparagine, HexNAc attached to asparagine.

b

Mascot ion score

m/z

mass accuracy (ppm)

Mascot ion score

Endo M/exoglycosidases treatment of this glycopeptide generates a peptide with

Figure 5. Tandem mass spectra of ions corresponding to (a) LGACNDTLQQLMEVFK glycopeptide derived from human blood serum antithrombin-III tryptic digest, (b) LGACNDTLQQLMEVFK peptide derived from human blood serum antithrombin-III tryptic digest treated with ENDO M/Exoglycosidases, and (c) VYKPSAGNNSLYR peptide derived from human blood serum beta-2-glycoprotein 1 tryptic digest.

have to keep in mind that this inability to identify such a glycosylation site might be attributed to the duty cycle of the 3606

Journal of Proteome Research • Vol. 9, No. 7, 2010

mass spectrometer. Comparing the LC-MS/MS results of glycoproteins samples treated with the enzymes investigated here

research articles

Assigning N-Glycosylation Sites of Glycoproteins might be a good strategy to map core fucosylation in the case of complex samples such as human blood serum.

Conclusion Like PNGase F, Endo-M has wide spectrum of substrate specificity and is active on all three types of oligosaccharides. LC-MS/MS analyses of human blood serum evidently shows that Endo-M/exoglycosidase is more informative than PNGase F in terms of the number of identified glycosylation sites. Hence, fucosyl group attached to core GlcNAc residue in some hybrid and complex type glycans renders Endo-M inactive. Therefore, comparing the LC-MS/MS of PNGase F treated tryptic digest to that of Endo-M counterpart could be potentially utilized to determine the presence of core fucosylation on certain glycosylation sites which is important in cancer associated glycosylation. Data in this report suggests that Endo-M alone do not possess adequate activity to completely release complex and large glycans. However, reducing the complexity and size of the attached N-glycans through the inclusion of a mixture of exoglycosidase allows Endo-M to effectively release core N-glycan. The described approach might be very effective in mapping the glycosylation sites of glycoproteins as well as associating core fucosylation with certain sites.

Acknowledgment. This work was partially supported by the Indiana Metabolomics and Cytomics Initiative (METACyt), funded by a grant from Eli Lilly Endowment. This work was also partially supported by grant No. GM24349 from the National Institute of General Medical Sciences, U.S. Department of Health and Human Services, and grant No. RR018942 from the National Center for Research Resources, a component of the National Institute of Health (NIH-NCRR) for the National Center for Glycomics and Glycoproteomics (NCGG) at Indiana University

References (1) Hagglund, P.; Matthiesen, R.; Elortza, F.; Hojrup, P.; Roepstorff, P.; Jensen, O. N.; Bunkenborg, J. J. Proteome Res. 2007, 6, 3021– 3031. (2) Spiro, R. G. Glycobiology 2002, 12, 43r–56r. (3) Kannagi, R.; Izawa, M.; Koike, T.; Miyazaki, K.; Kimura, N. Cancer Sci. 2004, 95, 377–384. (4) Miyoshi, E.; Moriwaki, K.; Nakagawa, T. J. Biochem. 2008, 143, 725– 729. (5) Miyoshi, E.; Nakano, M. Proteomics 2008, 8, 3257–3262. (6) Taketa, K.; Endo, Y.; Sekiya, C.; Tanikawa, K.; Koji, T.; Taga, H.; Satomura, S.; Matsuura, S.; Kawai, T.; Hirai, H. Cancer Res. 1993, 5419–5423. (7) Jensen, O. N. Curr. Opin. Chem. Biol. 2004, 8, 33–41. (8) Praizel, O. Y.; Evstigneeva, R. P.; Yamskov, I. A.; Shtil, A. A. Drug Des. Rev. 2005, 2, 349–359. (9) Zhang, H.; Li, X.; Martin, D. B.; Aebersold, R. Nat. Biotechnol. 2003, 21, 660–666. (10) Liu, X.; Chan, K.; Chu, I. K.; Li, J. Carbohydr. Res. 2008, 343, 2870– 2877. (11) Gonzalez, J.; Takao, T.; Hori, H.; Besada, V.; Rodriguez, R.; Padron, G.; Shimonishi, Y. Anal. Biochem. 1992, 205, 151–158. (12) Kaji, H.; Saito, H.; Yamauchi, Y.; Shinkawa, T.; Taoka, M.; Hirabayashi, J.; Kasai, K.; Takahashi, N.; Isobe, T. Nat. Biotechnol. 2003, 21, 667–672. (13) Kuster, B.; Mann, M. Anal. Chem. 1999, 71, 1431–1440. (14) Xiong, L.; Regnier, F. E. J. Chromatogr., B 2002, 782, 405–418. (15) Haggland, P.; Bunkenborg, J.; Elortza, F.; Jensen, O. N.; Roepstorff, P. J. Proteome Res. 2004, 3, 556–566. (16) Kadowaki, S.; Yamamoto, K.; Fujisaki, M.; Izumi, K.; Tochikura, T.; Yokoyama, T. Agric. Biol. Chem. 1990, 54, 97–106. (17) Fujita, K.; Kobayashi, K.; Iwamatsu, A.; Takeuchi, M.; Kumagni, H.; Yamamoto, K. Arch. Biochem. Biophys. 2004, 432, 41–49. (18) Mechref, Y.; Novotny, M. V. Anal. Chem. 1998, 70, 455–463. (19) Yet, M. G.; Chin, C. C. Q.; Wold, F. J. Biol. Chem. 1988, 263, 111– 117. (20) Treuheit, M. J.; Costello, C. E.; Halsall, H. B. Biochem. J. 1992, 283, 105–112. (21) Havanaar, E. C.; Hoff, R. C.; Eijnden, D. H. V.; Dijk, W. V. Glycoconjugate J. 1998, 15, 389–395. (22) Bunkenborg, J.; Pilch, B. J.; Podtelejnikov, A. V.; Wisniewski, J. R. Proteomics 2004, 4, 454–465. (23) Liu, T.; Qian, W. J.; Gritsenko, M. A.; Camp, D. G.; Monroe, M. E.; Moore, R. J.; Smith, R. D. J. Proteome Res. 2005, 4, 2070–2080. (24) Miyoshi, E.; Moriwaki, K.; Nakagawa, T. J. Biochem. 2008, 143, 725–729.

PR100129N

Journal of Proteome Research • Vol. 9, No. 7, 2010 3607