Article pubs.acs.org/jpr
LC−MS/MS Characterization of O‑Glycosylation Sites and Glycan Structures of Human Cerebrospinal Fluid Glycoproteins Adnan Halim, Ulla Rüetschi, Göran Larson, and Jonas Nilsson* Department of Clinical Chemistry and Transfusion Medicine, Institute of Biomedicine, Sahlgrenska Academy at the University of Gothenburg, 413 45 Gothenburg, Sweden S Supporting Information *
ABSTRACT: The GalNAc O-glycosylation on Ser/Thr residues of extracellular proteins has not been well characterized from a proteomics perspective. We previously reported a sialic acid capture-and-release protocol to enrich tryptic N- and O-glycopeptides from human cerebrospinal fluid glycoproteins using nano-LC−ESI−MS/MS with collision-induced dissociation (CID) for glycopeptide characterization. Here, we have introduced peptide N-glycosidase F (PNGase F) pretreatment of CSF samples to remove the Nglycans facilitating the selective characterization of Oglycopeptides and enabling the use of an automated CID−MS2/MS3 search protocol for glycopeptide identification. We used electron-capture and -transfer dissociation (ECD/ETD) to pinpoint the glycosylation site(s) of the glycopeptides, identified as predominantly core-1-like HexHexNAc-O- structure attached to one to four Ser/Thr residues. We characterized 106 Oglycosylations and found Pro residues preferentially in the n − 1, n + 1, and/or n + 3 positions in relation to the Ser/Thr attachment site (n). The characterization of glycans and glycosylation sites in glycoproteins from human clinical samples provides a basis for future studies addressing the biological and diagnostic importance of specific protein glycosylations in relation to human disease. KEYWORDS: glycoproteomics, glycopeptide, tandem mass spectrometry, PNGase F, hydrazide chemistry
■
INTRODUCTION Extracellular proteins are frequently modified post-translationally with N-glycans on Asn residues and O-glycans on Ser/Thr residues.1 Recently, O-glycosylation of Tyr residues have also been reported.2,3 Both N- and O-glycans are often terminated with sialic acids, with N-acetyl-5-neuraminic acid (Neu5Ac) being the dominant form in human glycoproteins,4 which are essential for a multitude of cellular interactions.5−8 Mucins, that is, glycoproteins with long stretches rich in Ser, Thr, and Pro residues, are heavily GalNAc O-glycosylated on these Ser/Thr residues.9,10 Such “mucin glycosylations” are known to protect epithelial cells from physical stress and to act as decoy mechanisms for microbes.11 Nonmucin glycoproteins also carry GalNAc O-glycosylations on site-specific Ser/Thr residues3,12−15 and single or few clustered O-glycans have been shown to selectively block proteases from cleaving their peptide target sites.16−18 The proteolytic destiny, processing pathway, lifetime, and biological function of a glycoprotein can thus be specifically determined by its glycosylation status. To better address the significance of site-specific O-glycosylation of specific glycoproteins, it is accordingly important to map the O-glycosylation sites. As opposed to the Asn-X-Ser/Thr consensus motif of Nglycosylation, no apparent consensus motif for O-glycosylation seems to exist. This is likely due to the existence and differential expression of up to 20 different mammalian genes coding for a © XXXX American Chemical Society
family of polypeptide GalNAc transferases (ppGalNAc-Ts), which are together responsible for addition of the initial GalNAcα1-O-Ser/Thr on the polypeptide substrates.19,20 Each ppGalNAc-T seems to exhibit rather unique specificity for the O-glycosylation motif and also to show a tissue-specific distribution. Accordingly, it has been shown that model peptides containing S/T-X-X-P and P-S/T sequences are favorably subjected to initial glycosylation by ppGalNAc-T1 and -T221−23 due to substrate recognition by their catalytic domain. However, additional glycosylation of neighboring Ser/ Thr residues might also be facilitated, because of binding of ppGalNAc-T1 and -T2 through their lectin domains24,25 to the newly formed O-glycopeptide, which undermines the straight peptide-sequence-dependent O-glycosylation.26 Additionally, ppGalNAc-T427 and -T1028 glycosylate several Ser/Thr residues by specific recognition of preformed GalNAc-Othrough their lectin domains and also independently by recognition of GalNAc-O- through their catalytic domains.26,28 Two web resources are available where GalNAc O-glycosylation sites are predicted based on known glycosylation sites [Netoglyc 3.1, http://www.cbs.dtu.dk/services/NetOGlyc/; and Isoform Specific O-Glycosylation Prediction (ISOGlyP), Received: June 12, 2012
A
dx.doi.org/10.1021/pr300963h | J. Proteome Res. XXXX, XXX, XXX−XXX
Journal of Proteome Research
Article
deidentified, that is, all patient information was removed, before usage in this study. The use of deidentified clinical samples for method development is in agreement with Swedish law, and the study was permitted by the head of the Clinical Chemistry laboratory, Sahlgrenska University Hospital (Dnr 797-550/12).
http://isoglyp.utep.edu], but the validity of predicted sites must be questioned when experimental confirmation is lacking. Glycoproteomic techniques aimed at mapping O-glycosylation sites have recently been introduced as powerful tools for structural characterization of native glycoproteins.3,12,13,29 Darula and Medzihradszky used the jacalin lectin, recognizing GalNAcα1-O- of the core 1 structure, to purify tryptic Oglycopeptides and identified 23 O-glycosylation sites from bovine serum glycoproteins.13,30 Recently, they expanded their list to include 125 O-glycosylation sites by using prefractionation steps at both the glycoprotein and glycopeptide levels.31 Steentoft et al. used zinc-finger nuclease-induced knockout of the core 1 Gal T1 chaperone cosmc, inhibiting further elongation of GalNAc-O- precursor substrates, for studies of O-glycosylation in Simple cell cultures.3 Additionally, they employed Vicia villosa agglutinin (VVA) lectin chromatography to enrich GalNAc-modified tryptic glycopeptides and identified more than 350 O-glycosylation sites.3 Another glycoproteomics approach is the usage of TiO2 solid phases for the enrichment of sialylated glycopeptides in combination with peptide Nglycosidase F (PNGase F) treatment to release formerly Nglycosylated peptides from the solid phase.32 The usability of this methodology for the purification of O-glycopeptides has yet to be demonstrated. We initially developed a protocol for sialic acid capture and release of both N- and O-glycoproteins/ glycopeptides from clinical samples12 using hydrazide chemistry.33 Mild periodate oxidation was used to introduce an aldehyde on sialic acid terminated glycoproteins, which were then covalently captured onto hydrazide beads and trypsin digested, and finally, tryptic glycopeptides were released by formic acid hydrolysis of the acid-labile sialic acid glycosidic bond. Using liquid chromatography coupled to tandem mass spectrometry (LC−MS/MS) for the glycopeptide analyses, we identified desialylated glycans of 36 N- and 44 O-glycosylation sites on human cerebrospinal fluid (CSF) glycoproteins.12 We also used this method to identify desialylated glycans of 58 Nand 63 O-glycosylation sites from human urine samples.34 The N-glycan structures were essentially all of the complex type, and the O-glycans were mainly of the core 1 type. For these CSF and urine samples, the presence of abundant N-glycopeptides was prominent in the ion chromatograms and reduced the likelihood to fragment less abundant coeluting O-glycopeptides. To specifically study the site-specific O-glycosylation of CSF proteins, we have now included a pretreatment step using PNGase F to selectively remove N-glycans from native glycoproteins and thus facilitate the selective MS analysis of O-glycopeptides. We have now also developed an automated protocol to search for the HexHexNAc-O-substituted peptides using the Mascot search engine. For the assignment of specific Ser/Thr/Tyr glycosylation site(s) for peptides containing multiple hydroxylated amino acid, we used electron-capture dissociation (ECD) and electron-transfer dissociation (ETD) to allow for selective peptide backbone fragmentation of Oglycopeptides.
PNGase F Pretreatment and Sialic Acid Capture-and-Release Protocols
Aliquots of CSF samples (1 mL) were dialyzed against water using membranes with a 12−14 kDa molecular-weight cutoff (MWCO) (Spectrum Lab) (n = 2) or desalted on Sephadex PD-10 columns (GE Healthcare) (n = 6). The samples were lyophilized, dissolved in 50 μL of water, and subjected to PNGase F treatment according to the manufacturer’s protocol (New England Biolabs). The samples were denatured at 50 °C for 10 min in the glycoprotein denaturing buffer. Temperatures higher than 60 °C should be avoided because of risk of irreversible sample denaturation. G7 buffer, NP40, and PNGase F were added and incubated at 37 °C for 16 h. The samples were then desalted against water using 10 kDa MWCO microdialysis (Pierce). Finally, the samples (100−200 μL) were subjected to sialic acid capture and release for the enrichment of desialylated glycopeptides, as described elsewhere.12 Liquid Chromatography−Mass Spectrometry
Mass spectrometric analysis was performed essentially as described in ref 12. In short, samples were dissolved in 20 μL of 0.1% formic acid and separated by nano-liquid chromatography on a 150 × 0.075 mm C18 reverse-phase column (Zorbax; Agilent Technologies) in 50 min for elution of narrow chromatographic peaks and 120 min for broader peaks, with a gradient from 0 to 50% acetonitrile in 0.1% formic acid at a flow rate of 200−300 nL/min. The eluting peptides were allowed through a nano-ESI source to a hybrid linear quadrupole ion trap/FT ion cyclotron resonance (ICR) mass spectrometer equipped with a 7 T magnet (LTQ-FT; Thermo Fisher Scientific). All spectra were acquired in positive-ion mode, and the mass spectrometer was operated in the datadependent mode to automatically switch between MS1, MS2, and MS3 acquisition. The FTICR precursor scan was acquired at an isotopic resolution of 50000, and the most intense ion was isolated and fragmented in the linear ion trap (LTQ) using a normalized collision energy of 30%. For each MS2 spectrum, the five most intense fragment ions were sequentially selected for CID fragmentation in MS3. A repeat count of two was used, and ions were then dynamically excluded for 180 s. For ECD, the precursor ions were guided to the ICR cell and fragmented. The most abundant ion from an inclusion list, obtained by initial use of the CID−MS2/MS3 approach, was selected for fragmentation and irradiated with low-energy electrons produced by an emitter cathode for 80 ms using an arbitrary energy setting of 4 or 5 in duplicate fragmentation events. For higher-energy collision dissociation (HCD) and ETD, we used Orbitrap Velos and Orbitrap XL instruments (Thermo), respectively. The reverse-phase C18 chromatography and ESI interface setups were as previously described.35 The MS run times were 70 min, and the gradient ranged from 0 to 40% acetonitrile in 0.1% formic acid. For the Velos Orbitrap experiments, the MS1 precursor scans and CID−MS2 spectra were acquired with an isotopic resolution of 30000 and 7500, respectively, in the Orbitrap. The software could thus assign the charge states of MS2 peaks, which was necessary for attaining data-dependent CID−MS3 transitions from the five most abundant peaks in each MS2 spectrum. The CID−MS3 spectra
■
MATERIALS AND METHODS The CSF samples (10 mL, n = 8) were taken on the suspicion of infection but were, upon analysis, found to have normal white blood cell count and blood brain barrier function. The samples were collected by lumbar puncture and were centrifuged at 1800g for 10 min within 30 min after sample collection, aliquoted (1 mL fractions), and stored at −80 °C pending analysis. The aliquots of the CSF samples were B
dx.doi.org/10.1021/pr300963h | J. Proteome Res. XXXX, XXX, XXX−XXX
Journal of Proteome Research
Article
specifying the topic glycosylation in the FT line, the search terms (GalNAc...) and (HexNAc...), and experimentally verif ied. The neighboring ±10 amino acid residues were plotted, and Weblogos37 were constructed using version 3.1 (http:// weblogo.threeplusone.com), where the previously reported sites from CSF were omitted to avoid bias from the methodology used both previously12 and in this report.
were acquired as profile data in the LTQ. The normalized collision energies for CID−MS2 and −MS3 were set to 30%, and the minimum signal intensities for data dependent triggering of CID were set to 10000 and 500 counts in the MS1 and MS2 steps, respectively. Also, one HCD-MS2 spectrum was acquired on the Orbitrap Velos, after the MS3 events, at normalized collision energy of 40%. For the ETD experiments, using the Orbitrap XL, the normalized collision energy was set to 35% and the activation time was 200 ms. For each MS1 spectrum, at a resolution of 30000, three ETD spectra were collected, and the minimum signal required was set to 100000 counts. The ETD spectra were collected either as profile data from the Orbitrap, at a resolution of 7500, or as centroided data from the LTQ.
■
RESULTS
PNGase F Treatment
We subjected eight deidentified CSF samples to peptide Nglycosidase F (PNGase F) treatment and enriched O-linked glycopeptides (O-glycopeptides) with the sialic acid captureand-release protocol (Figure 1A). For two CSF samples, half of the volumes were treated with PNGase F and the other half was left untreated, and then both were subjected to glycopeptide enrichment. Peaks of tryptic N-linked glycopeptides (Nglycopeptides) were virtually absent in the PNGase F treated CSF samples (Figure 1B) but were prominent in the untreated samples (Figure 1C). By inspection of the CID−MS2 and −MS3 spectra, we identified several O-glycopeptides with mainly HexHexNAc-O- structure, most likely corresponding to the core 1 (Galβ3GalNAcα-O-) glycan.
Analysis of MS Data
The LC−MS/MS files from CID acquisitions were converted to Mascot general format (.mgf) using the Raw2 msm application.36 The top 12 peaks per 100 Da were selected, and MS3 spectra were included. The in-house Mascot server was accessed through Mascot Daemon (version 2.3.0), and searches were performed with the enzyme specificity set to Trypsin and then changed to Semitrypsin. The human sequences of the Swiss-Prot database were searched (20249 sequences; January 25, 2011), but then the NCBI database (16392747 sequences; December 27, 2011) was used to account for sequence variations. HexHexNAc (365.1322 Da) on Ser, Thr, and Tyr residues was set to variable modification together with neutral loss of HexHexNAc and Hex (162.0528 Da) for scoring purposes and from the “peptide” to account for neutral loss of HexHexNAc and Hex from the precursor. Alternatively, Hex2HexNAc2 (730.2644 Da) and HexHexNAc2 (568.2116 Da) on Ser, Thr, and Tyr residues were set to variable modifications together with neutral losses of the same masses in separate searches. Other variable modifications were Asn-toAsp conversion (+0.9840 Da), methionine oxidation, and loss of NH3 for peptides with N-terminal Gln and N-terminal carbamidomethyl-Cys. Carbamidomethyl-Cys was set to a fixed modification. The Instrument setting of ion trap was selected. Peptide tolerance was set to 10 ppm, and fragment tolerance was set to 0.6 Da. All MS2 and MS3 spectra of Mascot-proposed O-glycopeptides were manually checked to contain the anticipated HexHexNAc-O- or (HexHexNAc-O-)2 structures and were further investigated for matches that pinpointed the glycan to a specific Ser/Thr/Tyr residue within the peptide. The ECD and ETD spectra were converted and aggregated using Mascot distiller (version 2.3.2.0, Matrix Science), and the ions were presented as singly protonated in the output Mascot file. Search parameters were set as described above, except that the fragment tolerance was set to 0.03 Da, no neutral losses were allowed for the HexHexNAc modification, and the Instrument parameters were set to consider c, z, and z + 1 ions. Also, the precursor ion masses of ECD and ETD spectra were matched manually to those of glycopeptides that had been identified by the automated Mascot search protocol. The MSproduct tool from Protein prospector (http://prospector.ucsf. edu) was used to prepare peak lists of c and z ions for glycopeptide matches, and O-glycosylation sites were pinpointed to unique Ser/Thr/Tyr residues by tracing c and z ions that included or lacked HexHexNAc-O- modifications.
Automated Mascot Search to Identify O-Glycopeptides
To efficiently analyze the fragment-ion spectra, we designed a protocol to automate the Mascot searches for HexHexNAc-Osubstituted peptides (Figure 2). Use of the Raw2 msm application36 for the generation of Mascot .mgf search files allowed the precursor masses (MS1) to be assigned not only to the CID−MS2 spectrum but also to five consecutive MS3 spectra. Thus, the high mass accuracy (