Proteomics Analysis of O-GalNAc Glycosylation in ... - ACS Publications

Dec 28, 2016 - In this work, an integrated strategy was developed for comprehensive analysis of O-GalNAc glycosylation by combining hydrophilic intera...
0 downloads 0 Views 803KB Size
Subscriber access provided by UB + Fachbibliothek Chemie | (FU-Bibliothekssystem)

Article

Proteomics Analysis of O-GalNAc Glycosylation in Human Serum by an Integrated Strategy Hongqiang Qin, Kai Cheng, Jun Zhu, Jiawei Mao, Fangjun Wang, Mingming Dong, Rui Chen, Zhimou Guo, Xinmiao Liang, Mingliang Ye, and Hanfa Zou Anal. Chem., Just Accepted Manuscript • DOI: 10.1021/acs.analchem.6b02887 • Publication Date (Web): 28 Dec 2016 Downloaded from http://pubs.acs.org on December 29, 2016

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Analytical Chemistry is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 9

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Proteomics Analysis of O-GalNAc Glycosylation in Human Serum by an Integrated Strategy Hongqiang Qin‡, a, Kai Cheng‡, a, Jun Zhu‡, a, Jiawei Maoa, Fangjun Wanga, Mingming Donga, Rui Chena, Zhimou Guoa, Xinmiao Lianga, Mingliang Ye*, a and Hanfa Zou§, a a

CAS Key Lab of Separation Science for Analytical Chemistry, National Chromatographic Research and Analysis Center, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, China * To whom correspondence should be addressed. E-mail: [email protected] ‡ These authors contributed equally to this work. § Deceased April 25, 2016.

ABSTRACT: The diversity of O-linked glycan structures has drawn increasing attention due to its vital biological roles. However, intact O-glycopeptides with different glycans are typically not well elucidated using the current methods. In this work, an integrated strategy was developed for comprehensive analysis of O-GalNAc glycosylation by combining hydrophilic interaction chromatography (HILIC) tip enrichment, beam-type collision induced decomposition (beam-CID) detection and in silico deglycosylation method for spectra interpretation. In this strategy, the intact O-GalNAc glycopeptides were selectively enriched and the original spectra obtained by TOF-CID were preprocessed using an in silico deglycosylation method, enabling direct searching without setting multiple glycosylation modifications, which could significantly decrease the search space. This strategy was applied to analyze the O-GalNAc glycoproteome of human serum, leading to identification of 407 intact O-GalNAc glycopeptides from 93 glycoproteins. About 81% of the glycopeptides contained at least one sialic acid, which could reveal the micro-heterogeneity of O-GalNAc glycosylation. Up to now, this is the largest dataset of intact O-GalNAc glycoforms from complex biological samples at proteome level. Furthermore, this method is readily applicable to study O-glycoform heterogeneity in other complex biological systems.

O-GalNAc glycosylation, as one of the most important posttranslational modifications of proteins, plays crucial roles in tumor growth and metastasis.1-4 Many of the O-GalNAc glycoproteins such as CA125 and CA15-3 were adopted as clinical biomarkers for the diagnosis of cancers in clinics.5-6 The analysis of aberrant glycoforms of O-GalNAc glycoproteins could enhance the specificity of biomarkers in cancer diagnosis.7 Therefore, it is really important and urgent to investigate the heterogeneity of GalNAc-type O-glycoforms at proteome level. However, this type of O-glycosylation is generated by attaching glycans onto Ser/Thr residues and elongated by more than 20 distinct polypeptide GalNAc-transferases, which is much more complex than that of GlcNAc-type Oglycosylation.8-10 Additionally, there are up to eight core structures of the glycosylation and no consensus sequence motif obtained as N-linked glycosylation (N-X-S/T, where X is not proline), which makes the characterization of O-glycosylation much more challenging. Compared with N-linked glycosylation, it is still challenging to analyze O-GalNAc glycosylation due to the lack of robust methods for glycopeptide enrichment. Up to now, lectin chromatography has been one of the main tools for the glycopeptides enrichment.11-12 Yet, most of the lectins could only enrich certain types of glycan structures, which limits their applications in comprehensive analysis of O-GalNAc glycosylation. Hydrazide chemistry was firstly employed in the enrichment of N-linked glycopeptides/glycoproteins with high selectivity by oxidation and covalent linkage of cis-diol groups

in glycan structures.13-16 This method was modified to enrich O-GalNAc peptides by selective oxidation of the terminal sialic acids.17-19 However, it could only enrich glycopeptides containing sialic acids, while the information of sialylation was lost during the enrichment. Therefore, it cannot be applied to characterize the heterogeneity of O-GalNAc glycosylation. Additionally, the presence of N-linked sialylated glycopeptides could interfere with the enrichment of O-GalNAc glycopeptides. HILIC has been widely adopted to enrich N-linked glycopeptides by taking the convenience and no bias to some glycan structures. Yet, only few reports were reported about enrichment of O-GalNAc glycopeptides using HILIC, due to the limited efficiency and interferences of N-linked glycopeptides.20-22 Recently, glycopeptide metabolic labeling has also been adopted in the enrichment of O-GalNAc glycopeptides.23 In this strategy, unnatural sugars (such as Ac4GalNAz, Ac4ManNAz etc.) were metabolically labeled, and then the labeled O-GalNAc glycopeptides were enriched by ‘click chemistry’. Though the specificity and efficiency of glycopeptide enrichment were enhanced by using the metabolicchemistry strategy, this method could only be applied in the analysis of O-GalNAc glycosylation of cell lines. Because there is no glycosidase that could release the OGalNAc glycans with high efficiency, the analysis of OGalNAc glycosylation was mainly focused on partial deglycosylation. For example, Darula et al. developed a trimming strategy for identification of O-GalNAc glycosites, in which intact O-glycopeptides were subjected to partial deglycosyla-

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

tion by neuraminidase and galactosidase. The resulting glycopeptides with only core GalNAc residues were analyzed by HCD/ETD tandem mass spectrometry (MS2).12 This method could improve the determination of O-GalNAc glycopeptides. However, low efficiency of the neuraminidase and galactosidase limited their applications in analysis of O-GalNAc at proteome scale. Instead of partial deglycosylation in vitro, Steentoft et al. introduced zinc-finger nuclease to truncate the O-glycan elongation pathway in human cells, which could simplify all O-glycan structures to GalNAc and NeuAcGalNAc.24-26 This strategy was proved to be well suited for large scale analysis of O-GalNAc, and nearly 3000 glycosites were identified from 12 human cell lines in a recent study.26 The glycan chains on glycopeptides were trimmed to leave only core GalNAc to facilitate the glycosite identification.11-12, 24-26 Yet, diverse structures of glycans were lost during the analysis. Instead of trimming glycans, the intact O-GalNAc glycopeptides could be identified by using CID MS2-MS3 strategy.17, 27 The glycan fragment ions in MS2 allowed the identification of O-glycoforms, while the peptide fragment ions in MS3 generated by further fragment Y0 and Y1 ions in MS2 enabled the identification of peptide backbones. This strategy could be applied to characterize the glycosites and glycan structures of glycopeptides, simultaneously. However, low chance for the fragmentation of Y0 ions limited their efficiency. Compared to CID in trap, beam-type CID, e.g. CID in Q-TOF MS (TOF-CID), is able to break bonds on peptides, as well as bonds on glycans in O-GalNAc glycopeptides, simultaneously.28-31 The resulting MS2 spectra contained sufficient information for the identification of peptide sequence and attached O-glycan structures. However, due to the lack of an effective approach to interpret the spectra, the application of TOF-CID for intact O-glycopeptide identification was rarely reported at proteome level up to now. In this study, we presented an integrated workflow for proteomics analysis of O-GalNAc glycosylation by combining HILIC tip enrichment, Q-TOF MS detection and in silico deglycosylation method for spectra interpretation. In this workflow, the enzyme assisted HILIC tip enrichment enabled selective enrichment of intact O-GalNAc glycopeptides with various glycoforms from human serum, which made the characterization of glycosylation heterogeneity at proteome scale possible. Then, an in silico deglycosylation approach was developed to interpret the spectra of intact glycopeptides acquired by Q-TOF MS. This method could significantly decrease the search space and allow the identification of peptide sequences, as well as the determination of glycan compositions with high confidence. This strategy was utilized in comprehensive analysis of O-GalNAc glycosylation of human serum, resulting in the identification of 407 intact glycopeptides, which is the biggest dataset for O-GalNAc glycoproteome of serum. Experimental section Reagents and materials Bovine fetuin, trypsin and elastase were purchased from Sigma (St. Louis, MO). PNGase F and neuraminidase were purchased from New England Biolabs (Ipswich, MA, USA). Chemical reagents of iodacetamide (IAA), 1,4-dithiothreitol (DTT), and trifluoroacetic acid (TFA) were obtained from Sigma (St. Louis, MO). Formic acid (FA) was obtained from Fluka (Buches, Germany). Acetonitrile (ACN, HPLC grade) was from Merck (Darmstadt, Germany). Pure water used in all experiments was purified with a Milli-Q system (Millipore,

Page 2 of 9

Milford, MA). The centrifugal filter units (Amicon Ultra-0.5 mL) were purchased from Millipore (Milford, MA). GELoader tips (20 µL) were purchased from Eppendorf (Hamburg, Germany). The human serum used in all the experiments was obtained from the Second Affiliated Hospital of Dalian Medical University (Dalian, China). The normal serum samples were collected from ten volunteers, and pooled together with equal-volume. The cancer serum samples used to verify the platform were collected from 10 hepatocellular carcinoma (HCC) patients and pooled with equal-volume. All the 10 HCC patients were at moderate or advanced stages, and most of them were related to etiology HBV. The samples were stored at -80 °C until usage. The utilization of human serum complied with the guideline of Ethics Committee of the Hospital. Click maltose-HILIC beads were prepared in house as reported.32 Enrichment of O-GalNAc glycopeptides Digestion of proteins The protein samples were prepared following the FASP procedure according to previous work with some modifications.33 Briefly, 200 µg of bovine fetuin was first added to a 0.5 mLcentrifugal filter unit (Millipore, Amicon Ultra-0.5) with a 10 kDa cutoff. For the serum proteins, 100 µL of human serum was divided into twelve portions, and added into 12 filter units. Then 300 µL of 8 M urea/100 mM NH4HCO3 (pH 8.2) was added, followed by 15 min centrifugation at 14,000 g and repeated once. After that, DTT and IAA were added to the filter unit for reduction and alkylation as reported previously.34 The samples were desalted by centrifugation for 15 min at 14,000 g, washed with 400 µL of H2O twice, and the protein sample was treated by PNGase F (100 U for each tube, 37 °C overnight). After washing with 400 µL of H2O twice, the standard glycoprotein was digested by elastase (trypsin for human serum samples) with the ratio of 1:25 (enzyme to protein, w/w) at 37 °C overnight. Finally, the protein digest was collected by centrifugation and lyophilized to dryness. Enrichment of O-GalNAc glycopeptides HILIC enrichment was performed according to our previous report using the centrifugation assisted click maltose-HILIC approach.34 The protein samples prepared by modified FASP workflow were first re-dissolved in the loading buffer (80% ACN/1% TFA). Then, 20 µL digest solution equivalent to 20 µg standard glycoprotein digest (or equivalent to 20 µL serum digest) was pipetted into a HILIC tip with 5 mg click-maltose materials. After centrifugation at 4,000 g for 10 min, the HILIC tip was washed with 10 µL of loading buffer three times. Finally, the enriched O-glycopeptides were eluted with 20 µL of 0.1% FA/H2O for twice. The eluted samples were combined and dried for MS analysis. Fractionation of serum digest and O-GalNAc peptides For the initial serum sample analysis, glycopeptide enrichment was performed before fractionation. The glycopeptides enriched from 4 tips (equivalent to 80 µL serum) were fractioned by using a SCX column (100 mm×2.1 mm i.d., PolyLC). Mobile phases A (10 mM PBS buffer containing 20%ACN, adjusted pH to 2.7 using HCl) and B (500 mM KCl buffer containing 20%ACN, adjusted pH to 2.7 using HCl) were used in the gradient: as follows: 0%-0% B, 4 min; 0%-25% B, 30 min; 25%-100% B, 10 min; 100%-100% B, 6 min. The peptide digests were separated at flow rate of 0.2 mL/min and monitored by UV at 214 nm. Fractions were collected every 2

ACS Paragon Plus Environment

Page 3 of 9

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

minutes, and the fractions were dried under vacuum for MS analysis. Above fractionation process was termed as SCX workflow. For parallel analysis of the normal and HCC serum samples, fractionation was performed before O-glycopeptide enrichment. The serum digest equivalent to 100 µL serum was fractioned by using a column (250 mm×4.6 mm i.d.) packed with C18 particles 5 µm, 150 Å, Agela). Mobile phases A (98% H2O, adjusted pH to 11.0 using NH3·H2O) and B (98% acetonitrile, adjusted pH to 11.0 using NH3·H2O) were used to develop a gradient. The solvent gradient was set as follows: 0.1%-8% B, 2 min; 8%-18% B, 17 min; 18%-30% B, 11 min; 30%-80% B, 2 min; 80% B, 5 min; 80%-5% B, 3 min. The peptide digests were separated at flow rate of 1.0 mL/min and monitored by UV at 214 nm. Fractions were collected every 1 minute, and the 40 fractions were mixed into 10 fractions according to the previous report.35 The samples were dried under vacuum for subsequent glycopeptide enrichment as above. This sample fractionation scheme was termed as RP workflow. Liquid Chromatography-Mass Spectrometry The LC-MS/MS analyses were performed on a quadrupole time-of-flight mass spectrometer (Triple-TOF 5600, AB Sciex, Foster City, CA) with a NanoACQUITY UPLC system (Waters, Milford, MA) for separation. The LC-MS/MS system includes a 3 cm C18 capillary trap column (200 µm i.d.), a 12cm C18 capillary analysis column (75 µm i.d.) and a PicoTip Emitter (New Objective, Woburn, MA). The trap column was packed with C18AQ beads (5 µm, 120 Å) and the separation column was packed with C18AQ beads (3 µm, 120 Å). The RP gradient was developed as follows: from 0 to 5% buffer B (ACN/0.1% FA) for 2 min, from 5 to 22% buffer B for 60 min, from 25 to 35% buffer B for 20 min, and from 35 to 80% buffer B for 15 min. The O-glycopeptides were first loaded onto the C18 trap column and then separated by reversed phase liquid chromatography (RPLC) with gradient elution at a flow rate of 300 nL/min. The Triple-TOF 5600 MS was equipped with a Digital PicoView ESI source (New Objective, Woburn, MA). A spray voltage of 2.3 kV was applied between spray tip and MS interface. The detail parameters of instrument were shown in Supplementary Note 1 in the supporting information. Data analysis Mass spectrometric data acquired by Triple-TOF 5600 MS (.wiff) were converted into peaklist file (.mgf) using PeakView™ Software version 1.1.1.2. Further processing was performed by an in-house written program named ArMone 2.0, which contained a user-friendly graphic interface written by Java programming language. In ArMone 2.0, additional modules were included to process the data for the identification of intact glycopeptides. ArMone 2.0 can be downloaded for free from http://www.bioanalysis.dicp.ac.cn/proteomics/software/ ArMone.html. User manual and testing data were also provided.36 The details were shown in Supplementary Note 2 in the supporting information. Results and Discussion The integrated strategy for proteomics analysis of O-GalNAc glycosylation mainly has three steps, including enrichment of O-GalNAc glycopeptides, detection of intact glycopeptides and interpretation of glycopeptide spectra (shown in Scheme 1). In this strategy, the O-GalNAc glycopeptides were selec-

tively enriched by using a modified HILIC method, and the interference of N-linked glycopeptides were reduced by pretreatment using PNGase F. The enriched O-GalNAc glycopeptides were analyzed by Q-TOF MS, which generated spectra with rich fragment ions from both peptide backbones and Oglycans. Finally, the raw spectra were converted to deglycosylated forms by using an in silico deglycosylation method, and peptides could be identified without setting multiple glycosylation modifications. The performances of these steps were evaluated in the following sections.

Scheme 1. The workflow for identification of intact OGalNAc glycopeptides by combining HILIC enrichment and in silico deglycosylation method. Enzyme assisted removal of N-glycopeptides and HILIC enrichment of O-GalNAc Glycopeptides To study the heterogeneity of O-GalNAc glycosylation, intact O-glycopeptides must be enriched with high specificity. HILIC was often used to enrich the N-linked glycopeptides in a non-glycan-specific fashion.37,38 However, it is still difficult to selectively enrich O-glycopeptides from complex biological samples by HILIC. As shown in Figure S1A, the whole LCMS/MS chromatogram was covered by N-glycopeptides when HILIC was directly used to enrich glycopeptides from the digest of bovine fetuin, a glycoprotein with both N-linked and O-GalNAc glycosylation. This indicated that the Nglycopeptides seriously hampered the enrichment of OGalNAc peptides. Therefore, the dominant N-glycosylation should be removed prior to O-glycopeptide enrichment. For this purpose, PNGase F was adopted to remove the interference of N-linked glycosylation before HILIC enrichment. Briefly, protein samples were first denatured in centrifugal filter units, and PNGase F was added to enzymatically release the N-linked glycans. Then the proteins were digested, and intact O-glycopeptides could be selectively enriched by using HILIC. As shown in Figure S1B, it was observed that the LCMS/MS chromatogram was dominated by O-GalNAc glycopeptides, indicating the effectiveness of this approach. By applying the developed in silico deglycosylation method (described below), we obtained 646 spectral matches of intact Oglycopeptides after removing N-glycosylation, while only 302 intact O-glycopeptide spectral matches were obtained by using HILIC alone without removing N-glycosylation. Clearly, intact O-GalNAc glycopeptides could be selectively enriched by combining PNGase F pretreatment and HILIC enrichment. In silico deglycosylation method for interpretation of intact O-glycopeptide MS/MS spectra

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

After the enrichment of intact O-glycopeptides, the fragmentation of intact O-glycopeptides was needed for the identification of intact glycopeptides. As a beam-type CID, TOF-CID is able to break bonds on peptide backbones as well as glycans. As shown in Figure 1, one glycopeptide from histidine-rich glycoprotein was fragmented by LTQ-CID, ETD and TOFCID. It was found that plenty of fragments from peptide backbones and glycan structures were obtained in TOF-CID, which was in favor of the identification of intact glycopeptides. For comparison, only glycan fragments or a few fragments of peptide backbones were detected by using LTQ-CID or ETD. This was in accord with the previous work.31 Thus, the abundant information in the acquired MS/MS spectra by Q-TOF made the identification of intact glycopeptides theoretical possible. However, it is very challenging to interpret the spectra by conventional database searching strategy due to the various glycan structures and huge searching space caused by multiple O-glycosylation modifications.

Figure 1. The spectra for identification of the intact Oglycopeptide (SSTTKPPFKPHGS(GalNAc-Gal-NeuAc)R from histidine-rich protein, M.W. 2182.0277 Da) with different collision mode: (A) LTQ-CID, (B) ETD and (C) TOFCID. ( , GalNAc; , Gal; , NeuAc ( sialic acid)). Instead, we proposed an in silico deglycosylation method for this purpose (shown in Figure S2A). Firstly, MS/MS spectra of O-glycopeptides were extracted according to the observed oxonium ions. Because the oxonium ions will compromise the subsequent identification of peptide sequences, they were removed from the spectra in this step. After that, the possible compositions of O-glycan attached to peptide backbones were determined based on the fact that spectra of glycopeptides contained notable features of ions formed by loss of glycans from precursor ions (Y0 ions). For example, a cluster of high intensity peaks were observed at m/z 1021.5707, 1224.6327 and 1386.6842, which showed m/z intervals of 203.0620 and 162.0515 exactly matching the monosaccharides of HexNAc and Hex, respectively (shown in Figure S2B). Because the monosaccharide attached directly to the peptide backbone is GalNAc, the peak 1021.5707 and 1224.6327 were probably the fragments of peptide backbone (Y0 ion) and the peptide backbone plus GalNAc (Y1 ion). As the molecular weight of precursor ion is 2041.8873, the mass difference between precursor ion and the peptide backbone ion is

Page 4 of 9

1021.3239, which could be the composition of (HexNAc)2(Hex)2(NeuAc)1. After the potential O-glycan compositions were determined, the following processes were performed to generate a new spectrum: i) deduct the mass of O-glycan modifications from precursor ion to determine the molecular weight (MW) of the peptide backbone: for this spectrum the MW of the peptide backbone is 1020.5634 Da; ii) remove the tentatively identified glycosylated peptide fragments (Y ions). After this step, the spectrum was “deglycosylated”, which could be searched like nonglycopeptides without setting any O-glycosylation modifications (shown in Figure S3). It was observed that the “deglycosylated” peptide sequence was matched with an ion score of 38 (Expected value 1.6E-4). After combining the information of glycan compositions, the intact glycopeptides could be determined as 268EAP-SAVPDAAGPTPS282-(HexNAc)1(Hex)1 (NeuAc)2 with high confidence. If the in silico deglycosylation processing was not performed, the corresponding Oglycan modification(s) should be set as variable modifications in database searching. Although the same peptide sequence was obtained, the glycopeptide received with an ion score of 16 (Expected value 0.078), which was much lower than that of the “in silico deglycosylated” spectrum (shown in Figure S3A). This could be caused by the removal of interference fragments of glycan compositions with high abundance. As shown in Table S1, the information of b7+ fragment could be lost by direct searching, and low score of peptide sequence was obtained. As the comparison, the b7+ fragment with low abundance could be obtained after in silico deglycosylation, and five continuous fragments were obtained, which could enhance the identification confidence. Additionally, the searching space could be significantly reduced by the in silico deglycosylation method, inducing the decrease of the random match, which was in favor of the interpretation of complex OGalNAc glycopeptide spectra. Analysis of O-GalNAc glycosylation in bovine fetuin Bovine fetuin-A, the major component in bovine fetuin, was reported to have both N-linked and O-GalNAc glycans.39-40 The analysis of O-GalNAc glycosylation in bovine fetuin was performed by using the integrated strategy. Due to the absence of trypsin cutting sites nearby the glycosites, elastase (a nonspecific protease) was employed for the digestion of fetuin. Intact O-glycopeptides enriched from bovine fetuin digest were subjected to TOF-CID analysis and the acquired spectra were processed by in silico deglycosylation method. As shown in Table S2, there were 30 peptide sequences and 15 glycan compositions identified with FDR < 1%. The repeatability of the strategy for analysis of glycosylation was also evaluated. As shown in Figure S5, the number of intact glycopeptides identified was similar and most of glycopeptides were identified in all the three experiments, indicating good repeatability of this method. Totally, there were 82 intact glycopeptides detected, and 18 of them only had one potential glycosite (one Ser/Thr per peptide), which could determine the glycosylation location unambiguously, including S271, S296 and S341 in fetuin-A (shown in Table S2). There were 5, 3 and 5 glycoforms linked to the three glycosites, respectively. Interestingly, all of the 3 glycoforms were sialylation on S296, while only 3 of the 5 glycan structures were sialylation on S271 and S341 (shown in Figure 2F), indicating the heterogeneity of Oglycosylation. The spectra of the 5 glycans linked onto S341 were shown in Figure 2A-E. Additionally, we also presumed that the site of T334 was not glycosylated in our fetuin sample

ACS Paragon Plus Environment

Page 5 of 9

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

by the comparison of glycan compositions linked to 337 VGQPSIPGGPV347 and 332GKT-PIVGQPSIPGGPV347. Thus only the glycan compositions linked to T280 and S282 in fetuin-A could not be determined. O-GalNAc glycosites iden-

tified in this study covered all of the glycosites of fetuin-A in Uniprot database (Table S2), which indicated that the identification with in silico deglycosylation was highly sensitive and confident for complex biological samples.

Figure 2. MS2 of five O-glycoforms identified on the same site of S341 in bovine fetuin: (A) HexNAc, (B) (HexNAc)1(Hex)1, (C) (HexNAc)1(Hex)1(NeuAc)1, (D) (HexNAc)1(Hex)1(NeuAc)2 and (E) (HexNAc)2(Hex)1(NeuAc)2; (F) O-glycan structures identified on the glycosites of S271, S296 and S341 in bovine fetuin, respectively; ( , GalNAc; , Gal; , NeuAc ( sialic acid); , GlcNAc). (Detail information on matched ions is shown in Figure S4). Proteomics analysis of O-GalNAc glycosylation in human serum The newly developed integrated strategy was then applied to analysis of the O-glycoproteome in human serum. For 1D LCMS/MS analysis, there were 47 and 54 intact glycopeptides identified from normal human serum in two runs, and 38 of the intact glycopeptides could be detected in both of runs, indicating the good repeatability of this method (shown in Figure S6). In order to comprehensively analyze Oglycosylation in serum, 2D LC-MS/MS was employed to detect the intact glycopeptides enriched from normal human serum. The initial analysis of the pooled human serum sample obtained identification of 5337 PSMs for the de-glycosylated

spectra of intact O-glycopeptides with FDR