Subscriber access provided by Macquarie University
Article
A New Searching Strategy for the Identification of O-linked Glycopeptides Jiawei Mao, Xin You, Hongqiang Qin, Chengye Wang, Liming Wang, and Mingliang Ye Anal. Chem., Just Accepted Manuscript • DOI: 10.1021/acs.analchem.8b04184 • Publication Date (Web): 25 Feb 2019 Downloaded from http://pubs.acs.org on February 27, 2019
Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.
is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.
Page 1 of 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
1
A New Searching Strategy for the Identification of O-linked
2
Glycopeptides
3 4
Jiawei Mao#1,2, Xin You#1,2, Hongqiang Qin1,2, Chengye Wang3, Liming Wang3, and Mingliang Ye*1,2
5 6
1
7
2
8 9
3Division
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Science, Dalian, 116023, China University of Chinese Academy of Sciences, Beijing 100049, China
of Hepatobiliary and Pancreatic Surgery, Department of Surgery, The Second Hospital of Dalian Medical University, Dalian, Liaoning 116023, China
Abstract For the analysis of homogenous post-translational modification such as protein phosphorylation, acetylation, setting a variable modification on the specific residue(s) is applied to identify the modified peptides for database searching. However, this approach is often not applicable to identify intact mucin-type O-glycopeptides due to the high microheterogeneity of the glycosylation. Because there is virtually no carbohydrate-related tag on the peptide fragments after the O-glycopeptides are dissociated in HCD, we find it is unnecessary to set the variable mass tags on the Ser/Thr residues to identify the peptide sequences. In this study, we present a novel approach, termed as O-Search, for the interpretation of O-glycopeptides’ HCD spectra. Instead of setting the variable mass tags on the Ser/Thr residues, we set variable mass tags on peptide level. The precursor mass of the MS/MS spectrum was deducted by every possible summed masses of O-glycan combinations on at most three S/T residues. All the spectra with these new precursor masses were searched against protein sequence database without setting variable glycan modifications. It was found that this method had much decreased search space and had excellent sensitivity in the identification of O-glycopeptides. Compared with the conventional searching approach, O-Search yielded 96%, 86% and 79% improvement in glycopeptide spectra matching, glycopeptide identification and the peptide sequence identification, respectively. It was demonstrated that O-Search enabled the consideration of more glycan structures and was fitted to analyze microheterogeneity of O-glycosylation.
26
ACS Paragon Plus Environment
Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70
Introduction Protein glycosylation is a highly heterogeneous post-translational modification (PTM), which mainly includes the N- and O-linked glycosylation. It participates in various biological processes such as signal transduction, cell-cell adherence and cell division1,2. The aberration of glycosylation is associated with many diseases such as cancers and neurodegenerative diseases3,4. It was reported that the glycan structures on one site varied among different samples and the site specific glycoform could be the potential biomarker for some diseases3. Therefore, the global profiling of site-specific glycans is attracting more and more attention recently5,6. Rapid progress is seen for the analysis of site specific N-glycoforms. A few excellent approaches are developed to decipher the MS/MS spectra of intact glycopeptides. For example, in 2014, Cheng et al. presented a twin spectra approach, by combining the spectra of intact and deglycosylated peptides, to identify intact N-glycopeptides which revealed 1769 site-specific N-glycans on 453 glycosites from HEK 293T cells7. Instead of using two spectra, Eshghi et al. introduced a novel algorithm named “GPQuest” in 2015 for automatic identification of intact glycopeptides from its HCD (higher-collision energy dissociation) spectra based on spectral library matching8. By using this algorithm, 769 unique intact N-glycopeptides were identified from 1008 glycan-containing spectra for LNCaP cells. Recently, Zeng et al.9 developed a software (pGlyco) for the identification of intact N-glycopeptides with higher data interpretation efficiency and 10,009 distinct site-specific N-glycans on 1988 glycosites from 955 glycoproteins in five mouse tissues were identified. Some of these approaches were also applied to identify intact O-glycopeptides with limited success10. Up to now, the identification of intact mucin-type O-glycopeptides is still challenging. For the analysis of homogenous PTM such as protein phosphorylation, acetylation, setting a variable modification mass, e.g. 97.98 Da for phosphorylation and 42.01 Da for acetylation, on the specific residue(s) is applied to identify the modified peptides for database searching when the restricted search scheme is applied. However, this approach is often not applicable to intact mucin-type O-glycopeptides as the search space will expand exponentially when multiple O-glycans are considered. The big search space will increase random matches, which will result in poor peptide identification sensitivity and high false positive identification rate11. Mass tolerant search (Open search)12 and error tolerant search13 could identify modifications without setting variable modifications, they are efficient for discovering new modifications but have relatively lower sensitivity for the identification of specified modifications when compared with the restricted search. To enable sensitive identification of O-glycopeptides by using restricted search, simplification of the O-glycan structures prior to LC-MS/MS analysis is often performed. One excellent example is the SimpleCell technique, in which the O-glycan elongation pathway in living cells was truncated to simplify all the mucin-type of O-glycans into GalNAc and NeuAc-GalNAc14. Thereby, the identification of O-glycopeptides could be achieved by setting two variable masses on the S/T residues as in the conventional database searching. Almost 3000 O-glycosites over 600 O-glycoproteins were identified from 12 human cell lines15, indicating the excllent performance of this approach. However, the SimpleCell technique is only applicable to analyze the O-glycoproteome of cultured cell lines. Recently, we developed an acid-assisted glycoform simplification strategy to analyze the mucin-type O-glycosylation in human serum16. An obvious drawback for these approaches is the loss of the information of heterogeneous O-glycans. To obtain this information, we previously developed an in-silico deglycosylation strategy (ArMone) to identify intact O-glycopeptides17. However, the presence of Y0 ion is
ACS Paragon Plus Environment
Page 2 of 15
Page 3 of 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
mandatory to make a successful in-silico deglycosylation of the spectra. This hampered the identification of sequences from the spectra without Y0 ion. Orbitrap based mass spectrometers are the main workhorses for glycoproteomics analysis. ETD is helpful for the localization of glycosites, however it is more expensive and not available in the Q Exactive machine. Instead, HCD is more popular for the analysis of glycoproteome. Compared with CID in Orbitrap, HCD records both precursor ions and fragment ions with high mass accuracy, which makes the fragment ion matching more confident. More importantly, HCD provides more fragment ions especially b, y-ions for peptide backbone sequencing due to the higher collision energy. A typical HCD spectrum of O-glycopeptide was given in Figure 1A. Three types of fragment ions were observed: the fragments from the glycans (B ions or oxonium ions), the fragments from the peptide backbones (Y0, b/y ions) and the fragments derived from the loss of B-ions from the intact glycopeptides (Y1, Y2 …ions). Sufficient numbers of peptide fragments carrying PTM such as phosphorylation are required to determine the modified sites in conventional database searching strategy. However, such fragment ions to determine the glycosites, i.e. the peptide fragments carrying the carbohydrate-related fragments, were not observed in the spectrum. According to our observation, these ions were rarely observed in the HCD spectra of other O-glycopeptides either. This means that the localization of glycosites to specific S/T residue is a mission impossible for most of the HCD data due to the lack of evidence. Because of the high diversity of the O-glycans, the conventional searching approach by setting multiple variable mass tags on the S/T residues has huge search space and is not applicable to study the microheterogeneity of O-glycosylation. Since there is virtually no carbohydrate-related tag on the peptide fragments, it is unnecessary to set the variable mass tags on the S/T residues to identify the peptide sequences from the HCD spectra of intact glycopeptides. Herein, we present a novel approach called O-Search for the interpretation of O-glycopeptides’ HCD spectra. Instead of setting the variable mass tags on the S/T residue, we set variable mass tags on peptide level. It was found that this method has much decreased search space and has excellent identification sensitivity in the identification of O-glycopeptides.
100 101 102 103 104 105 106 107 108 109
HILIC Enrichment of O-GalNAc Glycopeptides The tryptic peptides were dissolved in GlycoBuffer 2 (pH=7.5) and PNGase F was added to release the N-linked glycans at 37 °C overnight. The de-N-glycosylated sample were desalted and subjected to O-GalNAc glycopeptide enrichment by using HILIC tip materials according to our previous reports18. Briefly, the de-N-glycosylated digest was re-dissolved in the loading buffer (80% ACN/1% TFA), and 60 μL solution equivalent to 50 μL serum digest was pipetted into a HILIC tip with 5 mg click-maltose materials. After capturing, the HILIC tip was washed with 60 μL of loading buffer once and 20 μL of loading buffer twice. Finally, the captured O-glycopeptides were eluted with 100 μL of 30% ACN/1% TFA for twice. And the elution was dried for high pH RPLC fractionation (Supporting information) or direct RPLC-MS analysis.
110 111 112 113
LC-MS/MS Analysis of O-GalNAc Glycopeptides The analysis of enriched human serum O-glycopeptides was performed by using Q-Exactive mass spectrometer equipped with a nanospray ion source and a U3000 RSLC nano-system (Thermo, San Jose, CA, USA). The native glycopeptides were dissolved in 0.1% FA/water, and
Materials and Methods
ACS Paragon Plus Environment
Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
114 115 116 117 118 119 120 121 122 123 124
subsequently loaded onto a trap column (4 cm length, 200 μm i.d.) packed with C18 AQ beads (5 μm, Daison, Osaka, Japan). The peptides were separated by a capillary analysis column (15 cm length, 150 μm i.d.) packed with C18 AQ beads (1.9 μm, Dr. Maisch, Germany). For the analysis of glycopeptides from human serum, a 120 min gradient method was employed: loading at 3% Buffer B (80% ACN/20% H2O/0.1% FA) for 10 min, from 3 to 7% Buffer B for 5 min, from 7% to 45% Buffer B for 88 min, from 45% to 90% Buffer B in 2 min. The separation system was equilibrated by Buffer A (98% H2O/2% ACN/0.1% FA) for 15 min. The full mass scan was acquired from 400 to 2000 with a resolution of 70,000. In addition, data-dependent acquisition (DDA) mode was employed for the identification of glycopeptides, with the top 15 precursors subjected to MS/MS analysis with a resolution of 17,500. The HCD mode were used for MS2 fragmentation with stepped NCE (normalized collision energy) of 23, 25 and 27.
125 126 127 128 129 130 131 132
Data Analysis Mass spectrometric data acquired by Q-Exactive mass spectrometer (.raw) were converted into mzML format using Proteome Discoverer (version 1.4.0.0). Mascot (version 2.30) was used for error tolerant search19. MSFragger (version 20181110) was used for open search20. MS-GF+ (version 20180605) was used for traditional restricted search21, which was done by setting the three most abundant O-linked glycans in human serum, NeuAc-Gal-GalNAc (~60%), NeuAc-Gal-(NeuAc-)GalNAc (~20%) and Gal-GalNAc (~10%)22, as variable modifications on amino acid Ser and Thr. The same glycans were applied to O-Search for comparison. ArMone
133 134 135 136
uses a glycan database with 12 glycans, O-Search did another analysis with these 12 glycans. The error tolerant search and the open search were done with similar settings except these glycans were not set as variable modifications. The detailed parameter settings were provided in the supporting information.
137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156
Results and Discussion The workflow of O-Search. This new searching scheme, termed as O-Search, is designed to identify O-linked glycopeptides from high-energy collision-induced dissociation (HCD) MS2 spectra. As shown in Figure 2, the analysis involves the following steps: (i) filtering the spectra to keep only glycopeptide spectra; (ii) generating in silico deglycosylated spectrum; (iii) peptide sequence identification. In this study, the presence of either of the two abundant B-ions, HexNAc-2H2O-CH2O+ (m/z 138.05) and HexNAc+ (m/z 204.09) (Figure S1B), were used to filter the raw data to keep the glycopeptides’ spectra for the first step. The removal of the non-glycopeptides’ spectra by pre-filtering would reduce the database search time and possibly produce more PSMs because non-glycopeptides’ spectra can only generate false positive glycopeptide spectrum matches and thus could increase the score threshold determined by the target-decoy analysis23. In our dataset, about 50% of acquired spectra were determined to be of glycopeptides for a LC-MS run. A typical spectrum for O-glycopeptide was given in Figure 2A. The two abundant B-ions as well as some low abundance B ions were presented. To facilitate the peptide sequence identification, all the B-ions were removed from the spectrum. A much clean spectrum was generated as shown in Figure 2B. O-GalNAc glycosylation is often simultaneously occurred on multiple nearby S/T residues. For this reason, a tryptic glycopeptide could carry multiple O-glycans. In this study, maximum of three glycosylation sites are considered for every peptide sequence. The masses for all possible combinations of glycans are generated. These masses are termed as glycan delta masses. For each
ACS Paragon Plus Environment
Page 4 of 15
Page 5 of 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200
Analytical Chemistry
combination, a new spectrum is generated by deducing the glycan delta mass from the precursor mass. In the meantime, the corresponding Y-ions are removed from the spectrum. In this way, a glycopeptide spectrum was converted into a series of in silico deglycosylated spectra with different precursor masses. The number of in silico deglycosylated spectra corresponding to a glycopeptide spectrum depends on the number of O-glycan structures considered for searching. If only the three high abundant O-linked glycans in human sera, i.e., NeuAc-Gal-GalNAc (~60%), NeuAc-Gal-(NeuAc-)GalNAc (~20%) and Gal-GalNAc(~10%)22, were consider, then there are 19 types of combinations. Because some combinations yielded the same glycan composition, the 19 types of combinations yielded only 15 different masses (Table S1). Thus, each glycopeptide spectrum will generate 15 in silico deglycosylated spectra. Much more spectra will be generated if more types of O-glycan structures are considered. For example, when the types of O-linked glycan structures increase to 12, the number of glycan combinations and delta masses increase to 454 and 184, respectively (Figure 1B). For this case, each raw spectrum will have 184 deglycosylated spectra. All of the new generated spectra were searched with the traditional database search method without setting any glycosylation variable modification. After searching, only the top matched peptide identification is kept for each spectrum. MS-GF+ is a universal database search tool optimized for a variety of spectral types21. As an open source application, it can be easily integrated in our platform to perform database searching for peptide sequence identifications. Peptide Spectrum Matches (PSM) are filtered with minimum MS-GF:RawScore 15, maximum MS-GF:EValue 0.01. For all the in silico deglycosylated spectra of each spectrum, only the best PSM is kept. These PSMs are regarded as Glycopeptide spectrum matches (GPSM). GPSM level false discovery rate (FDR) was kept to be