gFinder: A Web-Based Bioinformatics Tool for the Analysis of N

Aug 30, 2016 - gFinder: A Web-Based Bioinformatics Tool for the Analysis of. N‑Glycopeptides. Ju-Wan Kim,. †,‡. Heeyoun Hwang,. §. Jong-Sun Lim...
0 downloads 10 Views 1MB Size
Subscriber access provided by CORNELL UNIVERSITY LIBRARY

Article

gFinder: a web-based bioinformatics tool for the analysis of N-glycopeptides Ju-Wan Kim, Heeyoun Hwang, Jong-Sun Lim, HyoungJoo Lee, Seul-Ki Jeong, Jong Shin Yoo, and Young-Ki Paik J. Proteome Res., Just Accepted Manuscript • DOI: 10.1021/acs.jproteome.6b00772 • Publication Date (Web): 30 Aug 2016 Downloaded from http://pubs.acs.org on September 5, 2016

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Proteome Research is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

JPR SI:pr-2016-00333q.R2

gFinder: a web-based bioinformatics tool for the analysis of N-glycopeptides Ju-Wan Kim1,2 , Heeyoun Hwang3, Jong-Sun Lim2, Hyoung-Joo Lee2, Seul-Ki Jeong2, Jong Shin Yoo3 and Young-Ki Paik1,2* 1

Graduate Program in Functional Genomics, College of Life Sciences and Biotechnology, Yonsei

University, Seoul, 03722, Korea, 2Yonsei Proteome Research Center, Seoul, 03722, Korea, 3Korea Basic Science Institute, Ochang, 28199, Chungbuk, Korea

*To whom correspondence should be addressed to: Room 423, Industry-Academic Cooperation Building, Yonsei Proteome Research Center, Yonsei University, 50 Yonsei-Ro, Sudaemoon-ku, Seoul, 03722, Korea Phone: +82-2-2123-4242; Fax: +82-2-393-6589; E-mail: [email protected]

1 Environment ACS Paragon Plus

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 33

ABSTRACT

Glycoproteins influence numerous indispensable biological functions, and changes in protein glycosylation have been observed in various diseases. The identification and characterization of glycoprotein and glycosylation sites by mass spectrometry (MS) remain challenging tasks, and great efforts have been made toward developing proteome informatics tools that facilitate the MS analysis of glycans and glycopeptides.

Here, we report on the development of gFinder, a web-based

bioinformatics tool that analyzes mixtures of native N-glycopeptides that have been profiled by tandem MS. gFinder not only enables the simultaneous integration of collision-induced dissociation (CID) and high energy collisional dissociation (HCD) fragmentation, but also merges each spectrum for high-throughput analysis. These merged spectra expedite the identification of both glycans and N-glycopeptide backbones in tandem MS data using the glycan database and proteomic search tool (e.g., Mascot). These data can be used to simultaneously characterize peptide backbone sequences and possible N-glycan structures using assigned scores. gFinder also provides many convenient functions that make it easy to perform manual calculations while viewing the spectrum on-screen. We used gFinder to detect an additional protein (Q8N9B8) that was missed from the previously published dataset containing N-linked glycosylation.

For N-glycan analysis, we used the

GlycomeDB glycan structure database, which integrates the structural and taxonomic data from all of the major carbohydrate databases available in the public domain. Thus, gFinder is a convenient, high-throughput analytical tool for interpreting the tandem mass spectra of N-glycopeptides, which can then be used for identification of potential missing proteins having glycans. gFinder is available publicly at http://gFinder.proteomix.org/.

2 Environment ACS Paragon Plus

Page 3 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

KEYWORDS glycoprotein, N-glycan, mass spectrometry, chromosome-centric human proteome project, C-HPP, bioinformatics.

INTRODUCTION Glycosylation, as one of the most common post-translational modifications (PTMs), covers almost half of all gene products.1,2 Protein glycosylation is related to various biological functions such as cell growth and development, immune response, microbial-host interactions and intercellular adhesion.3 Moreover, differential glycosylation offers an opportunity to identify new biomarkers for various diseases.4 For example, abnormal forms of glycoprotein can easily be seen in patient specimens of cancer, metabolic diseases and inflammation.5–7 Human glycoproteins contain two major types of glycosylations: N- and O-linked.8,9 N-linked glycosylation is usually found at the carboxy-amido nitrogen of the asparagine (Asn or N-) residues in the consensus tri-peptide sequence NXS/T, where X can be any amino acid except proline. Approximately 90% of glycoproteins are known to carry an N-linked glycan, which is characterized as a high-mannose, a hybrid and a complex type.10 Unlike DNA replication and protein transcription, glycosylation is a “non-templatedriven process,”11 and it remains difficult to analyze and predict the types of oligosaccharides attached to proteins. All N-linked glycans are derived from the precursor Glc3Man9GlcNAc2, which is attached to the protein during the post-translation process in the Golgi apparatus. Many of the residues on this initial building block are eventually removed, leaving two molecules of GlcNAc and three mannoses for a well-defined core structure.12 O-glycans are usually produced by the sequential enzymatic transfer of individual monosaccharides, starting with the glycosylation of hydroxyl groups

3 Environment ACS Paragon Plus

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 33

of ‘S’ and ‘T’ side chains.13 Unlike N-linked glycans, O-glycans are attached with no consensus sequence and they are relatively diverse in mammalian structures.14 There are two common mass spectrometry (MS)-based approaches to the characterization of Nglycosylation. One approach, involving the identification of a protein’s glycans after they are released from glycoproteins through PNGase F treatment,15–17 has limitations when more than one glycosylation site is present at the glycoprotein because it fails to correlate glycan composition with the different attachment sites.

The other approach, characterizing of intact glycopeptides without

any glycosidase,12,18 is generally advantageous because it provides information about both glycan composition and its attachment site.

Despite this advantage, there are still limitations.

The

identification and characterization of glycopeptides are highly complex and time consuming tasks. To resolve these problems, many bioinformatic algorithms have been developed. For example, GlycoMod, freely accessible from the ExPASy server, predicts oligosaccharide compositions corresponding to a glycopeptide mass if either the peptide or protein sequence is known.19 Although GlycoMod was created as a first step tool and is highly cited, it only uses single MS data and thus cannot handle multiple charged precursors.11,20,21 GlycoMiner is a software tool that uses MS/MS spectra to identify and characterize the glycopeptide parts of both sugars and peptides.22 This program is useful for searching glycopeptide composition without peptide and glycan portion information. However, it has only been optimized for qTOF data and requires a separate process that converts MS/MS data to singly charged ions.23 Moreover, it appears to only be efficient in cases of low signal-to-noise ratios, as it only accepts high-quality spectra.20,22 Another software program, Byonic has a comprehensive scoring algorithm and a user-friendly graphical user interface (GUI),24 but is not freely available or open-source, which limits its use.25

Its algorithm also requires high

resolution and accuracy, which limits its applicability across different MS platforms. Given the

4 Environment ACS Paragon Plus

Page 5 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

various limitations of these current tools, several tools must be used for each analysis, which requires tedious and cumbersome manual curations. Thus, it is necessary to develop a better method for the efficient fragmentation of N-glycopeptides and more convenient customized tools for analyzing the datasets.20,21 Although collision-induced dissociation (CID) has been used to analyze protien peptides and PTMs , it has some limitations when used to analyze glycans with complex branched structures and the existence of isobaric structures. Recently, a mathematical model was developed to predict the CID spectra of N-glycopeptides for reliable glycopeptide identification.26 Although CID is effective in characterizing glycan moieties, it does not provide sequence related information about the peptide backbone. Therefore, multiple tandem approaches are usually required to fully characterize the structures of intact glycopeptides.27

Given that higher-energy C-trap dissociation (HCD)

fragmentation not only facilitates the identification of glycan oxonium ions at low m/z, but also provides b/y ions for sequencing the peptide backbone,28,29 we used both CID and HCD spectra simultaneously to design gFinder. In the Chromosome-centric Human Proteome Project (C-HPP), which was established to annotate all human proteins including the three major PTMs (glycosylation, phosphorylation and acetylation), one of the most obvious barriers is the presence of at least 2,949 ‘missing’ proteins (neXtProt 1-15-2016 release); that is, gene products yet to be annotated using biological samples with sufficient MS evidence at the protein level.30 It is important to characterize the three major PTMs for each protein.31 In particular, these modifications make it difficult to identify missing proteins those lacking protein evidence using routine MS strategies. Here, we report on the development of gFinder, which not only efficiently analyzes N-glycopeptides by merging MS/MS data containing both CID and HCD, but is also convenient for identifying backbone sequences by automatically detecting them using a Mascot search when there is no protein

5 Environment ACS Paragon Plus

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 33

information. We also note the possibility that gFinder facilitates the analysis of N-glycoproteins contained in missing proteins encoded by human chromosomes.

MATERIALS AND METHODS Program overview We wanted to design gFinder for the high-throughput identification of N-glycopeptides using Mascot and the GlycomeDB. Figure 1A depicts a workflow for glycopeptide analysis that can be implemented through the gFinder function. gFinder is largely composed of four functional modules. First, the data obtained from LC-MS/MS are merged with the combined spectra of CID and HCD. Second, those individual spectra were combined together into one spectrum. The merged spectral data are then divided into spectra of glycopeptides and non-glycopeptides. Third, a Mascot search is conducted to obtain protein information using non-glycopeptide spectra. Fourth, glycopeptides are identified and a backbone sequence search is conducted among glyco-spectra using protein information obtained from the Mascot search.

Then, those spectra with backbone sequence

information are used to predict possible glycan structures through a GlycomeDB search. Finally, there is a visualization section where gFinder shows the backbone sequences and glycosylation sites, then predicts glycan structures with corresponding spectra.

These processes are automatically

executed. As there is currently no good ways to calculate FDR for these types of raw data, which lack both true and false datasets, we do not attempt the task. Further, because merging spectra from HCD and CID seems uncommon, an N-glycopeptide decoy database was not established. We only

6 Environment ACS Paragon Plus

Page 7 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

describe the false positive rate of glyco-marker testing (see details in “Selection of glyco-spectra for gFinder analysis” ‘Result’ section). Datasets. The mass spectral data used in this work were those previously published.29 Additional MS/MS analysis data of IgG protein were provided by Prof. JS Yoo, Korea Basic Science Institute (KBSI). To identify the N-glycosylation of missing proteins, the datasets used in this study were obtained from the ProteomeXchange database (PXD). We used the dataset obtained from the HILIC-enriched N-glycopeptides of human plasma with both HCD and CID-MS/MS fragmentation (published by Park et al., PXD003227).32 Proteome search. Raw files from the MS were converted to ‘.mgf’ using RawConverter which links the accurate precursor mass-to-charge (m/z) value with the tandem mass spectrum.33

Mascot

(version 2.2) was used for protein identification, whereas peptides were identified using the UniProt Human database (version 2015_08). The Mascot database search criteria were as follows: taxonomy all entries; carboxyamidomethylated (+57) at cysteine for fixed modifications; oxidized (+16) at methionine and HexNAc (+203) at asparagine residues for variable modifications; one maximum allowed missed cleavage; 20 ppm MS tolerance and a 0.1 Da MS/MS tolerance. Only peptides resulting from trypsin digests were considered in this work. Method for merging two spectra into one spectrum and spectra analysis. Recently, multiple tandem approaches are usually required to fully characterize the structures of intact glycopeptides.27 Therefore, gFinder was designed to merge CID and HCD spectrum into single spectrum. By combining MS spectra from CID and HCD into a single spectral dataset we were able to process them more easily to obtain the results.

Individual spectra were simply

combined together into one spectrum and all peaks were sorted out according to their m/z values.

7 Environment ACS Paragon Plus

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 33

Concept of how the CID and HCD spectra were merged into single spectrum and subsequent data processing are outlined in Fig. 2B. Merged spectra into an oxonium ion filtering module was used for distinguishing glycol spectrum from non-glyco spectrum. If a spectrum is a single type of fragmentation, gFinder uses the original spectra without merging spectra. It can also automatically distinguish between glyco and non-glyco spectra before a Mascot search is performed for those spectra that have gone through preprocessing. We used oxonium ion peaks to distinguish glycopeptide spectra from MS/MS data (Table 1). As summarized in Table 1, those oxonium ion peaks have been used for manual searches.26,29,34 Using these ions as glyco-markers, we chose a glyco spectrum if ions containing 20% of maximum intensity under 700m/z were present in more than 3 out of 9 ions. Of these ions, HexNAc (204.0866) contained 50.3% of the glycopeptide spectrum.35 In the case of the y2 ion, which was obtained from the Gly-Lys C-term, its m/z was 204.1343, with only 0.048 m/z difference compared with HexNAc.36 To better distinguish between these ions, 20 ppm of tolerance was provided. For those precursors containing N-glycosylation, both Y-series ion and glycan searches were conducted. From the list of proteins identified by Mascot search or UniProt accession numbers, N-site was identified by the in silico tryptic digest of corresponding protein sequences. We designated those peptides with > 500 M.W. as backbone candidates for glycopeptide. We set the missed cleavage as 1. Because N-glycopeptide contains a core composed of 2 HexNAc + 3 Man, when it was ionized, the MS/MS spectrum had a Y-series ion including Y0 (peptide only), Y1 (peptide + GlcNAc), Y2 (peptide + 2GlcNAc) and a cross-ring cleavage ion of GlcNAc that lost 120.0423 Da from Y1.29,37 In the gFinder system, a backbone peptide candidate is only designated if the backbone candidate has at least two Y-series ions in the MS/MS spectrum. We calculated the theoretical molecular weight of the N-glycopeptide backbone peptides from the Y-series ion search.

8 Environment ACS Paragon Plus

Page 9 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

gFinder uses the GlycomeDB for glycan prediction. Because the de novo approach takes a long time and produces data with low accuracy, we used a database search approach, assuming that one peptide had only one glycan. Thus, the backbone sequence mass subtracted from the precursor ion mass of the glycopeptides was the glycan mass, and gFinder searched its mass from the GlycomeDB (tolerance of 0.02 Da). If the ions for 274.0874, 292.1032 and 657.2354 can be found, it indicates that NeuAc may be present in the glycan. In the same way, if the ions for 350.0545, 512.1953 and 803.2894 can be found, it indicates the presence of fucose in the glycan. If certain spectra have these ions, gFinder searches databases with NeuAc and fucose options; otherwise it uses peptide mass. Thus, gFinder searches glycan by filtering this information when a user wants to search a database. Using these identified glycan lists, the theoretical and experimental spectra and then cross-compared to help end users easily calculate glycan structures in a way that mimics manual calculation. We also searched the mass of the constructed theoretical tryptic N-glycopeptide database from the UniProt Human database (tolerance of 10 ppm). Then, the spectrum was matched with a b/y ion pattern from those sequences having similar molecular weights. Because of Y-series ion and b/y ion peaks were mainly found in HCD spectra, we used 0.1 da tolerance, and glycan fragmentation peaks were mainly found in CID spectra, we used 0.5 da tolerance. Signal peak selection for spectrum match. To obtain an ion peak match with a mass spectrum, it is necessary to filter signal peaks. To this end, we selected the top 50 peaks with high intensity and then picked up the signal peaks, which appeared larger than the intensity of the average values, plus 2-fold multiplies of the standard deviation in a 30 m/z window. We compared these signal peaks with the theoretical spectra of glycopeptides.

9 Environment ACS Paragon Plus

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 33

Glycan database. GlycomeDB38, which was used as the glycan database (version 1.0.0), integrates the structural and taxonomic data of all major carbohydrates. The N-glycan database, which contains 7,959 N-glycans, was constructed by filtering out those with N-glycan core motifs from the 33,283 entries present in the GlycomeDB downloaded for this work.37

This N-glycan database was

modified by converting to glycoCT39 xml format through a two-step process.

To this end,

monosaccharide mass was added to each entry of the GlycomeDB, resulting in a total mass that was stored in the database. The theoretical spectrum of glycans, created and stored in the database to enhance the search speed, was prepared from the glycoCT format based on the residue IDs, as follows. First, we added up the smaller IDs of residues. Then, if a branch was present, a theoretical spectrum was prepared by adding all of the linked residues, followed by the addition of the next smaller ID of residue. Among the 7,959 glycans, 4,310 spectra had no overlap in the theoretical spectra. The spectra from the Homo sapiens was the largest, followed by that from the Bos taurus and Sus scrofa among the 2,735 N-glcyan structures that contained taxa (Figure S1). There were 695 N-glycans with more than 2 taxa. Construction of the web interface. The web interface was realized using Apache (2.2.4, www.apache.org) and PHP (5.2.11, www.php.net), which allowed users to query the results from the database to be displayed in HTML format. The spectrum viewer was generated using the JavaScript plugin Lorikeet (uwpr.github.io/Lorikeet/) modification.

The database was constructed using

MySQL (5.0.45, www.mysql.com), which was linked to Apache and PHP, enabling query results to be received and reported through the web.

RESULTS AND DISCUSSION

10 Environment ACS Paragon Plus

Page 11 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Verification of CID and HCD spectra merging We used both CID and HCD spectra for glycopeptide identification. gFinder has a module of CID and HCD spectra merging. We tested this merging method using human IgG MS/MS data. As summarized in Table 2, use of merged spectra showed more matched numbers of y-ion, glycan DB search and b/y ion compared to those in a single spectrum of CID and HCD. Because of the dominant oxonium ion peaks in HCD specta were used to distinguish glyco-peptide spectra, there are no glyco spectra in CID spectra only. Glyco spectra filtering and Y-ion matching are used peaks from HCD spectra. Therefore, merged spectra and HCD only spectra have the similar results. Table 3 summarizes the glycan list that was obtained from the IgG analysis by merging HCD and CID spectra by gFinder, resulting in identification of 30 glycopeptides. In addition, 25 of 30 glycan structures were fucosylated, indicating that the merged spectra seem more efficient for identification of glycopeptides. Selection of glyco-spectra for gFinder analysis It has been reported that the combination of spectra from both HCD and CID obtained from peptide precursor analysis is more efficient in identifying glyco-proteins.29 gFinder contains a unique function that automatically merges both spectra (HCD and CID) for the same precursors. To test whether this method was valid, we identified only 36 glyco spectra (false positive rate: 0.034%) out of 104,720 when glyco markers were tested using an ISB Human plasma non-glyco spectral library (2013-07) downloaded from the Peptide Atlas.40 We also attempted to identify N-glyco precursors from 3,773 precursors by testing with a vitronectin standard. To this end, 998 spectra were used to search for Y-ions in the backbone peptide search module. As a result, 843 cases (84.47 %) were found to contain both 204.08 and 366.14 oxonium ions typically used for manual calculation. There

11 Environment ACS Paragon Plus

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 33

were 152 (15.23 %) and 1 (0.1 %) cases featuring spectra with only 204.09 or 366.14 oxonium ions, respectively. Although this method is quite similar to the current programs or manual method, we were able to easily search the glycopeptide spectra in an automatic manner using the common markers. When this result was further tested by decoy library, we found a very low false positive rate, indicating that 998 spectra seems accurate. Prediction of glycan structure by gFinder To obtain the list of peptides, a Mascot search was performed for the non-glyco spectra where less than three oxonium ions were found in the MS/MS data. This suggests that there is no glycosylation in such peptide spectra. From the Mascot search of the standard sample using non-glyco spectra, 188 spectra were matched by the vitronectin standard.

In addition, 111 spectra were matched to

glycopeptides from 998 glyco spectra. For those peptides obtained from the serum samples that had been previously processed by hydrophilic interaction liquid chromatography (HILIC), because nonglyco spectra have not been the majority, those glyco spectra were assigned UniProt accession numbers using the manual option instead of conducting a Mascot search. The peptide backbone sequence search of plasma samples revealed that 56 out of 1122 glyco spectra were identified in healthy controls, whereas 54 out of 1150 were identified in HCC samples. Table 4 summarizes the glycan list that was obtained from the vitronectin standard by gFinder. Vitronectin is a multi-functional glycoprotein present in the extracellular matrix and plasma.41,42 It has three N-glycosylation sites at Asn-86, Asn-169 and Asn-242.43 The changed N-glycosylation of vitronectin is known as a candidate biomarker of hepatocellular carcinoma (HCC).44 Using previous data from the MS analysis of vitronectin ,29 we wanted to demonstrate that gFinder can be a versatile tool for analyzing complex N-glycan molecules. To this end, 30 glycan structures were identified in

12 Environment ACS Paragon Plus

Page 13 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

which 6, 14 and 10 glycans were predicted to localize at Asn-86, Asn-169 and Asn-242, respectively. In addition, 10 of 30 glycan structures were fucosylated. Notably, 8 out of the 30 glycopeptides were newly identified by gFinder (Figure S2). Furthermore, 5 and 9 glycopeptides were identified from the sera of healthy volunteers and HCC patients, respectively (Tables S1 and S2). Figure 2AB and Figure S3 show that only 4 glycopeptides (1 from Asn-86 and 3 from Asn-169) were found in HCC sera. All of these identified glycopeptides were validated manually. Thus, gFinder enables not only the identification of the various glycan structures present in each N-glycosylation site to be located at one protein, but also finds those glycopeptides present in different types of samples (serum). With this tool, it is possible to study serologic disease biomarkers and biological pathways. Characterization of N-glycopeptides of previously detected missing proteins In our characterization of the N-glycosylation of missing proteins, we used MS/MS data (PXD003227) obtained from depleted human serum. In the list of N-glycosylated missing proteins obtained from Uniprot, 852 contained N-glycosylation (28.89%, release 2016_02). This means that the missing proteins have a higher N-glycosylation ratio than the 3,518 N-glycoproteins (20.57%) of 17,106 annotated human proteins. Figure S4A shows that the missing proteins contained, on average, more N-glycosylation in each chromosome. The distribution of missing proteins that contained Nglycosylation in each chromosome is shown in Figure S4B. About 75.1% of the N-glycosylated missing proteins belonged to PE2 group proteins with transcript expression evidence, whereas 24.9% of the total missing proteins belonged to PE3 group proteins with homologous protein evidence in a related species. We used gFinder to analyze the N-glycopeptide list from the in silico digest of missing proteins and found the N-glycopeptide of Q8N9B8 (Figure 3). This protein (Q8N9B8) is “Ras-GEF domain-containing family member 1A” encoded in chromosome 10 and contains only transcript-level expression (PE2). We identified only N-glycosylation at Asn-371 of Q8N9B8, where

13 Environment ACS Paragon Plus

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 33

the backbone sequence is TALQGATQRSQMAN*SSR and exists only in the Swiss-Prot database. Table S3 shows b/y ion match score of glycopeptides having similar mass with this backbone sequence. TALQGATQRSQMANSSR was the best matched one by gFinder. Table S4 shows 4 matched b/y ion of this backbone sequence with MS/MS spectrum. It has HexNAc4Hex5NueAc2 of the bi-antennary glycan structure. This identified glycopeptide was validated manually.

This

observation shows that gFinder can detect N-glycosylated missing proteins present in human serum samples. Although this is not a case of directly identifying missing proteins, gFinder is proven to be useful for characterizing Y-series ions and the fragmentation of less characterized N-glycopeptides of missing proteins that have not been previously explored. Thus, gFinder can be used to screen more missing proteins that contain PTMs from previously published datasets. Visualization of glycopeptide spectra by gFinder gFinder helps the end user identify glycopeptides by directly calculating the spectrum that can be viewed using a modified Lorikeet, Javascript MS/MS spectrum viewer. Figure 4AB shows the gFinder interface. Users can select search options and gFinder shows the merged spectra of CID and HCD on the results screen, along with the backbone sequence and glycan structure. This unique system allows users to not only directly identify the b/y ions of glyco-spectrum and glyco-marker ions, but also search Y-series ions (Figure 4C). The results screen also shows the m/z values and charges of these peptide sequences, identified by gFinder using different colors: black for oxonium ions, brown for Y-series ions and green for the theoretical spectrum of a glycan candidate. More importantly, users can analyze glycopeptides by calculating mass through the view screen. In the left panel, default values are selected and easily manipulated (enlarged or reduced), allowing users to examine the search results.

14 Environment ACS Paragon Plus

Page 15 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

CONCLUSIONS In this work, we demonstrated that gFinder can analyze N-linked glycosylated peptide data by using software that enables the simultaneous integration of both CID and HCD fragmentation for high-throughput analysis. The merged spectra expedite the identification of both glycans and Nglycopeptide backbones from tandem MS data using the glycan database and proteomic search tool. Recent advances in glycobiology and glycomics have accelerated the development of bioanalytical and bioinformatic tools for studying glycan structure–function relationships. This has motivated us to create new tools for the rapid, automated and reliable analysis of glycopeptides. Glycopeptide analysis is far from easily incorporated into automated proteomic workflows.

Despite the

availability of various bioinformatics tools, end users need manual curation due to the limitations of each tool. Hence, gFinder provides many convenient functions that allow users to perform manual calculations while viewing the spectrum on-screen. Because gFinder helps users predict the potential structure of glycopeptide glycans, it is possible to get better results provided there is a way to more accurately produce the theoretical spectrum. gFinder was able identify successfully glycopeptides present in not only standard glycoproteins (e.g., IgG and vitronectin) but also in clinical samples (HCC patients’ plasma). Furthermore, gFinder can be applicable to find glycopeptide of missing protein. It is noted that the current HPP guidelines45 may require MS-based validation of additional peptides for this kind of glycoprotein or antibody detection to definitively verify their existence. We were not able to find other unique peptides because samples in this work were treated with HILIC. Leaving a possibility that this missing protein (Q8N9B8) may be ‘one hit wonders’. Lastly, this is a freely accessible web-based tool that aims to support scientists in analyzing the overall site

15 Environment ACS Paragon Plus

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 33

heterogeneity of glycopeptides, and its workflow is easily incorporated into a proteomics laboratory. This program is available at http://gfinder.proteomix.org/.

16 Environment ACS Paragon Plus

Page 17 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

ABBREVIATIONS Mass spectrometry (MS); post-translational modification (PTM); signal to noise ratio (S/N); graphical user interface (GUI); collision induced dissociation (CID); higher-energy C-trap dissociation (HCD); chromosome-centric human proteome project (C-HPP); hepatocellular carcinoma (HCC); multiple affinity removal column (MARC); hydrophilic interaction liquid chromatography (HILIC); Hexose (Hex); N-acetylglycosamine (HexNAc); N-acetylneuraminic acid (NeuAc); N-glycolylneuraminic acid (NeuGc).

ASSOCIATED CONTENT Supporting Information Table S1: Results of human plasma (normal) analysis by gFinder. Table S2: Results of human plasma (HCC) analysis by gFinder. Table S3. b/y ion match score of missing protein backbone sequence. Table S4. b/y ion match table of TALQGATQRSQMANSSR. Figure S1: Frequency distribution of carbohydrates in the top 10 taxonomic categories. Figure S2: Newly identified 8 N-glycopeptides of vitronectin standard. Figure S3: Differently identified N-glycopeptide from a HCC Human serum sample. Figure S4: (A) Ratio of N-linked glycoprotein in human chromosomes. (B) Distribution of missing proteins containing N-linked glycosylation present in human chromosomes (Uniprot release 2016_02).

ACKNOWLEDGMENTS

17 Environment ACS Paragon Plus

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Data were provided by Prof. JS Yoo, Korea Basic Science Institute (KBSI). This work was supported by a grant from the Korean Ministry of Health and Welfare (HI13C2098-International Consortium Project to Y.-K.P; HI16C0257 to Y-K. P).

REFERENCES (1)

Pan, S.; Chen, R.; Aebersold, R.; Brentnall, T. A. Mass spectrometry based glycoproteomics--from a proteomics perspective. Mol. Cell. Proteomics 2011, 10, R110.003251.

(2)

Apweiler, R.; Hermjakob, H.; Sharon, N. On the frequency of protein glycosylation, as deduced from analysis of the SWISS-PROT database. Biochim. Biophys. Acta 1999, 1473, 4–8.

(3)

Joenväärä, S.; Ritamo, I.; Peltoniemi, H.; Renkonen, R. N-Glycoproteomics - An automated workflow approach. Glycobiology 2008, 18, 339–349.

(4)

Kirmiz, C.; Li, B.; An, H. J.; Clowers, B. H.; Chew, H. K.; Lam, K. S.; Ferrige, A.; Alecio, R.; Borowsky, A. D.; Sulaimon, S.; et al. A serum glycomics approach to breast cancer biomarkers. Mol. Cell. Proteomics 2007, 6, 43–55.

(5)

Van den Steen, P.; Rudd, P. M.; Dwek, R. A.; Opdenakker, G. Concepts and principles of O-linked glycosylation. Crit. Rev. Biochem. Mol. Biol. 1998, 33, 151–208.

(6)

Gruppen, E. G.; Connelly, M. a.; Otvos, J. D.; Bakker, S. J. L.; Dullaart, R. P. F. A novel protein glycan biomarker and LCAT activity in metabolic syndrome. Eur. J. Clin. Invest. 2015, 45, 850–859.

(7)

Fernandes, C. L.; Ligabue-Braun, R.; Verli, H. Structural glycobiology of human α1-acid glycoprotein and its implications for pharmacokinetics and inflammation. Glycobiology 2015, 25, 1125–1133.

(8)

Spiro, R. G. Protein glycosylation: nature, distribution, enzymatic formation, and disease implications of glycopeptide bonds. Glycobiology 2002, 12, 43R – 56R.

(9)

Hart, G. W.; Copeland, R. J. Glycomics hits the big time. Cell 2010, 143, 672–676.

(10)

Kornfeld, R.; Kornfeld, S. Assembly of asparagine-linked oligosaccharides. Annu. Rev. Biochem. 1985, 54, 631–664.

(11)

Woodin, C. L.; Maxon, M.; Desaire, H. Software for automated interpretation of mass spectrometry data from glycans and glycopeptides. Analyst 2013, 138, 2793–2803.

(12)

Dalpathado, D. S.; Desaire, H. Glycopeptide analysis by mass spectrometry. Analyst 2008, 133, 731–738.

18 Environment ACS Paragon Plus

Page 18 of 33

Page 19 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

(13)

Zauner, G.; Kozak, R. P.; Gardner, R. a.; Fernandes, D. L.; Deelder, A. M.; Wuhrer, M. Protein O-glycosylation analysis. Biol. Chem. 2012, 393, 687–708.

(14)

Harvey, D. J. Proteomic analysis of glycosylation: structural determination of N- and Olinked glycans by mass spectrometry. Expert Rev. Proteomics 2005, 2, 87–101.

(15)

Patel, T.; Bruce, J.; Merry, A.; Bigge, C.; Wormald, M.; Jaques, A.; Parekh, R. Use of hydrazine to release in intact and unreduced form both N- and O-linked oligosaccharides from glycoproteins. Biochemistry 1993, 32, 679–693.

(16)

Goldberg, D.; Sutton-Smith, M.; Paulson, J.; Dell, A. Automatic annotation of matrixassisted laser desorption/ionization N-glycan spectra. Proteomics 2005, 5, 865–875.

(17)

Drake, P.; Schilling, B.; Gibson, B.; Fisher, S. Elucidation of N-glycosites within human plasma glycoproteins for cancer biomarker discovery. Methods Mol. Biol. 2013, 951, 307– 322.

(18)

Morelle, W.; Canis, K.; Chirat, F.; Faid, V.; Michalski, J.-C. The use of mass spectrometry for the proteomic analysis of glycosylation. Proteomics 2006, 6, 3993–4015.

(19)

Cooper, C. a; Gasteiger, E.; Packer, N. H. GlycoMod--a software tool for determining glycosylation compositions from mass spectrometric data. Proteomics 2001, 1, 340–349.

(20)

Li, F.; Glinskii, O. V.; Glinsky, V. V. Glycobioinformatics: current strategies and tools for data mining in MS-based glycoproteomics. Proteomics 2013, 13, 341–354.

(21)

Desaire, H. Glycopeptide analysis, recent developments and applications. Mol. Cell. Proteomics 2013, 12, 893–901.

(22)

Ozohanics, O.; Krenyacz, J.; Ludányi, K.; Pollreisz, F.; Vékey, K.; Drahos, L. GlycoMiner: A new software tool to elucidate glycopeptide composition. Rapid Commun. Mass Spectrom. 2008, 22, 3245–3254.

(23)

Woodin, C. L.; Hua, D.; Maxon, M.; Rebecchi, K. R.; Go, E. P.; Desaire, H. GlycoPep grader: A web-based utility for assigning the composition of N-linked glycopeptides. Anal. Chem. 2012, 84, 4821–4829.

(24)

Bern, M.; Kil, Y. J.; Becker, C. Byonic: advanced peptide and protein identification software. Curr. Protoc. Bioinformatics 2012, Chapter 13, Unit13.20.

(25)

Liu, G.; Neelamegham, S. Integration of systems glycobiology with bioinformatics toolboxes, glycoinformatics resources, and glycoproteomics data. Wiley Interdiscip. Rev. Syst. Biol. Med. 7, 163–181.

(26)

Mayampurath, A. M.; Wu, Y.; Segu, Z. M.; Mechref, Y.; Tang, H. Improving confidence in detection and characterization of protein N-glycosylation sites and microheterogeneity. Rapid Commun. Mass Spectrom. 2011, 25, 2007–2019.

(27)

Mechref, Y. Use of CID/ETD mass spectrometry to analyze glycopeptides. Curr. Protoc. Protein Sci. 2012, Chapter 12, Unit 12.11.1–11.

(28)

Cheng, K.; Chen, R.; Seebun, D.; Ye, M.; Figeys, D.; Zou, H. Large-scale characterization of intact N-glycopeptides using an automated glycoproteomic method. J. Proteomics 2014, 110, 145–154.

19 Environment ACS Paragon Plus

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(29)

Hwang, H.; Lee, J. Y.; Lee, H. K.; Park, G. W.; Jeong, H. K.; Moon, M. H.; Kim, J. Y.; Yoo, J. S. In-depth analysis of site-specific N-glycosylation in vitronectin from human plasma by tandem mass spectrometry with immunoprecipitation. Anal. Bioanal. Chem. 2014, 406, 7999–8011.

(30)

Jeong, S.-K.; Hancock, W. S.; Paik, Y.-K. GenomewidePDB 2.0: A Newly Upgraded Versatile Proteogenomic Database for the Chromosome-Centric Human Proteome Project. J. Proteome Res. 2015, 14, 3710–3719.

(31)

Paik, Y. K.; Omenn, G. S.; Uhlen, M.; Hanash, S.; Marko-Varga, G.; Aebersold, R.; Bairoch, A.; Yamamoto, T.; Legrain, P.; Lee, H. J.; et al. Standard guidelines for the chromosome-centric human proteome project. J. Proteome Res. 2012, 11, 2005–2013.

(32)

Park, G. W.; Kim, J. Y.; Hwang, H.; Lee, J. Y.; Ahn, Y. H.; Lee, H. K.; Ji, E. S.; Kim, K. H.; Jeong, H. K.; Yun, K. N.; et al. Integrated GlycoProteome Analyzer (I-GPA) for Automated Identification and Quantitation of Site-Specific N-Glycosylation. Sci. Rep. 2016, 6, 21175.

(33)

He, L.; Diedrich, J.; Chu, Y.-Y.; Yates, J. R. Extracting Accurate Precursor Information for Tandem Mass Spectra by RawConverter. Anal. Chem. 2015, 87, 11361–11367.

(34)

Nanni, P.; Panse, C.; Gehrig, P.; Mueller, S.; Grossmann, J.; Schlapbach, R. PTM MarkerFinder, a software tool to detect and validate spectra from peptides carrying posttranslational modifications. Proteomics 2013, 13, 2251–2255.

(35)

Medzihradszky, K. F.; Kaasik, K.; Chalkley, R. J. Tissue-specific glycosylation at the glycopeptide level. Mol. Cell. Proteomics 2015, 1–21.

(36)

Medzihradszky, K. F.; Kaasik, K.; Chalkley, R. J. Characterizing sialic acid variants at the glycopeptide level. Anal. Chem. 2015, 87, 3064–3071.

(37)

Pompach, P.; Chandler, K. B.; Lan, R.; Edwards, N.; Goldman, R. Semi-automated identification of N-glycopeptides by hydrophilic interaction chromatography, nanoreverse-phase LC-MS/MS, and glycan database search. J. Proteome Res. 2012, 11, 1728– 1740.

(38)

Ranzinger, R.; Herget, S.; Wetter, T.; von der Lieth, C.-W. GlycomeDB - integration of open-access carbohydrate structure databases. BMC Bioinformatics 2008, 9, 384.

(39)

Herget, S.; Ranzinger, R.; Maass, K.; Lieth, C. W. V. D. GlycoCT-a unifying sequence format for carbohydrates. Carbohydr. Res. 2008, 343, 2162–2171.

(40)

Desiere, F.; Deutsch, E. W.; King, N. L.; Nesvizhskii, A. I.; Mallick, P.; Eng, J.; Chen, S.; Eddes, J.; Loevenich, S. N.; Aebersold, R. The PeptideAtlas project. Nucleic Acids Res. 2006, 34, D655–D658.

(41)

Uchibori-Iwaki, H.; Yoneda, A.; Oda-Tamai, S.; Kato, S.; Akamatsu, N.; Otsuka, M.; Murase, K.; Kojima, K.; Suzuki, R.; Maeya, Y.; et al. The changes in glycosylation after partial hepatectomy enhance collagen binding of vitronectin in plasma. Glycobiology 2000, 10, 865–874.

20 Environment ACS Paragon Plus

Page 20 of 33

Page 21 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

(42)

Sano, K.; Asanuma-Date, K.; Arisaka, F.; Hattori, S.; Ogawa, H. Changes in glycosylation of vitronectin modulate multimerization and collagen binding during liver regeneration. Glycobiology 2007, 17, 784–794.

(43)

Bunkenborg, J.; Pilch, B. J.; Podtelejnikov, A. V.; Wiśniewski, J. R. Screening for Nglycosylated proteins by liquid chromatography mass spectrometry. Proteomics 2004, 4, 454–465.

(44)

Lee, H.-J.; Cha, H.-J.; Lim, J.-S.; Lee, S. H.; Song, S. Y.; Kim, H.; Hancock, W. S.; Yoo, J. S.; Paik, Y.-K. Abundance-ratio-based semiquantitative analysis of site-specific Nlinked glycopeptides present in the plasma of hepatocellular carcinoma patients. J. Proteome Res. 2014, 13, 2328–2338.

(45)

Deutsch, E. W.; Overall, C. M.; Van Eyk, J. E.; Baker, M. S.; Paik, Y.-K.; Weintraub, S. T.; Lane, L.; Martens, L.; Vandenbrouck, Y.; Kusebauch, U.; et al. Human Proteome Project Mass Spectrometry Data Interpretation Guidelines 2.1. J. Proteome Res. 2016.

21 Environment ACS Paragon Plus

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 33

Figure legends Figure 1. (A) Workflow of glycopeptide analysis with gFinder. Mass spectrometry data are divided into glycopeptide and non-glycopeptide spectra. Protein information is obtained by Mascot search using non-glycopeptide spectra for peptide backbone sequences. With these results, backbone sequences are searched and glycan structures are obtained through a GlycomeDB search. (B) A schem of merging procedure for CID and HCD spectra.

Figure 2. An example spectrum of differently identified N-glycopeptides (A, B) present only in HCC Human serum samples. The glycan symbol follows CFG nomenclature.

means cross-

ring cleavage fragments of GlcNAc with a loss of 120.0423 Da.

Figure 3. An example spectrum of N-glyco missing proteins of human serum samples. Nglycopeptide of Q8N9B8 at Asn-371.. The glycan symbols follow CFG nomenclature. means cross-ring cleavage fragments of GlcNAc with a loss of 120.0423 Da.

Figure 4. (A) Search option window. (B) Search result window. (C) An example result window with spectrum viewer: black, brown and green color represent glycan oxonium ions, Y-series ions and glycopeptide fragment ions, respectively. Users can change the settings in the left panel.

22 Environment ACS Paragon Plus

Page 23 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Table 1. Oxonium ions of the glyco spectrum

Mass(Da)

Saccharide

138.0550

HexNAc-2H2O-CH2O

163.0601

Hex

168.0602

HexNAc-2H2O

186.0760

HexNAc-H2O

204.0866

HexNAc

274.0874

NeuAc-H2O

292.1032

NeuAc

366.1400

HexHexNAc

657.2354

HexHexNAcNeuAc

*These oxonium ions are collected from previous research26,29,34

23 Environment ACS Paragon Plus

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 33

Table 2. A beneficial effect of combination of CID and HCD Spectra

CID only HCD only CID+HCD

Total spectra

glyco spectra

6174 6174 6174

0 706 706

Y-ion matched 0 148 155

Glycan DB matched 0 29 42

24 Environment ACS Paragon Plus

b/y ion matched 0 17 30

Page 25 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Table 3. Results of analysis of standard protein (IgG) by gFinder Precursor Massa

RTb

ppm

[HexNac]4[Hex]3[Fuc]1

774.84 (+4)

56.98

0.23

[HexNac]4[Hex]5[NeuAc]1[Fuc]1

1233.85 (+3)

57.95

0.40

[HexNac]5[Hex]4[Fuc]1

1150.50 (+3)

56.90

0.06

[HexNac]4[Hex]5[Fuc]1

976.06 (+3)

59.37

2.90

[HexNac]4[Hex]4[Fuc]1

922.04 (+3)

59.50

2.06

[HexNac]5[Hex]5[Fuc]1

1043.75 (+3)

59.82

2.30

[HexNac]5[Hex]4[Fuc]1

989.73 (+3)

59.86

-0.74

[HexNac]4[Hex]3[Fuc]1

868.02 (+3)

59.99

0.65

[HexNac]5[Hex]3[Fuc]1

1403.07 (+2)

60.01

-1.22

[HexNac]4[Hex]5[NeuAc]1[Fuc]1

1073.09 (+3)

60.87

0.75

[HexNac]5[Hex]5[NeuAc]1[Fuc]1

1140.78 (+3)

60.90

3.68

[HexNac]4[Hex]4[NeuAc]1[Fuc]1

1019.07

61.06

2.55

[HexNac]4[Hex]4[Fuc]1

1093.47 (+3)

52.75

0.75

[HexNac]4[Hex]3[Fuc]1

1039.45 (+3)

52.82

-0.21

[HexNac]5[Hex]4[Fuc]1

1161.16 (+3)

52.90

-0.36

[HexNac]4[Hex]4

1044.78 (+3)

52.97

1.61

[HexNac]4[Hex]4[Fuc]1

932.70 (+3)

54.18

1.63

[HexNac]4[Hex]3[Fuc]1

1317.53 (+2)

54.25

0.58

[HexNac]5[Hex]4[Fuc]1

1000.40 (+3)

54.32

1.19

[HexNac]4[Hex]5[Fuc]1

986.72 (+3)

54.49

2.59

[HexNac]3[Hex]4[Fuc]1

1297.01 (+2)

54.54

1.82

[HexNac]5[Hex]5[Fuc]1

1054.41 (+3)

54.70

2.30

[HexNac]5[Hex]3[Fuc]1

946.38 (+3)

54.80

-1.00

[HexNac]4[Hex]5

938.03 (+3)

54.89

2.37

[HexNac]4[Hex]4

884.02 (+3)

54.92

2.13

[HexNac]4[Hex]3

1244.50 (+2)

55.05

2.06

[HexNac]4[Hex]5[NeuAc]1[Fuc]1

1083.75 (+3)

55.49

0.48

Glycan

Structure

TKPREEQFN*STFR

EEQFN*STFR

TKPREEQYN*STYR

EEQYN*STYR

25 Environment ACS Paragon Plus

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 33

[HexNac]5[Hex]5[NeuAc]1[Fuc]1

1151.45 (+3)

55.65

2.80

[HexNac]4[Hex]4[NeuAc]1[Fuc]1

1029.74 (+3)

55.69

2.93

[HexNac]4[Hex]4[NeuAc]1

1035.07 (+3)

55.94

2.02

* N-glycosylation site a mass unit:Da, charge state in bracket, bmin mannose, galactose, N-acetyl grucosamine, CFG nomenclature).

fucose,

N-acetylneuraminic acid (glycan symbols follow

26 Environment ACS Paragon Plus

Page 27 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Table 4. Results of standard protein (vitronectin) analysis by gFinder Precursor Massa

RTb

ppm

[HexNac]4[Hex]5[NeuAc]1

1432.28 (+3)

56.07

-0.18

[HexNac]4[Hex]5[NeuAc]2

1529.32 (+3) 1147.24 (+4)

57.10 57.11

1.40 0.58

[HexNac]5[Hex]6[NeuAc]2**

1238.52 (+4)

57.16

1.91

[HexNac]5[Hex]6[NeuAc]2[Fuc]1**

1275.04 (+4)

57.13

0.84

[HexNac]5[Hex]6[NeuAc]3

1311.29 (+4)

58.61

-0.33

[HexNac]5[Hex]6[NeuAc]3[Fuc]1

1347.81 (+4)

58.64

0.35

[HexNac]3[Hex]4**

1084.96 (+2)

60.03

2.94

[HexNac]3[Hex]4[NeuAc]1

1230.51 (+2) 820.68 (+3)

63.12 63.19

1.62 1.51

[HexNac]3[Hex]5

1165.99 (+2)

59.49

-0.78

[HexNac]3[Hex]5[NeuAc]1

874.69 (+3)

62.83

2.14

[HexNac]3[Hex]6[NeuAc]1

928.71 (+3)

62.66

2.37

[HexNac]4[Hex]5**

1267.53 (+2) 845.36 (+3)

59.31 59.37

2.99 3.40

[HexNac]4[Hex]5[NeuAc]1

942.39 (+3)

62.48

1.42

[HexNac]4[Hex]5[NeuAc]2

1558.63 (+2) 1039.42 (+3)

65.63 65.62

3.84 3.69

[HexNac]4[Hex]5[NeuAc]2[Fuc]1**

1088.10 (+3)

65.82

0.89

[HexNac]4[Hex]6[NeuAc]1**

996.40 (+3)

63.05

0.27

[HexNac]5[Hex]6[NeuAc]2**

1161.13 (+3)

65.19

2.96

[HexNac]5[Hex]6[NeuAc]2[Fuc]1

1209.82 (+3)

65.16

3.03

[HexNac]5[Hex]6[NeuAc]3

1258.16 (+3)

68.19

2.31

[HexNac]5[Hex]6[NeuAc]3[Fuc]1

1306.85 (+3)

68.22

0.95

[HexNac]4[Hex]5[Fuc]1

1135.995 (+4)

62.33

1.47

[HexNac]4[Hex]5[NeuAc]1

938.01 (+5)

71.27

1.09

[HexNac]4[Hex]5[NeuAc]1[Fuc]2**

1245.28 (+4)

91.03

-0.50

[HexNac]4[Hex]5[NeuAc]2

1245.53 (+4)

94.47

-0.82

[HexNac]4[Hex]5[NeuAc]2[Fuc]1

1281.54 (+4) 1025.43 (+5)

72.53 72.56

-0.80 -0.07

Glycan

Structure

NN*ATVHEQVGGPSLTSDLQAQSK (Asn-86)

N*GSLFAFR (Asn-169)

N*ISDGFDGIPDNVDAALALPAHSYSGR (Asn-242)

27 Environment ACS Paragon Plus

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 28 of 33

[HexNac]5[Hex]6[NeuAc]1

1263.54 (+4) 1011.03 (+5)

70.60 70.65

-0.10 0.03

[HexNac]5[Hex]6[NeuAc]1[Fuc]1

1300.52 (+4) 1040.24 (+5)

70.54 70.59

1.06 0.02

[HexNac]5[Hex]6[NeuAc]2

1069.25 (+5)

72.40

1.43

[HexNac]5[Hex]6[NeuAc]2[Fuc]1

1372.82 (+4) 1098.46 (+5)

72.31 72.32

1.58 0.84

[HexNac]5[Hex]6[NeuAc]3

1127.47 (+5)

74.58

-0.15

* N-glycosylation site ** Newly found N-glycopeptide a mass unit:Da, charge state in bracket, bmin mannose, galactose, N-acetyl grucosamine, CFG nomenclature)

fucose,

N-acetylneuraminic acid (glycan symbols follow

28 Environment ACS Paragon Plus

Page 29 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure 1, Workflow Figure 1 254x329mm (300 x 300 DPI)

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 2, Examples of glycopeptides present in human sample Figure 2 254x362mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 30 of 33

Page 31 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure 3, An example spectrum of N-glyco missing proteins Figure 3 254x190mm (300 x 300 DPI)

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 4, Visual options Figure 4 254x344mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 32 of 33

Page 33 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

TOC Figure TOC 99x58mm (300 x 300 DPI)

ACS Paragon Plus Environment