Identification of N-Glycosylation Sites on Secreted Proteins of Human

Jan 13, 2009 - example, normal and diseased,1,2 the global profiling of secre- tome received ... glycosylated sites without glycopeptide enrichment st...
19 downloads 0 Views 1MB Size
Identification of N-Glycosylation Sites on Secreted Proteins of Human Hepatocellular Carcinoma Cells with a Complementary Proteomics Approach Jing Cao,†,‡ Chengping Shen,‡ Hong Wang,‡ Huali Shen,‡ Yaohan Chen,‡ Aiying Nie,‡ Guoquan Yan,†,‡ Haojie Lu,†,‡ Yinkun Liu,*,‡,§ and Pengyuan Yang*,†,‡ Department of Chemistry, Fudan University, Shanghai, China, Institutes for Biomedical Sciences, Fudan University, Shanghai, China, and Liver Cancer Institute Zhongshan Hospital, Fudan University, Shanghai, China Received October 1, 2008

N-linked glycosylation is prevalent in proteins destined for extracellular environments; nearly all secreted proteins are glycosylated. However, with respect to their glycosylation sites, little attention has been paid. Here, we report the analysis of N-glycosylation sites on secreted proteins of human hepatocellular carcinoma cells. For the enrichment of glycopeptides, capture methods with hydrophilic affinity (HA) and hydrazide chemistry (HC) were used complementarily. With the use of both methods in combination with nano-LC-ESI-MS/MS analysis, 300 different glycosylation sites within 194 unique glycoproteins were identified, and 172 glycosites have not been determined experimentally previously. A direct comparison between HA and HC methods was also investigated for the first time. In brief, in terms of selectivity for glycopeptides, HC is superior to HA (92.9% vs 51.3%); however, based on the number of glycosites identified, HA outweighs HC (265 vs 159). Furthermore, unavoidable contaminants such as actin and bovine serum albumin which are not N-glycosylated could be easily depleted by using this glycoproteomic strategy. As a consequence, more low-abundance and genuinely secreted proteins were identified. Among the glycoproteins identified, R-fetoprotein, CD44 and laminin have been reported to be implicated in HCC and its metastasis. Keywords: N-glycosylation • Secreted proteins • Hepatocellular carcinoma • Mass spectrometry • Glycopeptide-capture • Hydrophilic affinity • Hydrazide chemistry • LTQ-Orbitrap

1. Introduction Given the importance of secreted proteins as a source for early detection and diagnosis of disease, secreted proteins have been arousing considerable attention, and the field of secreted proteome (secretome) has achieved substantial advances during recent years. In addition to the comparative proteomic analyses focusing on the characterization of differentially expressed proteins in two different, well-controlled states, for example, normal and diseased,1,2 the global profiling of secretome received considerable attraction.3,4 However, most of these investigations paid little attention to the post-translational modifications (PTMs) of the proteins investigated, for example, glycosylation and phosphorylation sites. Most data in this respect are still obtained by studies of individual proteins and body fluid proteome.5,6 Glycosylation is one of the most important and abundant post-translational modifications in nature.7 Typically, carbohydrates are linked to serine or threonine residues (O-linked glycosylation) or to asparagine residues (N-linked glycosyla* To whom correspondence should be addressed. E-mails: (P.Y.) pyyang@ fudan.edu.cn; (Y.L.) [email protected]. † Department of Chemistry, Fudan University. ‡ Institutes for Biomedical Sciences, Fudan University. § Liver Cancer Institute Zhongshan Hospital, Fudan University.

662 Journal of Proteome Research 2009, 8, 662–672 Published on Web 01/13/2009

tion). Generally, N-linked glycosylation sites fall into the N-XS/T sequon in which X denotes any amino acid except proline. Since the research of glycosylation has potential for diagnostic and therapeutic purposes,8 interest in investigating glycosylation status of glycoproteins has greatly increased in recent years. The attachment site of carbohydrate to the peptide backbone is of primary interest. For elucidation of N-glycosylation sites, a very convenient method making use of the properties of PNGase F and N-X-S/T (X*P) sequon was commonly used. PNGase F not only cleaves all types of N-glycans from the polypeptide backbones, with few exceptions,9 but also possesses an additional amidase activity during this process. Therefore, PNGaseF converts asparagine to aspartic acid during the cleavage reaction. This results in a 1 Da mass shift detectable by most of the recently available mass spectrometers, especially by FTICR or Orbitrap instrument with distinguishable and superior mass accuracies.10-12 For analysis of N-glycosylation sites, a major drawback is that glycopeptides are usually present in relatively low abundance (2-5%) in peptide mixtures compared to nonglycosylated peptides. The coexistence of nonglycosylated peptides would significantly suppress the mass spectrometric signals of glycopeptides, making it almost impossible to identify the low-abundance glycoproteins and their 10.1021/pr800826u CCC: $40.75

 2009 American Chemical Society

Identification of N-Glycosylation Sites glycosylated sites without glycopeptide enrichment steps for biological samples.13 To overcome the problem mentioned above, currently, solidphase extraction approaches such as lectin affinity,14-16 hydrazide chemistry (HC),17-19 hydrophilic affinity (HA),20-22 and boronate affinity methods23-25 are powerful tools for enrichment of glycoproteins/glycopeptides. In all these approaches, lectin affinity is a two-edge sword since lectins have a high degree of specificity for particular glycan structures. On the one hand, the binding selectivity of lectins to specific conformations of different carbohydrate moieties has limited the utility of lectins in global glycoprotein analysis;26 on the other hand, the selective specificity for limited subsets of the glycopeptides is favorable for focused glycosylation studies.27-29 In essence, the other three methods have few limits for the enrichment of glycopeptides, HC in particular has been used successfully to identify N-glycosites widely.30,31 HC, as well as boronate affinity method, makes use of the cis-diol groups of carbohydrates, while HA has different mechanism to immobilize glycopeptides on solid phase, based on the hydrogen bonding between carbohydrate oxygens of the gel-matrix (sepharose or cellulose) and glycopeptides. However, due to the complexity and heterogeneity of glycosylation, none of these methods can capture all the glycoproteins/glycopeptides in complex biological samples. Accordingly, to maximize the profile of N-glycosylated secretory proteins in human hepatocellular carcinoma (HCC) cells, we applied in this study the two enrichment methods based on different mechanisms, HC and HA. In combination with nano-LC-ESI-MS/MS as the first step, a large-scale glycoproteomic identification and N-glycosylation site elucidation has been performed for secreted proteins of HCC cells. Furthermore, the two methods have been compared, and the merits and shortcomings of each method are discussed based on the obtained results.

2. Materials and Methods 2.1. Chemicals and Materials. Bradford assay reagent, sodium periodate and hydrazide resin were obtained from BioRad. A 0.45 µm filter and 3000 Da MWCO spin column were from Millipore. Sequencing grade modified trypsin was from Promega. C18 spin columns were from Waters. Sepharose CL4B was purchased from Amersham Bioscience. PNGase F was from New England Biolabs. The water used was Milli-Q grade. All other chemicals were purchased from Sigma. 2.2. Cell Culture and Preparation of Secretory Proteins. HCCLM3 cell line had been established previously.32 The cells were grown in DMEM culture medium containing 10% fetal bovine serum until approaching 60-70% confluence. Cells were washed stringently and gently, two times with Dubelcco’s phosphate buffered saline with calcium and magnesium (DPBS) and one time with serum- and phenol red-free DMEM (conditioned medium). Cells were then incubated in the conditioned medium at 37 °C. After 24 h, the culture supernatants containing secreted proteins were collected and centrifuged at 1000g for 5 min (4 °C) and subsequently filtered using a 0.45 µm filter and then concentrated using a 3000 Da MWCO spin column. Protein concentrations of the concentrated medium were measured by the Bradford assay. 2.3. Tryptic Digestion of Samples. The secreted proteins were in-solution digested using the method previously described33 with minor modification. In each experiment, the concentrated proteins were heated at 100 °C for 10 min. After

research articles the sample had cooled down to room temperature, DTT at a final concentration of 10 mM was introduced into the solution and the sample was incubated at 57 °C for 30 min. To prevent disulfide bond formation, cysteine residues were alkylated by iodoacetamide, which was added to the sample solution at 20 mM final concentration. A 30-min incubation in the dark, at room temperature, was carried out for cysteine derivatization. The reaction was quenched by the addition of DTT at half of the molar concentration of the iodoacetamide for 10 min. After iodoacetamide deactivation, the sample solution was diluted 10-fold with 50 mM NH4HCO3 buffer. Trypsin was added to the sample (1:50), incubated overnight at 37 °C. All digested peptide mixtures were passed over a C18 column to remove extra DTT, and salt. Tryptic peptides were eluted from the column with 80% acetonitrile (ACN) in 0.1% trifluoroacetic acid (TFA) and the peptides solution was devided equally; then, each aliquot was dried in a vacuum centrifuge for later use. 2.4. Glycopeptide Capture. 2.4.1. Hydrophilic Affinity Separation of Glycopeptides (HA Method). The hydrophilic affinity enrichment of glycopeptides was carried out by the method previously described.34,35 About 100 µg of the digest was added to a microcentrifuge tube containing 20 µL of Sepharose. Then, 1 mL of organic solvent mixture containing butanol/ethanol/ water (4:1:1 by volume) was added to the digest/Sepharose mixture. The resulting mixture was gently mixed for 1 h and then centrifuged for 5 min. The supernatant was removed, and the pellet was washed three times with the same organic solvent. The glycopeptides were extracted by incubating the pellet in 50% ethanol aqueous solution for 30 min. The samples were vortexed and then centrifuged, and the supernatant was transferred to another microcentrifuge tube. The same extraction procedure was repeated once. Finally, the combined supernatant was lyophilized and the captured glycopeptides were deglycosylated with PNGase F (500 units/µL) at a concentration of 1 µL of PNGase F per 1 mg of crude proteins in 100 mM NH4HCO3 (pH 8.0).36 2.4.2. Glycopeptide-Capture Using Hydrazide Chemistry (HC Method). The glycopeptides were captured using the method previously described33 with slight modification. Dried tryptic peptides were dissolved in a coupling buffer (100 mM sodium acetate, 150 mM NaCl, pH 5.5) at a concentration of 1 mg/100 µL. The undissolved solids were removed by centrifugation, and the supernatant was ready for the following reactions. First, to oxidize the cis-diol groups of carbohydrates to aldehydes, sodium periodate at 10 mM final concentration was introduced into the peptide solution and the sample was incubated in the dark at room temperature for 1 h under agitation. Second, sodium sulfite was added to 20 mM final concentration and incubated for 10 min to deactivate the excess oxidant in the peptide solution. The coupling reaction was initiated by introducing hydrazide resin (prepared by washing three times with coupling buffer) into the quenched peptide solution at 10 mg/mL of resin, and extra coupling buffer was added to make a solid to liquid ratio of 1:5. The coupling reaction was performed at 37 °C overnight under agitation. After the coupling reaction, the resin was washed twice thoroughly and successively with 1.5 M NaCl, 80% (v/v) aqueous ACN, water, and 50 mM NH4HCO3 (made fresh). Finally, enzymatic cleavage of the N-linked peptides from the sugar moiety was carried out as mentioned above. The supernatant, containing the released deglycosylated peptides, was Journal of Proteome Research • Vol. 8, No. 2, 2009 663

research articles collected by centrifugation and combined with the supernatant of an 80% ACN wash. 2.5. Automated Nano-LC-ESI-MS/MS Analysis of Peptides. All the eluted peptide solutions above were dried thoroughly using a vacuum centrifuge and then resuspended with 5% ACN in 0.1% formic acid, separated by nanoLC and analyzed by online electrospray tandem mass spectrometry. The experiments were performed on a LC-20AD system (Shimadzu, Tokyo, Japan) connected to an LTQ Orbitrap mass spectrometer (Thermo Electron, Bremen, Germany) equipped with an online nanoelectrospray ion source (Michrom Bioresources, Auburn, CA). The separation of the peptides took place in a 15-cm reverse phase column (100 µm i.d., MICHROM Bioresources, Inc., Auburn, CA). The peptide mixtures were injected onto the trap-column with a flow of 60 µL/min and subsequently eluted with a gradient of 5-45% solvent B (95% ACN in 0.1% formic acid) over 90 min, and then injected into the mass-spectrometer at a constant column-tip flow rate of ∼500 nL/min. Eluted peptides were analyzed by MS and data-dependent MS/MS acquisition, selecting the 8 most abundant precursor ions for MS/MS with a dynamic exclusion duration of 1 min. The peptide mixtures from HC and HA methods were analyzed under the same Nano-LC-ESI-MS/MS condition. 2.6. Peptide Sequencing and Data Interpretation. The mass spectra were searched against the human International Protein Index (IPI) database (IPI human v3.45 fasta with 71 983 entries) using the Bioworks software (Version 3.3.1; Thermo Electron Corp.) based on the SEQUEST algorithm. The parameters for the SEQUEST search were as follows: enzyme, fullTrypsin; missed cleavages, two; fixed modification, carboxyamidomethylation (C); variable modifications, deamidation (N) and oxidation (M); peptide tolerance, 10 ppm; MS/MS tolerance, 1.0 Da. Positive protein identification was accepted for a peptide with Xcorr of greater than or equal to 2.5 for triply and 1.8 for doubly charged ions, and all with ∆Cn g 0.1.37-39 Furthermore, database search results were statistically analyzed using PeptideProphet,40 which effectively computes a probability for generating statistical validation of MS/MS search engines’ spectra-to-peptide sequence assignments. A minimum PeptideProphet probability score (P) filter of 0.95 was used to remove low-probability peptides. Since N-glycosylation occurs at a consensus N-X-S/T (X * P) sequon, the remaining peptide sequences were additionally filtered to remove nonmotifcontaining peptides. Additional peptides not passing the filtering criteria, with Xcorr g1.8 (2+) or Xcorr g 2.5 (3+), but P < 0.95 were also investigated by the method previously described.41 First, a modified IPI human v3.45 database was created, of which the asparagines (N) contained within the N-X-S/T (X * P) sequon were replaced with alphabet ‘J’. Prior to database searching, ‘J‘ was defined as deamidated N having a monoisotopic and average mass of 115.02694 and 115.0886, respectively. Then, the peak-lists corresponding to the additional peptides were searched against the modified database.

3. Results and Discussion 3.1. Glycoproteomic Strategy for Secretome. A total of 194 glycoproteins were identified from six LTQ-Orbitrap experiments of two biological replicates per method (12 LTQ-Orbitrap runs in total), 70.6% (137/194) of which are known to be secreted in Swiss-Prot database, either localized extracellularily or in membrane. The other 57 glycoproteins (containing 29 664

Journal of Proteome Research • Vol. 8, No. 2, 2009

Cao et al. glycoproteins that have no localization definition in TrEMBL) were further analyzed using bioinformatic software programs as reported previous.42 Consequently, 48 glycoproteins were predicted by SignalP to be secreted in the classical secretory pathway, which is characterized by the presence of a signal peptide.43 Two glycoproteins were predicted by SecretomeP to be released through the nonclassical secretory pathway.44 Collectively, 96.4% (187/194) of the identified glycoproteins could be verified as secretory proteins, and such a high ratio implies that this glycoproteomic strategy also has been favorable for the detection of genuinely secreted proteins. To illustrate, as well-known, the contamination of the authentic secreted proteins is unavoidable because of the notorious residual bovine serum albumin (BSA) and cytoplasmic proteins (e.g., actin) released in the medium by cell lysis, which occurs even in the best cultures.45,46 Fortunately, such inevitable contaminants are not N-glycosylated, while nearly all cell surface and secreted proteins are glycosylated;47 thus, Nglycopeptide enrichment procedure could deplete them, and more low-abundance and genuinely secreted proteins would be identified. Furthermore, N-glycoproteins and classical secreted proteins have the same dependence of the endoplasmic reticulum (ER)-Golgi network, because glycans are formed on their protein or lipid scaffolds by glycosyltransferases and glycosidases as they traffic through the secretory pathway.8 Herein, the fact that only two glycoproteins we identified were leaderless secretion (without an N-terminal signal peptide) conclusively demonstrated this good point. What is worth mentioning is that the proposed N-glycoproteomic strategy not only is a great way to profile the genuinely secreted proteins, but also is an excellent gateway to discover the potential biomarkers. In our data, many of the N-glycosylated secreted proteins have been implicated in HCC and the metastasis of HCC. For instance, the clusterin and R-1 antitrypsin are overexpressed in HCC serum compared to healthy group.48,49 The adhesion molecules (CD44 and laminin)50,51 and extracellular matrix (fibronectin)52 have been reported to be associated with the invasion and metastasis of HCC. Furthermore, R-fetoprotein is well-known as a marker of HCC, widely used for biochemical blood test. These observations demonstrated the contention that the complementary glycoproteomic approach we developed here is a powerful tool for the discovery of potential biomarkers. Meanwhile, CD proteins that play important immune functions in cells are a class of membrane proteins which are often glycosylated and also make good drug targets and biomarkers.33 In current study, we identified 23 CD proteins, which would be one of the potential use in future studies. 3.2. Identification of N-Glycosites in Secretome of HCC. A total of 300 unique N-glycosylation sites were detected in 296 different glycopeptides within the 194 identified glycoproteins, and the 128 sites were presently documented in the Swiss-Prot database (version of July 2008) based on experimental evidence (Supporting Information Table 1). However, 144 N-glycosylation sites in the Swiss-Prot database were annotated as potential (141 sites), probable (2) or by similarity (1), which are three types of nonexperimental qualifiers indicating that the information given is not based on experimentally proven findings. To our knowledge, this study provides the first experimental evidence for these 144 N-glycosylation sites (Table 1). Interestingly, 28 N-glycosylation sites, which hitherto were not documented in the Swiss-Prot database, were

isoform a of protocadherin-7 precursor leucine-rich repeat transmembrane protein flrt3 precursor palmitoyl-protein thioesterase 1 precursor cd44 antigen isoform 5 precursor tumor necrosis factor receptor superfamily member 21 precursor tumor necrosis factor receptor superfamily member 21 precursor lysosomal-associated membrane protein 1 lysosomal-associated membrane protein 1 lysosomal-associated membrane protein 1 isoform 1 of adam 28 precursor isoform bmp1-3 of bone morphogenetic protein 1 precursor trophoblast glycoprotein precursor CMP-N-acetylneuraminate-β-galactosamide-R-2,3sialyltransferase CMP-N-acetylneuraminate-β-galactosamide-R-2,3sialyltransferase CMP-N-acetylneuraminate-β-galactosamide-R-2,3sialyltransferase glia-derived nexin precursor CD66a antigen chloride intracellular channel protein 1 isoform 1 of sialate O-acetylesterase precursor N-acetylglucosamine-6-sulfatase precursor lysosomal alpha-mannosidase precursor laminin subunit β-1 precursor laminin subunit β-1 precursor laminin subunit β-1 precursor proto-oncogene tyrosine-protein kinase receptor ret precursor lama5 protein cd166 antigen glypican-1 precursor isoform 1 of receptor-type tyrosine-protein phosphatase kappa precursor tetraspanin-8 sema4g protein isoform 3 of seizure 6-like protein 2 precursor isoform 3 of seizure 6-like protein 2 precursor CD276 antigen CMP-N-acetylneuraminate-poly alpha-2,8-sialyltransferase prolow-density lipoprotein receptor-related protein 1 precursor prolow-density lipoprotein receptor-related protein 1 precursor prolow-density lipoprotein receptor-related protein 1 precursor lumican precursor low-density lipoprotein receptor-related protein 12 precursor CD121b antigen CD265 antigen

IPI00001893.2 IPI00002320.1 IPI00002412.1 IPI00002541.4 IPI00004413.1

IPI00015872.3 IPI00016629.2 IPI00018276.1 IPI00018276.1 IPI00019275.3 IPI00020201.1 IPI00020557.1 IPI00020557.1 IPI00020557.1 IPI00020986.2 IPI00021027.1 IPI00021382.1 IPI00021968.1

IPI00009890.1 IPI00009938.1 IPI00010896.3 IPI00010949.3 IPI00012102.1 IPI00012989.2 IPI00013976.3 IPI00013976.3 IPI00013976.3 IPI00013983.1 IPI00014138.1 IPI00015102.2 IPI00015688.1 IPI00015756.2

IPI00009629.1

IPI00009629.1

IPI00004503.5 IPI00004503.5 IPI00004503.5 IPI00007709.2 IPI00009054.1 pIPI00009111.1 IPI00009629.1

IPI00004413.1

description

protein IPI

peptide sequence

R.IVN#ETLYENTK.L R.SQGYN#SSQDLPSLVLDFVK.L R.LLAN#SSM*LGEGQVLR.S R.IVSPEPGGAVGPN#LTCR.W R.TALFPDLLAQGN#ASLR.L R.LSMLN#DSVLWIPAFMVK.G K.LTSCATN#ASICGDEAR.C R.LN#GTDPIVAADSK.R K.WTGHN#VTVVQR.T K.LHINHNN#LTESVGPLPK.S K.IN#CSWFIR.A R.IN#LTWHK.N K.ALVAVVAGN#STTPR.R

K.N#ASEIEVPFVTR.N R.N#DTGPYECEIQNPVSAN#R.S K.GVTFN#VTTVDTK.R K.N#LTFEGPLPEK.I R.GASN#LTWR.S K.AN#LTWSVK.H K.M*EM*PSTPQQLQN#LTEDIR.E R.VN#ASTTEPN#STVEQSALM*R.D R.VN#ASTTEPN#STVEQSALM*R.D R.M*ERPDN#CSEEMYR.L R.LN#TTGVSAGCTADLLVGR.A R.N#ATVVWMK.D R.SFDDHFQHLLN#DSER.T R.IAVDWESLGYN#ITR.C

R.FN#QTMQPLLTAQNALLEDDTYR.W

K.TGVHDADFESN#VTATLASINK.I

K.N#MTFDLPSDATVVLN#R.S K.N#MTFDLPSDATVVLN#R.S R.GHTLTLN#FTR.N K.NLLAPGYTETYYN#STGK.E K.IILN#FTSLDLYR.S R.N#LTEVPTDLPAYVR.N R.ELGDN#VSM*ILVPFK.T

K.CPAGTYVSEHCTN#TSLR.V

K.IDN#LTGELSTSER.R K.VFFNLVN#LTELSLVR.N R.N#HSIFLADINQER.G K.AFN#STLPTM*AQM*EK.A K.VLSSIQEGTVPDN#TSSAR.G

Table 1. A Detailed List of the 172 N-Glycosites Not Determined Experimentally Previouslya

118 388 247 373 104 219 3788 4075 1511 127 75 66 105

118 208 42 401 405 367 1542 1336 1343 975 272 361 116 416

79

323

62 76 103 89 363 81 201

82

79 226 197 57 278

site

yes yes yes yes yes yes

yes yes yes

yes yes

yes yes yes yes

yes yes yes yes yes yes yes

yes

yes

yes yes yes yes yes

yes

yes

yes yes yes

potential

yes

yes

yes

yes yes

yes yes

yes

unknown

SEP SEP HA, SEP SEP SEP SEP HA, SEP HA, SEP SEP SEP SEP SEP HA, SEP

HA, SEP SEP HA, SEP SEP HA, SEP SEP HA, SEP HA HA SEP HA, SEP HA, SEP SEP HA, SEP

SEP

SEP

SEP SEP HA SEP SEP SEP HA, SEP

SEP

HA, SEP HA, SEP SEP SEP HA, SEP

method

Identification of N-Glycosylation Sites

research articles

Journal of Proteome Research • Vol. 8, No. 2, 2009 665

666

protein creg1 precursor protein creg1 precursor protein creg1 precursor CD315 antigen CD315 antigen leucine-rich R-2-glycoprotein precursor isoform 1 of myelin protein zero-like protein 1 precursor torsin-1b precursor isoform 1 of neogenin precursor isoform 1 of neogenin precursor integral membrane protein dgcr2/idd precursor basement membrane-specific heparan sulfate proteoglycan core protein precursor basement membrane-specific heparan sulfate proteoglycan core protein precursor aspartate β-hydroxylase isoform e isoform 2a of desmocollin-2 precursor R-2-macroglobulin receptor-associated protein precursor serine protease 23 precursor procollagen-lysine,2-oxoglutarate 5-dioxygenase 1 precursor endoplasmin precursor leukocyte elastase inhibitor CD66e antigen CD66e antigen CD98 antigen CD98 antigen CD98 antigen dystroglycan precursor dystroglycan precursor isoform 1 of hepatocyte growth factor receptor precursor isoform 1 of hepatocyte growth factor receptor precursor N-acetylgalactosamine-6-sulfatase precursor isoform 1 of complement factor h precursor sialidase-1 precursor procollagen-lysine,2-oxoglutarate 5-dioxygenase 3 precursor procollagen-lysine,2-oxoglutarate 5-dioxygenase 3 precursor transmembrane 9 superfamily member 3 precursor tyrosine-protein kinase receptor tyro3 precursor isoform 1 of amyloid-like protein 2 precursor isoform 1 of amyloid-like protein 2 precursor protocadherin fat 1 precursor protocadherin fat 1 precursor protocadherin fat 1 precursor R-N-acetylgalactosaminide R-2,6-sialyltransferase 1 R-N-acetylgalactosaminide R-2,6-sialyltransferase 1 metalloproteinase inhibitor 1 precursor metalloproteinase inhibitor 1 precursor lactosylceramide 4-R-galactosyltransferase

IPI00021997.1 IPI00021997.1 IPI00021997.1 IPI00022048.8 IPI00022048.8 IPI00022417.4 IPI00022558.2 IPI00023137.1 IPI00023814.2 IPI00023814.2 IPI00024272.1 IPI00024284.4

IPI00024572.3 IPI00025846.1 IPI00026848.3 IPI00026941.1 IPI00027192.5 IPI00027230.3 IPI00027444.1 IPI00027486.3 IPI00027486.3 IPI00027493.1 IPI00027493.1 IPI00027493.1 IPI00028911.1 IPI00028911.1 IPI00029273.1 IPI00029273.1 IPI00029605.1 IPI00029739.5 IPI00029817.1 IPI00030255.1 IPI00030255.1 IPI00030847.3 IPI00030887.1 IPI00031030.1 IPI00031030.1 IPI00031411.3 IPI00031411.3 IPI00031411.3 IPI00031534.3 IPI00031534.3 IPI00032292.1 IPI00032292.1 IPI00033541.1

IPI00024284.4

description

protein IPI

Table 1. Continued

R.YN#LSEVLQGK.L K.N#VTLHVPSK.L R.VIDLWDLAQSAN#LTDK.E K.QYLSYETLYAN#GSR.T R.EQIN#ITLDHR.C R.EEEAIQLDGLN#ASQIR.E R.LGVQDLFN#SSK.A R.TLTLFN#VTR.N R.TLTLFN#VTRN#DTASYK.C K.DASSFLAEWQN#ITK.G R.LLIAGTN#SSDLQQILSLLESNK.D K.SLVTQYLN#ATGNR.W R.N#CSTITLQN#ITR.G R.N#CSTITLQN#ITR.G R.N#FTVACQHR.S R.VLLGN#ESCTLTLSESTMNTLK.C K.TGEAN#LTQIYLQEALDFIK.R K.MDGASN#VTCINSR.W R.WSFSN#GTSWR.K R.SAEFFN#YTVR.T K.EQYIHEN#YSR.A R.IVDVN#LTSEGK.V R.AN#LTGWDPQK.D R.RN#QSLSLLYK.V R.N#QSLSLLYK.V K.TGALTVQN#TTQLR.S K.IN#SSVTDIEEIIGVR.I R.QVTQEM*LN#HTIAIR.F K.LFLPN#LTLFLDSR.H R.DYEWLEALLMN#QTVMSK.N K.FVGTPEVN#QTTLYQR.Y R.SHN#RSEEFLIAGK.L R.N#LTNVLGTQSR.Y

R.SLTQGSLIVGDLAPVN#GTSQGK.F

K.IVTPEEYYN#VTVQ.K.LN#ITNIWVLDYFGGPK.I K.VN#ETEMDIAK.H K.LEN#WTDASR.V K.ELDLTCN#ITTDR.A K.LPPGLLAN#FTLLR.T K.EIFVAN#GTQGK.L R.EERPLN#ASALK.L R.GSSVILN#CSAYSEPSPK.I R.TLSDVPSAAPQN#LSLEVR.N R.LN#GSLATFSTDQELR.F R.ALVN#FTR.S

peptide sequence

64 34 268 93 197 62 298 204 208 264 280 323 641 649 785 607 204 1034 352 63 548 174 380 541 541 1943 3718 3642 300 460 53 101 203

3780

216 193 160 618 300 186 50 64 73 639 149 89

site

Journal of Proteome Research • Vol. 8, No. 2, 2009 yes yes yes yes yes yes yes

yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes

yes

yes yes yes yes yes yes yes yes yes yes yes yes

potential

yes yes yes

unknown

HA, SEP HA, SEP SEP HA, SEP HA, SEP HA, SEP HA, SEP SEP SEP HA, SEP HA, SEP SEP SEP SEP HA, SEP HA, SEP SEP HA, SEP SEP SEP SEP HA SEP SEP SEP HA HA, SEP SEP HA, SEP SEP HA, SEP SEP SEP

SEP

SEP SEP SEP HA HA SEP HA, SEP HA HA, SEP HA, SEP SEP HA, SEP

method

research articles Cao et al.

cdna flj77775, highly similar to Homo sapiens mhc class i polypeptide- related sequence b (micb), mrna isoform 2 of calumenin precursor CD316 antigen isoform 2c2a′ of collagen alpha-2(vi) chain precursor isoform 2c2a′ of collagen alpha-2(vi) chain precursor protein jagged-1 precursor isoform 1 of carbohydrate sulfotransferase 11 isoform 1 of soluble calcium-activated nucleotidase 1 receptor-type tyrosine-protein phosphatase f precursor isoform 1 of cd109 antigen precursor isoform 1 of cd109 antigen precursor angiopoietin-related protein 4 precursor isoform 1 of ectonucleotide pyrophosphatase/phosphodiesterase family member 2 precursor isoform 1 of ectonucleotide pyrophosphatase/phosphodiesterase family member 2 precursor muscle type neuropilin 1 muscle type neuropilin 1 glucosaminyl (N-acetyl) transferase 2, i-branching enzyme isoform a R-2-glycoprotein 1, zinc cDNA flj13286 fis, clone ovarc1001154, highly similar to Homo sapiens clone 24720 epithelin 1 and 2 mRNA CD49c antigen CD49c antigen cd63 antigen isoform 2 of UDP-GlcNAc:betaGal beta-1,3-Nacetylglucosaminyltransferase 2 signal peptide, CUB and EGF-like domain-containing protein 1 uncharacterized protein c12orf25 isoform 2 of inositol 1,4,5-trisphosphate receptor type 1 isoform b of fibulin-1 precursor myristoylated alanine-rich c-kinase substrate isoform 1 of roundabout homologue 1 precursor isoform 2 of neuromedin-b precursor amphoterin-induced protein 2 precursor IgGFc-binding protein precursor IgGFc-binding protein precursor IgGFc-binding protein precursor cadherin-17 precursor cadherin-17 precursor CD148 antigen CD148 antigen collagen R-1(vi) chain precursor collagen R-1(vi) chain precursor insulin-like growth factor-binding protein-like 1 precursor aspartyl/asparaginyl β-hydroxylase

IPI00044790.2

IPI00217435.4 IPI00217802.2 IPI00218658.1 IPI00218803.2 IPI00219301.7 IPI00219798.1 IPI00220630.1 IPI00238262.4 IPI00242956.4 IPI00242956.4 IPI00242956.4 IPI00290089.4 IPI00290089.4 IPI00290328.3 IPI00290328.3 IPI00291136.4 IPI00291136.4 IPI00291987.1 IPI00294834.6

IPI00215995.1 IPI00215995.1 IPI00215998.5 IPI00217345.1

IPI00166729.4 IPI00181753.4

IPI00165438.2 IPI00165438.2 IPI00166086.1

IPI00156171.2

IPI00045396.1 IPI00056478.1 IPI00073454.2 IPI00073454.2 IPI00099650.2 IPI00099831.1 IPI00103175.1 IPI00107831.3 IPI00152540.3 IPI00152540.3 IPI00153060.4 IPI00156171.2

description

protein IPI

Table 1. Continued

R.N#GTSSGLGPSCSDAPTTPIK.Q K.AQN#ATGQAEDLAEVSVDSPGPSER.E K.LVTN#LSGQLSELK.D R.CATPHGDN#ASLEATFVK.R K.EELQAN#GSAPAADKEEPAAAGSGAASPSAAEK.G K.VWCLGN#ETR.Y K.VLN#GTLLM*APSGCK.S R.HNN#ITSISTGSFSTTPNLK.C R.GLCVLSVGAN#LTTFDGAR.G R.NPNNDQVFPN#GTLAPSIPIWGGSWR.A R.YLPVN#SSLLTSDCSER.C K.APKPVEMVEN#STDPHPIK.I K.KQDTPQYN#LTIEVSDKDFK.T K.AALSWSN#GN#GTASCR.V K.IHVAGETDSSNLN#VSEPR.A R.N#FTAADWGQSR.D K.N#VTAQICIDK.K R.SVHN#VTGAQVGLSCEVR.A R.LVQLFPN#DTSLK.N

R.VWN#STFIEDYR.D K.N#ITIVTGAPR.H K.CCGAAN#YTDWEK.I R.ESWGQESNAGN#QTVVR.V

K.DIVEYYN#DSN#GSHVLQGR.F K.EN#ATTDLLTK.L

K.EGFSAN#YSVLQSSVSEDFK.C R.GPECSQN#YTTPSGVIK.S R.AALSN#ASLLAEACHQIFEGK.V

R.AEGWEEGPPTVLSDSPWTN#ISGSCK.G

R.N#VTYGTYLDDPDPDDGFNYK.Q R.IGPGEPLELLCN#VSGALPPAGR.H R.GTFTDCALAN#MTEQIR.Q R.N#MTLFSDLVAEK.F R.TTPCEVIDSCTVAMASN#DTPEGVR.Y K.LDFLMFN#YSVPSYLK.L R.LGQAPANWYN#DTYPLSPPQR.T R.DINSQQELQN#ITTDTR.F R.N#VSTNVFFK.Q R.N#YTEYWSGSNSGNQK.M R.LPEMAQPVDPAHN#VSR.L K.AIIAN#LTCK.K

R.AQTLAMN#VTNFWK.E

peptide sequence

466 323 2719 98 61 820 142 104 5242 3031 5186 250 419 231 413 212 804 166 452

935 265 150 173

112 85

261 150 41

54

131 327 140 785 585 342 88 956 337 397 177 411

164

site

yes yes yes yes yes yes yes yes

yes

yes yes yes yes

a

yes yes a

yes

yes yes yes

yes

yes yes yes yes yes yes yes

yes yes yes yes

potential

yes

yes yes

yes

yes

yes

yes

yes

yes

unknown

SEP SEP HA, SEP SEP SEP SEP HA, SEP SEP HA, SEP SEP HA, SEP HA, SEP SEP SEP HA HA, SEP SEP SEP HA, SEP

SEP SEP SEP SEP

HA, SEP HA, SEP

HA, SEP HA, SEP HA, SEP

SEP

SEP SEP HA, SEP SEP SEP SEP SEP HA, SEP SEP HA SEP HA, SEP

SEP

method

Identification of N-Glycosylation Sites

research articles

Journal of Proteome Research • Vol. 8, No. 2, 2009 667

668

Journal of Proteome Research • Vol. 8, No. 2, 2009 peptide sequence

R.LN#ASIADLQSQLR.S R.AQQLLAN#STALEEAM*LQEQQR.L R.DN#ATLQATLHAAR.D R.LN#LTSPDLFWLVFR.Y R.TLSELMSQTGHLGLAN#ASAPSGEQLLR.T K.LN#ASLPALLLIR.L K.N#ATYGYVLDDPDPDDGFNYK.Q R.ALSN#ISLR.L

R.EEQFN#STYR.V K.DTN#GSQFFITTVK.T R.N#CSAAPQPEPAAGLASYPELR.L R.GYYN#QSEAGSHTVQR.M R.N#ATSVDSGAPGGAAPGGPGFR.A

R.WTPLN#SSTIIGYR.I K.NELM*LN#SSLM*R.I K.EVIDTN#LTTLR.D R.SQFN#ATSVADVDKR.V R.SN#FTLTAAR.H K.QDQQLQN#CTEPGEQPSPK.Q R.LNAIN#ITSALDR.G R.N#FSSCSAEDFEK.L K.VQDLVLEPTQN#ITTK.G R.EAALN#DSPCRK.S K.LVGGPVAGGDPN#QTIR.G R.PGAM*N#FSYSPLLR.E K.LFN#TSVEVLPFDNPQSDK.E

R.YN#CSIESPR.K

R.QAIHVGN#QTFNDGTIVEK.Y R.AN#QSWEDSNTDLVPAPAVR.I K.N#ASGIYAEIDGAK.S R.N#LTEVVPQLLDQLR.T K.YQPIN#STHELGPLVDLK.I R.VNVVN#STLAEVHWDPVPLK.S K.DGDDEWTSVVVAN#VSK.Y R.QKDGDDEWTSVVVAN#VSK.Y R.EAIN#ITLDHK.C

R.PLTQQN#TSCDFLPAM*K.S R.N#STIEAANLAGLK.I

R.N#TTWQAGHNFYNVDMSYLK.R K.VSCPIMPCSN#ATVPDGECCPR.C K.DYLTDLITN#DSVSFFR.T R.HIGHAN#LTFEQLR.S

229 588 2423 921 323 170 131 127

323 140 87 110 62

1291 135 2267 255 242 212 200 381 110 552 95 243 465

696

346 70 557 787 993 858 802 802 209

141 184

38 360 198 289

site

yes yes yes yes yes yes yes

b

yes yes

yes yes yes

yes yes yes yes yes yes yes yes

yes

yes yes yes yes yes yes

yes

yes yes

yes yes yes

potential

yes

yes

yes

yes

yes

yes

yes

yes

unknown

method

HA, SEP HA, SEP HA, SEP SEP SEP HA, SEP SEP HA

HA, SEP SEP SEP HA SEP

SEP HA, SEP HA, SEP SEP SEP HA, SEP HA SEP HA, SEP HA HA, SEP SEP SEP

HA

HA, SEP SEP SEP HA, SEP HA, SEP HA, SEP SEP SEP HA, SEP

SEP SEP

SEP SEP HA, SEP HA, SEP

a Protein accession number and protein name from IPI database, and glycosylation sites are annotated according to Swiss-Prot database. Sites are marked as unknown and potential according to the definition of Swiss-Prot database. a, probable; b, by similarity. HC, hydrazide chemistry; HA, hydrophilic affinity; C (carboxyamidomethylation). #, glycosylation (N); *, oxidation (M).

IPI00783665.2 IPI00783665.2 IPI00783665.2 IPI00783665.2 IPI00783665.2 IPI00784119.1 IPI00789155.1 IPI00816626.1

IPI00550640.2 IPI00646304.4 IPI00748502.2 IPI00760554.2 IPI00783492.1

IPI00339227.4 IPI00374563.3 IPI00377045.3 IPI00395866.2 IPI00398918.5 IPI00414896.1 IPI00414984.3 IPI00440932.1 IPI00451450.3 IPI00470874.1 IPI00477868.1 IPI00477868.1 IPI00477905.4

IPI00337495.3

IPI00301395.4 IPI00306543.3 IPI00329482.4 IPI00329482.4 IPI00333776.5 IPI00333776.5 IPI00333776.5 IPI00333776.5 IPI00337495.3

IPI00298405.3 IPI00299299.3

description

cathepsin b precursor thrombospondin-1 precursor isoform 1 of extracellular sulfatase sulf-2 precursor tumor necrosis factor receptor superfamily member 11b precursor promethin stress 70 protein chaperone microsome-associated 60 kDa protein precursor probable serine carboxypeptidase cpvl precursor growth/differentiation factor 15 precursor isoform 1 of laminin subunit R-4 precursor isoform 1 of laminin subunit R-4 precursor isoform 1 of neuronal cell adhesion molecule precursor isoform 1 of neuronal cell adhesion molecule precursor isoform 1 of neuronal cell adhesion molecule precursor isoform 1 of neuronal cell adhesion molecule precursor isoform 2 of procollagen-lysine,2-oxoglutarate 5-dioxygenase 2 precursor isoform 2 of procollagen-lysine,2-oxoglutarate 5-dioxygenase 2 precursor isoform 7 of fibronectin precursor agrin precursor laminin R-3 chain variant 1 scube1 protein hypothetical protein loc374383 isoform 1 of ribonuclease t2 precursor sarcoglycan, epsilon isoform 1 isoform 1 of adam 9 precursor inactive serine protease 35 precursor isoform 2 of formin-1 lama5 protein lama5 protein isoform 1 of R-1,3-mannosyl-glycoprotein 4-β-N acetylglucosaminyltransferase b ighg4 protein peptidylprolyl isomerase b precursor cartilage-associated protein precursor hla class histocompatibility antigen, a-69 alphachain isoform 2 of latent-transforming growth factor beta-binding protein 4 precursor laminin subunit R-5 precursor laminin subunit R-5 precursor laminin subunit R-5 precursor laminin subunit R-5 precursor laminin subunit R-5 precursor vacuolar ATP synthase subunit s1 precursor calumenin precursor plxnb2 protein

protein IPI

IPI00295741.4 IPI00296099.6 IPI00297252.6 IPI00298362.3

Table 1. Continued

research articles Cao et al.

Identification of N-Glycosylation Sites

Figure 1. Classification of N-glycosylation sites. The classification in known, unknown, potential and by similarity is based upon comparison of the experimental data set with the annotated Swiss-Prot database. In total,300 sites could be identified from secreted N-glycoproteins.

confirmed in this study (Table 1). Thus, 172 N-glycosites have not been determined so far accounting for 57.3% of the N-glycosites as shown in Figure 1, combining unknown, potential, probable sites and sites determined by similarity studies alone. For instance, typical sites mentioned are site 57 of CD44 antigen isoform 5 precursor within the sequence (K)AFN#STLPTMAQMEK(A), site 38 of cathepsin b precursor within the sequence (R)N#TTWQAGHNFYNVDMSYLK(R), and site 1291 of isoform 7 of fibronectin precursor within the sequence (R)WTPLN#SSTIIGYR(I). Importantly, these three glycoproteins could be truly involved in invasion and metastasis of HCC.53 Figure 2A shows a representative nano-LC-ESI-MS/MS spectrum of a deglycosylated glycopeptide [(M + 2H)2+ at m/z 490.26] from R-fetoprotein precursor. The localization of the N-glycosylation site was determined by a mass increase of 0.984 Da on the N-X-S/T (X * P) sequon after deamidation of asparagine residue into aspartic acid. The y- series of product ions clearly displayed a mass shift indicating the conversion of asparagine to aspartic acid. Figure 2B shows another example of a peptide [(M + 2H)2+ at m/z 702.38] from protocadherin fat 1 precursor; the residue 6 of this peptide demonstrated to be glycosylated, which was not discovered in literatures. In this study, for the sake of credible identification, an Orbitrap instrument with high mass accuracy is used, allowing a series of criteria safeguarding the experimental results to be present. In addition, to access potential false negative identification, 32 peptides with Xcorr g 1.8 (2+) or Xcorr g 2.5 (3+) but P < 0.95 were further investigated. In all these peptides, more than two asparagine residues would be observed, which interfered with the determination of N-glycosites, and resulted in ∆Cn < 0.1 and P < 0.95. Therefore, to study these peptides, the corresponding peak-lists were searched against the modified database, of which the asparagine residues contained within the consensus sequon was replaced with J, and N-glycosylation was deemed to take place only in the consensus J-X-S/T sequon (‘J’ is defined as deamidated N). Consequently, 28 peptides met the criteria described above. Compared with the Swiss-Prot database, only one asparagine residue was never suggested to be glycosylated, the others were annotated as known (9), potential (17) or by similarity (1) in Swiss-Prot, which were probable N-glycosites, unincorporated into the confirmed results mentioned above, but displayed in Supporting Information Table 2. For instance, the sequence (K)VASVININPN#

research articles TTHSTGSCR(S) with three N residues, P ) 0.2045, but the residue 10 of this sequence was documented as known Nglycosite in Swiss-Prot. Stringently, we also noticed about the existing condition that 700 pairs of tryptic peptides in IPI human v3.45 database have coincident amino acids except for N-X-S/T (X*P) and the counterpart D-X-S/T (X * P). This confusing condition would interfere with data interpretation thoroughly and result in ∆Cn ) 0. In our experiments, among the peptides produced, only a pair of sequences, (K)EFHLNESGNPSSK(S) and (K)EFHLNESGDPSSK(S), from two different proteins, were under the condition. 3.3. Evaluation of the Two Enrichment Methods. Among the 300 nonredundant glycosites, 124 were captured and identified by both HC and HA (41% overlap), 35 and 141 glycosites were identified by HC and HA, respectively (Figure 3A). Furthermore, for glycoprotein identification, the complementary nature of our approach can be demonstrated in two aspects. First, many proteins were identified by different unique peptides captured by different methods. This type of protein identification is exemplified in the case of galectin-3-binding protein precursor (IPI00023673), where two unique N-glycosylated peptides were only identified by HA, while one unique peptide was identified just by HC. Second, a significant number of proteins were detected only by one method. More specifically, the 21 and 89 glycoproteins were observed dependent on the methods of HC and HA, respectively (Figure 3B). Thus, evaluating the results of the two individual methods of affinitydirected glycopeptides analysis revealed a complementary behavior. Although some overlap was observed, many of the glycosites and glycoproteins were either determined by HA or HC but not both. In addition, we also assessed the selectivity for glycopeptides of the two methods based on the number of glycopeptides compared to the total number of identified peptides. First and foremost, we examined the reproducibility of nanoLC-ESI-MS/ MS of the two methods, respectively. With a peptideprophet probability value greater than 0.95, by HC method, we identified a total of 169 nonredundant peptides with an average peptide identification rate of 109 ( 7 (n ) 6) in a single nanoLC-ESI-MS/MS experiment; with respect to HA approach, a total of 511 nonredundant peptides with an average peptide identification rate of 285 ( 18 (n ) 6) were identified. Therefore, the selectivity for glycopeptides of the two methods could be evaluated based on the excellent reproducibility. As shown in Figure 4, based on the number of glycopeptides identified, HA outweighs HC (262 vs 157). However, in terms of specificity for glycopeptides, HC is as high as 92.9% (159/169), while HA is only 51.3% (265/511). Factually, all the comparison conditions and the complementary nature are not surprising since the two glycopeptide enrichment methods are based on different mechanisms to immobilize glycopeptide on solid phase. More specifically, in essence, HC method takes the advantage of the cis-diol groups of carbohydrates. The carbohydrates on glycopeptides are oxidized to form aldehyde groups. The reaction of the newly formed aldehyde groups and the hydrazide groups on the resin conjugate the glycopeptides on the solid phase by forming the covalent hydrazone bond, and the N-glycosylated peptides are specifically released by PNGase F. However, the N-glycan structure remained at the resin, which hinders the methodological application to the analysis of the detailed structures of the attached glycans. Compared to HC method, HA method Journal of Proteome Research • Vol. 8, No. 2, 2009 669

research articles

Cao et al.

Figure 2. Nano-LC-MS/MS mass spectra of doubly charged N-glycosylated peptide (A) VN#FTEIQK [(M + 2H)2+ at m/z 490.26] from R-fetoprotein precursor and (B) TGALTVQN#TTQLR [(M + 2H)2+ at m/z 702.38] from protocadherin fat 1 precursor. The # denotes the residue site of N-glycosylation.

is low cost and simple preparation without multiderivatization. Typically, HA method is applied to the enrichment of glycopeptides, based on hydrophilic interaction of carbohydrates with gel matrices. However, since hydrophilic peptides may not be completely separated from glycopeptides, the relative low selectivity of HA is reasonable. Worthy of mention are the following: first, we also have tried to enrich glycopeptides with SiMAG-Boronic acid beads.25 All 670

Journal of Proteome Research • Vol. 8, No. 2, 2009

the results obtained by boronate affinity method were found to be covered by the results of either HC or HA (data not shown), namely, the boronate affinity method did not contribute to the identification of glycosites among the three methods. In essence, the boronate affinity method makes use of the cisdiol groups of carbohydrates the same as HC method, which has no limit to certain types of N-glycans. However, this unexpected result may be due to several secondary interactions,

research articles

Identification of N-Glycosylation Sites

proteins positively confirmed this good point. (2) This investigation, for the first time, employing both HA and HC methods for glycopeptide enrichment, significantly enhanced the identification of N-glycated sites on secreted proteins. Using stringent criteria, we identified 300 different N-glycosites within 194 different glycoproteins. Of these, 172 N-glycosites were not determined experimentally previously. (3) The two methods for glycopeptide enrichment, HA and HC, based on different separation mechanisms, were compared directly for the first time, and the merits and shortcomings of each method were discussed based on the obtained results. Abbreviations: HA, hydrophilic affinity; HC, hydrazide chemistry.

Figure 3. Two-way Venn diagram depicting the overlay of glycosylation sites (A) and glycoproteins (B) detected by HC and HA.

Acknowledgment. This work was supported in part by National 863 project (2006AA02A308, 2006AA02Z134), 973 and S973 project (2006CB910801, 2007CB914100, 2009CB825607), NFS fund (30700990, 20735005, 20875016), and Shanghai STDP fund (07JC14003) and LAD program (B109). We also deeply appreciate Koichi Tanaka for his long-term great help in glycoprotein identification. Supporting Information Available: A detailed list of the 128 glycosylation sites reported in Swiss-Prot database (Supporting Table 1), and the 27 peptides which met the criteria after searching against the modified database (Supporting Table 2). This material is available free of charge via the Internet at http://pubs.acs.org. References

Figure 4. The comparison of the two methods for enriching glycopeptide, including the number of glycopeptides identified and the selectivity for glycopeptides.

for example, hydrophobic interactions, ionic interactions, hydrogen bonding, coordination interactions, which sometimes play an important role in retention.54 Second, the HC and HA methods can also capture O-linked glycopeptides; for the destined relationship between N-glycoproteins and classical secreted proteins, we detected the N-linked glycopeptides only in the present study.

4. Conclusions We report here an in-depth identification of N-glycosylation sites on secreted proteins of human hepatocellular carcinoma (HCC) cells using a complementary proteomic approach which integrated hydrophilic affinity (HA) and hydrazide chemistry (HC) for glycopeptide enrichment, followed by nanoLC-ESIMS/MS analysis. Our strategy has several advantages: (1) the utility of N-glycoproteomic strategy is a great way to profile the genuinely secreted proteins. Because inevitable contaminants such as actin and residual bovine serum albumin (BSA) are not N-glycosylated, they can be depleted completely in our N-glycopeptide enrichment procedures. In this study, the fact that 96.4% of the identified glycoproteins could be secretory

(1) Grønborg, M.; Kristiansen, T. Z.; Iwahori, A.; Chang, R.; Reddy, R.; Sato, N.; Molina, H.; Jensen, O. N.; Hruban, R. H.; Goggins, M. G.; Maitra, A.; Pandey, A. Mol. Cell. Proteomics 2006, 5, 157– 171. (2) Mbeunkui, F.; Metge, B. J.; Shevde, L. A.; Pannell, L. K. J. Proteome Res. 2007, 6, 2993–3002. (3) Zwickl, H.; Traxler, E.; Staettner, S.; Parzefall, W.; Grasl-Kraupp, B.; Karner, J.; Schulte-Hermann, R.; Gerner, C. Electrophoresis 2005, 26, 2779–2785. (4) Mbeunkui, F.; Fodstad, O.; Pannell, L. K. J. Proteome Res. 2006, 5, 899–906. (5) Lewandrowski, U.; Moebius, J.; Walter, U.; Sickmann, A. Mol. Cell. Proteomics 2006, 5, 226–233. (6) Pan, S.; Wang, Y.; Quinn, J. F.; Peskind, E. R.; Waichunas, D.; Wimberger, J. T.; Jin, J. H.; Li, J. G.; Zhu, D.; Pan, C.; Zhang, J. J. Proteome. Res. 2006, 5, 2769–2779. (7) Jeremy, E. T.; Robert, A. F. Nat. Chem. Biol. 2007, 3, 74–77. (8) Danielle, H. D.; Carolyn, R. B. Nat. Rev. Drug Discovery 2005, 4, 477–488. (9) Tretter, V.; Altmann, F.; Marz, L. J. Biochem. 1991, 199, 647–652. (10) Everley, P. A.; Bakalarski, C. E.; Elias, J. E.; Waghorne, C. G. J. Proteome Res. 2006, 5, 1224–1231. (11) Olsen, J. V.; de Godoy, L. M. F.; Li, G. Q; Macek, B.; Mortensen, P.; Pesch, R.; Makarov, A.; Lange, O.; Horning, S.; Mann, M. Mol. Cell. Proteomics 2005, 4, 2010–2021. (12) Wang, Y. H.; Wu, S. L.; Hancock, W. S. Biotechnol. Prog. 2006, 22, 873–880. (13) Liu, X.; Ma, L.; Li, J. J. Anal. Lett. 2008, 41, 268–277. (14) Zhao, J.; Qiu, W.; Simeone, D. M.; Lubman, D. M. J. Proteome Res. 2007, 6, 1126–1138. (15) Zhao, J.; Patwa, T. H.; Qiu, W.; Shedden, K.; Hinderer, R.; Misek, D. E.; Anderson, M. A.; Simeone, D. M.; Lubman, D. M. J. Proteome Res. 2007, 6, 1864–1874. (16) Qiu, Y. H.; Patwa, T. H.; Xu, L.; Shedden, K.; Misek, D. E.; Tuck, M.; Jin, G.; Ruffin, M. T.; Turgeon, D. K.; Synal, S.; Bresalier, R.; Marcon, N.; Brenner, D. E.; Lubman, D. M. J. Proteome Res. 2008, 7, 1693–1703. (17) Zhang, H.; Li, X. J.; Martin, D. B.; Aebersold, R. Nat. Biotechnol. 2003, 21, 660–666. (18) Liu, T.; Qian, W. J.; Gritsenko, M. A.; CampII, D. G.; Monroe, M. E.; Moore, R. J.; Smith, R. D. J. Proteome Res. 2005, 4, 2070–2080. (19) Ramachandran, P.; Boontheung, P.; Xie, Y. M.; Sondej, M.; Wong, D. T.; Loo, J. A. J. Proteome Res. 2006, 5, 1493–1503.

Journal of Proteome Research • Vol. 8, No. 2, 2009 671

research articles (20) Hagglund, P.; Bunkenborg, J. E. F.; Jensen, O. N.; Roepstorff, P. J. Proteome Res. 2004, 3, 556–566. (21) Sekiya, S.; Wada, Y.; Tanaka, K. Anal. Chem. 2005, 77, 4962–4968. (22) Zhang, Y.; Go, E. P.; Desaire, H. Anal. Chem. 2008, 80, 3144–3158. (23) Sparbier, K.; Wenzel, T.; Kostrzewa, M. J. Chromatogr., B 2006, 840, 29–36. (24) Sparbier, K.; Asperger, A.; Resemann, A.; Kessler, I.; Koch, S.; Wenzel, T.; Stein, G.; Vorwerg, L.; Suckau, D.; Kostrzewa, M. J. Biomol. Tech. 2007, 18, 252–258. (25) Zhang, Q. B.; Tang, N.; Brock, J. W. C.; Mottaz, H. M.; Ames, J. M.; Baynes, J. W.; Smith, R. D.; Metz, T. O. J. Proteome Res. 2007, 6, 2323–2330. (26) Gerardo, A. M.; Atwood, J.; Guo, Y.; Warren, N. L.; Orlando, R.; Pierce, M. J. Proteome Res. 2006, 5, 701–708. (27) Dai, Z.; Fan, J.; Liu, Y. K.; Zhou, J.; Bai, D. S.; Tan, C. J.; Guo, K.; Zhang, Y.; Zhao, Y.; Yang, P. Y. Electrophoresis 2007, 28, 4382– 4391. (28) Zhao, J.; Simeone, D. M.; Heidt, D.; Anderson, M. A.; Lubman, D. M. J. Proteome Res. 2006, 5, 1792–1802. (29) Nakagawa, T.; Uozumi, N.; Nakano, M.; Horikawa, Y. M.; Okuyama, N.; Taguchi, T.; Gu, J. G.; Kondo, A.; Taniguchi, N.; Miyoshi, E. J. Biol. Chem. 2006, 281, 29797–29806. (30) Zhang, H.; Yi, E. C.; Li, X. J.; Mallick, P.; Kelly-Spratt, K. S.; Masselon, C. D.; Camp, D. G., II; Smith, R. D.; Kemp, C. J.; Aebersold, R. Mol. Cell. Proteomics 2005, 4, 144–155. (31) Bernhard, O. K.; Kapp, E. A.; Simpson, R. J. J. Proteome Res. 2007, 6, 987–995. (32) Li, Y.; Tang, Z. Y.; Ye, S. L.; Liu, B. B.; Liu, Y. K.; Chen, J.; Xue, Q. J. Cancer Res. Clin Oncol. 2003, 129, 43–51. (33) Sun, B. Y.; Ranish, J. A.; Utleg, A. G.; White, J. T.; Yan, X. W.; Lin, B. Y.; Hood, L. Mol. Cell. Proteomics 2007, 6, 141–149. (34) Wada, Y.; Tajiri, M.; Yoshida, S. Anal. Chem. 2004, 76, 6560–6565. (35) Tajiri, M.; Yoshida, S.; Wada, Y. Glycobiology 2005, 15, 1332–1340. (36) Zhou, Y.; Aebersold, R.; Zhang, H. Anal. Chem. 2007, 79, 5826– 5837. (37) Kreunin, P.; Zhao, J.; Rosser, C.; Urquidi, V.; Lubman, D. M.; Goodison, S. J. Proteome Res. 2007, 6, 2631–2639. (38) Liu, T.; Qian, W. J.; Gritsenko, M. A.; Xiao, W. Z.; Moldawer, L. L.; Kaushal, A.; Monroe, M. E.; Varnum, S. M.; Moore, R. J.; Purvine,

672

Journal of Proteome Research • Vol. 8, No. 2, 2009

Cao et al.

(39) (40) (41) (42) (43) (44) (45) (46) (47) (48) (49) (50) (51) (52) (53) (54)

S. O.; Maier, R. V.; Davis, R. W.; Tompkins, R. G.; Camp, D. G., II; Smith, R. D. Mol. Cell. Proteomics 2006, 5, 1899–1913. Chen, M.; Ying, W. T.; Song, Y. P.; Liu, X.; Yang, B.; Wu, S. F.; Jiang, Y.; Cai, Y.; He, F. C.; Qian, X. H. Proteomics 2007, 7, 2479–2488. Keller, A.; Nesvizhskii, A. I.; Kolker, E.; Aebersold, R. Anal. Chem. 2002, 74, 5383–5392. Atwood, J. A., III; Sahoo, S. S.; Alvarez-Manilla, G.; Weatherly, D. B.; Kolli, K.; Orlando, R.; York, W. S Rapid Commun. Mass Spectrom. 2005, 19, 3002–3006. Wu, C. C.; Cheng, H. C.; Chen, S. J.; Liu, H. P.; Hsieh, Y. Y.; Yu, C. J.; Tang, R.; Hsieh, L. L.; Yu, J. S.; Chang, Y. S. Proteomics 2008, 8, 316–332. Bendtsen, J. D.; Nielsen, H.; Heijne, G. V.; Brunak, S. J. Mol. Biol. 2004, 340, 783–795. Bendtsen, J. D.; Jensen, L. J.; Blom, N.; Heijne, G. V.; Brunak, S. Protein Eng. Des. Sel. 2004, 17, 349–356. Pellitteri-Hahn, M. C.; Warren, M. C.; Didier, D. N.; Winkler, E. L.; Mirza, S. P.; Greene, A. S.; Olivier, M. J. Proteome. Res. 2006, 5, 2861–2864. Chevallet, M.; Diemer, H.; Dorssealer, A. V.; Villiers, C.; Rabilloud, T. Proteomics 2007, 7, 1757–1770. Krueger, K. E.; Srivastava, S. Mol. Cell. Proteomics 2006, 5, 1799– 1810. Feng, J. T.; Liu, Y. K.; Song, H. Y.; Dai, Z.; Qin, L. X.; Almofti, M. R.; Fang, C. Y.; Lu, H. J.; Yang, P. Y.; Tang, Z. Y. Proteomics 2005, 5, 4581–4588. Sun, Wei; Xing, B. C.; Sun, Y.; Du, X. J.; Lu, M.; Hao, C. Y.; Lu, Z.; Mi, W.; Wu, S. F.; Wei, H. D.; Gao, X.; Zhu, Y. P.; Jiang, Y.; Qian, X. H.; He, F. C. Mol. Cell. Proteomics 2007, 6, 1798–1808. Qin, L. X.; Tang, Z. Y. World J. Gastroenterol. 2002, 8, 385–392. Qin, L. X.; Tang, Z. Y. J. Cancer Res. Clin. Oncol. 2004, 130, 497– 513. Guo, H. B.; Zhang, Y.; Chen, H. L. J. Cancer Res. Clin. Oncol. 2001, 127, 231–236. Fedorowski, A.; Steciwko, A.; Rabczynski, J. Med. Sci. Monit. 2004, 10, BR144-BR150. Liu, X. C. Chin. J. Chromatogr. 2006, 24, 73–80.

PR800826U