Large-scale Identification of N-Glycosylated Proteins of Mouse Tissues

Jul 23, 2012 - ... named GlycoProtDB (http://jcggdb.jp/rcmg/gpdb/), which makes our .... on the Apache Tomcat 5.5 (application server) and Oracle Data...
0 downloads 0 Views 3MB Size
Article pubs.acs.org/jpr

Large-scale Identification of N-Glycosylated Proteins of Mouse Tissues and Construction of a Glycoprotein Database, GlycoProtDB Hiroyuki Kaji,*,†,‡ Toshihide Shikanai,† Akiko Sasaki-Sawa,‡ Hongling Wen,† Mika Fujita,† Yoshinori Suzuki,† Daisuke Sugahara,† Hiromichi Sawaki,† Yoshio Yamauchi,‡ Takashi Shinkawa,‡ Masato Taoka,‡ Nobuhiro Takahashi,§ Toshiaki Isobe,‡ and Hisashi Narimatsu† †

Research Center for Medical Glycoscience, National Institute of Advanced Industrial Science and Technology, Tsukuba Central-2, Umezono 1-1-1, Tsukuba, Ibaraki 305-8568, Japan. ‡ Department of Chemistry, Graduate School of Science and Engineering, Tokyo Metropolitan University, Minami-osawa 1-1, Hachioji, Tokyo 192-0397, Japan § Department of Applied Life Science, United Graduate School of Agriculture, Tokyo University of Agriculture and Technology, Saiwai-cho 3-5-8, Fuchu, Tokyo 183-8509, Japan. S Supporting Information *

ABSTRACT: Protein glycosylation is a common post-translational modification that plays important roles in terms of protein function. However, analyzing the relationship between glycosylation and protein function remains technically challenging. This problem arises from the fact that the attached glycans possess diverse and heterogeneous structures. We believe that the first step to elucidate glycan function is to systematically determine the status of protein glycosylation under physiological conditions. Such studies involve analyzing differences in glycan structure on cell type (tissue), sex, and age, as well as changes associated with perturbations as a result of gene knockout of glycan biosynthesis-related enzyme, disease and drug treatment. Therefore, we analyzed a series of glycoproteomes in several mouse tissues to identify glycosylated proteins and their glycosylation sites. Comprehensive analysis was performed by lectin- or HILIC-capture of glycopeptide subsets followed by enzymatic deglycosylation in stable isotope-labeled water (H218O, IGOT) and finally LC−MS analyses. In total, 5060 peptides derived from 2556 glycoproteins were identified. We then constructed a glycoprotein database, GlycoProtDB, using our experimental-based information to facilitate future studies in glycobiology. KEYWORDS: glycoprotein, glycoproteome, lectin, glycosylation site map, LC−MS, GlycoProtDB, mouse, IGOT



INTRODUCTION Many proteins are subject to modification after translation for them to function at the proper place and at the appropriate moment with their cognate partners. Protein glycosylation is known to regulate folding, targeting, interaction and half-life of the corresponding protein. As such, glycosylation is related to various biological processes, including cell−cell interactions and the immune response.1,2 A specific glycan motif, such as phosphorylated mannose,3 α1,6-fucose,4,5 polylactosamine,6,7 LacdiNAc8−11 and Lewis antigen,12,13 is often associated with a specific role. However, functional studies of glycans remain technically demanding because these glycans are diverse and heterogeneous structures. As an approach to elucidate the function of a specific glycan motif, model animals lacking a glycogene responsible for the biosynthesis of the corresponding glycan motif can be generated and their phenotypes can be subsequently analyzed by biochemical and pathological investigations.14−24 If any aberration is observed, the carrier protein derived from the corresponding tissue or cell type is then analyzed. © 2012 American Chemical Society

Several methods have been developed to comprehensively identify glycoproteins bearing specific glycan motif(s). An approach for capturing a subset of glycopeptides possessing a specific monosaccharide (e.g., sialic acid, GlcNAc or fucose) utilizes in vivo incorporation of reactive sugar derivatives, socalled click chemistry.25−29 However, this method is not readily applicable to living organisms and cannot be performed in humans. Another simple approach is to use a lectin cognate for a specific motif to enrich a target subset of glycopeptides. In general, lectins are immobilized onto a solid support and used for affinity chromatography. Recently, a unique, filter-aided sample preparation (FASP) method has been applied to capture glycopeptides with free lectin, which has facilitated the identification of a large number of mouse glycoproteins.30 We have utilized multiple lectin columns to enrich glycopeptide subsets31,32 and then identified the glycopeptides by enzymatic stable isotope-tagging of the N-glycosylated site (Asn residue) Received: April 10, 2012 Published: July 23, 2012 4553

dx.doi.org/10.1021/pr300346c | J. Proteome Res. 2012, 11, 4553−4566

Journal of Proteome Research

Article

Peptide Preparation of Mouse Tissue Proteins

in combination with LC−MS analysis of the deglycosylated (tagged) peptides. This method was named IGOT (isotopecoded glycosylation site-specific tagging)-LC−MS.33 In this study, we systematically analyzed glycopeptide subsets obtained from nine mouse tissues and serum using multiple lectins of distinct specificity and a hydrophilic interaction chromatography (HILIC) column. Our findings reveal the glycosylation status of mouse proteins under physiological conditions, which may serve for example as a reference to compare with those derived from glycogene-knockout mice and diseased mice. In total, 2556 glycoproteins (1819 gene families/ symbols) were identified and 5657 sites were mapped on the sequences. In addition, we also constructed a glycoprotein database, named GlycoProtDB (http://jcggdb.jp/rcmg/gpdb/), which makes our experimental-based information available to the wider research community to facilitate future studies in the field of glycobiology.



Peptide preparation was performed as described previously.32 Briefly, each frozen tissue was weighed (0.5−10.0 g), thawed on ice, and then homogenized using a Polytron homogenizer with 10-fold of a denaturing buffer, 0.5 M Tris-HCl, pH8.5, containing 7 M guanidine-HCl and 10 mM EDTA. The homogenate was sonicated to reduce the viscosity of the solution by breaking up the DNA. The sonicated solution was then centrifuged and the supernatant recovered. The serum was diluted 9-fold with denaturing buffer. After reduction with DTT (1:1 w/w protein) and carbamoylmethylation with iodoacetamide (2.5:1 w/w protein), the protein solution was dialyzed against 10 mM HEPES-NaOH, pH 7.5 (for liver, brain, kidney, lung and testis, the solvent was selected for peptide assay by the o-phthalaldehyde method32), or 10 mM ammonium bicarbonate, pH 8.6 (for serum, colon, stomach, skeletal muscle and heart). Dialyzed samples were then digested with TPCKtreated trypsin at E/S ratio of 1/100 (w/w) at 37 °C for 16 h. Progression of the digestion was monitored by SDSpolyacrylamide gel electrophoresis followed by CBB staining. Protein concentration was assayed by the Bradford method (DC protein assay kit, Bio-Rad) using bovine serum albumin as a standard.

MATERIALS AND METHODS

Mouse Tissues

Tissues used in this study were dissected from the inbred mouse strain C57BL/6J, males, aged 9−13 weeks, purchased from Nihon Clea (Tokyo, Japan). The mice were anesthetized with ether. First, about 1 mL blood was taken from the orbital plexus of each individual. After addition of EDTA to a final concentration of 0.5 mM, the blood samples were centrifuged at 800× g, 4 °C for 5 min to separate blood cells from plasma. The pellet was discarded. Liver, kidney, heart, brain, lung, stomach, skeletal muscle (legs), colon and testis were taken out after perfusion with PBS. All tissues were frozen in liquid nitrogen and stored at −80 °C until required.

Affinity Capture of Glycopeptides from the Tryptic Digests

The tryptic digest (50−100 mg protein) was applied to a lectin column and a subset of glycopeptides carrying glycans reactive to the lectin was isolated as described previously.32 The method used for sample preparation is summarized in Supplementary Table 1 (Supporting Information) for each tissue-lectin combination. Briefly, aliquots of the tryptic digest of liver proteins were applied to five lectin columns in parallel (i.e., ConA, RCA120, AAL, SSA and WGA). In the same way, aliquots of samples derived from brain, lung, kidney and testis were applied to two lectin columns (ConA and RCA120) in parallel. Additionally, unbound fractions from both lectin columns were combined and applied again to AAL column. Tryptic digests of proteins from heart, stomach, skeletal muscle and colon were applied to RCA120 and ConA columns in a serial fashion; namely each digest was loaded onto an RCA120 column and the flow-through fraction was applied to a ConA column. Glycopeptides captured on these lectin columns were eluted with an appropriate sugar solution and further purified byHILIC using Sepharose CL-4B32 or TSKgel Amide-80 columns. Sepharose CL-4B column (5 mm i.d. × 50 mm) was equilibrated with a solvent: water/EtOH/1-BuOH = 1:1:4 (v/v/v). The sample was prepared by adding an equal volume of EtOH to lectin eluate and then 4 volumes of 1-BuOH, and loaded to the column. After washing with the equilibrium solvent, glycopeptides were eluted with 50% EtOH, monitoring the eluate absorption at 220 nm. Binding of glycopeptides to an Amide-80 column was carried out in 70% MeCN containing 0.1% TFA. After washing with the same solvent, glycopeptides were eluted with 50% MeCN containing 0.1% TFA. Serum glycopeptides were collected with an Amide-80 column (2 mm i.d. × 50 mm) directly. Purified glycopeptides were evaporated using a centrifugal vacuum concentrator with glycerol (50% aqueous, 2−5 μL) and PNGase F buffer (1 M Tris-HCl, pH 7.5, 2 μL) to remove the solvent (EtOH or MeCN) and ordinary water (H216O).

Chemicals

Guanidine-HCl, HEPES (4-(2-hydroxyethyl)-1-piperazineethanesulfonic acid), ammonium bicarbonate, and neuraminidase (from Arthrobacter ureafaciens, EC 3.2.1.18) were purchased from Nacalai Tesque (Kyoto, Japan). Dithiothreitol (DTT), iodoacetamide (IAA), trichloroacetic acid (TCA), methyl α-Dmannopyranoside, lactose, ethanol, 1-butanol and trifluoroacetic acid (TFA) were from Wako Pure Chemicals (Osaka, Japan). Tosyl phenylalanylchloromethyl ketone (TPCK)treated trypsin was obtained from Thermo Fisher Scientific (Waltham, MA). Stable isotope-labeled water (H218O, 99% atom % 18O) was a product of Taiyo Nippon Sanso Corp. (Tokyo, Japan). L-Fucose and phenylmethylsulfonyl fluoride (PMSF) were obtained from Sigma-Aldrich (St. Louis, MO). Glycopeptidase F (peptide-N-glycanase) was from Takara Bio (Kyoto, Japan). HPLC lectin columns (4.6 mm i.d. (inner diameter) × 150 mm; LA-AAL, Aleuria aurantia lectin; LAConA, concanavalin A from Jack Bean (Canavalia ensiformis); LA-WGA, wheat germ agglutinin; LA-SSA, Red-Berried Elder (Sambucus sieboldiana); LA-RCA120, Castor oil bean (Ricinus communis)) and chitooligosaccharide were from Seikagaku Biobusiness (Tokyo, Japan). RCA (Ricinus communis agglutinin) 120-Agarose was from Vector Laboratories, Inc. (Burlingame, CA). HPLC column TSKgel Amide-80 (2 mm i.d. × 50 mm, 5 μm particles) was a product of TOSOH (Tokyo, Japan). All solutions were prepared using ultrapure water and analytical grade reagents unless stated otherwise. 4554

dx.doi.org/10.1021/pr300346c | J. Proteome Res. 2012, 11, 4553−4566

Journal of Proteome Research

Article

Deglycosylation and Stable Isotope-labeling (IGOT) of N-Glycopeptides

The ten most intense ions were fragmented in a data dependent mode by CID with normalized collision energy of 35, activation q 0.25, activation time of 10 ms and one microscan and analyzed in the ion trap MS. The following conditions were used: spray voltage, 2.0 kV; the ion transfer tube temperature, 200/250 °C; ion selection threshold for MS/ MS, 10000 counts; maximum ion accumulation times, 500 ms for full scans; a dynamic exclusion duration, 60 s (10 ppm window; maximum number of excluded peaks, 500). The MS/MS spectra data of Q-TOF were processed using MassLynx software (version 4.0, Micromass) to create peak list files with smoothing by the Savizky-Golay method (window channels, ±3). The MS/MS data of LTQ Orbitrap was converted to the mascot generic format by software Proteome Discoverer (Thermo Scientific, ver. 1.1). The files were processed by the MASCOT algorithm (Version 2.1−2.2, Matrix Science) to assign peptides using the NCBI RefSeq protein sequence database (30,041 entries, downloaded on 25 Feb., 2011). The database search was performed with the parameters as described previously including a variable modification of Asn with 3 Da increase for IGOT. All results of peptide searches were exported as CSV files and processed by Microsoft Excel. We first selected the peptides with rank 1 and expectation value