Identification of Glycoproteins in Human Cerebrospinal Fluid with a Complementary Proteomic Approach Sheng Pan,§,# Yan Wang,§,‡ Joseph F. Quinn,† Elaine R. Peskind,‡,| Dana Waichunas,† Jake T. Wimberger,‡ Jinghua Jin,‡ Jane G. Li,‡ David Zhu,‡ Catherine Pan,‡ and Jing Zhang*,‡ Departments of Pathology and Psychiatry & Behavioral Sciences, University of Washington, Seattle, Washington 98104, Institute for Systems Biology, Seattle, Washington 98103, Oregon Health and Science University and Portland VA Medical Center, Oregon, Portland 97239, and VA Northwest Network Mental Illness Research, Education, and Clinical Center, VA Puget Sound Health Care System, Seattle, Washington 98108 Received May 25, 2006
Biomarkers are pressingly needed to assist with the clinical diagnosis of neurodegenerative diseases and/or the monitoring of disease progression. Glycoproteins are enriched in bodily fluids such as human cerebrospinal fluid (CSF), an ideal source for discovering biomarkers due to its proximity to the central nervous system (CNS), and consequently can serve as diagnostic and/or therapeutic markers for CNS diseases. We report here an in-depth identification of glycoproteins in human CSF using a complementary proteomic approach which integrated hydrazide chemistry and lectin affinity column for glycoprotein enrichment, followed by multidimensional chromatography separation and tandem mass spectrometric analysis. Using stringent criteria, a total of 216 glycoproteins, including many lowabundance proteins, was identified with high confidence. Approximately one-third of these proteins was already known to be relevant to the CNS structurally or functionally. This investigation, for the first time, not only categorized many glycoproteins in human CSF but also expanded the existing overall CSF protein database. Keywords: mass spectrometry • proteomics • cerebrospinal fluid • glycoprotein • hydrazide chemistry • lectin affinity column
Introduction Human cerebrospinal fluid (CSF), which circulates within the ventricles of the brain and the subarachnoid space of the central nervous system (CNS),1,2 is an important target for proteomic studies of many neurodegenerative disorders, including Alzheimer’s disease (AD) and Parkinson’s disease (PD).3 The growing interest in applying proteomics to the study of neurodegenerative diseases is powered by the recent advancement of proteomic technologies.4 It is further stimulated by the hypothesis that the discovery of disease-specific biomarkers can assist with clinical diagnosis, monitoring of disease progression, and identification of new therapeutic targets.5 Proteomic profiling of human CSF for neurodegenerative biomarkers is particularly appealing because unique markers discovered in CSF, which is intimately associated with the tissue of pathology, likely reflect the pathogenesis of the disease of * Correspondence to Jing Zhang, MD, Ph.D., Division of Neuropathology, University of Washington School of Medicine, Box 359635, Harborview Medical Center, Seattle, WA 98104. Phone: 206-341-5245. Fax: (206) 3415249. E-mail:
[email protected]. § Authors who contributed to the work equally. # Institute for Systems Biology. ‡ University of Washington. † Oregon Health and Science University and Portland VA Medical Center. | VA Northwest Network Mental Illness Research, Education, and Clinical Center, VA Puget Sound Health Care System. 10.1021/pr060251s CCC: $33.50
2006 American Chemical Society
interest, thereby shedding more light on novel mechanisms of each disease.6,7 The study of CSF proteins is also advantageous in biomarker discovery for at least two additional reasons: (1) the CNS-specific markers are typically low in abundance, meaning that they are more likely to be identified in CSF than in plasma, assuming these proteins cross the blood brain barrier (BBB)6 and (2) there is the possibility that the markers unique to CNS diseases may not cross the BBB. In fact, accumulating evidence has suggested that very few markers known to be related to CNS diseases have been detected in human plasma.8 Comprehensive characterization of the human CSF proteome by nonbiased profiling, however, presents several analytical challenges.9 First, human CSF is very complex; it is assumed to consist of at least thousands of proteins in addition to other compounds, such as organic and nonorganic salts, various sugars, and lipids.10 Protein heterogeneity due to posttranslational modifications (PTM) may further contribute to the complexity of human CSF. In addition, the total protein content in human CSF is approximately 300-700 µg/mL,10 which is significantly lower than that in plasma or serum. Of this, a few predominant proteins represent more than 70% of the total protein amount in CSF; for example, albumin and immunoglobulins constitute >50% and >15% of the total CSF protein content, respectively. Finally, the dynamic range of protein Journal of Proteome Research 2006, 5, 2769-2779
2769
Published on Web 08/30/2006
research articles concentration in human CSF is approximately 109,11 which is similar to that in serum and plasma.12 Indeed, before our recent publications,9,13-15 fewer than 300 proteins, most of which are abundant proteins, were identified in human CSF. To date, we have identified over 1500 proteins in human CSF.9,13 However, proteins derived from the CNS are typically low in abundance and may be overshadowed by abundant proteins during the analysis due to the sample complexity and significant dynamic range in protein concentration. To circumvent this difficulty for proteomic analysis in biological fluids, for example, plasma or serum, CSF, and saliva, a direct analysis of a “subproteome” in a complex sample may provide alternative ways to effectively characterize proteins that are relatively low in abundance. Such efforts have been reported in several recent studies, including proteomic analysis of phosphorylated proteins16 and glycosylated proteins17-19 in human plasma/serum. In protein glycosylation, one of the most common PTM, carbohydrates are linked to serine or threonine residues (Olinked glycosylation) or to asparagine residues (N-linked glycosylation). N-linked glycosylation sites generally fall into the N-X-S/T sequence motif in which X denotes any amino acid except proline.20 Glycosylated proteins, N-linked glycosylation in particular, are prevalent in proteins destined for extracellular environments,21 for example, plasma, and consequently may serve as diagnostic and therapeutic biomarkers. Examples of well-established glycosylated protein markers include Her2/ neu (breast cancer), human chorionic gonadotropin and R-fetoprotein (germ cell tumors), prostate-specific antigen (PSA, prostate cancer), and CA125 (ovarian cancer). Furthermore, changes in the extent of glycosylation or the carbohydrate structure have been shown to correlate with cancer and other disease states.22-24 Thus, although no systematic attempts have been made to date in the global characterization of glycoproteins in human CSF, it is expected that markers unique to CNS diseases or their progressions may be enriched in this compartment, and that changes in CSF glycosylated proteins may serve as an indicator or effector of pathologic mechanisms of CNS diseases. Currently, there are two major methods of isolating glycoproteins, lectin affinity purification and hydrazide chemistry. While lectin affinity enrichment can isolate multiple types of glycoproteins in complex biological samples,25,26 hydrazide chemistry for glycoprotein extraction is specific for N-glycopeptide capture.27 Recent studies have demonstrated that both hydrazide chemistry and lectin affinity methods are effective means for glycoprotein enrichment and separation from human plasma,18,19 although no direct comparison has been made between these two methods. Accordingly, in the current study, we applied both methods in combination with intergraded 2D liquid chromatography (LC) separation and tandem mass spectrometry (MS) as the first step toward a large-scale proteomic identification of CSF glycoproteins. This investigation identified more than 200 glycoproteins in human CSF, and many were not only novel but also structurally and functionally relevant to the CNS.
Materials and Methods Characterization of the Subjects. This study was approved by the Human Subjects Committee of the University of Washington and the Oregon Health and Science University. All subjects provided written informed consent. Subjects included 19 normal community volunteers aged 35-45 (gender ratio, 2770
Journal of Proteome Research • Vol. 5, No. 10, 2006
Pan et al.
10 male:9 female). All subjects were in good general health. Clinical evaluation consisted of medical history, physical and neurological examinations, laboratory tests, and neuropsychological assessment. Laboratory evaluation included complete blood count: serum electrolytes, blood urea nitrogen, creatinine, glucose, vitamin B12, and thyroid stimulating hormone; all results were within normal limits. Neuropsychological evaluation included the Mini-Mental State Exam (MMSE),28 Trail-Making Tests A and B,29 Clinical Dementia Rating Scale (CDR30), the Mattis and Coblentz Dementia Rating Scale score (DRS31), and the New York University (NYU) version of the Logical Memory II subscale (Immediate and Delayed Paragraph Recall) from the Wechsler Memory Scale-Revised.32 None of the subjects had any signs or symptoms suggesting cognitive decline or neurological disease; all subjects had a MMSE score between 28 and 30, a CDR score of 0, and NYU paragraph recall scores (immediate and delayed) >6. Other exclusion criteria included heavy cigarette smoking (more than 10 packs/year), alcohol use other than socially, and any psychoactive medications. Collection and Characterization of Human CSF. Following written informed consent, individuals were placed in the lateral decubitus position and the L4-5 interspace was infiltrated with 1% lidocaine. Lumbar puncture was performed with a Sprotte 24 g spinal needle. Individuals remained in bed rest for 1 h following lumbar puncture. It should be noted that human CSF is closely regulated via balanced secretion and absorption with an average circulating volume between 125 and 150 mL in an adult,11,33,34 and the amount of CSF that can be obtained is usually limited to less than 25-30 mL. All CSF samples were collected in the morning after overnight fasting. Although 25 mL of CSF was typically drawn from each subject, only 1.0 mL of this from each subject was used to generate a pooled sample, that is, a total of 19 mL of CSF by pooling 19 subjects. Also, because the protein concentration in CSF is relatively low compared to plasma (CSF/plasma ) 1/100-200) and the protein profiles in CSF are similar to those in plasma,11 even a minor contamination of CSF with blood could significantly confound the interpretation of quantitative or qualitative proteomic analysis of CSF. The study was therefore limited to CSF samples with 6000. This approach has been utilized successfully in our previous CSF proteomics studies.9,13-15 Isolation of N-Linked Deglycosylated Peptides from CSF Using Hydrazide Resin. N-linked deglycosylated peptides were isolated using the method previously described27 with minor modification. In each experiment, 2 mL of pooled CSF sample was exchanged to coupling buffer (100 mM NaAc and 150 mM NaCl, pH 5.5) using a Zeba Desalt Spin column (Pierce, Rockford, IL). After sodium periodate solution was added (final concentration: 15 mM), the sample was incubated at room temperature for 1 h in the dark. Next, sodium periodate was removed from the sample using the same desalting column, followed by addition of hydrazide resin equilibrated in coupling buffer and conjugation of glycoproteins to the resin at room temperature overnight. After the coupling reaction was completed, the resin was collected by centrifugation at 3000g for 5 min, and nonglycoproteins were removed by washing the resin five times with 1 mL of urea solution (8 M urea/0.4 M NH4HCO3, pH 8.3). The proteins on the resin were denatured and alkylated in urea solution at room temperature for 30 min, followed by three washes with urea solution and 50 mM NH4-
research articles
Identification of Glycoproteins in Human Cerebrospinal Fluid
HCO3. After the last wash and removal of NH4HCO3 solution, the resin was suspended in 2 bed volumes of NH4HCO3 with a final concentration of 0.05% SDS. Trypsin was added at a concentration of 1 µg of trypsin/200 µg of CSF protein and digested at 37 °C overnight. The trypsin-released peptides were removed by washing the resin three times with 3 bed volumes of 1.5 M NaCl, 80% acetonitrile, 100% methanol, water, and 50 mM NH4HCO3. Finally, N-linked glycopeptides were released by PNGase F (1 µg diluted in 0.3 mL of 50 mM NH4HCO3) at 37 °C overnight and dried in a SpeedVac (Thermo Savant, Holbrook, NY). Extraction of Glycoproteins from CSF Using Lectin Affinity Column. A 2 mL sample of pooled human CSF was used for glycoprotein enrichment using lectin affinity column in each experiment. The sample was dried to approximately 200 µL with a SpeedVac, then centrifuged at 1000g for 5 min with the supernatant transferred to another tube, and the pellet dissolved with 100 µL of detergent supplied by the kit accompanying the lectin column (Qiagen, Valencia, CA). The resultant samples were combined with the Binding Buffer along with Protease Inhibitor Solution (100×) before they were loaded onto the lectin column according to the manufacturer’s instructions. Finally, the glycoproteins were eluted with 100 µL of Elution Buffer with Protease Inhibitor Solution for a total of six times, digested with trypsin overnight, and dried in a SpeedVac before MS analysis. Automated 2D LC-MS/MS Analysis of Glycopeptides. The samples prepared by both methods (n ) 3 for each method) were separated by a 2D microcapillary high-performance LC system, which integrated a strong cation-exchange (SCX) column (100 mm in length × 0.32 mm in i.d., particle size: 5 µm) with two alternating reverse-phase (RP) C18 columns (100 mm in length × 0.18 mm in i.d.), followed by MS/MS analysis using a LCQ DECA PLUS XP ion trap instrument (ThermoElectron, San Jose, CA). Six fractions were eluted from SCX using a binary gradient of 2-90% solvent D (1.0 M ammonium chloride, 0.1% formic acid, and 5% acetonitrile) versus solvent C (0.1% formic acid and 5% acetonitrile). Each fraction was injected onto RP columns automatically with the peptides being resolved using a 300 min binary gradient of 5-80% solvent B (0.1% formic acid in acetonitrile) versus solvent A (0.1% formic acid in water). A flow rate of 160 µL/min with a split ratio of 1/80 was used. The MS acquisition was operated in a datadependent MS/MS mode where each survey scan mass spectrum was followed by MS/MS analysis of one of the available precursor ions from the prior survey scan. Ions selected for collision-induced dissociation (CID) were dynamically excluded for 3 min. MS/MS Database Search and Protein Identification. MS/ MS data were searched against the International Protein Index (IPI) human protein database version 3.01 from the European Bioinformatics Institute (EBI) using SEQUEST algorithm.35 For MS/MS database search, the search criteria were set to expect the following modifications. (1) For hydrazide chemistry approach: carboxymethylated cysteines (fixed), oxidized methionines (variable), and an enzyme-catalyzed conversion of asparagine to aspartic acid at the site of carbohydrate attachment (variable). (2) For lectin affinity approach: carboxymethylated cysteines (fixed) and oxidized methionines (variable). The search results were validated using PeptideProphet36 and ProteinProphet37 for peptide and protein identification, respectively. In validation of protein identification, a ProteinProphet probability score of g0.9 was used as the filtering
Figure 1. The illustration of analytical flows for glycoprotein enrichment and separation from human CSF using a complementary approach combining hydrazide chemistry and lectin affinity column.
criteria to ensure a