Chromosome 11-Centric Human Proteome Analysis of Human Brain

Jan 4, 2013 - Yonsei Proteome Research Center, Yonsei University, Seoul, Republic of Korea, ... Ju Yeon Lee , Ji Eun Jeong , Sung-Kyu Robin Park , Joh...
1 downloads 0 Views 958KB Size
Article pubs.acs.org/jpr

Chromosome 11-Centric Human Proteome Analysis of Human Brain Hippocampus Tissue Kyung-Hoon Kwon,† Jin Young Kim,† Se-Young Kim,† Hye Kyeong Min,† Hyoung-Joo Lee,‡ In Jung Ji,†,§ Taewook Kang,†,§ Gun Wook Park,†,§ Hyun Joo An,§ Bonghee Lee,∥ Rivka Ravid,⊥ Isidro Ferrer,# Chun Kee Chung,¶ Young-Ki Paik,‡ William S. Hancock,● Young Mok Park,*,†,§ and Jong Shin Yoo*,†,§ †

Division of Mass Spectrometry Research, Korea Basic Science Institute, Ochang, Chungbuk, Republic of Korea, Yonsei Proteome Research Center, Yonsei University, Seoul, Republic of Korea, § Graduate School of Analytical Science and Technology, Chungnam National University, Daejeon, Republic of Korea ∥ Lee Gil Ya Center and Diabetes Institute, Gachon University, Incheon, Republic of Korea, ⊥ Royal Dutch Academy of Sciences, Amsterdam, The Netherlands # Institut de Neuropatologia, Servei Anatomia Patològica, IDIBELL-Hospital Universitari de Bellvitge, Universitat de Barcelona, Spain ¶ Seoul National University College of Medicine, Seoul, Republic of Korea ● Barnett Institute, Northeastern University, Boston, Massachusetts, United States ‡

S Supporting Information *

ABSTRACT: Human chromosome 11 is the third gene-rich chromosome having 1304 protein-coding genes. According to the GeneCards, this chromosome contains 240 genes related to diseases, as it is well known as a disease-rich chromosome. Although there are many protein-coding genes, the proteomic identification ratio is rather low. As a model study, human hippocampal tissues from patients suffering from Alzheimer’s disease and epilepsy were prepared to evaluate the gene-centric statistics related to the gene expression and disorders of chromosome 11. A total of 8828 protein coding genes from brain tissues were extensively off-gel fractionated and profiled by a high resolution mass spectrometer with collision induced dissociation and electron transfer dissociation. Five-hundred twenty-three of the proteins from brain tissues were determined to belong to chromosome 11, representing 37% of the proteins reported in the Global Proteome Machine Database. We extracted gene clusters from a specific biological process or molecular function in gene ontology, among which the olfactory receptor genes showed the largest cluster on chromosome 11. Analysis of the proteome data set from the hippocampus provides a significant network associated with genes and proteins and leads to new insights into the biological and genetic mechanisms of chromosome 11-specific diseases such as Alzheimer’s disease. KEYWORDS: chromosome 11, chromosomal Human Proteome Project, hippocampus, Alzheimer’s disease, epilepsy, collision induced dissociation/electron transfer dissociation, offgel fractionation, olfactory receptor



INTRODUCTION

proteomics studies on human and mouse brains that have analyzed the gene expression profiles. In 2006, Park et al.6 analyzed the whole proteome of temporal lobe human brain tissue with 7T FT-LTQ/MS (Finnigan, San Jose, CA) as part of the pilot phase of the HUPO Brain Proteome Project (HBPP).7 After separation of samples into soluble and membrane fractions, multidimensional liquid chromatography (LC) separation technology enabled the acquisition of highthroughput proteomics data with temporal lobe human brain tissue. Finally, 1533 proteins were identified within the false discovery rate of 5%. Since the brain is a very structurally

1

Human chromosome 11 includes 1304 protein-coding genes. Compared to its length of nucleotide pairs, it is a gene-rich and disease-rich chromosome. Chromosome 11 is the third autosome in a number of protein-coding genes. In the GeneCards database,2 240 genes of chromosome 11 are related to diseases. Concerning the chromosomal location, the genes of the Apolipoprotein family are located at 11q23.3 Domain 11p15.5 has been examined,4 reflecting increased interest in the domain as the genomic imprinting control region. Freed et al.5 showed that several genes, including TSSC4, HBG2, IGF2 and CDKN1C, located on chromosome 11p15.5 are associated with the function of dopaminergic neurons. For the chromosomal analysis of human proteome, we used the human brain proteome. There have been numerous © 2013 American Chemical Society

Special Issue: Chromosome-centric Human Proteome Project Received: September 1, 2012 Published: January 4, 2013 97

dx.doi.org/10.1021/pr3008368 | J. Proteome Res. 2013, 12, 97−105

Journal of Proteome Research

Article

from Alzheimer patients, 7138 were from epilepsy patients and 5068 were from the control sample. Figure 1 depicts the proteomic result of control, AD and epilepsy samples. The number of proteins identified by CID

complex organ, we expected that the subproteomics of different localizations would have different gene expression profiles.8 Odreman et al.9 analyzed human brain glyomas using the proteomics by two-dimensional (2D)-electrophoresis and found differentially expressed proteins between low-grade and high-grade fibrillary astromas. Bode et al.10 published a study on the transcriptomics and proteomics using the murin hippocampus. Another study explored the differential protein expression levels of the left and right hippocampi in adult rat tissue samples.11 In this study, large scale proteome mapping of brain tissue from the human hippocampus was performed to better understand various brain diseases such as Alzheimer’s disease (AD) and epilepsy. Neuronal cells in the hippocampus disappear or die quickly in all neuro-diseases by different signal transduction pathways. Using a succession of separation methods, we intended to enlarge the dynamic range of protein identification. At first, we extracted proteins into soluble and membrane fractions. We then used OFFGEL fractionation12 to these fractions followed by 2-D LC online separation. To conduct dissociation for mass spectrometry, we utilized two dissociation methods: collision induced dissociation (CID) and electron transfer dissociation (ETD). The main functions of the hippocampus, the region in which the serious damage associated with AD takes place, are long-term memory and spatial relational memory. Therefore, as the first step of the chromosomal human proteome project, we decided to analyze hippocampus tissues from patients with AD and epilepsy to obtain the gene expression profile. The common symptoms of the three neurologic diseases AD, Parkinson’s disease and epilepsyhave been reported in detail.13 AD is a neurodegenerative disorder that involves the hippocampus and that subsequently affects the control of cognition. The hippocampus is similarly affected by temporal lobe epilepsy. The molecular relationship between epilepsy and neurodegenerative disorders such as AD can be illuminated from research on the proteomics of the hippocampus. In this study, we compared the proteins of these two diseases with control tissues. By high-resolution mass spectrometry using different dissociation methods and multidimensional separation technology, we attempted to improve the throughput of proteome analysis. Using the proteomics result for AD and epilepsy, the protein-coding genes located on the human chromosome 11 were analyzed with the viewpoint of protein identification by proteomics, following the standard guidelines for the chromosome-centric human proteome project.14,15



Figure 1. Number of protein-coding genes of whole chromosomes identified in each hippocampus sample.

Table 1. Number of Proteins and Protein-coding Genes Identified by CID and ETD Dissociation for Each Hippocampus Sample

Control Epilepsy Alzheimer Total

proteins identified from CID

proteins identified from ETD

proteins identified by both of CID and ETD

total number of identified proteins

number of identified proteincoding genes

18948 20005 18681

14321 20188 18218

13252 13606 13133

20017 26587 23766 32409

5068 7138 6314 8828

and ETD dissociation is listed in Table 1. From the figure and table, it is evident that the proteins’ identification coverage on the chromosome was markedly increased by changing the sample condition or analysis protocol, even if analysis was performed with the same organ tissues. In the control sample, 20017 proteins corresponding to 5068 protein-coding genes were identified. By alternative splicing, single amino acid polymorphism, and isoforms, more proteins than proteincoding genes were identified. The AD samples were merged and the number of genes was increased by 1890. Epilepsy was added to integrate all of the control, AD, and epilepsy samples, resulting in 1798 more proteins and identification of 8828 genes. Our proteome data set corresponded well to the KEGG pathway16 associated with genes and proteins in the mechanism of AD. For example, alpha Crystallin B chain and apolipoprotein A-I were identified in all of the control, epilepsy, and Alzheimer samples. Functional proteomic analysis corresponding to epilepsy and Alzheimer specific genes from brain tissues will be performed in future studies. For the control, Alzheimer, and epilepsy samples, we identified 18948, 18681 and 20005 proteins, respectively, by the CID method. Incorporation of the ETD results identified 20017, 23766 and 26587 proteins, respectively. This additional experiment using a different analysis protocol of ETD increased the performance. Different samples or protocols than the ones used in this study will contribute to extended coverage of proteomics detection for each chromosome in the Chromosomal Human Proteome Project.

RESULTS AND DISCUSSION

Hippocampus Proteomics

Protein samples were prepared using extensive off-gel fractionation from soluble and membrane fractions of the control, epilepsy, and Alzheimer tissue samples. The enzymatic digestion of 12 protein fractions between a pH range of 3−10 was performed. The resulting peptides were separated by reversed phase chromatography and identified by high resolution mass spectrometry with higher than 0.99 p-value at the protein level. A total of 8828 protein-coding genes from brain tissues were profiled by high resolution Orbitrap mass spectrometry with both CID and ETD analyses. Of the genes, 6314 genes were 98

dx.doi.org/10.1021/pr3008368 | J. Proteome Res. 2013, 12, 97−105

Journal of Proteome Research

Article

Table 2. Number of Genes Identified by the HBPP Pilot Phase, Hippocampus Tissues and GPMDB

chr.

protein coding genes (Ensembl v.69)

PeptideAtlas brain proteins (released Sept. 2012)

hippocampus proteins

PeptideAtlas brain and Hippo-campus (A)

Brain proteins (A)/ genes

PeptideAtlas human proteins (released July 2012) (B)

PeptideAtlas human proteins (B)/genes

GPMDB (released July 2012) log(e) ≤ −5(C)

GPMDB (C)/ genes

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X Y MT total

2055 1259 1067 768 889 2150 908 695 799 767 1304 1068 326 651 609 871 1281 289 1669 547 240 456 838 51 13 21570

294 194 153 94 121 339 133 85 108 113 187 193 37 101 84 125 211 49 202 80 27 76 112 5 7 3130

879 606 472 331 401 668 382 278 359 366 523 486 139 285 286 345 510 130 519 207 80 214 344 16 2 8828

899 610 481 333 405 790 395 283 364 370 535 499 140 289 289 355 545 132 552 213 81 216 348 16 8 9148

43.7% 48.5% 45.1% 43.4% 45.6% 36.7% 43.5% 40.7% 45.6% 48.2% 41.0% 46.7% 42.9% 44.4% 47.5% 40.8% 42.5% 45.7% 33.1% 38.9% 33.8% 47.4% 41.5% 31.4% 61.5% 42.4%

1268 798 677 459 543 1255 545 402 482 474 697 636 196 396 365 550 790 170 894 343 137 279 493 16 12 12877

61.7% 63.4% 63.4% 59.8% 61.1% 58.4% 60.0% 57.8% 60.3% 61.8% 53.5% 59.6% 60.1% 60.8% 59.9% 63.1% 61.7% 58.8% 53.6% 62.7% 57.1% 61.2% 58.8% 31.4% 92.3% 59.7%

1192 817 708 517 501 662 550 172 489 474 304 214 207 169 261 324 402 120 685 381 144 302 547 22 12 10176

58.0% 64.9% 66.4% 67.3% 56.4% 30.8% 60.6% 24.7% 61.2% 61.8% 23.3% 20.0% 63.5% 26.0% 42.9% 37.2% 31.4% 41.5% 41.0% 69.7% 60.0% 66.2% 65.3% 43.1% 92.3% 47.2%

Table 2 summarizes the enumeration of the number of genes identified by the pilot phase of the Human Brain Proteome Project performed in 2006,7 and the number of genes from hippocampus proteome analysis of this study for each chromosome. In addition, the genes registered in the Global Proteome Machine Database (GPMDB)17 (released July 1, 2012) were counted. GPMDB has been collecting proteomics data sets and supplying the score of log(e) for each gene by statistical analysis of the proteomics data set of GPMDB. A log(e) ≤ −5 indicates the genes that were detected frequently and assumed to be abundant proteins. The proteins of log(e) ≤ −5 were colored green in the GPMDB human proteome guide released July 1, 2012. Chromosome 11 contained relatively fewer genes with log(e) ≤ −5 and a smaller ratio of identified proteins in brain proteomics among the human chromosomes (Table 2).

Table 3. Relationship between GPMDB log(e) Score (released July 1, 2012) and Protein Evidence Level of neXtProt (released Sept. 11, 2012) Data PE 1 2 3 4 5

log(e) ≤ −5 8746 1211 31 9 29

87.2% 12.1% 0.3% 0.1% 0.3%

−5 < log(e) ≤ −3

−3 < log(e) ≤ −1

log(e) > −1

271 397 13 5 7

402 799 21 19 28

4 4 0 0 0

39.1% 57.3% 1.9% 0.7% 1.0%

31.7% 63.0% 1.7% 1.5% 2.2%

50.0% 50.0% 0.0% 0.0% 0.0%

Table 4. The 147 genes among the 523 hippocampus expressed genes of chromosome 11 belong to log(e) ≤ −5. For the range of log(e) ≤ −5, we identified 48.4% of proteins of GPMDB, while only 14.2% of proteins were identified in the range log(e) > −5.

Protein Characteristics

The experimental protein evidence levels of chromosome 11 genes were classified by the log(e) score. The results are depicted in Table 3. The neXtProt database18 (released Sept. 11, 2012) assigned the five levels for the criteria of protein evidence (PE). The evidence at the protein and transcript levels, inferred from homology, predicted levels and uncertain levels assigned the values of 1, 2, 3, 4, and 5, respectively, as the PE level. The protein evidence level originated from the definition of the Swiss-Prot database.19 The protein evidence level of the neXtProt database and log(e) of GPMDB were highly correlated. Among the genes with lower log(e) values, more genes of the protein evidence level 1 were dominant. The hippocampus proteomic analysis results were compared with the log(e) score distribution. The results are summarized in

Table 4. log(e) Scores and Hippocampus Proteome Identifications of Chromosome 11 Proteins

99

GPMDB score

GPMDB genes

genes corresponding to hippocampus proteins

ratio of hippocampus proteins vs GPMDB

log(e) ≤ −5 −5 < log(e) ≤ −3 −3 < log(e) ≤ −1 log(e) > −1

304

147

48.4%

32

4

12.5%

119

18

15.1%

4

0

0.0%

dx.doi.org/10.1021/pr3008368 | J. Proteome Res. 2013, 12, 97−105

Journal of Proteome Research

Article

Figure 2. Current status of chromosome 11 genes. (a) Portion of the cluster including apolipoproteins. (b) Status of genes where olfactory receptor genes are located.

Figure 2a illustrates the region from gene 1114 to 1127 of chromosome 11. It was chosen as an example of the status panel in which the genes of this range are identified frequently in proteomic analyses. APOA1 and BACE1 are Alzheimerrelated genes included at the Alzheimer disease pathway of the KEGG database. Figure 2b is the region with abundant olfactory receptor genes. Proteomic identification was very poor in this region. In particular, olfactory receptor genes are rich in chromosome 11 and their GPMDB log(e) scores were poor. There are 164 olfactory receptors in chromosome 11, representing 12.7% of the whole genes in chromosome 11. These olfactory receptor genes are located in close proximity, forming clusters. Glycosylation was the main post-translational modification of the proteins coded by the olfactory receptor genes (Table 5). In order to find the locations of chromosome 11 containing the genes that were rarely detected in proteomics analysis, we considered the gene expression degree of five neighboring genes on the chromosome. Figure 3 is the plot for the distribution of log(e), hippocampus identified proteins, olfactory related genes, membrane related genes, and uncharacterized proteins. We assigned the value x as the order of a gene on the chromosome. The x values spanned from 1 to 1304 along the horizontal axis of Figure 3.

Figure 2 is the protein status data for some regions of chromosome 11. The PE column denotes the protein evidence level from the neXtProt database. The protein evidence, transcript evidence and no evidence are colored by green, yellow, and red, respectively. Mq and Mo denote the log(e) score and the number of observations in the GPMDB, respectively. For the genes with log(e) ≤ −5, the color of Mq is green. When −5 < log(e) ≤ −3 and −3 < log(e) ≤ −1, the color of Mq is yellow and red, respectively. Otherwise, we did not color Mq. Green, yellow, and red Mo indicates more than 20, 6−19, and fewer than six observations, respectively, in the GPMDB. The Ab, Ph, Ac, Gl, and AST columns denote the existence of antibody, phosphorylation, acetylation, glycosylation and alternative splicing transcript, respectively. The antibody information was obtained from HPA (Human Protein Atlas) database v.10.0. The modification information of phosphorylation, acetylation and glycosylation was filtered from the Swiss-Prot data file. The alternative splicing data was obtained from 1000 Genomes Project data.20 Each column is colored in green for the occurrence of the character. Among 8828 genes that were identified by human hippocampus tissue, 333 genes that were not previously identified according to neXtProt, GPMDB and Peptide Atlas were found. 100

dx.doi.org/10.1021/pr3008368 | J. Proteome Res. 2013, 12, 97−105

Journal of Proteome Research

Article

assigned the value 0. In the gene description, we assigned the value 1 for “olfactory related genes” when it included the word “olfactory”. For all other cases, the value 0 was assigned. Similarly, the values of “membrane related genes” were assigned using the word “membrane”. The “uncharacterized protein” genes were also extracted and assigned the value 1. We took the summation of the three values for the gene corresponding to x and next four neighboring genes, for the 1/0 values of the hippocampus proteome, olfactory genes, membrane genes and uncharacterized genes. The values of this plot were computed by the summation of the values for three neighboring genes. The sum of GPMDB log(e) values were scaled by multiplying a factor of 0.1 for better display of correlation behaviors. This figure shows roughly the correlation between log(e) and the hippocampus proteomics result. When the log(e) score was better with a lower log(e) value, it was indicative of better protein identification performance. In Figure 3, several regions with mostly poor log(e) scores of higher log(e) values were evident. The olfactory related genes were distributed in many of the poor score regions of 99−212, 434−495, 520−576, 853− 871, etc. All of the other poor regions showed concentrated olfactory receptor genes except in region 853−871. In the this region, the membrane-related genes containing many transmembrane domains were positioned in high densities. Olfactory receptors are well-known to contain seven transmembrane domains, which bestows a hydrophobic property. This is why olfactory receptors are found in the poor log(e) regions of chromosome 11. In these regions, proteins can hardly be identified by proteomics and many missing proteins may exist. By invoking more advanced proteomics technology with extensive sample preparation of membrane proteins, we should investigate the proteins of these regions with possible missing proteins. In the human hippocampus tissues, 333 missing proteins were found which that not included at neither PeptideAtlas nor

Table 5. Occurrences of Post-translational Modifications at the Chromosome 11 Proteins modification

class

PTM occurrence

PTM rate

Phosphorylation

SER THR TYR MET LYS ALA N-linked O-linked

283 132 80 19 82 37 395 11

21.7% 10.1% 6.1% 1.5% 6.3% 2.8% 30.3% 0.8%

Acetylation

Glycosylation

no. of genes 336 (25.8%)

125 (9.6%)

400 (30.7%)

Figure 3. Correlation of proteomics result, olfactory receptor genes, membrane related genes, and uncharacterized genes with GPMDB score distribution.

Concerning the log(e) score distribution, we obtained an average of the five log(e) scores. We then extracted the hippocampus proteome data and gene description at each gene corresponding to the value x. For the hippocampus proteome, we assigned the value 1 for each gene of identified proteins. If the gene was not identified in the hippocampus proteome, we

Table 6. GO Biological Processes and Molecular Functions Whose Genes Are Clustered Largely on the Localized Domain of Chromosome 11 GO Biological Process

Molecular Function

rank

GO ID

1 2 3 4 5

GO:0007186 GO:0050896 GO:0006508 GO:0007608 GO:0050911

6 7 8 9 10 1 2 3 4 5 6 7

GO:0030574 GO:0042981 GO:0055085 GO:0008152 GO:0070206 GO:0004872 GO:0004871 GO:0004930 GO:0005515 GO:0004984 GO:0008270 GO:0046872

8 9 10

GO:0004222 GO:0008233 GO:0008237

number of genes with the same GO (gene numbers where the cluster starts and ends)

GO Category G-protein coupled receptor signaling pathway response to stimulus proteolysis sensory perception of smell detection of chemical stimulus involved in chemical sensory perception of smell collagen catabolic process regulation of apoptotic process transmembrane transport metabolic process protein trimerization receptor activity signal transducer activity G-protein coupled receptor activity protein binding olfactory receptor activity zinc ion binding metal ion binding metalloendopeptidase activity peptidase activity metallopeptidase activity 101

57 57 14 8 8

(434−496) (434−499) (1010−1028) (138−147) (138−147)

7 7 7 7 5 60 57 57 57 56 10 8 8 8 8 8

(1018−1026) (1031−1037) (650−657) (1018−1026) (150−154) (433−504) (434−496) (434−496) (862−976) (434−494) (1014−1026) (397−421) (777−796) (1018−1026) (1018−1026) (1018−1026)

dx.doi.org/10.1021/pr3008368 | J. Proteome Res. 2013, 12, 97−105

Journal of Proteome Research

Article

GPMDB and not assigned as “protein level” of protein evidence in neXtProt. On the chromosome 11, we identified 29 missing proteins including the proteins of 8 olfactory receptor genes. The missing proteins are listed as Supporting Information. The missing proteins will be classified into two categories by the reason why they have not yet been detected. Some missing proteins are hardly detected in proteomics because they have very low abundance in cells. The other missing proteins have structures difficult to detect because they contain many hydrophobic transmembrane domains. The former proteins are possible to identify by analyzing many different kinds of samples from the cells in various conditions. The identification of the latter proteins, which are membrane proteins, needs to apply different proteomics such as membrane proteomics. The co-existence of genes encoding membrane proteins might contribute to improved performance of membrane proteomics technology. The olfactory receptor gene clusters can be good targets to explore in membrane proteomics research. Such clustering behaviors of similar functional genes on the chromosome have often been reported.21 We investigated such clusters in chromosome 11 by neighbor genes analysis described in the Experimental section.

Table 7. Number of Chromosome 11 Genes that were Identified by Proteome Analysis and Those Related to AD and Epilepsy

AD related genes

Epilepsy related genes

GPMDB score

number of genes

KBSI hippo. proteins

KBSI hippo. Ratio

log(e) ≤ −5 −5 < log(e) ≤ −3 −3 < log(e) ≤ −1 log(e) > −1 log(e) ≤ −5 −5 < log(e) ≤ −3 −3 < log(e) ≤ −1 log(e) > −1

74 0

48 0

64.9%

13

2

15.4%

3 16 1

1 11 1

33.3% 68.8% 100.0%

1

0

0.0%

1

0

0.0%

correlation. Analyzing the proteomics statistics on the chromosomal location, we could survey the domains where the proteins with special characteristics are clustered. Among the proteins identified from the hippocampus, we identified the genes and proteins of chromosome 11 included in the AD pathway. For example, alpha Crystallin B chain and apolipoprotein A-I were identified in all of the control, epilepsy, and Alzheimer samples. Functional proteomics analysis corresponding to epilepsy- and Alzheimer-specific genes from brain tissue will be further performed with multiple reaction monitoring and enzyme-linked immunosorbent assays targeting chromosome 11 proteins. Chromosomal-centric analysis is useful in the study of proteomics research and the development of the technology to overcome the associated difficulties. Eventually all of the genecoding protein list from chromosome 11 will be completed through the Chromosome-Centric Human Proteome Project by combining the protein data sets from C-HPP teams around the world. We compared the protein profile of control sample with those of AD and epilepsy samples. By focusing proteomics analysis on the chromosomal location, we could understand the proteins in the intermediate level between whole proteome profiling and classical protein biology approach. The chromosomal proteome analysis categorizes the whole proteome data by chromosome location and analyzes the proteins belonging to each chromosome by regarding the positional relationship of proteins as well as their functional relationship. This chromosome-centric view can reveal new characteristics of proteome such as the correlation of neighbor genes, protein interaction, and genetic functions.

Gene Ontology and Chromosomal Location

Table 6 is the gene ontology (GO) list of the biological processes and molecular functions, respectively, for gene clusters of the protein-coding genes located on chromosome 11. GO IDs were found in the chromosomal regions where the genes belonging to the same category of molecular function or biological process in GO are closely located. Some gene sets of different GO ID categories are ovelapping, when a category is a subcategory of the other. The gene symbol list of each cluster is included in the Supporting Information. Disorder Related Genes

The OMIM (Online Mendelian Inheritance in Man) database is a disease database.22 OMIM lists genes known to be important in genetic diseases. On the other hand, the GeneCards database integrates information on disorders from several databases including OMIM. In Figure 2, the genes related to disorders in the GeneCards database are denoted. The genes related to disorders can be found in GeneCards. GeneCards classifies disorder related genes by the genes affected by disorders as well as the causes of disorders. It integrates the disorder related database to list disorder biomarkers. They may be the genes which evoke disorders or the genes which are affected by disorders. According to GeneCards, there are 95 genes on chromosome 11 related to AD. Among these genes, 61 were found in the hippocampus tissue. Concerning epilepsy, we identified 19 genes among the 33 epilepsy related genes. For the Alzheimer related genes whose log(e) scores were less than −5, 65.6% of the genes were identified by our proteomics analysis. Table 7 summarizes the data when the genes related to AD and epilepsy were classified by GPMDB score range and summarizes the number of genes that were identified from hippocampus tissue at each GPM DB score range. For AD, mass spectral quality of 32 genes among the 95 genes were high.



EXPERIMENTAL SECTION

Human Brain Tissue

The epileptic hippocampus tissue was resected for therapeutic purpose in patients with medial temporal lobe epilepsy who had hippocampal sclerosis. All patients provided informed consent. During surgery, the head and body of the hippocampus was resected. After trimming down of the head and end of the body, about 6 mm of the hippocampus was immediately frozen using isopentane floated liquid nitrogen and kept in liquid nitrogen. Control and AD hippocampal specimens were acquired from the Barcelona Brain Bank, Spain. Control hippocampus was obtained from a donor who had no known history of neurological or psychiatric disease. The inclusion



CONCLUSIONS We integrated the proteomics analysis result of the human hippocampus into the chromosomal human proteome analysis for chromosome 11. Comparing GPMDB, protein evidence level, and hippocampus proteome, we observed a consistent 102

dx.doi.org/10.1021/pr3008368 | J. Proteome Res. 2013, 12, 97−105

Journal of Proteome Research

Article

Nanoflow Liquid Chromatography−Tandem Mass Spectrometry (LC−MS/MS) Analysis of Peptides

criterion for AD tissue was clinical diagnosis (AD-V, AD-VI). This tissue was received immediately frozen and stored at 80 °C until use. For this study, we used the following brain specimens: control (age 49 years, PMD 3 h), three AD samples (age 67, 75, and 86 years, PMD 8 h, 11.5 h, and 4 h 10 min, respectively) and epileptic patient (age 40 years). Tissue acquisition and storage conformed to the guidelines of the Institutional Review Board of Seoul National University Hospital (H-0507-509-153) for hippocampi from epilepsy patients and to the informed consent and ethics committee approval at the Barcelona Brain bank, Spain for hippocampi from Alzheimer’s patients and controls.

LC−MS/MS analysis of the peptides was conducted on an LTQ-Orbitrap XL ETD mass spectrometer (Thermo Fisher Scientific, San Jose, CA) equipped with a nanoelectrospray ion source, and EASY-nLC system (Thermo Fisher Scientific). All samples were analyzed in both modes of CID MS/MS and ETD MS/MS. Five microliters of sample was injected at a flow rate of 10 μL/min into a C18 trap column (300 μm I.D. × 50 mm, 5 μm, 300 Å). Following this, the trapped peptides were separated on a 200 mm homemade microcapillary column consisting of C18 (Aqua; particle size 3 μm) packed into 100 μm silica tubing with an orifice internal diameter of 5 μm. The mobile phases, A and B, were composed of 0 and 100% acetonitrile, respectively, and each contained 0.1% formic acid. The LC gradient began with 5% B for 5 min and was ramped to 15% B over 5 min, to 50% B over 90 min, to 95% B over 5 min, and remained at 95% B over 5 min and 5% B for another 10 min. The column was reequilibrated with 5% B for 15 min before the next run. The fullscan mass range was set from m/z 400−2000 with a resolution of 60000 at m/z 400. The 10 most intense ions were sequentially isolated for CID MS/MS and ETD MS/MS by LTQ. The electrospray voltage was maintained at 1.8 kV and the capillary temperature was set at 200 °C. CID was performed with helium as collision gas at a normalized collision energy (NCE) of 35% and a maximum injection time of 30 ms. For ion trap ETD MS/MS, isolation of 2 amu, one macroscan with a maximum injection time of 300 ms, was used. ETD fragmentations were performed based on the charge state with the anion AGC target set at 300000. Previously fragmented ions were excluded for 60 s.

Cell Fractionation of Hippocampus in Human Brain

To extract proteins from the control, epileptic and AD hippocampus tissues, cell fractionation was performed according to a previous method.23 In brief, frozen brain tissues were ground in a mortar with 50 mM Tris buffer (pH 7.1) containing 100 mM KCl, 20% glycerol, and protease inhibitors. The homogenate was centrifuged at 50000 rpm for 30 min at 4 °C, and the supernatant (SI) was transferred to a new tube. The resulting pellet was suspended with the same buffer and ground with protease inhibitors in liquid nitrogen. After thawing the frozen homogenate, sonication was performed six times, for 10 s each. The supernatant (SII) was obtained after centrifugation. SI and II were combined (SI+II). The remaining pellet was further extracted with 0.2 M KCl buffer (pH 7.1) containing 20% glycerol, 0.1 M phosphate, 9 M urea, and 4.5% CHAPS. The homogenate was centrifuged at 50000 rpm for 30 min at 17 °C, and the supernatant was used as the pellet extract (PE) fraction. The amounts of reagents to be added during the process were calculated as described previously.23 To eliminate a salt, two fractional proteins were used by acetone precipitation.

Protein Identification

Database search was performed against the IPI human database (IPI.-HUMAN.v.3.85). The following parameters were used to search via Mascot v.2.3 (Matrix Science, London, UK). Methionine were chosen as static and variable modifications, respectively. Precursor ion tolerance was set to 10 ppm and fragment ion tolerance was 600 ppm. When Trans-Proteomics Pipeline (Institute for Systems Biology, Seattle, WA) was applied to the Mascot search result, the filter of higher than 0.99 p-value at ProteinProphet result was set. PeptideProphet analysis was performed for the PE, S12 fractions of CID and ETD dissociation, respectively. The Mascot search results from different off-gel fractions were collected at each fraction of PE and S12 for CID to get two of PeptideProphet results. Two PeptideProphet results of PE and S12 of CID were integrated for the ProteinProphet analysis. With the ETD dataset, the same analysis process was performed. Finally, we got two ProteinProphet results of CID and ETD for each disease sample and in total six ProteinProphet results were used to count the numbers of identified proteins. The filtering criterion of higher than 0.99 p-value corresponded to nearly zero false discovery rate at each ProteinProphet result.

Trypsin Digestion

The prefractionated protein concentration of the whole brain sample was determined with the Bradford assay (Bio-Rad, Hercules, CA) following manufacturer’s recommendations. Once the concentration was determined, 670 μg of the protein was digested with Trypsin Gold (TPCK treated; Promega, Madison, WI). The sample was diluted to 10 mM in Tris pH 7.5. The protein sample was denatured with 8 M urea. After this, the sample was sonicated for 1 min and incubated for 10 min at room temperature. The protein sample was first reduced with 20 mM TCEP for 60 min at room temperature and then treated with 40 mM iodoacetamide for 60 min at room temperature. To eliminate the reagents, a 10K molecular weight cutoff filter was used with the sample. The sample was sonicated for 5 min. Trypsin was added at a ratio of 1:50 (w/w, trypsin/protein) and allowed to digest overnight for approximately 16 h at 37 °C. Once the digestion was completed, the sample was concentrated in a centrifugal vacuum concentrator to less than 50 μL but not to dryness.

Chromosome-centric Data Analysis

OFFGEL Fractionation

For chromosome-centric analysis, the gene information was started from Biomart,1 with Ensembl Genes 69 (Welcome Trust Sanger Institute, Cambridge, UK). We downloaded the gene list with the attributes of chromosome name, gene starting point, Ensembl Gene ID, Ensembl Protein ID, gene symbol, gene description, GO, neXtProt accession number and human IPI database ID. Ensembl Protein ID was used to link the

Tryptic peptide samples were fractionated by means of a 3100 OFFGEL fractionator (Agilent Technologies, Santa Clara, CA) according to the manufacturer’s protocol. Twelve fractions were collected from the fractionators. The fractions were desalted using C18, dried, reconstituted in 0.1% FA, and subjected to mass spectrometry analysis. 103

dx.doi.org/10.1021/pr3008368 | J. Proteome Res. 2013, 12, 97−105

Journal of Proteome Research GPMDB protein data (released July 1, 2012) to the chromosomal data. Since the IPI database was applied for protein identification using our mass spectral data, we needed a connection between gene ID and IPI database ID. Posttranslational modification information, such as phosphorylation, acetylation, and glycosylation, was extracted from the neXtProt database that was released Sept. 11, 2012. After downloading the neXtProt data file, we filtered the data to collect the modification information. From the neXtProt database, we could also take the information for the five levels of protein evidence. GPMDB served the protein list for each human chromosome containing the Ensembl Protein ID, gene symbol, protein description and log(e) score. In order to merge the GPMDB proteins of one chromosome into the chromosome-centric data, we referred to the Ensembl Protein ID of Biomart. However, some protein IDs did not match because of the difference in the version of the Ensembl database. Therefore, we matched the gene symbols of the GPMDB protein list with those of Biomart. Comparing the matches with Ensembl Protein ID with the matches with gene symbols, we could integrate these result to obtain the log(e) scores for each gene of chromosme 11. GPMDB also served as the results of the pilot phase research in Human Brain Proteome Project (HBPP).7 We downloaded HBPP results to benchmark with our hippocampus proteomics data. The genetic disease information and the antibody information were acquired from the GeneCards database and Human Protein Atlas.2,24 GeneCards lists the genes that are known to be related to disorders and neXtProt and GPMDB provide the protein evidence levels of these genes to classify the proteins that cannot be easily identified. GeneCards will then be valuable to target proteins that we would make an effort to analyze in our proteomics approach.



ABBREVIATIONS



REFERENCES

AD, Alzheimer’s disease; CID, collision induced dissociation; ETD, electron transfer dissociation; OMIM, online Mendelian inheritance in men; GPMDB, Global Proteome Machine Database

(1) Haider, S.; Ballester, B.; Smedley, D.; Zhang, J.; Rice, P.; Kasprzyk, A. BioMart Central Portal−unified access to biological data. Nucleic Acid Res. 2009, 37 (Web Server issue), W23−W27. (2) Safran, M.; Solomon, I.; Shmueli, O.; Lapidot, M.; Shen-Orr, S.; Adato, A.; Ben-Dor, U.; Esteman, N.; Rosen, N.; Peter, I.; Olender, T.; Chalifa-Caspi, V.; Lancet, D. GeneCards 2002: towards a complete, object-oriented, human gene compendium. Bioinformatics 2002, 18 (11), 1542−1543. (3) Karathanasis, S. K. Apolipoprotein multigene family: Tandem organization of human apolipoprotein AI, CIII, and AIV genes. Proc. Natl. Acad. Sci. 1985, 82, 6374−6378. (4) Smith, A. C.; Choufani, S.; Ferreira, J. C.; Weksberg, R. Growth regulation, imprinted genes, and chromosome 11p15.5. Pediatr. Res. 2007, 61 (5 Pt 2), 43R−47R. (5) Freed, W. J.; Chen, J.; Bäckman, C. M.; Schwartz, C. M.; Vazin, T.; Cai, J.; Spivak, C. E.; Lupica, C. R.; Rao, M. S.; Zeng, X. Gene Expression Profile of Neuronal Progenitor Cells Derived from hESCs: Activation of Chromosome 11p15.5 and Comparison to Human Dopaminergic Neurons. PLoS ONE 2008, 3 (1), e1422. (6) Park, Y. M.; Kim, J. Y.; Kwon, K. H.; Lee, S. K.; Kim, Y. H.; Kim, S. Y.; Park, G. W.; Lee, J. H.; Lee, B.; Yoo, J. S. Profiling human brain proteome by multi-dimensional separations coupled with MS. Proteomics 2006, 6 (18), 4978−4986. (7) Hamacher, M.; Apweiler, R.; Arnold, G.; Becker, A.; Blüggel, M.; Carrette, O.; Colvis, C.; Dunn, M. J.; FrA̋ uhlich, T.; Fountoulakis, M.; van Hall, A.; Herberg, F.; Ji, J.; Kretzschmar, H.; Lewczuk, P.; Lubec, G.; Marcus, K.; Martens, L.; Palacios Bustamante, N.; Park, Y. M.; Pennington, S. R.; Robben, J.; Stühler, K.; Reidegeld, K. A.; Riederer, P.; Rossier, J.; Sanchez, J. C.; Schrader, M.; Stephan, C.; Tagle, D.; Thiele, H.; Wang, J.; Wiltfang, J.; Yoo, J. S.; Zhang, C.; Klose, J.; Meyer, H. E. HUPO Brain Proteome Project: summary of the pilot phase and introduction of a comprehensive data reprocessing strategy. Proteomics 2006, 6 (18), 4890−4898. (8) Tribl, F.; Marcus, K.; Bringmann, G.; Meyer, H. E.; Gerlach, M.; Riederer, P. Proteomics of the human brain: sub-proteomes might hold the key to handle brain complexity. J. Neural Transm. 2006, 113 (8), 1041−1054. (9) Odreman, F.; Vindigni, M.; Gonzales, M. L.; Niccolini, B.; Candiano, G.; Zanotti, B.; Skrap, M.; Pizzolitto, S.; Stanta, G.; Vindigni, A. Proteomic studies on low-and high-grade human brain astrocytomas. J. Proteome Res. 2005, 4 (3), 698−708. (10) Bode, M.; Irmler, M.; Friedenberger, M.; May, C.; Jung, K.; Stephan, C.; Meyer, H. E.; Lach, C.; Hillert, R.; Krusche, A.; Beckers, J.; Marcus, K.; Schubert, W. Interlocking transcriptomics, proteomics and toponomics technologies for brain tissue analysis in murine hippocampus. Proteomics 2008, 8 (6), 1170−1178.

On one chromosome, some genes that participate in the same biological process or play the same molecular function are located closely. For chromosome 11, in order to extract such neighboring genes that are related to each other, we classified chromosome 11 genes by biological processes and molecular functions of gene ontology. We screened all the gene ontology IDs, surveyed chromosome 11 genes by their gene ontology numbers and compared their positions on the chromosome. We defined gene sets as a cluster when there were more than five neighboring genes whose spatial gaps were less than five proton-coding genes.

ASSOCIATED CONTENT

S Supporting Information *

Supplemental tables. This material is available free of charge via the Internet at http://pubs.acs.org.



ACKNOWLEDGMENTS

This work was supported by a grant to Y.M.P., J.S.Y., and H.J.A. from the National Research Foundation of Korea Grant funded by the Korean Government (MEST) (2009, UniversityInstitute cooperation program). J.S.Y. was also supported by the Converging Research Center Program (2011K000884) through the Ministry of Education, Science and Technology. Y.M.P. and K.H.K. were supported by grants “Operation of the Advanced Multipurpose Mass Spectrometers” (G32124) and “‘Construction of Knowledgebase for the Fusion Research Using Mass Spectrometers” (G32125) from the Korea Basic Science Institute, respectively.

Neighbor Genes Analysis





Article

AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected], [email protected]. Phone: +82(43)240 5150. Fax: +82 (43)240 5159. Notes

The authors declare no competing financial interest. 104

dx.doi.org/10.1021/pr3008368 | J. Proteome Res. 2013, 12, 97−105

Journal of Proteome Research

Article

(11) Samara, A.; Vougas, K.; Papadopoulou, A.; Anastasiadou, E.; Baloyanni, N.; Paronis, E.; Chrousos, G. P.; Tsangaris, G. T. Proteomics reveal rat hippocampal lateral asymmetry. Hippocampus 2011, 21 (1), 108−119. (12) Keidel, E. M.; Dosch, D.; Brunner, A.; Kellermann, J.; Lottspeich, F. Evaluation of protein loading techniques and improved separation in OFFGEL isoelectric focusing. Electrophoresis 2011, 32 (13), 1659−1666. (13) Szot, P. Common factors among Alzheimer’s disease, Parkinson’s disease, and epilepsy: possible role of the noradrenergic nervous system. Epilepsia 2012, 53 (Suppl 1), 61−66. (14) Paik, Y. K.; Omenn, G. S.; Uhlén, M.; Hanash, S.; Marko-Varga, G.; Aebersold, R.; Bairoch, A.; Yamamoto, T.; Legrain, P.; Lee, H. J.; Na, K.; Jeong, S. K.; He, F.; Binz, P. A.; Nishimura, R.; Keown, P.; Baker, M. S.; Yoo, J. S.; Garin, J.; Archakov, A.; Bergeron, J.; Salekdeh, G. H.; Hancock, W. S. Standard guidelines for the Chromosomecentric Human Proteome Project. J. Proteome Res. 2012, 11 (4), 2005− 2013. (15) Paik, Y. K.; Jeong, S. K.; Omenn, G. S.; Uhlén, M.; Hanash, S.; Cho, S. Y.; Lee, H. J.; Na, K.; Choi, E. Y.; Yan, F.; Zhang, F.; Zhang, Y.; Snyder, M.; Cheng, Y.; Chen, R.; Marko-Varga, G.; Deutsch, E. W.; Kim, H.; Kwon, J. Y.; Aebersold, R.; Bairoch, A.; Taylor, A. D.; Kim, K. Y.; Lee, E. Y.; Hochstrasser, D.; Legrain, P.; Hancock, W. S. The Chromosome-Centric Human Proteome Project for cataloging proteins encoded in the genome. Nat. Biotechnol. 2012, 30 (3), 221−223. (16) Kanehisa, M.; Goto, S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 2000, 28, 27−30. (17) Craig, R.; Cortens, J. P.; Beavis, R. C. Open source system for analyzing, validating, and storing protein identification data. J. Proteome Res. 2004, 3 (6), 1234−1242. (18) Lydie, L. neXtProt: a knowledge platform for human proteins. Nucleic Acids Res. 2012, 40, D76−D83. (19) O’Donovan, C.; Martin, M. J.; Gattiker, A.; Gasteiger, E.; Bairoch, A.; Apweiler, R. High-quality protein knowledge resource: SWISS-PROT and TrEMBL. Brief. Bioinform. 2002, 3 (3), 275−284. (20) The 1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature 2010, 467, 1061−1073. (21) Tsigelny, I.; Burton, D. W.; Sharikov, Y.; Hastings, R. H.; Deftos, L. J. Coherent expression chromosome cluster analysis reveals differential regulatory functions of amino-terminal and distal parathyroid hormone-related protein domains in prostate carcinoma. J. Biomed. Biotechnol. 2005, 2005 (4), 353−363. (22) Hamosh, A.; Scott, A. F.; Amberger, J. S.; Bocchini, C. A.; McKusick, V. A. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 2005, 33 (Database Issue), D514−D517. (23) Klose, J. Fractionated extraction of total tissue proteins from mouse and human for 2-D electrophoresis. Methods Mol. Biol. 1999, 112, 67−85. (24) Uhlén, M.; Björling, E.; Agaton, C. A human protein atlas for normal and cancer tissues based on antibody proteomics. Mol. Cell. Proteomics 2005, 4 (12), 1920−1932.

105

dx.doi.org/10.1021/pr3008368 | J. Proteome Res. 2013, 12, 97−105