Qualitative and Quantitative Expression Status of the Human

Dec 19, 2012 - Qingyu He, Tel/Fax 86-20-85227039; e-mail: [email protected]. Siqi Liu, Tel/Fax 86-10-80485324; e-mail: [email protected]...
0 downloads 0 Views 3MB Size
Article pubs.acs.org/jpr

Qualitative and Quantitative Expression Status of the Human Chromosome 20 Genes in Cancer Tissues and the Representative Cell Lines Quanhui Wang,†,‡,§ Bo Wen,‡,§ Guangrong Yan,∥,§ Junying Wei,⊥,#,§ Liqi Xie,¶,§ Shaohang Xu,‡ Dahai Jiang,‡ Tingyou Wang,‡ Liang Lin,‡ Jin Zi,‡ Ju Zhang,† Ruo Zhou,‡ Haiyi Zhao,‡ Zhe Ren,‡ Nengrong Qu,‡ Xiaomin Lou,† Haidan Sun,† Chaoqin Du,‡ Chuangbin Chen,‡ Shenyan Zhang,† Fengji Tan,‡ Youqi Xian,‡ Zhibo Gao,‡ Minghui He,‡ Longyun Chen,‡ Xiaohang Zhao,△ Ping Xu,⊥,# Yunping Zhu,⊥,# Xingfeng Yin,∥ Huali Shen,¶ Yang Zhang,¶ Jing Jiang,⊥,# Chengpu Zhang,⊥,# Liwei Li,⊥,# Cheng Chang,⊥,# Jie Ma,⊥,# Guoquan Yan,¶ Jun Yao,¶ Haojie Lu,¶ Wantao Ying,*,⊥,# Fan Zhong,*,¶ Qing-Yu He,*,∥ and Siqi Liu*,†,‡ †

Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 101318, China BGI-Shenzhen, Shenzhen 518083, China ∥ Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes, Institute of Life and Health Engineering, Jinan University, Guangzhou 510632, China ⊥ State Key Laboratory of Proteomics, Beijing Proteome Research Center, Beijing Institute of Radiation Medicine, Beijing 102206, China # National Engineering Research Center for Protein Drugs, Beijing 102206, China ¶ Institutes of Biomedical Sciences and Department of Chemistry, Fudan University, Shanghai 200032, China △ State Key Laboratory of Molecular Oncology, Cancer Institute & Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100021, China ‡

S Supporting Information *

ABSTRACT: Under the guidance of the Chromosome-centric Human Proteome Project (C-HPP),1,2 we conducted a systematic survey of the expression status of genes located at human chromosome 20 (Chr.20) in three cancer tissues, gastric, colon, and liver carcinoma, and their representative cell lines. We have globally profiled proteomes in these samples with combined technology of LC−MS/MS and acquired the corresponding mRNA information upon RNA-seq and RNAchip. In total, 323 unique proteins were identified, covering 60% of the coding genes (323/547) in Chr.20. With regards to qualitative information of proteomics, we overall evaluated the correlation of the identified Chr.20 proteins with target genes of transcription factors or of microRNA, conserved genes and cancer-related genes. As for quantitative information, the expression abundances of Chr.20 genes were found to be almost consistent in both tissues and cell lines of mRNA in all individual chromosome regions, whereas those of Chr.20 proteins in cells are different from tissues, especially in the region of 20q13.33. Furthermore, the abundances of Chr.20 proteins were hierarchically evaluated according to tissue- or cancer-related distribution. The analysis revealed several cancer-related proteins in Chr.20 are tissue- or cell-type dependent. With integration of all the acquired data, for the first time we established a solid database of the Chr.20 proteome. KEYWORDS: proteome, chromosome, C-HPP, chromosome features, quantitative correlations



INTRODUCTION

A number of studies have demonstrated that changes in the structure or copy number of Chr.20 genes can cause multiple kinds of diseases, especially cancers. Recurrent gain and amplification of the Chr.20 genes have been observed in a wide variety

Human chromosome 20 (Chr.20) spans about 63 Mb and represents approximately 2% of the total DNA in the genome.3 It comprises 547 protein coding genes (GRCh37.p8, version 69.37) with gene density of 8.6 per Mb, a little higher than the average value of 6.2 per Mb for the human genome. Similar to other chromosomes, relatively high gene densities are harbored in Chr.20 q arm rather than its p arm. © 2012 American Chemical Society

Special Issue: Chromosome-centric Human Proteome Project Received: August 31, 2012 Published: December 19, 2012 151

dx.doi.org/10.1021/pr3008336 | J. Proteome Res. 2013, 12, 151−161

Journal of Proteome Research

Article

required. Secondarily, gene expression is normally considered at two levels, transcription and translation, while the two events are not always at a synchronic pace. Comparison of gene expression at the two levels is likely to deepen our understanding of the chromosomal proteome. Third, as gene expression status is timeand spatial-dependent, the chromosome proteome is expected to be tissue- or cell-type dependent as well. The recent evidence of chromosome proteome achieved in Mann’s and Ulen’s group supports the deduction.15−17 Hence, a comprehensive proteome for chromosome should be built upon protein identification from multiple tissues or cells. Under the effective organization of the Chinese Human Proteome Organization (CNHUPO), we have initiated the C-HPP in China that concentrates on profiling proteins in three gastrointestinal cancer tissues and the representative cells. In this paper, we present a thorough analysis onto the Chr.20 related transcriptive and translational information achieved from these samples, qualitatively and quantitatively. Through overall evaluation to the correlation of the proteome with chromosomal features and the hierarchical cluster of tissue- or cancer-related Chr.20 proteome, for the first time we report herein the comprehensive database of Chr.20 proteome is established.

of cancers, such as prostate, liver, colon, gastric, bladder, pancreas and breast cancer.4−9 For example, the region of 20q13 contains several genes such as CAS, which are generally amplified in several neoplasias. The CAS gene located at 20q13.1 is translated to a nuclear transporter protein, which acts at the mitotic spindle checkpoint and controls proliferation and tumor necrosis factormediated apoptosis.10 Wellmann et al. proposed CAS as a prognostic marker for hepatic neoplasms because large amounts of CAS proteins were found in liver tumor, possibly leading to genomic instability in liver cancer.11 Abnormality in Chr.20 was widely observed in colorectal cancer as well. Peng et al. claimed that gene mutations in Chr.20 were closely associated with sporadic colorectal carcinogenesis, particularly in the regions of 20p and 20q11.1-q13.1.12 Xie et al. observed that the colorectal cancer samples might have high copies of 20q genes and pointed out the overpresence of potential tumor-related genes in Chr 20, such as EEF1A2 and PTK6.13 Furthermore, the computational model predicted that 20q amplification induced deregulation of several specific cancer-related pathways including the MAPK pathway, the p53 pathway and polycomb group factors, while activation of Myc, AML, B-Catenin and the ETS family transcription factors was an important step in cancer development driven by 20q amplification.9 Obviously, there is increasing evidence strongly demonstrating that deletions or duplications of genetic material from Chr.20 have a variety of effects on cancer generation as well as development. The question, nevertheless, is how to deeply scrutinize the relationships among the chromosomal positions, the Chr.20 genes, the correspondent expression status and biological behaviors, and how to abstract a clear clue to understand the functions regulated by Chr.20 genes from the plentiful omics information. Genomic approach is a main means to investigate the chromosome structures, gene locations and functions. In spite of great development in globally measurement of gene expression, qualitatively and quantitatively, there is still lack of systematical investigation which specifically focuses on the correlation between gene expression status and the individual chromosome. A major argument over the field is whether the gene expression messages, mRNA or protein, in contrast to the linear and continuous genomic messages, are able to reflect the biological characteristic of chromosome. The initiation of ChromosomeCentric Human Proteomics Project (C-HPP) is expected to give the argument an unambiguous answer.14 With the improvement of accuracy, resolution and scan speed for mass spectrometry, protein identification at large scale has been revolutionarily changed. Currently, 50−70% coverage of the annotated genome could be achieved in eukaryotic cells, while as high as 90% coverage of the prokaryotic genome is reachable. Concentrating on the analysis of high accuracy and quantitative proteomics data using the MaxQB database, Mann’s group identified 11731 unique proteins from 11 human cell lines, covering about 50% of the human genes, which were mapped with the identified proteins onto 23 pairs of chromosomes.15 Although the study did not pay great attention to the details regarding the identified proteins related to the chromosomal features, the high coverage of proteins to each chromosome and distribution of the proteome abundances in every chromosome really set up a successful case for chromosome proteome. On the other hand, this case prompts us to consider how to proceed in a chromosome project. First of all, the study on gene expression is different from on gene annotation. All gene expression is always related with the abundance of gene expression product. To achieve a real profile of chromosome proteome, quantitative information is fundamentally



MATERIALS AND METHODS

1. Sample Preparation for Chromosome Proteome

Three different tumor-related samples from certain tissues, stomach, liver and colon, were collected, and each sample included three parts, tumor tissues, tumor-adjacent tissues and the representative cancer cell lines. Briefly, the colon tissues and their adjacent tissues, and the liver tissues were collected from Beijing Cancer Institute & Hospital, while the gastric cancer and their adjacent tissues were collected from Sun Yat-Sen University Cancer Center. All the specimens were under the control of the ethical review board in the hospitals and institute. The representative cell lines were two gastric cell lines (AGS and BGC823), two colon cancer cell lines (HCT116 and SW480), and eight liver cell lines (MHCC97L (97L), MHCC97H (97H), HCCLM3 (LM3), HCCLM6 (LM6), SNU398, SNU449, SNU475 and Hep3B32−34). The proteins in the collected tissues and cell lines were extracted in the lysis buffer, and the protein amount was determined using BCA kit. The extracted proteins were digested by trypsin, followed by separation of digested peptides by high pH reversed-phase chromatography. 2. Proteomic Profiling of the Tumor-related Samples

Basically, proteomics analysis to the chromosome-related proteins was conducted with LC−MS/MS approaches, using TripleTOF 5600(AB Sciex) and Oribtrap-based mass spectrometry (Thermo Scientific). Briefly, the acquired MS/MS data were searched using Mascot v2.3.2 in local server against the database containing Swiss-Prot (20231 proteins, 2012_07 release) and common contaminants (115 proteins, ftp.thegpm.org/fasta/ cRAP). The target-decoy-based strategy was applied to control both peptide and protein level false discovery rates (FDRs) lower than 1%.18,19 To calculate the approximate abundance of each protein, we used the iBAQ algorithm, which normalizes the summed peptide intensities by the number of observable peptides of the protein.20,21 3. Transcriptomic Profiling of the Tumor-related Samples

The transcriptomic data were derived from two sources, one from BGI-Shenzhen for mRNA measurement with RNA-seq, and the other from public databases. For the gastric cell lines, 152

dx.doi.org/10.1021/pr3008336 | J. Proteome Res. 2013, 12, 151−161

Journal of Proteome Research

Article

transcriptions of Chr.20 genes and their translations, and whether Chr.20 proteins are tissue- or cancer- dependent.

AGS and BGC823, the total RNAs were isolated with TRIzol (Invitrogen) to build the library, followed by RNA sequencing using Illumina HiSeq 2000. For the hepatic cell lines, MHCC97L (97L), MHCC97H (97H), HCCLM3 (LM3), HCCLM6 (LM6), SNU398, SNU449, SNU475 and Hep3B32−34, the RNA samples were hybridized to Affymetrix HG-U133 2.0 Plus oligonucleotide microarrays (Santa Clara) which included 54675 probes from 31948 genes. All other data sets were downloaded from http:// www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-37, including the RNA-seq of 24 cancer and 6 adjacent of gastric tissues,22 the microarray data of colon cancer and adjacent tissues,23 the RNA-seq of colon cancer cell lines, the RNA-seq of 10 cancers and 10 adjacent hepatic tissues.24 The expression levels of genes in RNA-seq data were determined based on the value of RPKM (Reads Per Kilobase per Million reads),25 while the transcriptome profiles of microarrays were represented by MAS 5.0 (Affymetrix) at http://www.affymetrix.com/. The RNA-seq and microarray raw intensity was normalized by the correspondent median value in each sample. Whether a gene was transcribed was evaluated by the cutoff of RPKM from RNA-seq or at least 50% of parallel chips presenting judgment “P” from MAS 5.0.

1. Qualitative Proteomics Analysis of the Proteins Encoded by Chr.20 Genes

On the basis of current knowledge of chromosome encoding genes, there is no generally acceptable rule for evaluation of the gene expression status in tissue or cell. For the sake of identifying more proteins which are derived from Chr.20 genes and are possibly associated with diseases, we designed a strategy of chromosome proteomics as described in Materials and Methods, qualitatively and quantitatively profiling to the proteins in three gastrointestinal cancer tissues and their representative cell lines, such as stomach, liver and colon. Meanwhile, we collected the relevant mRNA information that was acquired from RNA-seq or RNAchip. In total, 323 proteins were identified as the translated products of Chr.20 genes from all the samples examined in this study, covering about 60% of the coding genes of Chr.20 (323/547). Of these proteins, 263 are from the liver tissues or cells, 254 from colon and 221 from stomach, and approximate 55% (179/323) were commonly identified in all the three types of samples. Although the number of identified proteins from the three different tissues and their representative cell lines were close, the unoverlapped parts of the identified proteins were very large, which led to a speculation that some proteins encoded by Chr.20 genes were tissue- or cell-type dependent somehow. In contrast to the proteomics with low coverage to the Chr.20 coding genes, the mRNA data exhibited higher coverage to these genes (96%, 523/547). The qualitative information of Chr.20 gene expression, integrated all the samples together, mRNA and protein, is illustrated in Figure 1. The distribution patterns of gene expression are generally correlated with the gene densities along the chromosome, higher density of genes accompanying with higher gene expression in genome regions. On average, based on genomic annotation, 8.6 genes distributed at every Mb of genome, while according to the mRNA and proteome data, 8.4 mRNAs and 5.1 proteins were identified per Mb on Chr.20, respectively. There are two regions with high gene densities in Chr.20, a 4.3 Mb region at q13.12 with 25 genes/Mb, and a 2.7 Mb region at q11.21 with 21 genes/Mb. In the region of q13.12, mRNA detection rate was approximate 24/Mb, which was close to the gene density in this region, however, the protein identification rate was significantly lower than that value, approximate 11/Mb. High frequency of genomic instability was found in the region of Chr.20 q13.12, such as numbers of gene deletion events or DNA copy number aberrations.32−34 For instance, Akan et al. adopted ChIP technology to investigate the regulation factors with histone binding, and annotated several putative regulatory elements in the region.35 The evidence that the protein products encoded in q13.12 were identified at lower rates is in good agreement with the observation that abnormal regulation of gene expression is often found in this chromosomal region. In q11.21, both the rates of protein and mRNA detection were obviously lower than the gene density in this region, approximate 18 mRNAs/Mb and 17 proteins/Mb. Totally, only 23 genes in Chr.20 were not detected with transcripts by RNA-seq or RNA chip in our data, whereas approximate 26% (6/23) of them located at 20q11.21were not detected with any proteins signals as well. Actually, gene deletion in 20q11.21 at a common region of 2.6Mb was well recognized in some kind of tumors such as acute leukemia, while MacKinnon et al. revealed the common retained region in 20q11.21 in Myelodysplastic syndromes.36,37 These observations suggest that 20q11.21 is

4. Bioinformatics Analysis toward the Chr.20 Proteome

The identified proteins and mRNAs were mapped onto their correspondent gene locations in Chr.20 using the program written in R language (R 2.12.1).26 Consequently, distributions of all the identified proteins and mRNAs and the gene densities through Chr.20 were generated. The graphic software, Circos,27 was employed to illustrate the distribution status of several chromosomal features and the correspondent Chr.20 gene expression products identified. The information of transcriptional regulators and the correspondent target genes and cancer-related genes were obtained from IPA (www.ingenuity.com), microRNA and the correspondent target genes from miRBase (version 18),28 and the conserved genes from Core Eukaryotic Genes data set (http://korflab.ucdavis. edu/data sets/cegma/),29 respectively. To quantitatively evaluate the correlation of gene expression status with their chromosomal locations, the identified proteins and mRNAs with abundance information were colocalized with the correspondent genes onto Chr.20 and illustrated using R 2.12.1. The abundances of proteins and mRNAs were valuated with iBAQ and RPKM, and further normalized by their median values. For the sake of better evaluations to expression status, the abundances were logarithmically treated, and the log abundances were plotted against the correspondent genes. The abundant ranks of all the identified Chr.20 proteins were evaluated according to the iBAQ and SIn values. Quantitative correlation between mRNA and protein of Chr.20 genes in each sample were analyzed using Lowess, a regression algorithm that combines multiple regression models.30 The tissueor cell line-dependent expression status of Chr.20 genes were evaluated according to the correlation coefficients. Hierarchical cluster (Cluster 3.0) was used to analyze correlations among the abundances of the Chr.20 cancer-related genes and these samples examined.31



RESULTS In order to pursue Chr.20 related bioinformatics derived from the cancer tissue and cell proteomes and transcriptomes, we extended the study in four questions, how Chr.20 genes could be translated in such materials, what the genetic characteristics of Chr.20 proteins are, which correlation exists between the 153

dx.doi.org/10.1021/pr3008336 | J. Proteome Res. 2013, 12, 151−161

Journal of Proteome Research

Article

Figure 1. Overview of the identified proteins and mRNAs in Chr.20. The identified proteins and mRNAs are mapped onto their correspondent genes in Chr.20. The rows of black, red and blue represent the genes in Chr.20, the identified proteins and mRNAs, respectively.

gene copy number variations. Low detection rate of the defensin proteins in Chr.20 may be resulted from two causal reasons, the abnormal association of the family genes with cancers inducing their lower expression and less tryptic peptides generated from the small defensin proteins, making a detection obstacle for mass spectrometry.

a susceptible region for DNA recombination, deletion or insertion. Additionally, we found the beta-defensins family genes were enriched in this region, such as DEFB115, DEFB116, DEFB118, DEFB119, DEFB121, DEFB123 and DEFB124, but the correspondent proteins were not found in our proteome data.38,39 Intriguingly, only one of these beta-defensins, DEFB121, is listed in GPMDB, Global Proteome Machine Database. The members of this family are some small proteins with molecular weight lower than 10kD, and were reported susceptible to diseases due to the

2. Correlation Analysis of the Chromosomal Features and the Chromosome Proteome

We further sought whether the gene expression status in Chr.20 was correlated with some chromosomal features. Basically, four 154

dx.doi.org/10.1021/pr3008336 | J. Proteome Res. 2013, 12, 151−161

Journal of Proteome Research

Article

Figure 2. Overview of the identified proteins and mRNAs with their chromosomal features in Chr.20. The identified proteins and mRNAs with their chromosomal features are mapped to Chr.20 using Circos software. The outside circle represents the Chr.20 regions, in which centromer is labeled with red. In each chromosomal feature, the rings of black indicate the Chr.20 genes with such feature predicted by databases, red and blue represent the identified proteins and mRNAs corresponding to the feature, respectively.

aspects were manipulated in such analysis, including target genes of transcription factors or of microRNA, conserved genes, and genes related with cancers. The correlation analysis is summarized in Figure 2, which clearly shows that, except the target genes of microRNA, the gene expression products associated with the other three chromosomal features were identified over 70% in protein, suggesting that proteomics may represent certain information released from chromosome. 1). Target Genes of Transcription Regulators. On the basis of the IPA database (www.Ingenuity.com), 106 target genes of transcription regulators are located at Chr.20. A total of 80 in the 323 identified proteins belong to such target genes and are possibly regulated by 19 transcription regulators, while 105 of the target genes were detected in the transcriptive forms (Supporting Information Table S1). Importantly, the majority of these target genes identified are also cancer-related. For example, over 90% of the identified target proteins are possibly regulated by SP1 and P53, which are located at chr12 and chr17 and are positively involved in regulating numbers of cancer-related genes. 2). Target Genes of microRNA in Chr.20. Total of 269 Chr.20 genes were predicted as the targets of 783 microRNAs through miRBase, of which 267 and 197 of them were identified at mRNAs and proteins, respectively (Supporting Information Table S2). The target gene expression products are distributed through whole Chr.20 as similar as gene density distribution (Figure 2). Interestingly, the 783 microRNAs that potentially regulate the Chr.20 target genes are enriched in chromosome

7, 8, 10, 12, 14, 15 and X, particularly in Chr.X, indicating that the microRNAs of Chr.X have close interactions with Chr.20 transcription as well as translation process (Supporting Information Figure S1). Several target genes regulated by Chr.X microRNA were found tissue-dependent. For instance, CSE1L was found highly expressed in gastric normal tissues, but was lower in other samples. 3). Conserved Genes Identified in Chr.20. There are 458 genes collected in the data set of Core Eukaryotic Genes, and 12 of them are located on Chr.20 (Figure 2). The 12 conserved genes were identified both in transcriptive and translational products, and the mRNAs and proteins of these conserved genes were generally found in most tissues with relatively high abundances. Furthermore, the abundances of Chr.20 conserved proteins in the different tissues valued with iBAQ were logarithmically treated and ranked. The quantitative information presented in Supporting Information Figure S2 reveals that the abundance of the five conserved proteins, AHCY, EIF6, CSE1L, ITPA, and PCNA are always ranked at the top list of the identified Chr.20 proteins. 4). Cancer-related Genes in Chr.20. We collected the cancer-related genes from IPA database, and found 143 localized in Chr.20, of which 107 were identified in proteins, while all of them were transcribed to mRNAs (Supporting Information Table S3). Further analysis reveals that 67 proteins were shared by all the cancer samples, 9 uniquely identified in colon samples, 11 uniquely identified in liver samples, and 2 uniquely identified 155

dx.doi.org/10.1021/pr3008336 | J. Proteome Res. 2013, 12, 151−161

Journal of Proteome Research

Article

Figure 3. Correlation analysis of gene expression abundances with the chromosomal positions in Chr.20. The normalized abundance of the identified proteins and mRNAs are logarithmically treated and the log2 values are plotted against the correspondent genes in Chr.20. The rows of black mean the gene distribution in Chr.20, while of red and blue represent the log2 abundances of the identified proteins and mRNAs abundances, respectively. Up, adjacent normal tissue of colon cancer; Middle, colon cancer tissue; and Bottom, colon cancer cell lines.

q13.33 prompts a question of which causal factors respond to this event. Several genomic abnormalities were found in this region, which could regulate expression process for some important genes like EGFR, FASLG and GLUT4, and bring many changes in cell processes associated with tumor development. In addition, as revealed by Figure 2, the cancer-related genes covered most of the transcriptional regulator target genes in Chr.20, strongly supporting that abnormal regulation of transcription is likely to accompany with tumorigenesis process.

in gastric samples. By mapping the detected cancer-related genes to Chr.20, there are three regions, 20p11.21, 20q11.22−20q11.23, and 20q13.33, where the proteins identified were much less than the mRNA signals detected (Figure 2). For instance, 20 cancerrelated genes are located at 20q13.33, while all of them were perceived by mRNA measurement but 8 were missed by proteomics, such as BIRC7, CDH4, and CHRNA4. As the detection ratio of protein to mRNA for the cancer-related genes in our database is approximate 75% (107/143), such low ratio of 60% (12/20) in 156

dx.doi.org/10.1021/pr3008336 | J. Proteome Res. 2013, 12, 151−161

Journal of Proteome Research

Article

Figure 4. Analysis of scatter plot matrix to Chr.20 gene expression status among different samples. The correlation coefficients of the expressed gene abundances between the varied samples are statistically estimated using Lowess algorithm. The lower left panels represent the scatter plots with smoothing splines after regression treatment to paired sample abundances, and the upper right panels display the correspondent Pearson’s correlation coefficients.

3. Quantitative Analysis of the Chr.20 Gene Expression in Chromosomal Regions

groups according to their abundances, mRNA or protein. One group has abundances above the median values (Group-1), and the other one below the medians (Group-2). And we estimated the ratios of each group within each chromosomal region, and treated it as the indicators of gene expression patterns. As for mRNA, the ratios of Group-1 in all the chromosomal regions are ranged from 40 to 70%, while as for protein, they occupy about 20−80%. In most regions, the ratios of Group-1 at mRNA remain similar values among tissues and cell lines, except at the regions of 20q11.21 and 20q13.33, indicating that there are a few of tissue- or cell-dependent changes at mRNA abundance in this chromosome. Moreover, the ratios of Group-1 at 20q11.21 and 20q13.33 for mRNA in cell lines are obviously higher than that in tissues. Compared to mRNA, the abundance patterns of Chr.20 proteins display tissue- or cell-type dependent. The ratios of Group-1 at 7 regions, 20p12.3, 20p11.23, 20p11.21, 20q11.23, 20q13.12 and

We analyzed the expression abundance of Chr.20 genes, mRNA and protein, using iBAQ and RPKM to further investigate the correlation of these abundances with their chromosomal locations. As the abundances of gene expression are always tissue-and cell- type dependent, it is difficult to integrate the abundance data acquired from different sample sources for studying on abundant distribution along chromosomal positions. We therefore present a typical distribution of gene expression abundance in Chr.20 for certain tissues and cells, colon cancer tissues, their adjacent tissues and colon cancer cell lines. As illustrated in Figure 3, the abundances of gene expression in different chromosomal regions, either mRNA or protein, are exhibited in quite variable patterns. For the sake of evaluating the differences of these patterns, we simply divided the genes at each chromosomal position into two 157

dx.doi.org/10.1021/pr3008336 | J. Proteome Res. 2013, 12, 151−161

Journal of Proteome Research

Article

20q13.33 (Supporting Information Table S4), in the two tissues are almost comparable, whereas that in cell lines are different from the tissues, suggesting that there are tissue- or cell- dependent Chr.20 proteins, at least for colon samples. Intriguingly, the gene expression abundances, mRNA and protein, in 20q13.33 in cell lines are clearly more abundant than that in tissues. The region of q13.33 is high disease-suspicious, such as colorectal cancer, gastric cancer and glioma.40,41 For instance, by finemapping the q13.33 region, Song et al. found that the SNPs in the region exerted a complex effect by the independent signals contributing to diseases, two SNPs at the genome-wide significance (P-value