Why Are the Correlations between mRNA and Protein Levels so Low

Sep 28, 2017 - Bacteria use expanded genetic code. The genome of every cell on Earth uses four DNA bases—adenine, thymine, cytosine, and guanine—t...
0 downloads 10 Views 1MB Size
Subscriber access provided by La Trobe University Library

Perspective

WHY ARE THE CORRELATIONS BETWEEN mRNA AND PROTEIN LEVELS SO LOW AMONG THE 275 PREDICTED PROTEIN-CODING GENES ON HUMAN CHROMOSOME 18? Ekaterina V. Poverennaya, Ekaterina V. Ilgisonis, Elena A. Ponomarenko, Arthur T Kopylov, Victor G. Zgoda, Sergey P. Radko, Andrey V Lisitsa, and Alexander I. Archakov J. Proteome Res., Just Accepted Manuscript • DOI: 10.1021/acs.jproteome.7b00348 • Publication Date (Web): 28 Sep 2017 Downloaded from http://pubs.acs.org on October 1, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Proteome Research is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 27

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

WHY ARE THE CORRELATIONS BETWEEN mRNA AND PROTEIN LEVELS SO LOW AMONG THE 275 PREDICTED PROTEIN-CODING GENES ON HUMAN CHROMOSOME 18?

Ekaterina V. Poverennaya*1, Ekaterina V. Ilgisonis1, Elena A. Ponomarenko1, Arthur T. Kopylov1, Victor G. Zgoda1, Sergey P. Radko1, Andrey V. Lisitsa1, Alexander I. Archakov1

*Corresponding author: Dr. Ekaterina Poverennaya, 119121, Russia, Moscow, Pogodinskaya Street, 10 E-mail: [email protected] Ph.: +7-499-246-69-80 Fax: +7-499-245-08-57

1

-Institute of Biomedical Chemistry RAS, Moscow, Russia

1 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 27

Abstract In this work targeted (selected reaction monitoring, SRM, PASSEL: PASS00697) and panoramic (shotgun LC-MS/MS, PRIDE: PXD00244) mass-spectrometric methods as well as transcriptomic analysis of the same samples using RNA-Seq and PCR methods (SRA experiment IDs: SRX341198, SRX267708, SRX395473, SRX390071) were applied for quantification of chromosome 18 encoded transcripts and proteins in human liver and HepG2 cells. The obtained data was used for the estimation of quantitative mRNA-protein ratios in the chromosome- (genecentric) mode for the 275 genes of the selected chromosome in the selected tissues. The impact of methodological limitations of existing analytical proteomic methods on gene-specific mRNAprotein ratios and possible ways of overcoming these limitations for detection of missing proteins are also discussed.

Keywords: mRNA sequencing, quantitative PCR, selected reaction monitoring, Human Proteome Project, transcriptome, proteome, human chromosome 18, limit of detection,

analytical sensitivity

2 ACS Paragon Plus Environment

Page 3 of 27

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Introduction There are 2,563 missing proteins1–3 (proteins having PE2, PE3 or PE4 “protein existence” evidence in neXtProt4) in the human proteome (neXtProt, v2.9.0). These predicted proteins have not been detected in any of the biological samples using existing experimental methods. Missing proteins (MP), according to HPP Consortium definition, are proteins that currently lack sufficient evidence from MS or other proteomic methods5,6. What does it mean? In our view, the fundamental is separation of biological and technical reasons for existence of status MP. Differences in expression levels within the same genes and translational rates between tissues 7–9 lead to a significant variability of proteomes of tissues and organs10. The protein can be missing, because the expression of encoding gene has not been confirmed in any of the types of biological material recently studied, or mRNA corresponding to the gene is already detected, while the protein may be not translated. On the other hand, the protein could be undetected due to existing limitations of the proteomic methods11, which could not reveal proteins present in the sample in a low and ultra-low concentration. In this case, chemical noise of other molecules in the biological matrix may affect signal/noise ratio. As a result, MS-signals from presenting proteins may be lost or new MS-signals may appear and cause false-positive results. In this context we mean that missing proteins are proteins that were not detected in a specific type of biological material: as an example, liver tissue and HepG2 cell line, which were analyzed during Russian part of C-HPP focused on human chromosome 18 (Chr18)12. Particularly, there are 20 MP located on Chr18, which has 276 predicted protein-coding genes according to information from 2017 (neXtProt v2.9.0). At the time of experimental work according to neXtProt (v.2016-08-08) database, there were 275 predicted protein-coding genes on Chr18, for which quantitative mRNA-protein ratios were studied. In comparison with data from 2016 the number of MP decreased – three MP (Q5BJE1, Q8IYT4, Q8IYD9) were detected by deep analysis of different tissues 13.

3 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The design of the chromosome-centric study used in our investigation

Page 4 of 27

12,14–16

of Chr18

allowed to obtain the level of mRNA and corresponding level of protein encoded by the same gene for each of the protein-coding genes located on Chr18. In our study, we used the results obtained in 2012-2016 from the transcriptomic and proteomic profiling of two types of biological material – liver tissue cells and hepatocellular carcinoma cell line (HepG2)12,14–16. Transcriptomic profiling was performed using RNA-sequencing: RNA-Seq analysis was performed independently on two platforms, SOLiD and Illumina. RNA-Seq datasets were deposed in NCBI Sequence Read Archive (experiment IDs: SRX341198, SRX267708, SRX395473, SRX390071) and methods were based on the polymerase chain reaction (PCR) – Next-generation droplet digital PCR (ddPCR) and Real Time quantitative PCR (qRT-PCR). Proteomic profiling was performed using selected reaction monitoring MS-technology, using stable isotope labeled peptide as a standard (SRM-SIS) for the quantitation analysis of protein in the samples (see Tear317, C-HPP Guidelines6). SRM datasets are available in PASSEL: PASS00697. A detailed description of the experiments and data analysis is given in previously report papers of Chr18 C-HPP Team12,14,15. Transcriptome and proteome profiling was performed only for master proteins12, excluding proteoforms arising from alternative splicing, mutations (SAAV) or post-translation modifications.

Missing proteins: not expressed, not translated or undetected? An unambiguous confirmation of gene expression in the specific type of biomaterial was the presence of specific mRNA detected in the sample. This confirms the possibility of gene expression and, consequently, the probability of presence of the corresponding protein in the sample. A priori, transcriptomic methods are more sensitive than proteomic methods because of their ability to detect one molecule of the nucleic acid by the PCR reaction17. Strategy of searching the missing proteins includes initial search of expression products of the corresponding 4 ACS Paragon Plus Environment

Page 5 of 27

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

gene on transcriptomic level and subsequent search on proteomic level6. The lifetime of transcript and corresponding protein are different: for example, the average half-live of protein is 46 hours, but half-live of transcript is only 9 hours, yet in the half of the cases these parameters do not correlate9,18. However, for transcriptomic and proteomic technologies several thousand or even millions of heterogeneous cells are used, that averages the results received and estimates the possibility of detecting gene products in each case. Despite the higher sensitivity of transcriptomic analysis, the list of transcripts detected in the sample highly depends on cut-off level of RPKM in case of RNA-Seq data. The problem of choosing the optimal RPKM cut-off level has been solved in many ways19. At the same time, there is no single-valued algorithm for selecting a particular cut-off level: RPKM>020,21, RPKM>0.122 and RPKM>123 are mentioned in the literature. In our work, transcripts found with the level of analytical sensitivity comparable with the proteomic methods were considered as detected. For quantitative transcriptomic and proteomic data analysis the same units of measurements were used: the number of copies of molecules (mRNA or protein) detected in one cell of the sample (liver or HepG2 cells) with the same level of analytical sensitivity. Since for the same sample transcriptomic profiling was performed by both RNA-Seq measurement methods and qRT-PCR and ddPCR methods, at the data processing stage the relationship between the quantitative values obtained by different methods was evaluated12,14–16. The correlation between sequencing data and the PCR data is not as high (R2 = 0.53 and R2 = 0.64 for liver tissue and HepG2 cell line, respectively), as it was in the case of SOLiD and Illumina RNA-Seq platforms (R2 = 0.82 and R2 = 0.88 for liver tissue and HepG2 cell line, respectively)12. This can be explained as follows: in the case of sequencing, expression of single gene is calculated on the basis of all the products of its transcription, in the case of PCR – orienting point is the unique section of this gene which corresponds to the canonical version of the protein. PCR based methods have higher sensitivity, since they are targeted to identify the transcript of the specific gene. Within the C-HPP project, it is possible to complete targeted 5 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 27

analysis of specific human chromosome, which allows the use of PCR-based methods for quantitative estimation. Herewith the use of PCR methods for the quantitative analysis of expression products of the complete human genome currently is difficult for implementation. High correlation between two types of the transcriptomic data obtained by RNA-Seq, and the presence of PCR data for the same samples allow moving from the RPKM value to the number of transcript copies per cell. The calibration equation y=0.99x+1.34 (Figure S1a), where x is lg (RPKM) and y is lg (transcript copies per cell) was used to estimate the copy number of transcripts in HepG2 cell line. Similarly a number of transcript copies in the liver tissue cells was obtained using the calibration equation y=0.76 x+0.83 (Figure S1b.). Figure 1 shows histograms of the distribution of the number of transcripts and proteins measured for chromosome 18 genes in liver and HepG2 cells. The quantitative method of proteomic analysis – selected reaction monitoring with the stable isotope labeled peptides as standards (SRM-SIS) - allows to detect a protein if its quantity exceeds 10 copies per cell (Figure 1b). For comparison, a transcript can be detected at a level of 1 copy per 100 cells (Figure 1a). Transcripts detected at sensitivity level more than 10 copies per cell were selected, that allowed comparing these results with proteomic data (Table S1). Selected cutoff level allows to analyze transcripts detected at RPKM>1 cutoff level. The transcriptomic data obtained by us for Chr18 were compared with the results of the largest transcriptomic study, performed within The Genotype-Tissue Expression (GTEx) project24. In case of liver tissue, there is a high correlation between our RNA-Seq data and information from GTEx, which have the average values from 119 different samples using Illumina technology (Figure 2). According to the information from GTEx, the expression was detected for total of 231 genes localized in Chr18, and ~ 60% of them were characterized with RPKM≥1. According to our data 228 of Chr18 genes were detected at transcriptome level and 59% of them with RPKM≥1.

6 ACS Paragon Plus Environment

Page 7 of 27

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Table 1 shows the number of Chr18 genes for which transcripts and proteins are detected in selected types of biomaterial. We divided all 275 predicted protein-coding genes of Chr18 into three groups: the ones for which expression in the samples of biomaterial was not shown (neither transcript, nor proteins were detected), group of genes for which proteins quantity was measured, and missing proteins for each type of selected biomaterial – proteins, that were not detected in measurements of these two types of samples during our part of C-HPP. Obviously, two different search strategies for missing protein should be considered in order to overcome both biological and technical obstacles in protein detection. If the transcript is not detected in the sample, the biological reason of lack of protein should not be excluded, i.e. lack of gene expression in that sample. In order to find missing proteins of that group it is reasonable to use samples of other types of biological material, or to increase the number of biological replications. Transcripts for 12 genes in the liver tissue and 13 genes in the cell line out of 20 missing proteins coded on Chr18 (according to neXtProt data) were not detected and 10 of them were not detected in both biomaterials (Table 1). Other types of biological material will be analyzed in future to search these missing proteins. Unfortunately, for half of missing proteins undetected in liver tissue and HepG2 cell line (A6NLF2, Q3SY89, Q6ZT83, Q6ZTR6, Q96KH6) no data of gene expression according GTEx and Human Protein Atlas. Testis is promising tissue for searching four of MP (B2RU33, Q3B7S5, Q96P15 and Q9HC47) and detection of A6NM36 is most likely in muscle tissues. Transcripts for other MPs on chromosome 18 were detected for 8 genes (liver tissue) and 7 genes (HepG2 cell line) (Table S2 and S3). Since the gene expression was revealed in these samples, the strategy of further search for these missing proteins in our case should specify the reason why proteins were not detected: is it because of lack of translation, or is it because the proteins are translated in the amount insufficient for detecting? Plan of future search of Chr18 missing proteins within the neXt-50 C-HPP Challenge25 should include implementation of translational 7 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 27

analysis of same samples of biological material for evaluating the speed of the translation. It will help to identify the group of proteins, which are not detected due to technical, not for biological reasons. If translation is proved, a variety of additional methods of sample preparation can be used to improve the analytical sensitivity of targeted mass spectrometry analysis26, either other proteotypic peptides can be selected to reduce the noise in biological matrices. The number of transcripts detected in the sample may be a “backbone” for assessing the quality of proteomic analysis, taking into account the number of transcripts, i.e., known genes expressed in a given sample. If the transcript is not detected, it is impossible to distinguish a situation when there is no gene expression in the biosample from a situation where the expression exists, but the limit of detection (LOD) of the technology is insufficient for detecting expression products - transcripts or proteins7. The protein copies number was measured in liver tissue cells for 73 and 68 genes (26.5% and 24.7% of the total predicted protein-coding genes of human Chr18). Intersection unites 58 proteins detected and measured in both types of biomaterial. Shotgun LC-MS/MS (panoramic) analysis helped to identify 8 proteins in liver tissue and 32 in HepG2 cell line12. We used this proteomic method only to confirm the presence of proteins in the biological material, but not for quantitative analysis. The gene-centric format allows the calculation of the intersections between transcriptome and proteome datasets, where TP (true positive) - the number of predicted protein-coding genes of chromosome 18 for which both transcript and protein were detected, FP (false positive) – protein detected, but no transcript detected, TN (true negative) – absence of both transcript and protein in the sample, FN (false negative) – transcript detected, but protein not detected. Similarly to the FDA diagnostic methods (FDA, 2007), sensitivity, specificity and accuracy of proteomic methods may be calculated compared with transcriptome of the same sample. Sensitivity (Se) is the ability of technology to give the correct result, which is defined as the proportion of true positive results among all the measurements. In our case, it was the proportion of detected proteins for which the presence of a transcript was shown in the same 8 ACS Paragon Plus Environment

Page 9 of 27

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

sample. Specificity (Sp) is the ability of the technology not to give false positive results in case of absence of the transcript, that is, the proportion of undetected proteins for which there was no transcript detected in the same sample. Accuracy (Ac) is the proportion of correct results, that is, the sum of true positive and true negative results among all the measurements. Following formulas were used for the calculation: Sensitivity (Se):

Se = TP / (TP + FN) * 100%.

(1)

Specificity (Sp):

Sp = TN / (TN + FP) * 100%.

(2)

Accuracy (Ac):

Ac = (TP + TN) / N * 100%.

(3)

where N = 269, the number of predicted protein-coding genes on human chromosome 18, which can be detected on the proteomic level; six protein-coding genes were excluded from the analysis due to specific proteotypic tryptic peptides that were not predicted, for details see 12. Based on the data obtained (Table 2) we can see that SRM-SIS, which is the most sensitive one among modern proteomic methods27, however, does not have sufficient sensitivity and accuracy for detecting all proteins, corresponding to mRNA detected in the same sample. Average sensitivity of proteomic technology for both type of biomaterial consists of ~40% and accuracy ~65%. In comparison, relevant values for shotgun LC-MS/MS are – Sn 10 (8) %, Sp 97(99) %, Ac 20 (18) % for HepG2 cell line (liver tissue). The specificity becomes comparable with the results of the targeted analysis with the use of isotope-labeled synthetic peptides as internal standard. At the same time, the shotgun LC-MS-MS has significantly lower values of Sn and Ac. Using indicators of sensitivity, specificity and accuracy for proteomic methods demonstrated, we could see that proteins are “missing” in the sample due to different reasons. If we do not take into account the biological reasons of the missing proteins, than the limit of detection (LOD) of the analytical method applied is essential. The LOD determined for the SRM-SIS is the most sensitive one among all proteomic technologies16,

17

. It is necessary to

distinguish the term “sensitivity of the analytical method” (LOD) – that is the minimum number 9 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 27

of molecule copies necessary for detection, and “sensitivity” – is the proportion of true positive results among all the measurements obtained by a specific method of detection. Standard features of SRM-SIS method currently make possible the detection of proteins with quantitative content of at least 1 000 copies of protein molecules per 1 ml of plasma, that means that the limit of detection could reach 10-14М15, 16. Figure 3 shows the distribution histogram for proteins detected by SRM-SIS according to their quantitative content in the liver and HepG2 cells. The greatest number of proteins detected in liver tissue belongs to low-abundance proteins, and mostly proteins detected in HepG2 cells belong to middle-abundance. The number of high-abundance proteins is minimal in both biomaterials. On the one hand, such kind of distribution is to be expected in accordance with the transcriptomic data - on Chr18 there are basically no highly expressed genes 9, 11, 19. On the other hand, previously published information 18 states that there is small amount of proteins in the lowabundance range, however, according to the distribution observed, this assumption can be explained by insufficient LOD of technology.

Differences between mRNA-protein levels in human liver tissue and HepG2 cell line mRNA-protein levels in human liver tissue and HepG2 cell line were compared for each Chr18 genes, for which both mRNA and protein were detected (Figure 4). The coefficient of determination (R2) for liver tissue is 0.07, and 0.14 in case of data obtained by RNA-Seq methods (N = 54) and PCR method (N = 59), respectively (Figure 4a and 4b). The correlation between transcriptomic and proteomic data for HepG2 cell line is R2 = 0.07 when using RNASeq (N = 51) and R2 = 0.09 (N = 54) when using PCR (Figure 4d and 4e). When comparing combined transcriptomic data with proteomic data we can observe coinciding values of the determination coefficient and the results of PCR comparison for liver tissue (Figure 4c), but for the hepatocellular cell line the correlation value is an average rate between the results of RNASeq and PCR (Figure 4f). Protein products for 147 transcripts can be found in liver tissue, and 10 ACS Paragon Plus Environment

Page 11 of 27

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

159 transcripts in hepatocellular HepG2 cell line, and 136 of those transcripts can be found in both types of biomaterial (Table S1). Recalculated values of the correlation between the transcriptome and proteome in this case are R2 = 0.16 for liver tissue, R2 = 0.05 for the HepG2 cell line (Figure S2). The lack of correlation in all these cases may be caused by different sensitivity of analytical methods. According to the studies of UPS2 performed under different conditions (Ilgisonis et al., submitted), the SRM-SIS LOD is 10-11 M, which is ~6 protein copies per liver cell, and ~12 protein copies per HepG2 cell. The difference between these rates for hepatocyte and HepG2 cells is related to their sizes, since proteomic analysis requires 100-150 times fewer HepG2 cells than liver cells28,29 (Table S2 and S3). The ratio between the number of copies of the transcript and the number of copies of the protein was obtained for each gene for analysis of two types of biomaterial (liver tissue cells and HepG2 cell line). Based on these results we compared the protein-mRNA ratios between liver tissue and HepG2 cell line and revealed fractions of genes for which these ratios are constant between samples. Evaluation of the correlations between the quantitative content of the transcript and protein is important for development of predictive methods for detecting the potential level of protein in the sample9. It is known that the speed of translation varies considerably both between genes and in the same genes of different types of biological material. In our work, we evaluated the correlation of mRNA and protein for the genes of chromosome 18. We only considered those genes for which both transcripts and proteins were measured in both types of biological material. There were 55 of these genes found on Chr18. On the Figure 5 the histograms are presented showing the ratios of the number of protein molecules and molecules of mRNA, measured for protein-coding genes of Chr18 in the cells of the liver tissue and HepG2 cell line. It is shown that in the cells of the liver tissue half of the total number of genes (N=55) has the ratio of mRNA>>proteins, and for the rest of the measurements 11 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 27

the ratio is opposite. For both types of biological material number of genes encoding approximately the same number of mRNA and protein is very small, but these genes are specific for the sample. HepG2 cell line when compared to the liver tissue has number of genes, which are characterized by the ratio of mRNA> proteins, and those genes are 63% of the total number of genes studied. Interestingly that the list of those genes contains all the genes, for which the same ratio is confirmed in the cells of the liver tissue. The ratios obtained prove the importance of the assessing translation level for gene-centered analysis and for creating predictive mathematical models.

Conclusions and perspectives In recent years, many research teams are working on the problem of predicting protein levels based on transcriptomic data and a series of clarifying information, such as expression level of a transcript in a specific type of biomaterial and gene-specific level of translation for a particular tissue9. The objective of this study is to understand the reasons for which targeted selected reaction monitoring mass spectrometry together with stable isotope-labeled standard (SRM-SIS) is not capable of detecting majority of proteins translated by Chr18 in liver tissue and HepG2 cell line. There always will be a question if protein is not detected because of low concentration that does not reach the sufficient LOD or because it is not expressed or translated. It was revealed that the existing limit of detection is insufficient for detecting majority of proteins. The use of the FDA equation shows that SRM-SIS technology, which is considered the most analytically sensitive among the existing proteomic technologies, however has a low sensitivity and accuracy, while the specificity is very high. Therefore, for further coverage of the proteome of selected chromosome, it is necessary to increase sensitivity, e.g. to decrease LOD, while maintaining the specificity value.

12 ACS Paragon Plus Environment

Page 13 of 27

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Limitations of the limit of detection (LOD) are determined by a complex of causes, some of which applies to all proteomic approaches. For example, parameters such as the localization of a protein determine the degree of enzymatic hydrolysis. It is shown that proteomic analysis of membrane proteins is hindered because of limited availability of sites of protein hydrolysis by trypsin. Therefore, in these cases special approaches for sample preparation are used for efficient analysis of proteins30,31. Part of the problems of SRM-SIS detection limitations is related to the choice of peptides (‘best flyers’) for monitoring in targeted proteomics. Particularly difficult is the choice for short proteins, and highly homologous proteins. However, there is a number of bioinformatics resources such as PeptideAtlas32, Unicity Checker33 that helps to avoid many of the problems described above. Also LOD of SRM-SIS method can be caused by the interference and peptide ion suppression when analyzing proteins in complex biological samples such as plasma or tissue lysates34. Reducing the complexity of peptide mixtures by fractionation of the sample by different methods will help to avoid these effects35. During the realization of Russian part of C-HPP the number of genes in Chr18 was obtained for which both transcript and protein were detected is the same type of biological material, and this number is 52 for the HepG2 and 57 for the cells of the liver tissue. These two examples show that among 275 protein-coding genes, products of the expression can be detected on both transcriptomic and proteomic levels for ~20% of the genes. The total number of detected proteins is almost half of the total number of detected transcripts: in total in both types of biomaterial 81 proteins were detected (data obtained by SRM-SIS), while the number of genes expressing at the transcriptomic level is 164 (intersection of data obtained by RNA-Seq method on the Illumina and SOLiD platforms, where RPKM≥1, and data obtained by qRT-PCR+ddPCR methods, where Ct 10-8 M; then the number of proteins differs for liver tissue and HepG2 cell line. Because the greatest number of proteins located in Chr18 was detected in the low-abundance and middle-abundance protein range, while the number of highabundance proteins is minimal for both biomaterials, it was concluded that LOD of technologies plays the key role in the search for missing proteins. This work helped to reveal the advantages and disadvantages of using a chromosomecentric approach in the analysis of the transcriptome and proteome. Only within this approach it was possible to identify genes for which the transcript is present, while the protein is missing, as well as the opposite situation. A possible explanation for the difference between proteomic and transcriptomic data may be related to time factor. Researcher is working at molecular level with the “pictures” taken at certain points of the experiment, while RNA molecules have its own scale of time coordinates, and proteins have the other one, because the lifetime of these molecules also differs18. There always will be a question if protein is not detected because of low concentration, which does not reach the limits of level of the detection of analyzer, or because it is not synthesized. Taking into account the results obtained in this work it may be suggested that the 14 ACS Paragon Plus Environment

Page 15 of 27

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

objective assessment of the methodological possibilities of proteomic analysis in terms of sensitivity, specificity and accuracy shows the limitations on analytical sensitivity, which is insufficient for identification of all the proteins, for which the relevant transcripts were detected in the sample. The study we performed on standard protein samples UPS2 (Ilgisonis et al., submitted) allows us to completely neutralize the biological component, since a priori all the proteins present in the analyzed sample. It is shown that the technical component, namely the lack of sensitivity of the analytical proteomic methods (due to various reasons – interference, biological matrix noise) may be a reason for not detecting the proteins, even those present in the sample, with this the more complex is biological matrix in the sample, the lower is the number of detected proteins of those present in the standard set. The obtained results indicate that it is necessary to take into account analytical limitations of sensitivity when searching for missing proteins and to distinguish the cases of the lack of gene expression or protein translation in the sample and errors of the method at the stage of proteomic analysis.

Associated content Supporting Information Available: This material is available free of charge via the Internet at http://pubs.acs.org.

Acknowledgements. This work was supported by the Russian Science Foundation grant #15-15-30041. Authors are grateful to the “Human Proteome” Core Facility, Institute of Biomedical Chemistry (IBMC), which is supported by Ministry of Education and Science of the Russian Federation (agreement 14.621.21.0017, unique project ID RFMEFI62117X0017)

Supporting information 15 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 27

Figure S1. Correlation between RPKM data for transcripts of Chr 18 measured by RNA-seq method and the results of quantitative measurement of transcripts by PCR method Figure S2. Correlation between RPKM data for transcripts and proteins of Chr 18 measured by methods with same sensibility in cells Table S1. Transcriptomic data for human Chr18. Table S2. Liver tissue proteomic data for human Chr18 obtained using SRM-SIS Table S3. HepG2 proteomic data for human Chr18 obtained using SRM-SIS

16 ACS Paragon Plus Environment

Page 17 of 27

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Tables Table 1. Number of transcripts and proteins, measured for human chromosome 18 in liver tissue cells and HepG2 cell line. Number of the genes (n=275); since the experimental data is received before January 2017, 20 missing proteins in brackets. Type of biomaterial

Protein present

Transcript present, protein missing

Transcript missing, protein missing

Liver tissue

73

157 (8)

45 (12)

HepG2 cell line

68

151 (7)

58 (13)

Common

58

143 (5)

42 (10)

Table 2. Sensitivity (Sn), specificity (Sp) and accuracy (Ac) values of SRM-SIS method demonstrated in the study of human HepG2 cell line and liver tissue. Sensitivity

Specificity

Accuracy

Sn, %

Sp, %

Ac, %

Liver tissue

40.28

100.00

68.03

HepG2 cell line

35.33

89.08

59.11

AVERAGE

37.81

94.54

63.57

Type of biomaterial

Se = TP/(TP + FN)*100%; Sp = TN/(TN+FP)*100%; Ac = (TP + TN)/N*100%, where: TP number of genes for which both transcript and protein were detected, FP - protein detected, but no transcript detected, TN– absence of both transcript and protein in the sample; FN – transcript detected, but protein not detected

The gene-centric format allows the calculation of the intersections between transcriptome and proteomic datasets, where TP (true positive) - the number of predicted protein-coding genes of chromosome 18 for which both transcript and protein were detected, FP (false positive) – protein detected, but no transcript detected, TN (true negative) – absence of both transcript and protein in the sample, FN (false negative) – transcript detected, but protein not detected. Similarly to the FDA diagnostic methods (FDA, 2007), sensitivity, specificity and accuracy of proteomic methods may be calculated compared with transcriptome of the same sample. 17 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 27

References: (1)

Horvatovich, P.; Lundberg, E. K.; Chen, Y.-J.; Sung, T.-Y.; He, F.; Nice, E. C.; Goode, R. J.; Yu, S.; Ranganathan, S.; Baker, M. S.; et al. Quest for Missing Proteins: Update 2015 on Chromosome-Centric Human Proteome Project. J. Proteome Res. 2015, 14 (9), 3415– 3431.

(2)

Vandenbrouck, Y.; Lane, L.; Carapito, C.; Duek, P.; Rondel, K.; Bruley, C.; Macron, C.; Gonzalez de Peredo, A.; Couté, Y.; Chaoui, K.; et al. Looking for Missing Proteins in the Proteome of Human Spermatozoa: An Update. J. Proteome Res. 2016, 15 (11), 3998– 4019.

(3)

Paik, Y.-K.; Overall, C. M.; Deutsch, E. W.; Hancock, W. S.; Omenn, G. S. Progress in the Chromosome-Centric Human Proteome Project as Highlighted in the Annual Special Issue IV. J. Proteome Res. 2016, 15 (11), 3945–3950.

(4)

Gaudet, P.; Argoud-Puy, G.; Cusin, I.; Duek, P.; Evalet, O.; Gateau, A.; Gleizes, A.; Pereira, M.; Zahn-Zabal, M.; Zwahlen, C.; et al. neXtProt: organizing protein knowledge in the context of human proteome projects. J. Proteome Res. 2013, 12 (1), 293–298.

(5)

Lane, L.; Bairoch, A.; Beavis, R. C.; Deutsch, E. W.; Gaudet, P.; Lundberg, E.; Omenn, G. S. Metrics for the human proteome project 2013-2014 and strategies for finding missing proteins. J. Proteome Res. 2014, 13 (1), 15–20.

(6)

Omenn, G. S.; Lane, L.; Lundberg, E. K.; Beavis, R. C.; Overall, C. M.; Deutsch, E. W. Metrics for the Human Proteome Project 2016: Progress on Identifying and Characterizing the Human Proteome, Including Post-Translational Modifications. J. Proteome Res. 2016, 15 (11), 3951–3960.

(7)

Waldman, Y. Y.; Tuller, T.; Shlomi, T.; Sharan, R.; Ruppin, E. Translation efficiency in humans: tissue specificity, global optimization and differences between developmental stages. Nucleic Acids Res. 2010, 38 (9), 2964–2974.

(8)

Castelo-Szekely, V.; Arpat, A. B.; Janich, P.; Gatfield, D. Translational contributions to tissue specificity in rhythmic and constitutive gene expression. Genome Biol. 2017, 18 (1), 116.

(9)

Fortelny, N.; Overall, C. M.; Pavlidis, P.; Freue, G. V. C. Can we predict protein from mRNA levels? Nature 2017, 547 (7664), E19–E20.

(10)

Uhlen, M.; Fagerberg, L.; Hallstrom, B. M.; Lindskog, C.; Oksvold, P.; Mardinoglu, A.; Sivertsson, A.; Kampf, C.; Sjostedt, E.; Asplund, A.; et al. Tissue-based map of the human proteome. Science (80-. ). 2015, 347 (6220), 1260419–1260419.

(11)

Archakov, A.; Zgoda, V.; Kopylov, A.; Naryzhny, S.; Chernobrovkin, A.; Ponomarenko, E.; Lisitsa, A. Chromosome-centric approach to overcoming bottlenecks in the Human 18 ACS Paragon Plus Environment

Page 19 of 27

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Proteome Project. Expert Rev. Proteomics 2012, 9 (6), 667–676. (12)

Poverennaya, E. V; Kopylov, A. T.; Ponomarenko, E. A.; Ilgisonis, E. V; Zgoda, V. G.; Tikhonova, O. V; Novikova, S. E.; Farafonova, T. E.; Kiseleva, Y. Y.; Radko, S. P.; et al. State of the Art of Chromosome 18-Centric HPP in 2016: Transcriptome and Proteome Profiling of Liver Tissue and HepG2 Cells. J. Proteome Res. 2016, 15(11), 4030-4038.

(13)

Wei, W.; Luo, W.; Wu, F.; Peng, X.; Zhang, Y.; Zhang, M.; Zhao, Y.; Su, N.; Qi, Y.; Chen, L.; et al. Deep Coverage Proteomics Identifies More Low-Abundance Missing Proteins in Human Testis Tissue with Q-Exactive HF Mass Spectrometer. J. Proteome Res. 2016, 15 (11), 3988–3997.

(14)

Zgoda, V. G.; Kopylov, A. T.; Tikhonova, O. V; Moisa, A. A.; Pyndyk, N. V; Farafonova, T. E.; Novikova, S. E.; Lisitsa, A. V; Ponomarenko, E. A.; Poverennaya, E. V; et al. Chromosome 18 transcriptome profiling and targeted proteome mapping in depleted plasma, liver tissue and HepG2 cells. J. Proteome Res. 2013, 12 (1), 123–134.

(15)

Ponomarenko, E. A.; Kopylov, A. T.; Lisitsa, A. V; Radko, S. P.; Kiseleva, Y. Y.; Kurbatov, L. K.; Ptitsyn, K. G.; Tikhonova, O. V; Moisa, A. A.; Novikova, S. E.; et al. Chromosome 18 Transcriptoproteome of Liver Tissue and HepG2 Cells and Targeted Proteome Mapping in Depleted Plasma: Update 2013. J. Proteome Res. 2014, 13 (1), 183–190.

(16)

Ponomarenko, E. A.; Zgoda, V. G.; Kopylov, A. T.; Poverennaya, E. V.; Ilgisonis, E. V.; Lisitsa, A. V.; Archakov, A. I. The Russian part of the human proteome project: first results and prospects. Biomeditsinskaya Khimiya 2015, 61 (2), 169–175.

(17)

Carr, S. A.; Abbatiello, S. E.; Ackermann, B. L.; Borchers, C.; Domon, B.; Deutsch, E. W.; Grant, R. P.; Hoofnagle, A. N.; Huttenhain, R.; Koomen, J. M.; et al. Targeted Peptide Measurements in Biology and Medicine: Best Practices for Mass Spectrometrybased Assay Development Using a Fit-for-Purpose Approach. Mol. Cell. Proteomics 2014, 13 (3), 907–917.

(18)

Schwanhäusser, B.; Busse, D.; Li, N.; Dittmar, G.; Schuchhardt, J.; Wolf, J.; Chen, W.; Selbach, M. Global quantification of mammalian gene expression control. Nature 2011, 473 (7347), 337–342.

(19)

Hebenstreit, D.; Fang, M.; Gu, M.; Charoensawan, V.; van Oudenaarden, A.; Teichmann, S. A. RNA sequencing reveals two major classes of gene expression levels in metazoan cells. Mol. Syst. Biol. 2014, 7 (1), 497–497.

(20)

Guo, Y.; Sheng, Q.; Li, J.; Ye, F.; Samuels, D. C.; Shyr, Y. Large scale comparison of gene expression levels by microarrays and RNAseq using TCGA data. PLoS One 2013, 8 (8), e71462. 19 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(21)

Page 20 of 27

Liu, S.; Lin, L.; Jiang, P.; Wang, D.; Xing, Y. A comparison of RNA-Seq and highdensity exon array for detecting differential gene expression between closely related species. Nucleic Acids Res. 2011, 39 (2), 578–588.

(22)

Lundberg, E.; Fagerberg, L.; Klevebring, D.; Matic, I.; Geiger, T.; Cox, J.; Algenäs, C.; Lundeberg, J.; Mann, M.; Uhlen, M. Defining the transcriptome and proteome in three functionally different human cell lines. Mol. Syst. Biol. 2010, 6, 450.

(23)

Martin, J. S.; Kephart, W. C.; Haun, C. T.; McCloskey, A. E.; Shake, J. J.; Mobley, C. B.; Goodlett, M. D.; Kavazis, A.; Pascoe, D. D.; Zhang, L.; et al. Impact of external pneumatic compression target inflation pressure on transcriptome-wide RNA expression in skeletal muscle. Physiol. Rep. 2016, 4 (22).

(24)

Mele, M.; Ferreira, P. G.; Reverter, F.; DeLuca, D. S.; Monlong, J.; Sammeth, M.; Young, T. R.; Goldmann, J. M.; Pervouchine, D. D.; Sullivan, T. J.; et al. The human transcriptome across tissues and individuals. Science (80-. ). 2015, 348 (6235), 660–665.

(25)

web-resource: C-HPP | The neXt-50 Challenge http://c-hpp.webhosting.rug.nl/tikiindex.php?page=The neXt-50 Challenge (accessed Sep 5, 2017).

(26)

Kopylov, A. T.; Zgoda, V. G.; Lisitsa, A. V; Archakov, A. I. Combined use of irreversible binding and MRM technology for low- and ultralow copy-number protein detection and quantitation. Proteomics 2013, 13 (5), 727–742.

(27)

Archakov, A.; Aseev, A.; Bykov, V.; Grigoriev, A.; Govorun, V.; Ivanov, V.; Khlunov, A.; Lisitsa, A.; Mazurenko, S.; Makarov, A. A.; et al. Gene-centric view on the human proteome project: the example of the Russian roadmap for chromosome 18. Proteomics 2011, 11 (10), 1853–1856.

(28)

Lodish, H. F., Berk A., Zipursky S. L., Matsudaira P., Baltimore D., and Darnell J. Molecular cell biology, 4th edition; New York: Scientific American Books, 2000

(29)

web-resource: HepG2 http://www.celeromics.com/en/Support/cell-lines/hepg2.php (accessed May 28, 2017).

(30)

Kitata, R. B.; Dimayacyac-Esleta, B. R. T.; Choong, W.-K.; Tsai, C.-F.; Lin, T.-D.; Tsou, C.-C.; Weng, S.-H.; Chen, Y.-J.; Yang, P.-C.; Arco, S. D.; et al. Mining Missing Membrane Proteins by High-pH Reverse-Phase StageTip Fractionation and Multiple Reaction Monitoring Mass Spectrometry. J. Proteome Res. 2015, 14 (9), 3658–3669.

(31)

Zgoda, V. G.; Moshkovskii, S. A.; Ponomarenko, E. A.; Andreewski, T. V; Kopylov, A. T.; Tikhonova, O. V; Melnik, S. A.; Lisitsa, A. V; Archakov, A. I. Proteomics of mouse liver microsomes: performance of different protein separation workflows for LC-MS/MS. Proteomics 2009, 9 (16), 4102–4105.

(32)

Deutsch, E. W. The PeptideAtlas Project. Methods Mol. Biol. 2010, 604, 285–296. 20 ACS Paragon Plus Environment

Page 21 of 27

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

(33)

Unicity checker https://matschaeff.github.io/unicity-checker/ (accessed Sep 5, 2017).

(34)

Wu, H.-Y.; Goan, Y.-G.; Chang, Y.-H.; Yang, Y.-F.; Chang, H.-J.; Cheng, P.-N.; Wu, C.C.; Zgoda, V. G.; Chen, Y.-J.; Liao, P.-C. Qualification and Verification of Serological Biomarker Candidates for Lung Adenocarcinoma by Targeted Mass Spectrometry. J. Proteome Res. 2015, 14 (8), 3039–3050.

(35)

Richard, V. R.; Domanski, D.; Percy, A. J.; Borchers, C. H. An online 2D-reversed-phase - Reversed-phase chromatographic method for sensitive and robust plasma protein quantitation. J. Proteomics 2017, 168, 28–36.

(36)

Chang, C.; Li, L.; Zhang, C.; Wu, S.; Guo, K.; Zi, J.; Chen, Z.; Jiang, J.; Ma, J.; Yu, Q.; et al. Systematic Analyses of the Transcriptome, Translatome, and Proteome Provide a Global View and Potential Strategy for the C-HPP. J. Proteome Res. 2014, 13 (1), 38–49.

(37)

Wang, T.; Cui, Y.; Jin, J.; Guo, J.; Wang, G.; Yin, X.; He, Q.-Y.; Zhang, G. Translating mRNAs strongly correlate to proteins in a multivariate manner and their translation ratios are phenotype specific. Nucleic Acids Res. 2013, 41 (9), 4743–4754.

21 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

abstract graph 190x142mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 22 of 27

Page 23 of 27

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure 1. Chromosome 18 transcriptome and proteome of liver tissue and HepG2 cell line distribution histograms. Transcripts detected at a sensitivity more than 10 copies per cell were selected, which allowed to use the same “scale” when comparing these results with proteomic data. Selected cutoff allows to analyze transcripts detected at RPRM>1 cutoff level. 190x142mm (300 x 300 DPI)

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 2. Comparison of transcriptomic data of human chromosome 18 in the liver cells obtained by RNASeq during the realization of Russian part of the 'Human Proteome Project' and GTEx consortium. 190x142mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 24 of 27

Page 25 of 27

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure 3. Histogram of distribution for concentration of proteins coded by chromosome 18 detected by SRM method using SIS in liver tissue cells and HepG2 cells, where low concentration is – 10-11-10-10 M; medium -10-10-10-9 M; high ≥10-8 M. 190x142mm (300 x 300 DPI)

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 4. The correlation between the transcripts of human chromosome 18 for liver cells (blue) and HepG2 cell line (green) measured by RNA-Seq (a, d), qRT-PCR (b, e) and combined data (c, d) and proteomic data measured by SRM using SIS. 190x142mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 26 of 27

Page 27 of 27

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure 5. Histograms of mRNA/protein ration for two types of biomaterial 190x142mm (300 x 300 DPI)

ACS Paragon Plus Environment