Chromosome 18 Transcriptoproteome of Liver Tissue and HepG2

Nov 4, 2013 - HepG2 Cells and Targeted Proteome Mapping in Depleted Plasma: ... In summary, for liver tissue and HepG2 cells a “transcriptoproteomeâ...
0 downloads 0 Views 930KB Size
Subscriber access provided by DALHOUSIE UNIV

Article

Chromosome 18 TranscriptoProteome of Liver Tissue and HepG2 Cells and Targeted Proteome Mapping in Depleted Plasma: Update 2013 Elena A. Ponomarenko, Arthur T Kopylov, Andrey V. Lisitsa, Sergey P. Radko, Yana Yu. Kiseleva, Leonid K. Kurbatov, Konstantin G. Ptitsyn, Olga V. Tikhonova, Alexander A. Moisa, Svetlana E. Novikova, Ekaterina V. Poverennaya, Ekaterina V. Ilgisonis, Alexey D. Filimonov, Nadezhda A. Bogolubova, Valentina V. Averchuk, Pavel A. Karalkin, Igor V. Vakhrushev, Konstantin N. Yarygin, Sergei A Moshkovskii, Victor G. Zgoda, Alexey S. Sokolov, Alexander M. Mazur, Egor B. Prokhortchouck, Konstantin G. Skryabin, Elena N. Ilina, Elena S. Kostrjukova, Dmitry G. Alexeev, Alexander V. Tyakht, Alexey Yu. Gorbachev, Vadim M. Govorun, and Alexander I. Archakov J. Proteome Res., Just Accepted Manuscript • Publication Date (Web): 04 Nov 2013 Downloaded from http://pubs.acs.org on November 13, 2013

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Proteome Research is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 25

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Zgoda, Victor; Orekhovich Institute of Biomedical Chemistry of Russian Academy of Medical Sciences, Sokolov, Alexey; Natoinal Research Centre «Kurchatov Institute», Mazur, Alexander; Center Bioengineering, Russian Academy of Sciences, Prokhortchouck, Egor; Natoinal Research Centre «Kurchatov Institute», ; Center Bioengineering, Russian Academy of Sciences, Skryabin, Konstantin; Natoinal Research Centre «Kurchatov Institute», ; Center Bioengineering, Russian Academy of Sciences, Ilina, Elena; Research Institute of Physical Chemical Medicine of the Federal Medical-Biological Agency of the Russian Federation, Kostrjukova, Elena; Research Institute of Physical Chemical Medicine of the Federal Medical-Biological Agency of the Russian Federation, Alexeev, Dmitry; Research Institute of Physical Chemical Medicine of the Federal Medical-Biological Agency of the Russian Federation, Tyakht, Alexander; Research Institute of Physical-Chemical Medicine, Gorbachev, Alexey; Research Institute of Physical Chemical Medicine of the Federal Medical-Biological Agency of the Russian Federation, Govorun, Vadim; Research Institute of Physical-Chemical Medicine, Archakov, Alexander; Orekhovich Institute of Biomedical Chemistry of Russian Academy of Medical Sciences,

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 25

Chromosome 18 TranscriptoProteome of Liver Tissue and HepG2 Cells and Targeted Proteome Mapping in Depleted Plasma: Update 2013

Elena A. Ponomarenko1, Arthur T. Kopylov1, Andrey V.Lisitsa1, Sergey P. Radko1, Yana Yu. Kiseleva1, Leonid K. Kurbatov1, Konstantin G. Ptitsyn1, Olga V. Tikhonova1, Alexander A. Moisa1, Svetlana E. Novikova1, Ekaterina V. Poverennaya1, Ekaterina V. Ilgisonis1, Alexey D.Filimonov1, Nadezhda A. Bogolubova1, Valentina V. Averchuk1, Pavel A. Karalkin1, Igor V. Vakhrushev1, Konstantin N. Yarygin1, Sergei A.Moshkovskii1, Victor G. Zgoda1, Alexey S. Sokolov2, Alexander M. Mazur3, Egor B. Prokhortchouck2,3, Konstantin G. Skryabin2,3, Elena N. Ilina4, Elena S. Kostrjukova4, Dmitry G. Alexeev4, Alexander V. Tyakht4, Alexey Yu. Gorbachev4, Vadim M. Govorun4, Alexander I. Archakov1

Corresponding author: Prof. Alexander Archakov, 119121, Russia, Moscow, Pogodinskaya street, 10 E-mail: [email protected], [email protected] Tel.: +7-499-246-69-80 Fax: +7 499 245-08-57

1 – Orekhovich Institute of Biomedical Chemistry of the Russian Academy of Medical Sciences; 2 – National Research Centre «Kurchatov Institute»; 3 – Center Bioengineering, Russian Academy of Sciences 4 - Research Institute of Physical Chemical Medicine of the Federal Medical-Biological Agency of the Russian Federation.

1

ACS Paragon Plus Environment

Page 3 of 25

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Abstract We report the results obtained in 2012-2013 by the Russian Consortium for the Chromosome-centric Human Proteome Project (C-HPP). The main scope of this work was the transcriptome profiling of genes on human chromosome 18 (Chr 18), as well as their encoded proteome, from three types of biomaterials: liver tissue, the hepatocellular carcinoma-derived cell line HepG2 and blood plasma. The transcriptome profiling for liver tissue was independently performed using two RNAseq platforms (SOLiD and Illumina), and also by Droplet Digital PCR (ddPCR) and quantitative RT-PCR. The proteome profiling of Chr 18 was accomplished by quantitatively measuring protein copy numbers in the three types of biomaterial (at a sensitivity of 10-13 M) using Selected Reaction Monitoring (SRM). In total, protein copy numbers were estimated for 228 master-proteins, including quantitative data on 164 proteins in plasma, 171 in the HepG2 cell line, and 186 in liver tissue. Most proteins were present in plasma at 108 copies per 1 µL, while the median abundance was 104 and 105 protein copies per cell in HepG2 cells and liver tissue, respectively. In summary, for liver tissue and HepG2 cells a «transcriptoproteome» was produced which reflects the relationship between transcript and protein copy numbers of the genes on Chr 18. The quantitative data acquired by RNAseq, PCR, and SRM were uploaded into the “Update_2013” dataset of our knowledgebase (www.kb18.ru) and investigated for linear correlations.

Keywords: mRNA sequencing, quantitative PCR, Selected Reaction Monitoring, human proteome project, transcriptome, proteome, transcriptoproteome, chromosome 18

2

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 25

INTRODUCTION The final goal of the Russian Consortium for the Chromosome-centric Human Proteome Project (C-HPP) was the analysis of proteins encoded by chromosome 18 (Chr 18) in plasma, liver tissue, and HepG2 cells with a sensitivity of 10-18 M, which corresponds to one protein copy per 106 liver cells, 107 HepG2 cells or 1 µL of plasma1. The size of the human proteome, which requires estimates of the width (number of protein species in a biosample) and depth (number of copies of the same protein molecules in a biosample), has not been determined. Theoretically, if one gene encodes for one protein, there should be at least 21,000 non-modified human proteins2. However, expressed proteins include products of alternative splicing (AS), proteins containing amino acid substitutions (nonsynonymous single nucleotide polymorphisms (nsSNPs) realized as single amino acid polymorphisms (SAPs)), and post-translational modifications (PTMs)3. We have estimated the total number of protein species (according to NextProt4 data) to be approximately 1.8 million, with the average number of protein variances per proteincoding gene of about 85. Thus, by applying this calculation to Chr 18, selected for the Russian part of HPP, we can expect ~25 000 different protein species. A sensitivity of 10–18 М for protein analytical methods (which corresponds to 1 protein copy per 1 µl of plasma) was considered sufficient for obtaining an inventory for the complete proteome of Chr 18. Theoretically, a sensitivity of one protein copy per one liver or per one HepG2 cell, which corresponds to a concentration of 10–13 M, would be sufficient for investigation of the cellular proteome at a single cell level. However, millions of cells are routinely required to extract proteins in the amounts required by mass spectrometry (MS) to obtain an averaged proteome for one cell5. Thus, to detect a single protein molecule per HepG2 averaged cell requires a sensitivity of 10–18 M.

3

ACS Paragon Plus Environment

Page 5 of 25

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Table 1 provides an update of current data and databases concerning Chr 18 and its expression products compared with data obtained in 2012. Most characteristics remained approximately the same or slightly increased. The number of protein-encoding genes according to Ensembl6 was 289, and the number of proteins confidently identified using MS was 202 and 264 proteins in the GPMdb7 and PRIDE8, respectively. The number of proteins detectable using antibodies in 2013 was 45 proteins, a slight increase of 6 proteins, compared to 2012. This year, the Russian Consortium aimed to complete objectives set forth in the conclusion of the 2012 C-HPP report1: analyze the transcriptome of chromosome 18 in liver tissue and HepG2 cells using both SOLiD and Illumina approaches, and compare the obtained RNAseq data from these approaches and with results from measurements by quantitative RT-PCR and ddPCR. The analysis of blood plasma and HepG2 was also continued at the proteome level to create a «transcriptoproteome» of HepG2 and liver cells. The analysis of blood plasma and HepG2 at the proteome level was also continued utilizing SRM technology, and the quantitative data aggregated from two platforms were assembled in a «transcriptoproteome» of HepG2 and liver cells.

Experimental section Work performed on C-HPP in 2012-2013 generally used the same biological specimens and same experimental designs as previously reported1. Plasma samples were pooled from medically examined healthy volunteers and depleted, liver samples were purchased from the ILSBioBiobank and also pooled, HepG2 cells were grown, harvested and processed by the standard protocol (see details in Zgoda et al.1, SN2, section 2).

4

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 25

The liver tissue and HepG2 samples were prepared and spread over aliquots in the C-HPP core principal investigator’s organization. These aliquots were provided to organizations collaborating on the C-HPP for performing RNAseq on the different platforms. This year our research program was supplemented by Illumina RNAseq and droplet-digital PCR (ddPCR) platforms, whereas previously applied SOLiD and quantitative RT-PCR (qRT-PCR) were implemented as described earlier (see Zgoda et al.1, SN1). The SRM qualitative and quantitative assay was also in advance to progress quantitative map of Chr 18. In this section we briefly introduce previous experimental techniques and provide essential details of new approaches. Transcriptome profiling The mRNA from the liver and HepG2 samples were detected and quantified in parallel using two RNAseq platforms, SOLiD and Illumina. The expression level of a number of transcripts was also detected using qRT-PCR and ddPCR; the results have been compared with the transcriptome sequencing data. RNAseq (Illumina). Total RNA was extracted using TRIzol (Invitrogen) on tissue samples prepared and distributed by the Institute of Biomedical Chemistry (responsible for the C-HPP in Russia). The quality of total RNA from tissue was estimated using Bioanalyzer 2100 (Agilent). Aliquots from total RNA samples were subjected to Illumina sequencing, following the protocol offered by the manufacturer for the sequencing of cDNA samples (see Supplementary Note 1). The resulting “fastq” format files were used to align all reads to the hg19 reference genome using the bowtie2 aligner. Uniquely mapped reads were used to calculate the expression level as reads per kilobase per million reads (RPKM), using the following formula: RPKM = 109 ×n/NL, where n is the number of

5

ACS Paragon Plus Environment

Page 7 of 25

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

mapped reads localized within exons, N is the total number of uniquely mapped reads in the experiment, and L is the length of gene body summing from all union exons in base pairs. RNAseq (SOLiD) RNAseq using SOLiD was implemented as described in Zgoda et al.1, SN1, section 1. New RNAseq data was stored in the Sequence Read Archive (SRA) under ID: SRX267708. Droplet Digital PCR (ddPCR) Droplet Digital PCR (ddPCR) is a new method for the quantification of DNA molecules in a sample9. To prepare cDNA, RNA was treated with DNase I (Thermo Scientific) and used for cDNA synthesis with H-minus Mu-MLV reverse transcriptase (Thermo Scientific) and random hexanucleotides according to the manufacturer’s instructions. ddPCR reaction mixtures were prepared in 20-µl volumes that contained final concentrations of 1× ddPCR Supermix (Bio-Rad, USA), 0.3 µM of each primer and probe, and 0.6-1 µg of the cDNA. Droplet generation and droplet reading for ddPCR were carried out according to the manufacturer’s instructions using Bio-Rad reagents. More details on the implementation of ddPCR are given in Supplementary Note 1. qRT-PCR Experimental conditions were followed as described in an earlier version of the CHPP report (Zgoda et al.1, SN1, section 1). Selected reaction monitoring Sample Preparation. Plasma depletion and digestion was performed using a method described in Zgoda et al.1. Liquid chromatography

6

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 25

Separation of peptides after enzymatic digestion with trypsin was performed on Eclipse-XDB-C18 column (3.0 x 100 mm, 3.5 um particle size, Agilent) in gradient of mobile phase A (0.1% formic acid, 0.015% trufluoracetic acid) and mobile phase B (80% acetonitrile, 0.1% formic acid, 0.015% trifluoracetic acid) over 52 minutes at flow rate 100 µl/min. Depending on MS signal intensity, 10-50 µl of peptides were loaded on the column equilibrated by mobile phase A. Mass spectrometry LC-MS-SRM analysis was performed on QQQ 6490 mass spectrometer (Agilent) equipped with Jet-Stream ion source. Following parameters were set: capillary voltage was at 4000 V, nozzle voltage 1200 V, drying gas (nitrogen) flow 15 L/min, sheath gas flow (nitrogen) 9 L/min, drying gas temperature 300oC, sheath gas temperature 270oC, fragmentor voltage constant at 380 V, cell accelerator voltage varied from 3.8 to 6.2 V depending on m/z value and charge state of a precursor ion. Protein concentration measurements were carried out in time-scheduled SRM analysis with retention time tolerance window set from ±1.25 to ±1.5 minutes depending on location of the certain peptide over gradient of elution. External peptide calibrations were performed with synthesized peptides which were diluted in step-wise manner in one order of magnitude step giving a series of final concentrations from 10-14 M to 10-9 M. Each calibration point was measured in five technical replicated. Proteins concentration were established as averaged concentration of two corresponding peptides using the same LC\MS method applied for synthetic peptide calibration. All measurements were uploaded into the in-house Chr 18 knowledgebase (www.pikb18.ru).

Results and discussion

7

ACS Paragon Plus Environment

Page 9 of 25

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

During

2012-2013

the

Russian

Consortium

for

C-HPP

created

the

transcriptoproteome map for Chr 18 genes, and their encoded proteins, by profiling the transcripts and proteins in liver tissue and the HepG2 cell line. The transcripts in the tissue samples were independently measured using two RNAseq platforms, SOLiD and Illumina. Measurements with qRT-PCR and ddPCR were also conducted to perform crosscomparisons for selecting the optimal RNAseq platform for future experiments. The proteome profiling was conducted through targeted mass-spectrometry using selected reaction monitoring (SRM) with sensitivity up to 10-13 M. To compare the transcriptome versus the proteome, all measurements were expressed in numbers of mRNA/protein copies per HepG2/liver cell10. Transcriptome profiling We collected RPKM values using SOLiD and Illumina RNAseq approaches for liver tissue samples. For both platforms, the dynamic range of RPKM values was five orders of magnitude, from 0.01 to 6500 (Fig.1). For most transcripts an expression level of 50 RPKM was observed. The RPKM values for individual genes were concordant between the two different RNAseq platforms. For example, the most abundant transcript for transthyretin responded as 6.4 thousands RPKM using SOLiD and 6.8 thousands using Illumina. Other transcripts estimated in liver tissue illustrated the high concordance of SOLiD versus Illumina across a dynamic range. For example, the RPKM was 750 vs. 616 for the CYP5A gene, and the RPKM was 76 vs. 80 for the myosin regulatory light chain (see Supplementary Table 1). A high degree of coincidence between the two RNAseq platforms for liver tissue was observed (see Fig. 2a; a similar cross-platform experiment with HepG2 cells is currently in progress). The transcripts for Chr 18 genes were used to demonstrate the extremely strong degree of relationship between RPKM values acquired by SOLiD and Illumina, with the

8

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 25

coefficient of determination, R2 = 0.821. Due to the strong correlation in our further analysis of 186 transcripts, we used the RPKM abundance proxy as an average between SOLiD and Illumina. Despite the high correlation in quantitative values, there were a number of transcripts exclusively registered by either SOLiD or Illumina. Four transcripts (for the DSC1, ANKRD30B, DSG3, and LAS2 genes) responded as 0.01-0.115 RPKM values in the Illumina RNAseq, while these transcripts were undetected by SOLiD. Vice versa, we observed 39 transcripts (RPKM values from 0.01 to ~140 000) solely detected by SOLiD technology. Over the past year, a new analytical method, Droplet Digital PCR (ddPCR), was applied for the high-throughput quantitation of the transcriptome. The reliability of ddPCR was tested on liver samples by comparing the results of mRNA measurements against the qRT-PCR reference dataset that currently contains qRT-PCR measured transcript copies for 153 Chr 18 genes, for which the abundance of 30 genes was additionally measured by ddPCR. We observed a strong relationship between ddPCR and qRT-PCR (R2 = 0.5, Fig.2b); however, the relationship was weaker relative to comparisons of the SOLiD versus Illumina platforms. We believe that in further studies both methods of quantitative PCR (qRT-PCR and ddPCR) should be used to verify the accuracy of measurements. We compared the level of transcription between liver and HepG2 cells. Measurements for both were identically conducted for 153 genes using qRT-PCR. As shown in the histogram (Fig. 1b), the dynamic range of transcript expression levels in liver tissue spanned seven orders of magnitude. For liver cell transcripts, the dynamic range was an order of magnitude wider than the dynamic range of HepG2 cell transcripts. The minimal level of transcription was measured at 3 copies per 1000 liver cells for the 1

To characterize strong, moderate and weak relationships for bivariate variables we followed criteria from Alan Rubin11.

9

ACS Paragon Plus Environment

Page 11 of 25

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Q86VQ3 and centrin-1 genes, while the maximum level was measured at 1881 copies per cell (P02766, TTR). For HepG2 cells, centrin was also a low-copy transcript, at 1.2 copies per 1000 cells. The RPL17 (P18621) gene was an example of a high-copy transcript, measured at 936 copies per HepG2 cell. A strong relationship was observed (R2 = 0.6) between transcript copy-numbers in HepG2 and liver cells (Fig. 2c). We assume, that this degree of relationship for different types of biomaterial could indicates that native and cultivated cells generally share the same composition at the transcript level. The general expression profile could still determine the basic transcriptional profile of these eukaryotic cells independent of their genome size, as the HepG2 cell is known to contain over 50 chromosomes (http://hepg2.com). It was interesting to compare the number of qRT-PCR transcript copies to corresponding RNAseq data (Fig. 2d,e,f). The coefficient of determination for the averaged values from the two RNAseq platforms (SOLiD and Illumina) was higher (R2=0.59, Fig. 2f), relative to qRT-PCR data compared to the individual platforms. For example, the R2=0.49 was obtained by correlating Illumina to qRT-PCR data (Fig. 2d). Based on these relationships, qRT-PCR data was used as an input for linear regression to convert the averaged RNAseq RPKM values to copy numbers. Summarizing the transcriptome section, there were 219 transcripts jointly found in liver and HepG2 cells, of which 17 transcripts were only found in the liver, while four transcripts were only observed in HepG2 cells. There are currently 37 transcripts undetectable in both tissues, constituting 13% of chromosome 18. In the latter part of this manuscript we report that five out of 37 transcripts were undetectable at the proteome level as well.

10

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 25

We have demonstrated that the results from two different RNAseq platforms are interchangeable. The estimations of transcript abundance are more accurate if averaged between RNAseq platforms (R2=0.87); however, each platform can be separately used to obtain reliable results. Interestingly, the RNAseq quantitative data did not perfectly match results from traditional qRT-PCR and modern ddPCR. In addition, these two methods corresponded at the level of R2=0.59. Such an effect size is statistically very significant in regression analysis for converting RPKM to transcript-per-cell numbers. Despite having statistically significant correlations, the origin of differences between estimations by RNAseq, qRT-PCR, and ddPCR require further investigation. Targeted proteome assay Proteome profiling was conducted by SRM assays developed for 277 masterproteins of Chr 18. The master protein is the primary translation product of the coding sequence; it is at least one of the known protein forms, coded by the gene12. The proteotypic peptides for the assays were evaluated through PRIDE8, PeptideAtlas13, and GPMDB7 databases. The fragmentation patters for the selected proteotypic peptides were calculated as described in the literature14. In silico designed SRM assays were used to collect responses from digested biosamples from liver tissue, HepG2 cells, and depleted human blood plasma (See Supplementary Table 2). In total, 1277 proteotypic peptides were assayed in the three types of biomaterial. Reliable SRM responses were collected for 516 of the assayed peptides, corresponding to 249 proteins encoded by Chr 18 (See Supplementary Table 2). For these 516 endogenous peptides (at more than 95% purity) we have synthesized, purified, and quantified standards as described in the Supplementary Note 1.

On average, the calibration equations

(quantifier-ion intensity versus concentration on a double-log scale) were characterized with a slope of 0.81+0.2 and R2 above 0.8. We used these correlations to convert the SRM

11

ACS Paragon Plus Environment

Page 13 of 25

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

responses measured in the biomaterial into concentrations, which were then expressed as protein copy numbers per 1 µL of plasma per cell from liver tissue and HepG2 culture. The measured proteotypic peptides, which belonged to one protein but were discordant by an order of magnitude in the quantitative estimation, were discarded. Summarizing results from three types of biomaterials in this paper, we report the abundance for 228 master-proteins encoded by the genes of chromosome 18. The copynumbers were not acquired for 49 proteins (See Supplementary Table 2,

List

“Undetected”). For 28 proteins, the responses were at the level of noise, while for 21 proteins the estimations obtained by two unique proteotypic peptides differed by an order of magnitude. Therefore, we coped to fulfill the inventory of master proteins at the sensitivity level of 10-13 M, using the depleted blood plasma, liver tissue and cell culture. There were 164 proteins detected in plasma (59% of all Chr18 proteins), 171 proteins in HepG2 cells (62%) and 186 (67%) in liver tissues. The percentage of chromosome coverage is still below the benchmark set in our Roadmap for Chr 1812. A Venn diagram demonstrates the intersection between SRM-profiling in the three different types of biomaterial (Fig. 3a). There is a 40% intersection between the proteomes of three types of biomaterial (114 proteins of Chr 18). The pairwise comparisons (liver to plasma, plasma to HepG2 cells, and liver to HepG2) have approximately the same number of proteins in common. Using external calibrations and the two peptides per protein rule in the depleted plasma, the quantitative estimations for 164 proteins were obtained by SRM in a dynamic range of seven orders of magnitude (Fig. 1c). The least abundant protein, P19022, was measured at 1.4*105 copies per one µl of plasma. The most abundant protein, P02766, was present at 1.2*1011 copies per µl.

12

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 25

In HepG2 cells, the SRM measurements enabled the quantification of 171 proteins covering five orders of magnitude. At the bottom of the dynamic range, the Q96K83 protein was detected at 221 copies per HepG2 cell, while Q9BXW9 was measured at 9*106 copies per cell. Unexpectedly, the widest dynamic range was observed for proteins in liver tissue cells, at 6 to 1.8*108 copies per cell for Q8NA19 (L(3)mbt-like protein 4) and Q7L5Y1 (Mitochondrial enolase superfamily member 1) respectively. However, this may be due to the contamination of liver (a highly vascularized tissue) with blood. Transcriptoproteome analysis The goal of our “transcriptoproteome” analysis was to analyze and compare the results of transcriptome profiling and proteome mapping for the genes on Chr 18. The transcriptome RNA-seq data for the HepG2 cell line was expressed as a number of transcript copies (according to y = 0.72x + 0.37, where x=log10(RPKM)) and correlated to protein abundance, approximated as protein copy numbers using the Intensity Based Absolute Quantification (iBAQ) index15. The data from the iBAQ index was borrowed from the deep proteome analysis of 11 cell lines performed by using OrbiTrap and extensive protein separation16. In total, 7130 proteins were reported by deep proteome mapping, of which 65 proteins attributed to Chr18 genes were detected in HepG2 cell line. We observed a moderate degree of relationship between the proxies for transcript and protein copy numbers (R2=0.3711, Fig. 2g). Housekeeping genes were at high mRNA levels, up to 100 million copies per cell, and respective protein levels up to 10 million copies per cell. For Chr 18, high-copy transcript and protein level genes were for the enzymes ATP-synthase and GDP-amino-transferase, as well as cytoskeleton proteins like keratins and fibronectins. In contrast, signal transduction proteins (including serpins and kinases) were characterized with relatively low levels of abundance: 100-1000 copies per cell at the transcript and protein level. These results agree well with the general estimations

13

ACS Paragon Plus Environment

Page 15 of 25

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

for the maximum degree of mRNA-protein correlations achieved by current analytical methods17. We further expanded the amount of information on protein abundance by recruiting our original SRM data obtained for the Chr 18 encoded proteome. The protein copy numbers were estimated for 247 Chr 18 gene products in HepG2, of which up to 70% (171 proteins) could be classified as high-confidence values. We observed a weak degree of transcript-protein relationship (R2=0.0023, Fig. 2h). Our observation indicates a sufficient discrepancy in the estimation of protein copy numbers by shotgun MS/MS and the targeted SRM method. For liver cells, the transcriptome data was expressed as the number of transcript copies per cell (SOLiD and Illumina average value converted according to y=0.82x+0.44, R2=0.59, Fig. 2f), and correlated to the protein abundance approximated as protein copy numbers attributed to Chr 18 genes in liver cells (Fig. 2i). The protein copy numbers were estimated for 186 Chr 18 gene products in liver cells and 159 genes had relevant transcripts measured in liver tissue. We observed no transcript-to-protein correlation in these data. We next compared the degree of similarity between transcriptome and proteome (SRM) profiles obtained for HepG2 cells and liver tissue as a Venn diagram (Fig. 3b,c). The Jaccard index value (0.52) indicated a sufficient portion of Chr 18 genes was observed at both transcriptome and proteome levels. A subset of missing proteins could be identified from the transcriptoproteome analysis (Fig. 3d). There were 28 protein-coding genes missing at the proteome level, although 23 were detected as transcripts. On the other hand, there were 37 undetected transcripts but 32 detected proteins. Thus, there were only five protein-coding genes for Chr 18, whose expression could not be confirmed on the transcript and protein level.

14

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 25

In our survey of Chr 18 we focused on the undetected proteins. About 30% of protein species were already captured by high-resolution MS/MS16. Many other protein products were measured by SRM, but this requires further rigorous validation. Resolving such proteins will require a new threshold in analytical sensitivity using other techniques (e.g., the irreversible binding protocol18).

Chr 18 Knowledgebase The Chr 18 Knowledgebase19 (kb18.ru) was populated with 25 tracks, 14 of which were integrated from international resources (e.g., UniProt, NeXtProt, and PRIDE) while the others were created by expert estimation and biocuration. The Knowledgebase contains a special section for the registration of SRM data obtained in the Russian part of the CHPP and information on 978 experiments (2-3 technical runs in each experiment) is currently loaded. In total, data on 820 peptides and 249 corresponding proteins encoded by chromosome 18 are maintained in the registration section of the Knowledgebase. Measurement results are given for three types of biological material, as well as solutions of synthetic peptides and calibration data for quantitative experiments. The flexible structure of the Knowledgebase not only permits the storage of genecentric protein data, but also the construction of spaces to arrange other types of objects (for example, data on peptides or splice isoforms of proteins). The space for human chromosomes was created in the test mode and this space will provide the ability to look through all human chromosome features and resources (e.g., number of protein-coding genes, transcripts and proteins identified, available antibodies) in the matrix mode.

Conclusions

15

ACS Paragon Plus Environment

Page 17 of 25

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

The last two years of intensive work on C-HPP have uncovered the mastertranscripts and master-proteins produced by Chr 18 in three different types of biomaterial. The proteins were measured at a sensitivity level of 10-13M. Currently, 37 transcripts and 28 proteins are missing and represent ultimate targets for the next year of research. We started from the whole-chromosome survey and focused on the limited set of “unidentifiable” genes, which will serve as targets for the technological development of the chromosome-centric approach. It is expected that at least some of the missing proteins could be identified through the irreversible binding technique, which provides sensitivity up to 10-18M18. At the transcriptome level, HepG2 cells will be subjected to parallel analysis by SOLiD and Illumina platforms, which could bring additional transcripts into the Chr 18 Knowledgebase.

Associated content Supporting Information This material is available via the Internet at http://pubs.acs.org.

Acknowledgements This work was done in accordance with the Human Proteome Program of the Russian Academy of Medical Science and was funded by Ministry of Education and Science of the Russian Federation (agreements #№8781 and #8274).

16

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 25

Tables Table 1. Gene-centric summary of Chromosome 18 (baseline metrics, 2013 update). NUMBER OF GENES Ensembl, protein-coding

285

rel.69, Oct.2012

289

Ensembl, non-coding

591

rel.69, Oct.2012

591

Ensembl, pseudogenes

227

rel.69, Oct.2012

234

rel.72, June 2013 rel.72, June 2013 rel.72, June 2013 rel. 2013_08 rel. 2013-06-11

UniProt, protein-coding rel.2012_10 277 278 NextProt, protein-coding 277 rel.2012-10-07 277 RNAseq Atlas, number of 240 nov.2012 241 07_2013 transcripts NUMBER OF PROTEINS IDENTIFIED FROM MS DATASETS neXtProt, protein level, gold 197 rel.2012-10-07 197 rel. 2013-06-11 GPMdb, green rel.2012-09-01 200 202 rel.2013-07-15 PeptideAtlas, FDR of 1% at 2012_07 2012-12 217 223 protein level PRIDE v2.8.17 v. 2.8.18 255 264 ANTIBODY-BASED PROTEIN IDENTIFICATION Human Protein Atlas v.11, 2013-0320120_07 39 45 (high and medium) 11 Table 2. Progress in the Chromosome 18 C-HPP. Year

Indicator Peptides - assayed - synthesized Proteins - depleted human blood plasma - HepG2 cell line - Human liver cells

2012

2013

1277 32

1277 469

29 29 29

164 171 186

17

ACS Paragon Plus Environment

Page 19 of 25

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Legends Figure 1. Chromosome 18, distribution histograms. (a) RNAseq of human liver tissue by Illumina (190 transcripts) and SOLiD (225 transcripts); (b) qRT-PCR for liver tissue and HepG2 cells; (c) SRM-measured protein abundance in liver tissue (186 proteins), HepG2 cells (171 proteins), and depleted blood plasma (164 proteins).

Figure 2. Chromosome 18, scatter plots. (a)-(f) transcriptome; (g)-(i) transcriptoproteome. (a) Liver tissue, SOLiD vs. Illumina, RPKM, 186 data points; (b) Liver tissue, qRT-PCR vs. ddPCR, transcript copy numbers (tcN), 30 data points for liver; (c) Liver tissue vs. HepG2, qRT-PCR, tcN, 153 data points; (d) Liver tissue, Illumina RPKMs vs. qRT-PCR, tcN, 127 data points; (e) Liver tissue, SOLiD RPKMs vs. qRT-PCR tcN, 144 data points; (f) Liver tissue tcN, qRT-PCR vs. average RPKMs of SOLiD and Illumina, 146 data points; (g) HepG2, tcN vs. iBAQ-estimated protein copy numbers (pcN), 65 data points; (h) HepG2, tcN vs. SRM-estimated pcN, 114 data points; (i) Liver tissue, tcN vs. SRMestimated pcN, 159 data points.

Figure 3. Chromosome 18, Venn diagrams. (a) SRM-scouting in three types of biomaterial: depleted plasma, liver cells, and the cell line HepG2; (b and c) Matches between SRM-detected proteins and genes, detected at transcriptome level in HepG2 cells (b) and liver cells (c); (d) Undetected proteins and transcripts.

18

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 25

References (1)

Zgoda, V. G.; Kopylov, A. T.; Tikhonova, O. V; Moisa, A. A.; Pyndyk, N. V;

Farafonova, T. E.; Novikova, S. E.; Lisitsa, A. V; Ponomarenko, E. A.; Poverennaya, E. V; Radko, S. P.; Khmeleva, S. A.; Kurbatov, L. K.; Filimonov, A. D.; Bogolyubova, N. A.; Ilgisonis, E. V; Chernobrovkin, A. L.; Ivanov, A. S.; Medvedev, A. E.; Mezentsev, Y. V; Moshkovskii, S. A.; Naryzhny, S. N.; Ilina, E. N.; Kostrjukova, E. S.; Alexeev, D. G.; Tyakht, A. V; Govorun, V. M.; Archakov, A. I. Chromosome 18 transcriptome profiling and targeted proteome mapping in depleted plasma, liver tissue and HepG2 cells. Journal of proteome research 2013, 12, 123–34. (2)

Collins, F.S., Lander, E.S., Rogers, J., Waterson, R. Finishing the euchromatic

sequence of the human genome. Nature 2004, 431, 931–45. (3)

Roth, M. J.; Forbes, A. J.; Boyne, M. T.; Kim, Y.-B.; Robinson, D. E.; Kelleher, N.

L. Precise and parallel characterization of coding polymorphisms, alternative splicing, and modifications in human proteins by mass spectrometry. Molecular & cellular proteomics : MCP 2005, 4, 1002–8. (4)

Lane, L.; Argoud-Puy, G.; Britan, A.; Cusin, I.; Duek, P. D.; Evalet, O.; Gateau, A.;

Gaudet, P.; Gleizes, A.; Masselot, A.; Zwahlen, C.; Bairoch, A. neXtProt: a knowledge platform for human proteins. Nucleic acids research 2012, 40, D76–83. (5)

Wang, D.; Bodovitz, S. Single cell analysis: the new frontier in “omics”. Trends in

biotechnology 2010, 28, 281–90. (6)

Flicek, P.; Ahmed, I.; Amode, M. R.; Barrell, D.; Beal, K.; Brent, S.; Carvalho-

Silva, D.; Clapham, P.; Coates, G.; Fairley, S.; Fitzgerald, S.; Gil, L.; García-Girón, C.; Gordon, L.; Hourlier, T.; Hunt, S.; Juettemann, T.; Kähäri, A. K.; Keenan, S.; Komorowska, M.; Kulesha, E.; Longden, I.; Maurel, T.; McLaren, W. M.; Muffato, M.; Nag, R.; Overduin, B.; Pignatelli, M.; Pritchard, B.; Pritchard, E.; Riat, H. S.; Ritchie, G. R. S.; Ruffier, M.; Schuster, M.; Sheppard, D.; Sobral, D.; Taylor, K.; Thormann, A.; Trevanion, S.; White, S.; Wilder, S. P.; Aken, B. L.; Birney, E.; Cunningham, F.; Dunham, I.; Harrow, J.; Herrero, J.; Hubbard, T. J. P.; Johnson, N.; Kinsella, R.; Parker, A.; Spudich, G.; Yates, A.; Zadissa, A.; Searle, S. M. J. Ensembl 2013. Nucleic acids research 2013, 41, D48–55. (7)

Craig, R.; Cortens, J. P.; Beavis, R. C. Open source system for analyzing,

validating, and storing protein identification data. Journal of proteome research 3, 1234– 42.

19

ACS Paragon Plus Environment

Page 21 of 25

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

(8)

Martens, L.; Hermjakob, H.; Jones, P.; Adamski, M.; Taylor, C.; States, D.;

Gevaert, K.; Vandekerckhove, J.; Apweiler, R. PRIDE: the proteomics identifications database. Proteomics 2005, 5, 3537–45. (9)

Hindson, B. J.; Ness, K. D.; Masquelier, D. A.; Belgrader, P.; Heredia, N. J.;

Makarewicz, A. J.; Bright, I. J.; Lucero, M. Y.; Hiddessen, A. L.; Legler, T. C.; Kitano, T. K.; Hodel, M. R.; Petersen, J. F.; Wyatt, P. W.; Steenblock, E. R.; Shah, P. H.; Bousse, L. J.; Troup, C. B.; Mellen, J. C.; Wittmann, D. K.; Erndt, N. G.; Cauley, T. H.; Koehler, R. T.; So, A. P.; Dube, S.; Rose, K. A.; Montesclaros, L.; Wang, S.; Stumbo, D. P.; Hodges, S. P.; Romine, S.; Milanovich, F. P.; White, H. E.; Regan, J. F.; Karlin-Neumann, G. A.; Hindson, C. M.; Saxonov, S.; Colston, B. W. High-throughput droplet digital PCR system for absolute quantitation of DNA copy number. Analytical chemistry 2011, 83, 8604–10. (10)

Archakov, A.; Zgoda, V.; Kopylov, A.; Naryzhny, S.; Chernobrovkin, A.;

Ponomarenko, E.; Lisitsa, A. Chromosome-centric approach to overcoming bottlenecks in the Human Proteome Project. Expert review of proteomics 2012, 9, 667–76. (11)

Rubin, A. Statistics for Evidence-based Practice and Evaluation, 3rd Edition;

Brooks/Cole Cengage Learning, 2013; pp. 144–145. (12)

Archakov, A.; Aseev, A.; Bykov, V.; Grigoriev, A.; Govorun, V.; Ivanov, V.;

Khlunov, A.; Lisitsa, A.; Mazurenko, S.; Makarov, A. A.; Ponomarenko, E.; Sagdeev, R.; Skryabin, K. Gene-centric view on the human proteome project: the example of the Russian roadmap for chromosome 18. Proteomics 2011, 11, 1853–6. (13)

Deutsch, E. W.; Lam, H.; Aebersold, R. PeptideAtlas: a resource for target

selection for emerging targeted proteomics workflows. EMBO reports 2008, 9, 429–34. (14)

Sherwood, C. A.; Eastham, A.; Lee, L. W.; Peterson, A.; Eng, J. K.; Shteynberg,

D.; Mendoza, L.; Deutsch, E. W.; Risler, J.; Tasman, N.; Aebersold, R.; Lam, H.; Martin, D. B. MaRiMba: a software application for spectral library-based MRM transition list assembly. Journal of proteome research 2009, 8, 4396–405. (15)

Schwanhäusser, B.; Busse, D.; Li, N.; Dittmar, G.; Schuchhardt, J.; Wolf, J.; Chen,

W.; Selbach, M. Global quantification of mammalian gene expression control. Nature 2011, 473, 337–42. (16)

Geiger, T.; Wehner, A.; Schaab, C.; Cox, J.; Mann, M. Comparative proteomic

analysis of eleven common cell lines reveals ubiquitous but varying expression of most proteins. Molecular & cellular proteomics : MCP 2012, 11, M111.014050.

20

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(17)

Page 22 of 25

de Sousa Abreu, R.; Penalva, L. O.; Marcotte, E. M.; Vogel, C. Global signatures of

protein and mRNA expression levels. Molecular bioSystems 2009, 5, 1512–26. (18)

Kopylov, A. T.; Zgoda, V. G.; Lisitsa, A. V; Archakov, A. I. Combined use of

irreversible binding and MRM technology for low- and ultralow copy-number protein detection and quantitation. Proteomics 2013, 13, 727–42. (19)

Poverennaya E. V.; .Bogolubova N. A; Bylko N. N.; Ponomarenko E. A.; Lisitsa

A.V.; Archakov A. I. Gene-Centric Content Management System. 2013, BBA- Proteins and Proteomics (in press).

21

ACS Paragon Plus Environment

Page 23 of 25

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

TOC 254x190mm (96 x 96 DPI)

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 1 254x190mm (96 x 96 DPI)

ACS Paragon Plus Environment

Page 24 of 25

Page 25 of 25

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure 2. Chromosome 18, scatter plots. (a)-(f) transcriptome; (g)-(i) transcriptoproteome. (a) Liver tissue, SOLiD vs. Illumina, RPKM, 186 data points; (b) Liver tissue, qRT-PCR vs. ddPCR, transcript copy numbers (tcN), 30 data points for liver; (c) Liver tissue vs. HepG2, qRT-PCR, tcN, 153 data points; (d) Liver tissue, Illumina RPKMs vs. qRT-PCR, tcN, 127 data points; (e) Liver tissue, SOLiD RPKMs vs. qRT-PCR tcN, 144 data points; (f) Liver tissue tcN, qRT-PCR vs. average RPKMs of SOLiD and Illumina, 146 data points; (g) HepG2, tcN vs. iBAQ-estimated protein copy numbers (pcN), 65 data points; (h) HepG2, tcN vs. SRM-estimated pcN, 114 data points; (i) Liver tissue, tcN vs. SRM-estimated pcN, 159 data points. 600x450mm (96 x 96 DPI)

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

600x400mm (96 x 96 DPI)

ACS Paragon Plus Environment

Page 26 of 25