Tissue-Based Proteogenomics Reveals that Human Testis Endows

Aug 18, 2015 - State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Engineering Research Center for Protein Drugs, National ...
0 downloads 5 Views 4MB Size
Article pubs.acs.org/jpr

Tissue-Based Proteogenomics Reveals that Human Testis Endows Plentiful Missing Proteins Yao Zhang,†,∥,⊥,▲ Qidan Li,‡,§,⊥,▲ Feilin Wu,†,#,▲ Ruo Zhou,§,▲ Yingzi Qi,†,▲ Na Su,† Lingsheng Chen,†,○ Shaohang Xu,§ Tao Jiang,§ Chengpu Zhang,† Gang Cheng,§ Xinguo Chen,◆ Degang Kong,¶ Yujia Wang,§ Tao Zhang,† Jin Zi,§ Wei Wei,† Yuan Gao,† Bei Zhen,† Zhi Xiong,# Songfeng Wu,† Pengyuan Yang,+ Quanhui Wang,‡,§,⊥ Bo Wen,*,§ Fuchu He,*,† Ping Xu,*,†,▽ and Siqi Liu*,‡,§,⊥ †

State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Engineering Research Center for Protein Drugs, National Center for Protein Sciences, Beijing Institute of Radiation Medicine, Beijing 102206, China ‡ CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China § BGI-Shenzhen, Shenzhen 518083, China ∥ Institute of Microbiology, Chinese Academy of Science, Beijing 100101, China ⊥ Graduate University of the Chinese Academy of Sciences, Beijing 100049, China # Life Science College, Southwest Forestry University, Kunming 650224, P. R, China ▽ Key Laboratory of Combinatorial Biosynthesis and Drug Discovery (Wuhan University), Ministry of Education, and Wuhan University School of Pharmaceutical Sciences, Wuhan 430071, China ○ State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, Guangxi University, Nanning 530005, China ◆ Institute of Organ Transportation, General Hospital of Chinese People’s Armed Police Forces, Beijing 100039, China ¶ General Surgery Dept., Capital Medical University Affiliated Beijing YouAn Hospital, Beijing 100069, China + Institutes of Biomedical Sciences, Department of Chemistry and Zhongshan Hospital, Fudan University, 130 DongAn Road, Shanghai 200032, China S Supporting Information *

ABSTRACT: Investigations of missing proteins (MPs) are being endorsed by many bioanalytical strategies. We proposed that proteogenomics of testis tissue was a feasible approach to identify more MPs because testis tissues have higher gene expression levels. Here we combined proteomics and transcriptomics to survey gene expression in human testis tissues from three post-mortem individuals. Proteins were extracted and separated with glycine- and tricine-SDS-PAGE. A total of 9597 protein groups were identified; of these, 166 protein groups were listed as MPs, including 138 groups (83.1%) with transcriptional evidence. A total of 2948 proteins are designated as MPs, and 5.6% of these were identified in this study. The high incidence of MPs in testis tissue indicates that this is a rich resource for MPs. Functional category analysis revealed that the biological processes that testis MPs are mainly involved in are sexual reproduction and spermatogenesis. Some of the MPs are potentially involved in tumorgenesis in other tissues. Therefore, this proteogenomics analysis of individual testis tissues provides convincing evidence of the discovery of MPs. All mass spectrometry data from this study have been deposited in the ProteomeXchange (data set identifier PXD002179). KEYWORDS: Chromosome-Centric Human Proteome Project, testis, missing proteins, proteome, transcriptome, individual



INTRODUCTION

Special Issue: The Chromosome-Centric Human Proteome Project 2015

Following in the footsteps of the Human Proteome Project (HPP), the Chromosome-Centric Human Proteome Project (C-HPP) has entered its third productive year. The mission of © 2015 American Chemical Society

Received: May 19, 2015 Published: August 18, 2015 3583

DOI: 10.1021/acs.jproteome.5b00435 J. Proteome Res. 2015, 14, 3583−3594

Article

Journal of Proteome Research C-HPP Phase I is to identify additional “missing proteins” (MPs); direct liquid chromatography−mass spectrometry(LC−MS) or antibody-based proteomics methods have not identified the full complement of MPs. The neXtProt database1(2014-09-19) estimated that protein evidence is still lacking for 2948 protein-coding genes (Protein Evidence 2-4). To find these MPs, in-depth proteomics studies on colon cancer cell lines,2 liver cell lines,3 brain tissues,4,5 and placenta tissues were performed.6 Three “human proteome drafts” have been published, which included all major organs, tissues, body fluid samples, and various cell lines.7−10 Analysis of the location and abundance of cell transcripts could provide clues for gene expression of the unidentified MPs. Transcriptomics analysis using different tissues indicated that 18% of MP genes may be specifically enriched in certain tissues.8 Therefore, in-depth proteomic analysis of these special tissues is a crucial remaining study for identification of MPs. The testis is the organ of male gametophyte development, which is known to have high gene expressions levels.11 The number of genes expressed in other human tissues varied from 11 000 to 13 000, whereas more than 15 000 gene transcripts have been detected in testis.12 An analysis of mouse testis revealed high transcriptome complexity, which was primarily due to meiotic spermatocytes and postmeiotic round spermatids.13 The testis has more than 1300 tissue-enriched genes with more than 5 times higher abundance, and 364 genes have up to 50-fold higher abundance, compared with those of corresponding genes in 26 other organs.14 Testis indeed expresses more specific proteins than other normal tissues, such as cancer-testis (CT) antigens, which are a group of immunogenic proteins encoded by genes that are normally expressed in the human germ line, but not in other normal tissues.15,16 Approximately 200 CT genes are reported in the CT database; 28 of these are immunohistochemically detected only in normal testis tissue and some cancer cells.17 The testis proteome has been extensively analyzed using a pooling strategy for multiple samples. On the basis of our own reanalysis of public data sets, we found that the Pandey group prepared protein extracts from three individual adults and fetal testis tissues and identified 8341 and 6954 proteins, respectively.7 The Kuster group identified 6123 proteins using a pool of ten testis samples and extensive analysis.8 By contrast, antibody-based protein identification methods may detect more low-abundant proteins and provide more information on cellular localization. Uhlen and colleagues used antibodies and detected 11 330 proteins in testis tissue microarrays, which is one of the largest data sets for human proteome.9 These results suggest that individual proteome analysis of individual tissues might provide greater numbers of detectable proteins than pooled tissue samples due to individual variations of some low-abundant proteins. The testis was reported to have the most transcribed genes,11−14 and the transcriptional data from seven individual testis tissues were integrated with the antibody-based proteome data. On the basis of comparisons of mRNA abundance among different tissues, testis was reported to have the most diverse transcriptome and proteome for seeking MPs and missing RNAs. In this study, we analyzed the proteome and transcriptome of three individual human testis samples. To improve the proteome coverage on three testis samples, we used a combination of glycine-SDS-PAGE and tricine-SDS-PAGE for protein separation. We identified a total of 9597 proteins; of these, 166 protein groups were the confirmed MPs. Tran-

scriptome analysis also revealed that MPs might be highly enriched in testis tissue. Gene ontology (GO) analysis and ingenuity pathway analysis (IPA) of MPs showed that 82.4% of MPs with known annotation information were heavily associated with various diseases, especially cancers. Some of these, such as ADORA3, HLA-C (confirmed), and ZNF23, have been designated as potential biomarkers that may be useful for disease diagnosis and treatment. Transcriptomic and proteomic analyses of individual testis tissues provide an efficient strategy and valuable data set for further studies of MPs.



MATERIALS AND METHODS

Testis Tissues Used in this Study

Human testis samples were collected from General Hospital of Chinese People’s Armed Police Forces and Capital Medical University Affiliated Beijing YouAn Hospital. Human tissues were collected post mortem as part of a rapid autopsy program from three adult donors by authors X.C. and D.K. The IRB approval number is BGI-IRB 15076. Tissues were washed with PBS three times and stored at −80 °C until use. The tissues were histologically confirmed to be normal before analysis. This study was approved by the BGI’s Institutional Review Board for the use of human tissues. Proteome Sample Preparation and LC−MS/MS Analysis

Three testis tissue samples (50 mg) were ground in liquid nitrogen and sonicated in lysis buffer (8 M urea, 5 mM IAA, 50 mM NH4HCO3, 1× protease cocktail) on ice. The unbroken debris was eliminated by centrifugation (13 300g) at 4 °C for 15 min. The protein concentration was determined by a gelassisted method as previously described.18 Extracted proteins were resolved by SDS-PAGE (10%) and tricine-SDS-PAGE (12%), respectively, followed by staining with Coomassie Brilliant Blue. Then, the gel lanes were cut into multiple bands, as indicated based on the molecular weight and the protein abundance in the specific region. Each gel band was digested with trypsin before subjecting to LC−MS/MS analysis. LC− MS/MS experiments were performed essentially as previously described.19 In brief, every peptide mixture was dissolved with sample loading buffer (1% acetonitrile and 1% formic acid in water). Then, they were separated and analyzed by UPLC (nano Acquity Ultra Performance LC, Waters, Milford, MA) and tandem MS/MS (LTQ Orbitrap Velos, Thermo Fisher Scientific, Waltham, MA). Survey scans were performed in the Orbitrap analyzer at a resolution of 30 000 and target values of 1 000 000 ions over a mass range between 300 and 1600 m/z. The 10 most intense ions were subjected to fragmentation via collision induced dissociation in the LTQ. For each scan, 5000 ions were accumulated over a maximum allowed fill time of 25 ms and fragmented by wideband activation. Exclusion of precursor ion masses over a time window of 30 s was used to reduce repeated peak fragmentation. MS/MS Data Analysis

The raw MS/MS data were converted into MGF format file by ProteoWizard (v3.0.4238). The MS/MS data were then searched by three search engines (OMSSA v2.1.8, X!Tandem v2009.10.01.1 and MS-GF+ v9733) against the human SwissProt database (20 050 sequences, release 2014-12-22) with decoy sequences and contamination protein sequences (245 sequences).20−22 Several parameters were set for database searching: Cysteine carbamidomethyl was specified as a fixed 3584

DOI: 10.1021/acs.jproteome.5b00435 J. Proteome Res. 2015, 14, 3583−3594

Article

Journal of Proteome Research

Figure 1. Flowchart illustrates the strategies for identifying MPs from three individual testis tissues by integrated proteome and transcriptome profiling, and the function and disease relation analysis of MPs.

and Uhlen group) and two public databases (PeptideAtlas26 2014-08, and CCPD 2.027) were chosen for the comparison analysis with our proteome data. Noteworthy, we did not use the testis proteome results from the two Human Proteome drafts directly but downloaded their raw data from Web sites and got the results by reanalyzing them with the same pipeline as in this study.

modification. Deamidation of asparagine and glutamine and oxidation of methionine were specified as variable modifications. The precursor mass tolerance for protein identification on MS was 10 ppm, and the product ion tolerance for MS/MS was 0.6 Da. Full cleavage by trypsin was used, with one missed cleavages permitted. The results from the three search engines were then integrated by IPeak,23,24 which is a tool that combines multiple search engine results. Only the identifications satisfying the following criteria were considered: (1) the peptide length ≥7; (2) the FDR ≤ 1% at peptide level; (3) the FDR ≤ 1% at protein level; (4) at least one peptide longer than 9 aa was required for protein identification. The protein level FDR was calculated by using the picked FDR strategy.25

Transcriptome Sample Preparation and Sequencing

RNA extraction of three testis samples was performed as previously described.28 The cDNA libraries were constructed using a published protocol29 with slight modification. In brief, the DNA was removed from 2 μg of total RNA with DNase I (NEB, Ipswich, MA). The cleaned mRNA was purified from total RNA using Dynabead mRNA Purification Kit (Ambion, Carlsbad, CA). Subsequently, the mRNA was randomly fragmented into short fragments of ∼150 bp, followed by reverse transcription to cDNA strands. After ligation to Ion Proton adapters, the fragments (average length for three samples 239−247 bp) were diluted to 8 pM for emulsion PCR

Bioinformatics Analysis of Identified Proteins

Protein identification and distribution in the glycine and tricine gels were compared for evaluating their contribution to total proteome data sets. Furthermore, individual contributions to large-scale identification were inspected at the level of protein variability and nonredundant protein and peptide saturation curve. Three published testis data sets (from Pandey, Kuster, 3585

DOI: 10.1021/acs.jproteome.5b00435 J. Proteome Res. 2015, 14, 3583−3594

Article

Journal of Proteome Research

Figure 2. Proteome sample preparation and identification. (A) Separation of total proteins from three individual testis tissues using 10% glycineSDS-PAGE and 12% gricine-SDS-PAGE. (B) Veen diagram of proteins identified by glycine-SDS-PAGE and tricine-SDS-PAGE. (C) Molecular weight distribution of proteins identified by glycine-SDS-PAGE and tricine-SDS-PAGE. (D) Venn diagram of the proteins identified in three individuals (individual A, B, and C). (E) Comparison of protein abundance variability in individual testis samples. (F−G) Proteome profiling saturation using two gel-separation methods and MS were evaluated by nonredundant peptide and protein identification.

samples. The reference transcript set was based on the Ensembl GRCh37 reference genome, and clean reads were mapped to the reference gene using Tmap. No more than three mismatches were allowed in the alignment. At the same time, clean reads were mapped to the UCSC hg19 reference genome using Tophat. The reads aligned to the reference transcript sequences were used to determine transcript level quantification. Three quantification values were reported: effective read

and sequenced using the PI Chip v2 on the Ion ProtonTM Sequencer (Thermo Fisher Scientific, Waltham, MA). RNA-Seq Data Analysis

Raw sequencing reads were preprocessed by removing adapters and low-quality reads, and the clean reads were obtained with strict quality-control steps of the standard Ion Proton RNA-seq protocol of BGI Shenzhen. In general, 22−24 M clean reads with an average length of 135 bp were adopted for the three 3586

DOI: 10.1021/acs.jproteome.5b00435 J. Proteome Res. 2015, 14, 3583−3594

Article

Journal of Proteome Research

Figure 3. Summary of testis proteome data and the contribution of individual testis to MPs identification. (A) Comparison of MS-based testis proteomes from three research groups. (B) Comparison of testis proteomes obtained from LC/MS-MS and immunoassay with antibodies. (C) Comparison of two MS-based testis proteome data sets. Xulab_testis, current study; Pandey all tissues data set, reanalysis of published datasets. (D) Comparison of proteomes from Xulab testis, public CCPD 2.0, and PeptideAtlas. (E) Comparison of MPs derived from testis tissues among three labs. (F) Comparison of MPs derived from three individual testis proteome in this study.

the authenticity of identified MPs in our study.30−32 DAVID33 and IPA34 were used for MPs biological function and diseaserelated biomarker analysis of MPs.

counts, coverage, and reads per kilobase per million mapped reads (RPKM). Only genes with uniquely mapped reads ≥10 were used for downstream analysis. Effective read counts were normalized with respect to transcript length and total number of reads yield to give RPKM, which was used to represent transcript abundance.



RESULTS AND DISCUSSION

The complexity of a complete proteome makes it challenging to detect all expressed proteins. To achieve a deep coverage of the human proteome, we selected testis tissue as our target sample because of its abundant gene expression. Technically, proteome coverage depends on the separation power of the proteomics

Quality Checking and Functional Analysis of MPs

The quality of spectra referring to MPs was manually inspected by observing base peak intensity and b/y ions matching assisted by pLabel software developed by the pFind group to evaluate 3587

DOI: 10.1021/acs.jproteome.5b00435 J. Proteome Res. 2015, 14, 3583−3594

Article

Journal of Proteome Research platform. Our proteomics strategy started with high-resolution gel fractionation of testis total cell lysate (TCL) samples as the first-dimension separation to reduce proteome complexity, followed by LC−MS/MS analysis. To identify lower molecular weight proteins (LMW), we further resolved the same testis TCL on tricine gels.35,36 Deep RNA-seq data was applied to interpret the proteomics data, which resulted in the proteogenomics study. The detailed experimental designs are shown in Figure 1A,B and are discussed in more detail in the following section.

Much deeper coverage of the mammalian proteome to approximately 8000 to 10 000 gene products has been achieved from a single human cell line owing to the development of high-resolution chromatography and high sensitive mass spectrometry; however, data saturation could be limited by the detection sensitivity and dynamic range of available mass spectrometers. To determine the number of individual samples needed for saturation analysis of the testis proteome, we calculated an accumulation curve with these identified peptides and proteins (Figure 2F,G). The number of nonredundant peptides or proteins increased at a similar level with the addition individuals A−C, and the total number of identified proteins rose to 9597 when two separation methods were introduced. This result indicated that testis is one of the few tissues with almost 10 000 identified proteins.7,8

Protein Diversity Revealed by a Deep-Coverage Proteomics Study on Human Testis Tissue

We obtained testis tissue from three individuals for this study. After lysis, SDS-PAGE was used to resolve the TCL, resulting in similar patterns and sample loading with clear, sharp band, indicating that TCL proteins were extracted and separated while preserving that proteome integrity (Figure 2A, left panel). Each lane was excised into 28 gel bands based on their MW and the protein abundance in specific regions. The proteins in these gel bands were in-gel digested with trypsin. LC−MS/MS analysis showed that 9064 proteins were identified (Figure 2B). To further deepen the proteome coverage for these testis samples, the same TCLs were resolved on tricine gels. Each lane was excised into 22 gel bands as indicated (Figure 2A, right panel). These samples were in-gel digested with trypsin and analyzed by LC−MS/MS, resulting in a total of 8419 identified proteins. A total of 1178 and 533 proteins were uniquely identified in regular SDS-PAGE and tricine SDS-PAGE, respectively, increasing the cumulative number of proteins identified to 9597 (Figure 2B). The theoretical MW distribution of the identified proteins indicated that the proteins from tricine gels were enriched in the LMW region (Figure 2C). However, the proteins uniquely identified from normal glycine SDS-PAGE primarily appeared at a relatively larger MW range with a peak at 50 kDa. This result indicated that our protein separation strategy was valuable for a deeper proteomics study of human testis tissues. Protein abundance may vary in individual samples, which can facilitate the identification of novel proteins. Therefore, we separately analyzed testis samples from three individuals, instead of pooling the samples (Figure 2D). We identified 8520, 8362 and 8434 proteins from each individual testis sample. A total of 7380 proteins were present in all three data sets, which accounts for ∼87% of the identified proteins from each individual testis. A total of 403, 317, and 536 proteins were uniquely identified from the three samples. The chosen strategy significantly expanded the coverage of the testis proteome. We used the following method to identify significant numbers of unique proteins from the three individuals. We analyzed the protein abundance distribution of the commonly identified 7380 proteins and the uniquely identified proteins by extracting the intensity from those identified peptides. As shown in Figure 2E, the common and sample-specific proteins in the testis tissues had similar abundance distributions; however, the abundance distribution curves of the specifically identified proteins from individual samples were lower than the abundance distribution curves of the common proteins. This result suggested that proteins uniquely detected in only one sample were likely to have relatively lower abundance. The individual variation of abundance of these proteins promoted their identification, which resulted in deeper coverage of the human proteome by complementing with specific testis tissues.

Testis Proteome Is an Enriched Library for MPs

Testis tissue has been extensively analyzed in several recently published studies for human proteome draft map. 9597, 8968, and 6123 proteins were detected by using IPeak with 1% protein level FDR in the current data set, the Pandey group data set, and the Kuster group data set, respectively (Figure 3A). The total number of proteins identified from the testis samples is 10 827, which is currently the deepest coverage offering a single human organ/tissue proteome. A total of 5386 proteins were detected in all three data sets. Over 90% of the proteins in the Pandy and Kuster data sets also were detected in our data set. The number of uniquely identified proteins was 1319, 736, and 297 in our data set, the Pandey group data set, and the Kuster group data set, respectively. The greater number of proteins identified in our data set confirms that our strategies are suitable for the testis samples. Antibody-based proteomics is another approach for the HPP. In the testis proteome data set generated with antibodies, a total of 11 330 proteins were detected credibly (Figure 3B). Comparing the antibody-based protein data set with our current MS-based data set revealed that 6652 proteins were identified in both data sets, whereas 2945 and 4678 proteins were detected exclusively by MS or antibodies. This result suggests that these two approaches have good complementarity. To test the specificity of testis proteome, we compared our current data set with a protein list from the human draft map generated by Pandey’s group. A total of 584 proteins were uniquely identified in our study (Figure 3C). We also found that 66% of the MPs identified in our MS analysis were among these 584 unique proteins (Supplementary Figure 2). When we compared our current testis proteome data set with the PeptideAtlas (2014-08) and CCPD 2.0, which are publicly accessible proteomics databases, we still found that 233 proteins were uniquely identified in our testis samples (Figure 3D). This result confirms that our testis proteomics study identifies significant numbers of unique proteins. Comparing the testis proteomics data sets with the missing protein list provided by the C-HPP consortium, we identified 200, 66, and 39 MPs from our data set and the Pandey and Kuster testis data sets, respectively (Figure 3E, Supplementary Figure 3). Clearly, besides the uneven numbers of discovered MPs, the majority of MPs found in each group were different from each other. As various sample preparation methods, enzymes for protein digestion and peptide or protein separation strategies were adopted among the laboratories (summarized in Supplementary Table 3), and the difference was comprehen3588

DOI: 10.1021/acs.jproteome.5b00435 J. Proteome Res. 2015, 14, 3583−3594

Article

Journal of Proteome Research

Figure 4. Transcriptome data profiling and proteogenomics analysis of MPs. (A) Venn diagram of testis transcripts from three individual samples. (B) Transcriptome profiling saturation based on RNA-seq strategy. (C) Comparison of mRNA abundance distribution among three individual samples. (D) Comparison of testis gene expression between protein and mRNA level. (E) Correlation of gene expression at the mRNA and protein abundance levels in three testis samples. (F) Comparison of MPs identification from three individual testis tissues at the mRNA level. (G) Comparison of mRNA abundance of MPs and non-MPs in testis samples. (H) Enrichment comparison between confirmed MPs and filtered MPs from quality-checked spectra. (I) Comparison of confirmed MPs with filtered MPs at the mRNA expression level.

same testis samples. Total proteins and mRNAs were extracted from the same samples, and whole transcriptome libraries were sequenced using RNA-seq (Figure 1A,B). On average, 23.1 million reads and 3.1 Gb of sequenced nucleotides were obtained for each mRNA sample; 99.5% of these were mapped to the reference genome and 85.1% of the reads were mapped to the reference sequences. After applying a cut off to read numbers lower than 10, the average total reads were ∼14.5 M in each testis tissues. This translates to sequencing of ∼16 000 genes. Comparative analysis indicated that 15 491 genes were detected in all three

sible. We further counted the number of MPs derived from our three individual testis tissues (Figure 3F). A total of 145, 131, and 116 MPs were identified in samples A−C, respectively (Supplementary Figure 3). About 70% MPs in each individual were shared, yet the ratio decreased to 50% when adding them together. This result emphasizes that individual variations are important in proteomics studies. Testis Transcriptome Is More Diverse than Proteome MPs

To estimate the number of expressed protein-coding genes in the testis samples, we applied a proteogenomics strategy to the 3589

DOI: 10.1021/acs.jproteome.5b00435 J. Proteome Res. 2015, 14, 3583−3594

Article

Journal of Proteome Research

Figure 5. The dynamic range of mRNA abundance among various tissues. The mRNA abundance distribution of total genes (gray) identified in various tissues and the mRNA abundance distribution of MPs (orange) are shown as box plots. Box height represents the dynamic range of mRNA abundance.

in one tissue (80%), we found that 95% of mRNAs were detected in all three individuals (Figure 4F). The protein abundance of the uniquely identified MPs was globally lower than that of other proteins with MS evidence. Therefore, we were curious about their mRNA abundance (Figure 4G). The mRNA abundance distribution curves for these two groups of genes essentially superposed. These results further demonstrated that the testis has highly diverse gene expression, which might be different from that in other tissues in the human body. The reliability of MPs identified with MS technology was further checked to confirm the identified missing proteins, and the mass spectra were presented in Supplementary Figure 4. A total of 166 groups (182 proteins) of MPs were verified by stringently filtering the quality of their MS spectra with higher base peak intensity and better and continuous b/y ion matches. Most of the reliable MPs had higher mRNA levels (Figure 4H). Conversely, the majority of filtered MPs did not have mRNA evidence (Figure 4I). This confirmed list of MPs will be followed for further analysis in this study. To further understand the mechanism of MPs gene expression regulation in different organs or tissues, we compared the abundance distribution of mRNAs in 27 tissues. The mRNA abundance distribution for total genes (gray) identified in various tissues and MPs (orange) was portrayed as box plots. The height of the box represented the dynamic range of mRNA abundance. As shown in Figure 5, mRNA abundance of total genes was generally greater than that of genes encoding MPs, except in testis. This result strongly indicated that MP identification resulted from their higher abundance. Although the mRNA abundances for the remaining genes encoding MPs are even lower than those for the already identified MPs, it is still higher in testis than that in most other tissues, suggesting that testis is an ideal target tissues for future MPs studies. We

samples with overlap >90%, whereas