Deep Coverage Proteomics Identifies More Low-Abundance Missing

Aug 18, 2016 - Through strict manual spectra checking, verifying with ..... Notes. All mass spectrometry data from this study have been deposited in t...
0 downloads 0 Views 2MB Size
Article pubs.acs.org/jpr

Deep Coverage Proteomics Identifies More Low-Abundance Missing Proteins in Human Testis Tissue with Q‑Exactive HF Mass Spectrometer Wei Wei,†,▽ Weijia Luo,‡,▽ Feilin Wu,†,§,▽ Xuehui Peng,†,∥,▽ Yao Zhang,†,⊥ Manli Zhang,† Yan Zhao,† Na Su,† YingZi Qi,† Lingsheng Chen,† Yangjun Zhang,† Bo Wen,# Fuchu He,*,† and Ping Xu*,†,‡,∥

Downloaded via EASTERN KENTUCKY UNIV on January 28, 2019 at 08:07:21 (UTC). See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.



State Key Laboratory of Proteomics, National Center for Protein Sciences Beijing, Beijing Proteome Research Center, National Engineering Research Center for Protein Drugs, Beijing Institute of Radiation Medicine, Beijing 102206, China ‡ Graduate School, Anhui Medical University, Hefei 230032, China § Life Science College, Southwest Forestry University, Kunming, 650224, China ∥ Key Laboratory of Combinatorial Biosynthesis and Drug Discovery of the Ministry of Education, School of Pharmaceutical Sciences, Wuhan University, Wuhan 430072, China ⊥ Institute of Microbiology, Chinese Academy of Science, Beijing 100101, China # BGI-Shenzhen, Shenzhen 518083, China S Supporting Information *

ABSTRACT: Since 2012, missing proteins (MPs) investigation has been one of the critical missions of ChromosomeCentric Human Proteome Project (C-HPP) through various biochemical strategies. On the basis of our previous testis MPs study, faster scanning and higher resolution mass-spectrometry-based proteomics might be conducive to MPs exploration, especially for low-abundance proteins. In this study, QExactive HF (HF) was used to survey proteins from the same testis tissues separated by two separating methods (tricine- and glycine-SDS-PAGE), as previously described. A total of 8526 proteins were identified, of which more lowabundance proteins were uniquely detected in HF data but not in our previous LTQ Orbitrap Velos (Velos) reanalysis data. Further transcriptomics analysis showed that these uniquely identified proteins by HF also had lower expression at the mRNA level. Of the 81 total identified MPs, 74 and 39 proteins were listed as MPs in HF and Velos data sets, respectively. Among the above MPs, 47 proteins (43 neXtProt PE2 and 4 PE3) were ranked as confirmed MPs after verifying with the stringent spectra match and isobaric and single amino acid variants filtering. Functional investigation of these 47 MPs revealed that 11 MPs were testis-specific proteins and 7 MPs were involved in spermatogenesis process. Therefore, we concluded that higher scanning speed and resolution of HF might be factors for improving the low-abundance MP identification in future C-HPP studies. All massspectrometry data from this study have been deposited in the ProteomeXchange with identifier PXD004092. KEYWORDS: Chromosome-Centric Human Proteome Project, missing proteins, testis, Q-Exactive HF, LTQ Orbitrap Velos



INTRODUCTION Following steps in the Human Proteome Project (HPP), the main mission of the Chromosome-Centric Human Proteome Project (C-HPP) is still to explore the protein evidence of “missing proteins” (MPs).1,2 According to the neXtProt database (2016-02), there are 2949 protein-coding genes that do not have reliable evidence at the protein level. Previous research showed that various factors led to MPs’ difficult identification. First, the intrinsic properties of MPs might be the major reasons, including their expressing abundance, molecular weight (MW), and hydrophobicity.3 Therefore, various analytical and experimental methods were optimized to solubilize, capture, and characterize special proteins, especially low-molecular-weight (LMW) © 2016 American Chemical Society

proteins, membrane proteins, and post-translational modification (PTM) proteins.4,5 In the MP searching project, mass spectrometry has been a common way, followed by antibody-based validation, bioinformatics, and RNC-mRNA (the mRNA attached to ribosome-nascent chain complex) strategies. With the advance in technology, a new generation of mass spectrometer has been optimized in many properties, such as scanning rate, resolution, Special Issue: Chromosome-Centric Human Proteome Project 2016 Received: April 30, 2016 Published: August 18, 2016 3988

DOI: 10.1021/acs.jproteome.6b00390 J. Proteome Res. 2016, 15, 3988−3997

Article

Journal of Proteome Research

into 22 fractions based on their MW and the protein abundance in specific regions, followed by trypsin digestion.

sensitivity, and dynamic range. Nowadays, Q-Exactive HF (HF), a widely used mass spectrometer all over the world, is a new segmented quadrupole mass analyzer with a smaller ultra-highfield Orbitrap and allows increasing the electric field and detection frequency. The compact Orbitrap of HF allows stronger field strength and shorter acquisition time or transient length, which theoretically provides 1.8 higher resolution at the same transient length. At the same time, the capability of parallel acquisition of HF results in faster scan cycles, which significantly increases the peptide and protein identification. Additionally, new segmented quadrupole provides a double-fold improvement of transmission speed, which also contributes to the scanning rate of HF.6,7 In HeLa cell phosphoproteomics, Kelstrup found that the sequencing speed of HF routinely exceeds 10 peptide spectra per second.8 Therefore, the high-resolution HF might be a powerful tool in MP studies. Second, tissue-specific expression9 or special modification proteins10 are also rich resources for MP searching. Following the pace of Human Protein Atlas with antibodies, various samples from different tissues were used (e.g., liver,11 brain,12 kidney,13 pancreas,14 skin,15 adipose tissue,16 etc.) to discover MPs. Interestingly, tissue transcript analysis showed that the testis tissue endows plentiful tissue-specific genes and abundant transcripts of the MPs.17 The gene expression analysis showed that 77% of protein-code genes were expressed in the testis, with more than 1000 genes showing an at least 5 times higher expression than other tissues.18 Fagerberg found that 8% of all genes were uniquely detected in the testis but not in any others tissues.19 All of this research suggests that the testis is a high gene expression tissue, which may endow more MPs. In 2015, Zhang and Jumeau reported 166 and 89 MPs from human testis and spermatozoa samples, respectively, corresponding to 5.6 and 3% of total 2948 MPs.9,20 Thus, we believe that the testis is still a reservoir for MPs searching. We efficiently identified more MPs by HF in human testis samples isolated by glycine- and tricine-SDS-PAGE. A total of 8526 proteins were identified by HF, while 7537 proteins were identified in our previous Velos data sets. In total, 1230 proteins had low abundance, which were uniquely identified by HF. Transcriptome analysis also found that these uniquely identified proteins had low mRNA abundance property. In total, 81 MPs were detected, in which 74 and 39 MPs were identified by HF and Velos, respectively. Through strict manual spectra checking, verifying with synthesized-peptide matching, isobaric filtering, and single amino acid variants (SAAVs) filtering, 47 of 81 MPs were confidently verified. Function investigation suggested that most MPs were testis-specific and related to spermatogenesis. HF, an advanced mass spectrometer, might be a powerful tool for finding more low-abundance MPs.



LC−MS/MS Analysis and Database Searching

Prior to LC−MS/MS analysis, digested peptides were resuspended in loading buffer (1% acetonitrile and 1% formic acid (FA) in ddH2O) and analyzed by an LC (Ultimate 3000, Thermo Scientific) equipped with self-packed capillary column (150 μm i.d. × 12 cm, 1.9 μm C18 reverse-phase fused-silica, Trap) using a 90 min nonlinear gradient at a flow rate of 600 nL/ min. The elution gradient was as follows: 6−9% B for 8 min, 9− 14% B for 16 min, 14−30% B for 36 min, 30−40% B for 15 min, 40−95% for 10 min, and 95−96% for 5 min (Buffer A, 0.1% FA in ddH2O; Buffer B, 0.08% FA and 80% ACN in ddH2O). Eluted samples were analyzed by Q-Exactive HF (Thermo Fisher Scientific). MS full scans were performed in the ultrahigh-field Orbitrap mass analyzer in ranges m/z 300−1400 with a resolution of 120 000 at m/z 200, the maximum injection time (MIT) was 80 ms and the automatic gain control (AGC) was set to 3 × 106. The top 20 intense ions were subjected to Orbitrap for further fragmentation via high energy collision dissociation (HCD) activation over a mass range between m/z 200 and 2000 at a resolution of 15 000 with the intensity threshold kept at 8.3 × 103. We selected ions with charge state from 2+ to 6+ for screening. Normalized collision energy (NCE) was set at 27.21 For each scan, the AGC was set at 2 × 104 and the MIT was 60 ms. The following dynamic exclusion of precursor ion masses over a time window of 12s was used to suppress repeated peak fragmentation.22 To evaluate and compare the identification of HF and Velos, we processed MS/MS raw files produced by HF in this study and Velos in last year (PXD002179)9 in Proteome Discoverer 2.0 (v2.0.0.802, Matrix Science Mascot 2.3.01) against overlap queries (20 055 entries, including 2949 MPs) between the SwissProt database (release 2015.12) and the neXtProt database (release 2016.02). The same parameters were set for database searching as follows: Cysteine carbamidomethyl was specified as a fixed modification and oxidation of methionine was set as variable modification. For HF data set, the tolerances of precursor and fragment ions were set at 15 ppm and 20 mmu, respectively. For Velos data set, the tolerances of precursor and fragment ions were set at 20 ppm and 0.5 Da, respectively. For digestion, trypsin was set as protease with two missed cleavage permitted. Only the proteins’ identifications satisfying the following criteria were considered: (1) the peptide length ≥7; (2) the FDR ≤ 1% at peptide level; (3) the FDR ≤ 1% at protein level; and (4) at least two different peptides (both unique and shared peptides were considered) as protein identification. The peptides were quantified by the peak area in Proteome Discoverer. For protein quantification, only the top three unique peptides were used for calculation. The FDRs at the PSM, peptide, and protein levels were calculated by dividing the number of target identifications by the number of decoy identifications.

MATERIALS AND METHODS

Samples Used in this Study

Bioinformatics Analysis of Identified Proteins

The same human testis samples (IRB approval number BGIIRB15076) were processed as previously described.9 In brief, three testis tissue samples (50 mg) were put in liquid nitrogen and sonicated in lysis buffer (8 M urea, 5 mM iodoacetamide (IAA), 50 mM NH4HCO3, 1× protease cocktail) on ice. The cell debris was removed by centrifugation at 13 300g in 4 °C for 15 min. Then, extracted proteins were separated by glycine- (10%) and tricine-SDS-PAGE (12%), respectively. Each lane of glycineSDS-PAGE was excised into 28 fractions and tricine-SDS-PAGE

To evaluate the performance of different mass spectrometers, we compared the peptides and proteins identified by HF and Velos, including the protein number, protein abundance, MPs’ properties, and so on. In addition, RNA-seq data were used to further evaluate and verify the protein abundance identified by HF and Velos at the mRNA level. Individual contribution to testis proteome was inspected by the peptide and protein number and saturation curve from HF and Velos. 3989

DOI: 10.1021/acs.jproteome.6b00390 J. Proteome Res. 2016, 15, 3988−3997

Article

Journal of Proteome Research

Figure 1. Deep proteomics study for human testis tissues by HF. (A,B) Accumulation curves of identified proteins and peptides of three testis samples resolved by two gel-separation methods. (C) Venn diagram of proteins identified by glycine-SDS-PAGE and tricine-SDS-PAGE. (D) Venn diagram of proteins identified from three individual testis tissue. (E,F) Accumulation curves of identified proteins and peptides from three individual testis tissue resolved by glycine and tricine gel with HF.

Verification of MPs with Synthesized Peptides

when the similarity matching score was higher than 0.9 was its corresponding peptide considered to be very high confidence peptide of MP. Considering that isobaric substitutions could change the mapping of the peptide from MP to a commonly observed protein, the isobaric filtering was performed by evaluating whether I = L, Q[Deamidated] = E, GG = N in its protein database. Considering the possibility that SAAVs could also turn an extraordinary result into an ordinary result, the SAAVs filtering was performed according to the Swiss-Prot and RefSeq database.

To evaluate the authenticity of MPs, we manually inspected the spectra referring to MPs by observing base peak intensity and b/y ions matching assisted by pFind and pBuild software.23,24 Considering alternate explanations of PSMs that passed the manual check, an open search was performed. The spectrum conflicting with the previous identification was filtered. Relatively high-quality peptides (unique peptides ≥2, peptide length ≥9 aa) were selected and synthesized for further analysis. Both m/z and ion intensity were used to calculate the cosine similarity score, and a specific formula was similar, as previously described.10 Only 3990

DOI: 10.1021/acs.jproteome.6b00390 J. Proteome Res. 2016, 15, 3988−3997

Article

Journal of Proteome Research

Figure 2. HF resulted in higher proteome coverage than Velos. (A) Comparison of identified peptides in each fraction by Velos and HF from three testis samples (B) Comparison of proteome identification of human testis tissues by Velos and HF, including MS/MS, PSM, peptide, protein, and so on. (C) Venn diagram of identified proteins based on Velos and HF. (D) Comparison of the whole sequence coverage from commonly identified proteins by Velos and HF.

Functional Investigation of MPs

tissues. For increasing the trust in data reliability, strict criteria were considered, such as FDR value for PSM (0.25%), peptide (0.33%), and protein levels (0.17%). In addition, the total number of the expected true positives and false positives at each level (PSM, peptide, and protein) was calculated (Table S-1). As a result, 8526 proteins were identified confidently in HF data sets. To test the resolution of the glycine and tricine gels on the separation of testis proteins, we drew the accumulation curves with identified proteins and peptides from each fraction (Figure 1A,B). The number of nonredundant peptides and proteins consistently increased with the addition of individual fractions from lower band to top. Compared with glycine-SDS-PAGE,

25

The functions of verified MPs were searched against UniProt (http://www.uniprot.org/) online, including biological function and process, expression, subcellular location, and so on.



RESULTS AND DISCUSSION

HF Achieves the Deeper Proteome Coverage of Human Testis Tissue

To further improve the proteome coverage for the testis tissue, we detected the digested peptides by HF in this study on the basis of Velos detection in last year. After database searching, a total of 7 049 718 MS2 scans were obtained from three testis 3991

DOI: 10.1021/acs.jproteome.6b00390 J. Proteome Res. 2016, 15, 3988−3997

Article

Journal of Proteome Research

Figure 3. HF identifies more low-abundance proteins than Velos. (A) Correlation of the commonly identified protein abundance of two mass spectrometers. The different shades of colors represent the density of proteins. (B) Comparison of sequence coverage of commonly identified proteins by HF and Velos. (C) Comparison of protein abundance and sequence coverage of unique and shared proteins in HF data sets. (D) Dynamic range of normalized intensity in two mass spectrometers. (E) Normalized intensity distribution of uniquely identified proteins in two mass spectrometers. (F) Comparison of abundance distribution of mRNA based on RNA-seq of HF and Velos.

tricine-SDS-PAGE identification curves were steeper on the lower part. However, it is also easier to achieve saturation level. In addition, we found that the tricine gel had higher resolution on LMW proteins (Figure S-1). The total number of identified proteins rose to 8526 when two separation methods were combined, in which 598 proteins (7.6%) and 160 proteins (1.3%) were uniquely identified either from glycine- or tricineSDS-PAGE (Figure 1C). This result suggested that the different protein separation methods can increase the number of proteins. Although samples’ pooling strategy has been a common way in proteomics, individual contribution was also considered in this study, similar to our previous research. In total, 326, 123, and 77

proteins were uniquely identified in three individuals, and 805 proteins were identified in at least two individuals, which accounts for ∼16% of total identified proteins (Figure 1D). In total, 7195 proteins were overlapped in three samples, which accounts for 84% of the whole testis proteins. A similar conclusion was also found in two separation methods (Figure S-2A,B). These results implied that HF can identify more individual expressed proteins and improves the coverage of the testis proteome. To comprehensively evaluate the contribution of proteins isolation methods and individual difference on testis sample, we drew the accumulation curves of proteins and peptides (Figure 1E,F). The accumulated curve achieved 3992

DOI: 10.1021/acs.jproteome.6b00390 J. Proteome Res. 2016, 15, 3988−3997

Article

Journal of Proteome Research

Figure 4. HF identifies more MPs with low-abundance in human testis. (A) Venn diagram of HF and Velos MPs. (B) Distribution of MPs in each samples. Gray represents not identified. (C) Comparison of the abundance distribution of MPs and the commonly identified proteins from HF and Velos. (D) Sequence coverage of identified MPs from HF and Velos. (E) Example of verification of the identified peptide for MPs by using the synthesized peptide.

two mass generations, we analyzed these two data sets generated from HF and Velos with the same searching engine and criteria. As shown in Figure 2A, the number of identified peptides from glycine-SDS-PAGE by HF was ∼17% higher than that from the Velos in all of the fractions, which resulted in 14% more identified proteins. Following that, we also found that HF produced 67% more MS2 spectra, 24% more PSMs, 19% more peptides, and 13% more protein groups than Velos (Figure 2B). These advantages might be due to the faster scanning speed, which

saturation of approximately 125 792 peptides for 8526 protein groups with three individual testis tissues. This resulted in the largest tissue proteome data sets with high mass accuracy on both of MS1 and MS2 through HF under the stringent criteria. HF Has Higher Coverage than Velos

Because the HF has higher sensitivity, resolution, and scanning speed than Velos,26 it may help us to make a further exploration to the testis proteome. To understand the characteristic of these 3993

DOI: 10.1021/acs.jproteome.6b00390 J. Proteome Res. 2016, 15, 3988−3997

Article

Journal of Proteome Research

Figure 5. MPs from HF behave high density in lower-MW and higher-pI. (A) Molecular weight distribution of MPs. (B) Isoelectric-point distribution of MPs. (C) Function information on verified MPs in UniProt.

further results in more MS2 spectra. Another 1230 and 240 proteins were uniquely identified in HF and Velos data, respectively (Figure 2C). Therefore, HF can identify more proteins than Velos in shorter time (10 500 min vs 13 500 min). These phenomena were also found in each individual testis sample with any separation method (Figure S-3A,B). This result suggested that the HF can still detect more specific proteins from the same human testis samples. The whole sequence coverage of proteins identified by HF or Velos was also compared (Figure 2D). The sequence coverage of HF mainly distributed in 25−50%, which was higher than that of the Velos with distribution of 10−25%. This may be contributed by the high intrinsic resolution of HF, which allows short transients and short cycle times in topN methods.27

low sequence coverage proteins. To verify this result, we compared the abundance and sequence coverage between the commonly and uniquely identified proteins from HF (Figure 3C). As expected, the uniquely identified proteins showed lower abundance and lower sequence coverage than that of commonly identified ones. Thus, the higher sensitivity of HF benefits the identification of low abundance proteins with limited protein length. In addition to sensitivity and resolution of the instrument, prefractionation of the sample might also be a parameter for enhancing the identification of low-abundance proteins. The dynamic range of proteins abundance also affects the number of identified proteins. Thus, protein identification may be improved by wider dynamic range of mass spectrometer. Normalized intensities were used to compare the intensity of total proteins identified by HF and Velos, and the result showed that HF has wider dynamic range than Velos (Figure 3D). We also compared the intensity of uniquely identified proteins by HF and Velos (Figure 3E). Although they had a similar dynamic range expanded from 4 to 8 for the mainly identified proteins, the HF could detect more low abundance proteins in the range of 0 to 4. This result was consistent with what we previously described. As the same, RNA-Seq data consistently verified that these low-abundance proteins had lower expression level at the mRNA level as well (Figure 3F). Compared with the abundance of overlapped proteins at the mRNA level, uniquely identified proteins by HF and Velos had low abundance at the

HF Identifies More Low-Abundance Proteins than Velos

To estimate the detection sensitivity of these two different mass spectrometers used in this study, we compared the intensity of shared proteins in HF and Velos data sets (Figure 3A). The protein abundance of HF was highly correlated with that of the Velos (Pearson’s correlated coefficient was 0.94), which suggested the good reproducibility of these two mass spectrometers. Consistently, we saw the higher sequence coverage of proteins identified by HF than that of Velos, especially for the lower coverage proteins (Figure 3B). This result illustrated that HF may have an advantage of identifying 3994

DOI: 10.1021/acs.jproteome.6b00390 J. Proteome Res. 2016, 15, 3988−3997

Article

Journal of Proteome Research

were difficultly identified by previous generations of mass spectrometer. This may be caused by higher sensitivity of HF compared with that Velos for LMW proteins identification (Figure S-5A). A similar result was obtained for MPs with higher pI (Figure S-5B). The 47 confirmed MPs in UniProt (http://www.uniprot.org/ ) showed that 26 MPs (55.3%) were functionally annotated (Figure 5C). Most of these MPs take part in an mRNA process like RNA splicing.29−31 In total, 11 MPs were testis-specific proteins, among which, Q9BWV3, P0C5Z0, A6NEQ0, and O14598 are related to the spermatogenesis process (Figure 5C). Q9BWV3, P0C5Z0, and A6NEQ0 are all bonding proteins that are bonded to zinc ion, DNA, and RNA, respectively. Testisspecific basic protein Y1 (O14598) may play a role in sex ratio distortion.32 Zinc finger MYND domain-containing protein 15 is predicted to be a candidate in some disease-forming process and might interact with histone deacetylases (HDACs) and play an important role in differentiation, spermatogenesis, transcription, and transcription regulation. Mutation of this gene may cause spermatogenic failure 14 (SPGF14).33 In addition, Q5BKT4 and A6NNE9 may participate in protein modification. Dol-PGlc:Glc(2)Man(9)GlcNAc(2)-PP-Dol α-1,2-glucosyltransferase (Q5BKT4) is involved in the protein glycosylation pathway, which adds the third glucose residue to the lipid-linked oligosaccharide precursor for N-linked glycosylation,34 while E3 ubiquitin-protein ligase MARCH11 (A6NNE9) may play a role in ubiquitin-dependent protein sorting in the development of spermatids.35 The known biological functions of these MPs indicate that most of these identified MPs with low abundance might take part in various important biological processes, especially in spermatogenesis. These functions are ongoing by our other lab teams, and it is worthwhile to thoroughly research them.

transcriptional level. Moreover, more uniquely identified proteins were detected by HF compared with Velos data. Thus, the high-resolution and wide-dynamic range of HF could achieve deeper proteome coverage in human testis tissue, which potentially benefits the MP study in the same sample. HF Identified More MPs with Lower Abundance

In our previous research, testis was regarded as an ideal target tissue for MPs study through comparing the abundance distribution of the corresponding gene of MPs in total genes from 27 tissues at the mRNA level.9 In total, 81 MPs were detected, in which 32 MPs were commonly identified by HF and Velos, while another 42 MPs and 7 MPs were uniquely identified by HF and Velos, respectively (Figure 4A). Of them, total 51 proteins were new identified MPs. (Figure S-4A). The MP distribution suggested that common identified MPs tended to distribute in multiple samples and had relatively higher abundance (Figure 4B). However, uniquely identified MPs by HF or Velos tend to distribute in single sample, which may be due to the individual difference from different donors. The status of spermatogenesis might be a factor of individual differences. Various ages (53, 35, and 52 years old) and native places (Beijing and Hebei) of three donors might lead to different spermatogenesis status. The normalized intensity distribution comparison showed that MP abundance was much lower than common identified proteins’ abundance from HF and Velos, and the abundance of HF MPs was lower than that of Velos (Figure 4C). Because the HF can achieve higher sequence coverage than Velos as higher sequencing speed, we compared MPs sequence coverage of HF and Velos (Figure 4D). Each dot represents one MP, and the sizes of the dots vary with the number of identified unique peptides. Interestingly, almost all of the MPs identified by HF have at least two unique peptides, which account for 78.3% of total MPs (Figure S-4B). This result illustrated that HF detected more convincing MPs than Velos. The reliability of MPs identified in this study was further verified by manual spectra quality checking. The total of 56 MPs were retained by filtering with at least two unique-mapping peptides whose sequence length was 9 or more aa. These above 56 proteins were listed as credible MPs; their spectra have higher base peak intensity and continuous b/y ion matches. In addition, the matching of synthesized peptides with scanning spectra of these 56 MPs showed that they had high similarity, the score was 0.9 or higher based on comparing the matches for b1+, y1+, b2+, and y2+ ions, and the patterns for peak intensity of these match (Figure 4E, Figure S-6, and Table S-3). In addition, we also evaluated whether the isobaric substitutions and SAAVs could influence the authenticity of MPs identified. For example, the unique peptide “ISNIFVIGNGNKPWISLPR” mapped to the MP Q8TD47 might be isobaric-substituted to the “LSNIFVIGNGNKPWISLPR” peptide mapped to the commonly identified protein P22090 (40S ribosomal protein S4, Y isoform 1) in our data. After isobaric and SAAVs filtering, 47 proteins were confirmed MPs, finally with authentic MS evidence. The high verification rate for all of these MPs implied the high quality of our MS data and the reliability of MPs.



CONCLUSIONS To achieve deeper coverage, we used HF to reanalyze the same three individual human testis samples by two different gelseparations based on our previous work. In total, 8526 proteins were identified with high confidence, in which more than 1000 proteins were uniquely identified with low abundance when compared with our Velos data sets. The deeper coverage might also contribute to tandem protein separating methods and individual diversity. Among them, a total of 81 MPs were identified, of which 74 MPs and 39 MPs were identified from HF and Velos data sets, respectively. HF alone detected more MPs with lower abundance and higher sequence coverage from the same human testis tissues. After strict filtering, 47 MPs were selected as highly credible MPs by manual checking of spectrum quality, isobaric filtering, and SAAVs filtering. The higher sensitivity and resolution of MS instrument can explore more low-abundance MPs in human testis tissue and could be applied in other tissues and in the near future for C-HPP study on different proteomics samples.



ASSOCIATED CONTENT

S Supporting Information *

MPs Identified by HF Have High Density in Lower MW and Higher pI Properties

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jproteome.6b00390. Figure S-1. Molecular weight (MW) distribution of proteins identified by glycine- and tricine-SDS-PAGE. Figure S-2. Comparison of identified proteins from three

Previous studies showed that low MW, high hydrophobicity, and high pI were the main properties of MPs.28 The analysis of MP characteristics showed that HF could identify more MPs with even LMW (Figure 5A) or high pI compared with Velos (Figure 5B). These results illustrated that HF can detect more MPs that 3995

DOI: 10.1021/acs.jproteome.6b00390 J. Proteome Res. 2016, 15, 3988−3997

Article

Journal of Proteome Research



2013−2014 and strategies for finding missing proteins. J. Proteome Res. 2014, 13 (1), 15−20. (2) Horvatovich, P.; Lundberg, E. K.; Chen, Y. J.; Sung, T. Y.; He, F.; Nice, E. C.; Goode, R. J.; Yu, S.; Ranganathan, S.; Baker, M. S.; Domont, G. B.; Velasquez, E.; Li, D.; Liu, S.; Wang, Q.; He, Q. Y.; Menon, R.; Guan, Y.; Corrales, F. J.; Segura, V.; Casal, J. I.; Pascual-Montano, A.; Albar, J. P.; Fuentes, M.; Gonzalez-Gonzalez, M.; Diez, P.; Ibarrola, N.; Degano, R. M.; Mohammed, Y.; Borchers, C. H.; Urbani, A.; Soggiu, A.; Yamamoto, T.; Salekdeh, G. H.; Archakov, A.; Ponomarenko, E.; Lisitsa, A.; Lichti, C. F.; Mostovenko, E.; Kroes, R. A.; Rezeli, M.; Vegvari, A.; Fehniger, T. E.; Bischoff, R.; Vizcaino, J. A.; Deutsch, E. W.; Lane, L.; Nilsson, C. L.; Marko-Varga, G.; Omenn, G. S.; Jeong, S. K.; Lim, J. S.; Paik, Y. K.; Hancock, W. S. Quest for Missing Proteins: Update 2015 on Chromosome-Centric Human Proteome Project. J. Proteome Res. 2015, 14 (9), 3415−31. (3) Carapito, C.; Lane, L.; Benama, M.; Opsomer, A.; MoutonBarbosa, E.; Garrigues, L.; Gonzalez de Peredo, A.; Burel, A.; Bruley, C.; Gateau, A.; et al. Computational and mass-spectrometry-based workflow for the discovery and validation of missing human proteins: application to chromosomes 2 and 14. J. Proteome Res. 2015, 14 (9), 3621−3634. (4) Horvatovich, P.; Végvári, Á .; Saul, J.; Park, J. G.; Qiu, J.; Syring, M.; Pirrotte, P.; Petritis, K.; Tegeler, T. J.; Aziz, M.; et al. In vitro transcription/translation system: A versatile tool in the search for missing proteins. J. Proteome Res. 2015, 14 (9), 3441−3451. (5) Kitata, R. B.; Dimayacyac-Esleta, B. R. T.; Choong, W.-K.; Tsai, C.F.; Lin, T.-D.; Tsou, C.-C.; Weng, S.-H.; Chen, Y.-J.; Yang, P.-C.; Arco, S. D.; et al. Mining missing membrane proteins by high-pH reversephase StageTip fractionation and multiple reaction monitoring mass spectrometry. J. Proteome Res. 2015, 14 (9), 3658−3669. (6) Scheltema, R. A.; Hauschild, J.-P.; Lange, O.; Hornburg, D.; Denisov, E.; Damoc, E.; Kuehn, A.; Makarov, A.; Mann, M. The Q Exactive HF, a Benchtop mass spectrometer with a pre-filter, highperformance quadrupole and an ultra-high-field Orbitrap analyzer. Mol. Cell. Proteomics 2014, 13 (12), 3698−3708. (7) Senko, M. W.; Remes, P. M.; Canterbury, J. D.; Mathur, R.; Song, Q.; Eliuk, S. M.; Mullen, C.; Earley, L.; Hardman, M.; Blethrow, J. D.; et al. Novel parallelized quadrupole/linear ion trap/Orbitrap tribrid mass spectrometer improving proteome coverage and peptide identification rates. Anal. Chem. 2013, 85 (24), 11710−11714. (8) Kelstrup, C. D.; Jersie-Christensen, R. R.; Batth, T. S.; Arrey, T. N.; Kuehn, A.; Kellmann, M.; Olsen, J. V. Rapid and deep proteomes by faster sequencing on a benchtop quadrupole ultra-high-field Orbitrap mass spectrometer. J. Proteome Res. 2014, 13 (12), 6187−95. (9) Zhang, Y.; Li, Q.; Wu, F.; Zhou, R.; Qi, Y.; Su, N.; Chen, L.; Xu, S.; Jiang, T.; Zhang, C.; Cheng, G.; Chen, X.; Kong, D.; Wang, Y.; Zhang, T.; Zi, J.; Wei, W.; Gao, Y.; Zhen, B.; Xiong, Z.; Wu, S.; Yang, P.; Wang, Q.; Wen, B.; He, F.; Xu, P.; Liu, S. Tissue-Based Proteogenomics Reveals that Human Testis Endows Plentiful Missing Proteins. J. Proteome Res. 2015, 14 (9), 3583−94. (10) Su, N.; Zhang, C.; Zhang, Y.; Wang, Z.; Fan, F.; Zhao, M.; Wu, F.; Gao, Y.; Li, Y.; Chen, L.; Tian, M.; Zhang, T.; Wen, B.; Sensang, N.; Xiong, Z.; Wu, S.; Liu, S.; Yang, P.; Zhen, B.; Zhu, Y.; He, F.; Xu, P. Special Enrichment Strategies Greatly Increase the Efficiency of Missing Proteins Identification from Regular Proteome Samples. J. Proteome Res. 2015, 14 (9), 3680−92. (11) Kampf, C.; Mardinoglu, A.; Fagerberg, L.; Hallstrom, B. M.; Edlund, K.; Lundberg, E.; Ponten, F.; Nielsen, J.; Uhlen, M. The human liver-specific proteome defined by transcriptomics and antibody-based profiling. FASEB J. 2014, 28 (7), 2901−14. (12) Sjöstedt, E.; Fagerberg, L.; Hallström, B. M.; Häggmark, A.; Mitsios, N.; Nilsson, P.; Pontén, F.; Hökfelt, T.; Uhlén, M.; Mulder, J. Defining the human brain proteome using transcriptomics and antibody-based profiling with a focus on the cerebral cortex. PLoS One 2015, 10 (6), e0130028. (13) Habuka, M.; Fagerberg, L.; Hallström, B. M.; Kampf, C.; Edlund, K.; Sivertsson, Å.; Yamamoto, T.; Pontén, F.; Uhlén, M.; Odeberg, J. The kidney transcriptome and proteome defined by transcriptomics and antibody-based profiling. PLoS One 2014, 9 (12), e116125.

individual testis tissues based on two gel-separating methods. Figure S-3. Comparison of identified proteins from three individual testis tissues based on two separation ways by HF and Velos. Figure S-4. Number and sequence coverage of testis MPs. Figure S-5. MW and pI of identified proteins by HF and Velos. Figure S-6. Verification of MPs by its synthesized peptide and previous MS spectra matching. (PDF) Table S-1. The searching criteria and results summary of HF and Velos. Table S-2. Detailed information about the identified proteins/genes for three samples by HF and Velos based on proteomics and transcriptomics. Table S-3. The identification and verification of the total MPs in human testis. (XLSX)

AUTHOR INFORMATION

Corresponding Authors

*F.H.: Fax: 86-10-68171208. E-mail: [email protected]. *P.X. Tel/Fax: 86-10-80705066. E-mail: xuping_bprc@126. com. Author Contributions ▽

W.W., W.L., F.W., and X.P. contributed equally to this work.

Notes

The authors declare no competing financial interest. All mass spectrometry data from this study have been deposited in the ProteomeXchange with identifier PXD004092.



ACKNOWLEDGMENTS We thank Wantao Ying, Pengyuan Yang, Siqi Liu, and Qinyu He for their valuable suggestions. We also appreciate lab members for their discussion and efforts. This work was funded by the Chinese National Basic Research Programs (2013CB911201 & 2015CB910700), the International Collaboration Program (2014DFB30020), the National Natural Science Foundation of China (Grant Nos. 31470809, 31400698, and 31400697), the National High-Tech Research and Development Program of China (2014AA020900 and 2014AA020607), the National Natural Science Foundation of Beijing (Grant No. 5152008), Beijing Training Project for The Leading Talents in S&T, National Megaprojects for Key Infectious Diseases (2013zx10003002 and 2016ZX10003003), the Key Projects in the National Science & Technology Pillar Program (2012BAF14B00), the Unilevel 21th Century Toxicity Program (MA-2014-02409), and the Foundation of State Key Lab of Proteomics (SKLPYB201404).



ABBREVIATIONS HPP, Human Proteome Project; C-HPP, Chromosome-Centric Human Proteome Project; MPs, missing proteins; HF, QExactive HF; Velos, LTQ-Orbitrap Velos; MW, molecular weight; LMW, low-molecular-weight proteins; RNC-mRNA, mRNA attached to ribosome-nascent chain complex; RPKM, reads per kilobase per million mapped reads; PSM, peptide− spectrum; MIT, maximum injection time; AGC, automatic gain control; NCE, normalized collision energy match; FDR, false discovery rate



REFERENCES

(1) Lane, L.; Bairoch, A.; Beavis, R. C.; Deutsch, E. W.; Gaudet, P.; Lundberg, E.; Omenn, G. S. Metrics for the Human Proteome Project 3996

DOI: 10.1021/acs.jproteome.6b00390 J. Proteome Res. 2016, 15, 3988−3997

Article

Journal of Proteome Research (14) Danielsson, A.; Ponten, F.; Fagerberg, L.; Hallstrom, B. M.; Schwenk, J. M.; Uhlen, M.; Korsgren, O.; Lindskog, C. The human pancreas proteome defined by transcriptomics and antibody-based profiling. PLoS One 2014, 9 (12), e115421. (15) Edqvist, P. H.; Fagerberg, L.; Hallstrom, B. M.; Danielsson, A.; Edlund, K.; Uhlen, M.; Ponten, F. Expression of human skin-specific genes defined by transcriptomics and antibody-based profiling. J. Histochem. Cytochem. 2015, 63 (2), 129−41. (16) Mardinoglu, A.; Kampf, C.; Asplund, A.; Fagerberg, L.; Hallstrom, B. M.; Edlund, K.; Bluher, M.; Ponten, F.; Uhlen, M.; Nielsen, J. Defining the human adipose tissue proteome to reveal metabolic alterations in obesity. J. Proteome Res. 2014, 13 (11), 5106−19. (17) Uhlen, M.; Fagerberg, L.; Hallstrom, B. M.; Lindskog, C.; Oksvold, P.; Mardinoglu, A.; Sivertsson, A.; Kampf, C.; Sjostedt, E.; Asplund, A.; Olsson, I.; Edlund, K.; Lundberg, E.; Navani, S.; Szigyarto, C. A.; Odeberg, J.; Djureinovic, D.; Takanen, J. O.; Hober, S.; Alm, T.; Edqvist, P. H.; Berling, H.; Tegel, H.; Mulder, J.; Rockberg, J.; Nilsson, P.; Schwenk, J. M.; Hamsten, M.; von Feilitzen, K.; Forsberg, M.; Persson, L.; Johansson, F.; Zwahlen, M.; von Heijne, G.; Nielsen, J.; Ponten, F. Proteomics. Tissue-based map of the human proteome. Science 2015, 347 (6220), 1260419. (18) Djureinovic, D.; Fagerberg, L.; Hallstrom, B.; Danielsson, A.; Lindskog, C.; Uhlen, M.; Ponten, F. The human testis-specific proteome defined by transcriptomics and antibody-based profiling. Mol. Hum. Reprod. 2014, 20 (6), 476−88. (19) Fagerberg, L.; Hallström, B. M.; Oksvold, P.; Kampf, C.; Djureinovic, D.; Odeberg, J.; Habuka, M.; Tahmasebpoor, S.; Danielsson, A.; Edlund, K.; et al. Analysis of the human tissue-specific expression by genome-wide integration of transcriptomics and antibody-based proteomics. Mol. Cell. Proteomics 2014, 13 (2), 397−406. (20) Jumeau, F.; Com, E.; Lane, L.; Duek, P.; Lagarrigue, M.; Lavigne, R.; Guillot, L.; Rondel, K.; Gateau, A.; Melaine, N.; Guevel, B.; Sergeant, N.; Mitchell, V.; Pineau, C. Human Spermatozoa as a Model for Detecting Missing Proteins in the Context of the Chromosome-Centric Human Proteome Project. J. Proteome Res. 2015, 14 (9), 3606−20. (21) Randall, S. M.; Cardasis, H. L.; Muddiman, D. C. Factorial experimental designs elucidate significant variables affecting data acquisition on a quadrupole Orbitrap mass spectrometer. J. Am. Soc. Mass Spectrom. 2013, 24 (10), 1501−12. (22) Mann, K.; Mann, M. In-depth analysis of the chicken egg white proteome using an LTQ Orbitrap Velos. Proteome Sci. 2011, 9 (1), 7. (23) Wang, L. h.; Li, D. Q.; Fu, Y.; Wang, H. P.; Zhang, J. F.; Yuan, Z. F.; Sun, R. X.; Zeng, R.; He, S. M.; Gao, W. pFind 2.0: a software package for peptide and protein identification via tandem mass spectrometry. Rapid Commun. Mass Spectrom. 2007, 21 (18), 2985−2991. (24) Fu, Y.; Yang, Q.; Sun, R.; Li, D.; Zeng, R.; Ling, C. X.; Gao, W. Exploiting the kernel trick to correlate fragment ions for peptide identification via tandem mass spectrometry. Bioinformatics 2004, 20 (12), 1948−1954. (25) UniProt Consortium. UniProt: a hub for protein information. Nucleic Acids Res. 2015, 43 (Database issue), D204−12. (26) Sun, L.; Zhu, G.; Dovichi, N. J. Comparison of the LTQ-Orbitrap Velos and the Q-Exactive for proteomic analysis of 1−1000 ng RAW 264.7 cell lysate digests. Rapid Commun. Mass Spectrom. 2013, 27 (1), 157−62. (27) Michalski, A.; Damoc, E.; Hauschild, J.-P.; Lange, O.; Wieghaus, A.; Makarov, A.; Nagaraj, N.; Cox, J.; Mann, M.; Horning, S. Mass spectrometry-based proteomics using Q Exactive, a high-performance benchtop quadrupole Orbitrap mass spectrometer. Mol. Cell. Proteomics 2011, 10 (9), M111.011015. (28) Xu, A.; Li, G.; Yang, D.; Wu, S.; Ouyang, H.; Xu, P.; He, F. Evolutionary characteristics of missing proteins: insights into the evolution of human chromosomes related to missing-protein-encoding genes. J. Proteome Res. 2015, 14 (12), 4985−4994. (29) Skaletsky, H.; Kuroda-Kawaguchi, T.; Minx, P. J.; Cordum, H. S.; Hillier, L.; Brown, L. G.; Repping, S.; Pyntikova, T.; Ali, J.; Bieri, T.; et al. The male-specific region of the human Y chromosome is a mosaic of discrete sequence classes. Nature 2003, 423 (6942), 825−837.

(30) Bao, Y.; Konesky, K.; Park, Y. J.; Rosu, S.; Dyer, P. N.; Rangasamy, D.; Tremethick, D. J.; Laybourn, P. J.; Luger, K. Nucleosomes containing the histone variant H2A. Bbd organize only 118 base pairs of DNA. EMBO J. 2004, 23 (16), 3314−3324. (31) Chen, Y.-T.; Scanlan, M. J.; Venditti, C. A.; Chua, R.; Theiler, G.; Stevenson, B. J.; Iseli, C.; Gure, A. O.; Vasicek, T.; Strausberg, R. L.; et al. Identification of cancer/testis-antigen genes by massively parallel signature sequencing. Proc. Natl. Acad. Sci. U. S. A. 2005, 102 (22), 7940−7945. (32) Vawter, M. P.; Evans, S.; Choudary, P.; Tomita, H.; MeadorWoodruff, J.; Molnar, M.; Li, J.; Lopez, J. F.; Myers, R.; Cox, D.; et al. Gender-specific gene expression in post-mortem human brain: localization to sex chromosomes. Neuropsychopharmacology 2004, 29 (2), 373. (33) Ayhan, O.; Balkan, M.; Guven, A.; Hazan, R.; Atar, M.; Tok, A.; Tolun, A. Truncating mutations in TAF4B and ZMYND15 causing recessive azoospermia. J. Med. Genet. 2014, 51 (4), 239−44. (34) Oriol, R.; Martinez-Duncker, I.; Chantret, I.; Mollicone, R.; Codogno, P. Common origin and evolution of glycosyltransferases using Dol-P-monosaccharides as donor substrate. Mol. Biol. Evol. 2002, 19 (9), 1451−1463. (35) Nakamura, N. Ubiquitination regulates the morphogenesis and function of sperm organelles. Cells 2013, 2 (4), 732−750.

3997

DOI: 10.1021/acs.jproteome.6b00390 J. Proteome Res. 2016, 15, 3988−3997