Multi-Protease Strategy Identifies Three PE2 Missing Proteins in

Sep 29, 2017 - State Key Laboratory of Biocontrol and Guangdong Provincial Key Laboratory of Plant Resources, College of Ecology and Evolution, Sun Ya...
0 downloads 9 Views 2MB Size
Subscriber access provided by TUFTS UNIV

Article

Multi-Protease Strategy Identifies Three PE2 Missing Proteins in Human Testis Tissue Yihao Wang, Yang Chen, Yao Zhang, Wei Wei, Yanchang Li, Tao Zhang, Fuchu He, Yue Gao, and Ping Xu J. Proteome Res., Just Accepted Manuscript • DOI: 10.1021/acs.jproteome.7b00340 • Publication Date (Web): 29 Sep 2017 Downloaded from http://pubs.acs.org on October 5, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Proteome Research is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Multi-Protease Strategy Identifies Three PE2 Missing Proteins in Human Testis Tissue Yihao Wang1,2#, Yang Chen1#, Yao Zhang3*, Wei Wei1, Yanchang Li1, Tao Zhang1, Fuchu He1*, Yue Gao2*, Ping Xu 1,4,5,6* 1

State Key Laboratory of Proteomics, National Center for Protein Sciences Beijing, Beijing Proteome Research

Center, Beijing Institute of Radiation Medicine, Beijing 102206, China 2

Department of Pharmacology and Toxicology, Beijing Institute of Radiation Medicine, Beijing 100850, China

3

State Key Laboratory of Biocontrol and Guangdong Provincial Key Laboratory of Plant Resources, College of

Ecology and Evolution, Sun Yat-Sen University, Guangzhou 510275, China 4

Key Laboratory of Combinatorial Biosynthesis and Drug Discovery of Ministry of Education, School of

Pharmaceutical Sciences, Wuhan University, Wuhan 430072, China 5

Graduate School, Anhui Medical University, Hefei 230032, China

6

Tianjin Baodi Hospital, Tianjin 301800, China

#

These authors contributed equally to this work.

*

To whom correspondence should be addressed:

Ping Xu

Beijing Proteome Research Center, 38 Science Park Road, Changping District, Beijing 102206, China.

Tel and Fax: 86-10-61777113, E-mail: [email protected]

Yue Gao

Beijing Key Laboratory for Radiobiology, Beijing Institute of Radiation Medicine, Beijing 100850, China

Tel and Fax: 86-10-68212874, E-mail: [email protected]

Fuchu He

1

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

National Center for Protein Sciences Beijing, Beijing Proteome Research Center, 38 Science Park Road,

Changping District, Beijing 102206, China.

Tel and Fax: 86-10-68171208. E-mail: [email protected]

Yao Zhang

State Key Laboratory of Biocontrol and Guangdong Provincial Key Laboratory of Plant Resources, College of

Ecology and Evolution, Sun Yat-Sen University, Guangzhou 510275, China

Tel and Fax: 86-20-84111727. E-mail: [email protected]

ABSTRACT Although five-year missing proteins (MPs) study have been performed, the MPs searching is still one of mission of Chromosome-Centric Human Proteome Project (C-HPP) this year. Following the step of C-HPP, we have focused on the testis-enriched MPs by various strategies since 2015. Based on the theoretical analysis of MPs (2017-01, neXtProt) by multi-protease digestion, we found non-conventional proteases could improve the peptide diversity and sequence coverage compared with trypsin. Therefore, multi-protease strategy was used for searching more MPs in the same human testis tissues separated by 10% SDS-PAGE followed with high resolution LC-MS/MS system (Q Exactive HF). Total 7,838 protein were identified. Among them, three MPs in neXtProt 2017-01 (i.e., PE2) have been identified: beta-defensin 123 (Q8N688, chr 20q), cancer/testis antigen family 45 member A10 (P0DMU9, chr Xq) and Histone H2A-Bbd type 2/3 (P0C5Z0, chr Xq); despite only one unique peptide of≧9 AA was respectively identified in beta-defensin 123 and Histone H2A-Bbd type 2/3, we proposed that each falls under the exceptional evidence clause of the HPP Guidelines v2.1 after spectrum quality check, isobaric 2

ACS Paragon Plus Environment

Page 2 of 31

Page 3 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

PTM and single amino acid variant (SAAV) filtering, and verification with a synthesized peptide, based on overlapping peptides from different proteases. Other MPs were considered as candidates, but need more information. All MS datasets have been deposited to the ProteomeXchange with identifier PXD006465 (Username: [email protected]; Password: 5TbvY07c).

KEYWORDS: Chromosome-Centric Human Proteome Project, testis, missing proteins, multi-protease, proteome

INTRODUCTION The research of missing protein (MPs) has been an important mission in Chromosome-Centric HPP (C-HPP) project since 20121. For finding sufficient protein-expressing evidence of MPs, the mass spectrometry (MS) and antibody-based techniques have been firstly used as powerful tools in this field. However, previous study suggested that these MPs were difficult to be detected at the protein level because of their extremely low abundance, specific subcellular localization, extreme physicochemical features, etc2-4. Therefore, various techniques have been employed to find these MPs, including special tissues (testis5, placenta6, lung7, dental pulp8, disease-associated cell lines (colorectal cancer9, hepatoma10, glioma cancer11)), recombinant protein guide SRM12, omics-integrated analysis13, biochemical enrichment (low molecular weight (MW) protein, membrane protein, phosphorylated protein, ubiquitinated protein)14, 15, and the optimization of peptide fractionation16, etc (Table S-1). Through the concerted efforts of researchers all over the world, the number of MPs was reduced from the original 6,56817(2012) to 2,579 according to the neXtProt database (2017-01)18. 3

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

As the reproductive tissue, testis has the largest number of tissue-enriched transcripts19. Nearly 77% of all putative protein encoding genes were expressed in the testis, which were far more than those in other human tissues20. Kuster lab found testis showed relative higher transcriptional efficiency than the other tissues in MS-based human proteome draft21. Consistently, Pineau and his collaborators successfully identified 235 MPs from human spermatozoa, in which 16 MPs were specifically expressed in testis by immuhistochemistry22. Our lab has extensively utilized the testis tissues as the target materials since 2015. Multiple strategies, such as individuals, protein separation, and high scan speed MS have been applied, and we found that testis really endows plentiful MPs (2015, 2016). According to HPA dataset reference (v15, 2016-04), only 354 of the 879 HPA testis-enriched proteins were canonical in PeptideAtlas, which implied that the testis tissue was still an ideal MPs-searching resources. Deep coverage has been one of the desired goals in proteomics although it was restricted by many aspects including protease selection in sample digesting process. The near exclusive use of trypsin provided a partial view of the proteome and hampered the discovery of new isoforms. Hancock lab adopted a multi-protease strategy to increase the sequence coverage of identified proteins23. Guo used seven proteases for performing more comprehensive study on the HeLa proteome and identified 4 MPs according to the PeptideAtlas (2015-05)24. Trevisiol proposed multiple proteases could increase sequence coverage and provide more extensive PTM and sequence variant profiling25. Swaney26 found multiple proteases (Lys-C, Arg-C, Asp-N, Glu-C and Trypsin) could increase nearly 15% protein identification and 3-fold sequence coverage in Saccharomyces cerevisiae proteome study. These results emphasized that multi-protease strategy might be useful for more MPs identification. 4

ACS Paragon Plus Environment

Page 4 of 31

Page 5 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

In non-conventional proteases, Lys-C27, LysargiNase28 and Glu-C29 have complementary effect. The Lys-C is specific to lysine at C-terminus, and peptides with a basic residue at their C-termini typically show an increase in the ionization efficiency during proton adduct electrospray ionization. The peptides generated by LysargiNase could produce more b type ions in CID when compared to C-terminal protease30. As mean occurrences of Glu and Asp are around 5.5 and 5% respectively31, Glu-C makes a good supplementary by cleaving preferably to the C-terminus of Glu and sometimes Asp. The combination of these three proteases could potentially achieve a more comprehensive sequence coverage. In this study, we analyzed the same three human testis samples using multi-protease strategy (Lys-C27, Glu-C29, LysargiNase28). By applying the Q Exactive HF32, which was demonstrated superior in detectable coverage and sensitivity14, we identified 7,838 protein groups, in which three proteins were listed as confirmed MPs and described other 5 candidate MP groups (18 proteins) after strict filtering and verification with synthesized peptide33.

MATERIALS AND METHODS Proteome Sample Preparation The same three human testis samples (IRB approval number BGIIRB15076) were prepared as previously described in this study5. In brief, three testis tissues (50 mg) were ground in liquid nitrogen and sonicated in lysis buffer (8 M urea, 5 mM IAA, 50 mM NH4HCO3, 1×protease cocktail) on ice. The cell debris was removed by centrifugation at 13,300g for 15 min. Three individual samples (70 µg) were processed in triplicate, reduced with 5 mM DTT at 42℃ for 30min and alkylated with 10 mM iodoacetamide in dark at room temperature for 30 min. Total 5

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 31

nine samples were then resolved by a 10% SDS-PAGE (2.5 cm), and stained with Coomassie Blue G250. Each gel lane was excised into 7 fractions based on the MW and the protein abundance (Figure S-1A). Each individual sample was separately digested with proteases (our lab) of Glu-C (12.5 ng/µL)

29

, Lys-C (12.5 ng/µL)

27

, LysargiNase (12.5 ng/µL)

28

at 37 °C for 12 h (Figure

S-1B). The extracted peptides were dried and dissolved with loading buffer (1% formic acid, FA; 1% acetonitrile; ACN) for MS analysis. LC-MS/MS Analysis and Database Searching The peptide mixtures were analyzed with tandem MS/MS (Q Exactive HF, Thermo Fisher Scientific) after LC separation (Ultimate 3000, Thermo Scientific). Briefly, the samples were loaded onto a self-packed capillary column (150 µm i.d. × 12 cm, 1.9 µm C18 reverse-phase fused-silica, Trap) and eluted with a 90 min nonlinear gradient: 6% B for 8 min, 9-14% B for 16 min, 14-30% B for 36min, 30-40% B for 15 min, 40-95% B for 3min, 95% B for 7min, 95-6% B for 1min, 6% B for 4min (Buffer A, 0.1% FA in dd H2O; Buffer B,0.08% FA and 80% ACN in ddH2O; flow rate, ~600 nL/min). The trap, a pre-column, was packed by ourselves to protect analytical column from the destruction of salt and impurities. As for the trap, when empty pipe (Polymicro Technologies, America) was cut into appropriate length, sieve plate was inserted into one end of the pipe. Then 3 µm C18 particles (Michrom Bioresources, America) were pushed into the pipe. As for analytical column, 1.9 µm C18 particles (Michrom Bioresources, America) were pushed into the empty pipe with the help of the Pressure Injection Cell (Next Advance, America). The MS Survey scans were performed in the ultra-high-field Orbitrap analyzer at a resolution of 120,000 and target values of 3,000,000 ions over a mass range from 300 to 1400 m/z. For MS/MS scan, the 20 most intense peptide ions with charge state 2 to 6 were subjected to fragmentation via 6

ACS Paragon Plus Environment

Page 7 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

higher energy collision induced dissociation (HCD) in the Q Exactive HF (2×104AGC target, 19 ms maximum ion time). The dynamic exclusion setting was 12 s. The raw files were searched with the Proteome Discoverer 2.1 (v2.1.1.21, Matrix Science Mascot 2.3.1) against the Swiss-Prot reviewed database (2017.02) and the neXtProt database (2017.02). Full cleavage by Glu-C, LysargiNase and Lys-C were separately set including two miss cleavages. The protein identification must met the following criteria: (1) the peptide length≥7 AA; (2) the FDR at the PSM, peptide and protein levels≤1%. Detailed information about the calculation of FDRs was shown in Supplementary Methods. The peptides were quantified by the peak area with Proteome Discoverer. The intensity of top three peptides was used to calculate the protein quantitation information

34

. For the protein quantification, we set the Peptides to Use

parameter to Unique+Razor. The application uses all peptides that are not shared between different proteins or protein groups. All shared peptides are used for the protein that has more identified peptides but not for the other proteins they are contained in. Bioinformatics Analysis of Identified Proteins According to the HPP guidelines (version 2.1) about the MS data interpretation33, our datasets obtained from different proteases were reprocessed by a multi-consensus workflow in the Proteome Discoverer 2.1 as simply combining of multiple datasets might lead to higher FDR. In this multi-consensus processing step, FDRs (≤1%) at PSM, peptides and protein levels were applied according to three proteases’ datasets to generate final results. The total number of passing threshold hits was summed as true positive matches (TP) and false positive matches (FP), the FDRs were recalculated by FP/(TP+FP). Our other two datasets generated from 2015 (Trypsin; Velos) and 2016 (Trypsin; HF) with the same testis samples were compared with 2017 dataset 7

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 31

(Glu-C, LysargiNase and Lys-C; HF) in this study. Verification and Functional Analysis of MPs To select the higher credible MPs, our criteria includes: 1) unique peptides≥2; 2) the peptide length≥9 AA; 3) higher quality spectra filtered by less impure peaks and 3 pair of continuous b/y ions matching with pLabel within pFind software35; 4) the isobaric sequence filtering, evaluating whether I=L,Q[Deamidated]=E, GG=N existed; 5) the SAAVs filtering, inspecting the Swiss-Prot and RefSeq database by on-line tool (https://search.nextprot.org/view/unicity-checker); 6) verification of candidate peptides by synthesized peptide matching with endogenous peptide (cosine score ≥

0.85)3. The function of verified MPs were analyzed in David

(https://david.ncifcrf.gov/), UniProt36 (http://www.uniprot.org/) and Human Protein Atlas (http://www.proteinatlas.org/) platforms including tissue-specific enrichment, molecular function, biological process, cellular component, disease association, etc. Data Availability All MS proteomics data have been deposited to the ProteomeXchange Consortium37 via the PRIDE38

partner

repository

with

the

dataset

identifier

PXD006465

(Username:

[email protected]; Password: 5TbvY07c).

RESULTS AND DISCUSSION Unconventional Proteases Help on the Identification of MPs in Silico Trypsin digestion represents the gold standard in proteomics due to high efficiency, specificity, and relatively low cost. However, it generates more short peptides with a basic Arg or Lys at the C terminus26. These short peptides might influence the identification of the MPs, especially for low 8

ACS Paragon Plus Environment

Page 9 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

MW MPs. Other unconventional proteases might be beneficial for these MPs identification. Thus, we compared the theoretical sequence coverage of MPs (neXtProt: 2017-2) digested by Lys-C, Glu-C , LysargiNase, ArgC, AspN and Chymotrypsin (Figure S-2A). The result showed that LysargiNase had the highest sequence coverage followed by Chymotrypsin. The other four proteases didn’t show obvious advantage in sequence coverage. The digestive efficiency of Chymotrypsin varied toward different amino acid residues which resulted in quite a few missed cleavages. Considering the higher digestion specificity and lower miss digestion, we chose a combination of three proteases including Lys-C, Glu-C and LysargiNase for MPs identification. Further, the digesting characteristics of these three proteases were compared with Trypsin, including sequence coverage, length and number of unique peptide. Compared with Trypsin, more than half of the MPs could reach over 80% sequence coverage by multi proteases digestion, and the multi-protease could produce more detectable unique peptides (Figure 1A, Figure S-2B&C). As the requirement of MPs identification in HPP guideline was strict (the unique peptides≥2; the length of peptides≥9 AA), we compared the number of unique peptides (≥9AA) produced by Trypsin and multi-protease theoretical digestion. We found that multi-protease strategy could detect more unique peptides (≥9 AA) for MPs than Trypsin alone (Figure 1B, Figure S-2D). These theoretical analysis supported the hypothesis that unconventional proteases might be more efficient to identify MPs than Trypsin alone. The whole experimental design was shown in Figure 1C. Totally, 63 slices from three individual testis samples digested with three proteases were analyzed by LC-MS/MS. Multi-protease Strategy Enhanced the Protein Coverage Totally, 7,838 protein groups were identified by three proteases from three testis samples (Table 9

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

S-3). The digesting characteristics of multi-protease were evaluated, including the miss cleavage, peptide length distribution and dynamic range of identified peptides and proteins. Among these three proteases, Lys-C showed the highest digestion efficiency with fully cleaved (>93%), whose proportion of two miss cleavages was less than 0.43% (Figure 2A). The proportion of fully cleaved peptides from the LysargiNase digestion was 87.2%, which was slightly lower than that Lys-C digestion. The proportion of fully digestion from the Glu-C was only 66.8%, which was the lowest one in three proteases. We investigated the correlation of quantified proteins from three individuals’ testis samples digested with different proteases (Figure 2B). A relatively higher correlation was observed among these three individuals, indicating the lower individual variation in this experiment. On another hand, the correlation between the Lys-C and LysargiNase was higher than those between Glu-C and the other two proteases. This result was consistent with what we expected as both of Lys-C and LysargiNase recognize basic amino acid residues, which was quite different with Glu-C. Compared with LysargiNase, Glu-C and Lys-C digestion tended to produce longer peptides, which might be more favorable to MPs identification due to the requirement of ≥9 AA in HPP guideline (Figure 2C). More importantly, the Lys-C digestion showed the widest dynamic range of peptide intensity because of its higher cleavage efficiency and ionization efficiency. This result contributed to the peptides generated from Lys-C containing a basic residue (Lys) on the C-terminus, which are prone to protonate and ideal for sequencing by MS. This was confirmed in LysargiNase digested samples as a relatively higher peptide intensity for the low intensity portion. Conversely, Glu-C had the narrowest dynamic range of peptide intensity owing to its relatively lower digestion efficiency and C-terminal acidic amino acid characteristics (Figure 2D). The detail 10

ACS Paragon Plus Environment

Page 10 of 31

Page 11 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

appearance and conclusion were obtained from the dynamic range of proteins (Figure S-2). We compared the number of identified proteins from the various proteases digesting samples. The results showed the similar identification trend as protease specificity, 7,422 from Lys-C, 7125 from LysargiNase, and 5,210 from Glu-C, respectively. The same trend of identified proteins and peptides were revealed by different individual samples (Figure S-3). Among them, about 62% proteins were overlapped in three protease datasets. Total 493 (6.3%), 228 (2.9%), and 38 (0.5%) proteins were uniquely identified by Lys-C, LysargiNase, and Glu-C, respectively (Figure 3A). The similar trend was also observed in both of the non-redundant proteins and peptides identified from the seven fractionated samples digested with three proteases, although the Lys-C generated the highest identified proteins and peptides (Figure 3B&C). To evaluate the contribution of protease and individual to protein identification, we further compared the datasets from three different individuals (Figure 3E&F, Figure S-5). Over 86% proteins were shared in different individual samples, which was higher than the overlapped proteins (62%) from different proteases (Figure 3A&D). These results suggested that the multi-protease strategy is conducive to increase the proteome coverage compared with various individual samples (Figure 3D). Surprisingly, we found more contribution of multi-protease on the number of identified peptides as two big jumps were showed in the accumulation curve of non-redundant peptides versus individual samples (Figure 3E). Multi-Protease Strategy Identifies More MP Candidates Totally, we obtained the largest human testis proteome datasets (9,209 protein groups) by different strategies, including individual variants, protein separation, high scan speed MS instrument, and multi proteases (Figure 4A). These three datasets were compared for evaluating the contribution of 11

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

non-conventional proteases to the identification of proteins from the same testis tissues. We found that 73.7% proteins were shared in these three datasets. Obviously, multi-protease strategy showed higher sequence coverage than other two trypsin datasets, and the length of the identified unique peptides has higher correlation with previous in silico analysis (Figure S-4A), which might increase the odds to capture low MW proteins (Figure 4B). Among the additionally identified proteins, 415 were uniquely identified in multi-protease dataset compared with the first two year datasets (Figure 4A). Compared with the shared proteins (7,423), these uniquely identified proteins (415) showed a relative low MW distribution (Figure S-4B), which consistent with our theoretical digestion. We also noticed that the number of uniquely identified proteins from Trypsin-HF (2016) was 677. This dataset was generated from 150 MS runs with more extensively separated samples from an 8 cm length SDS-PAGE gel, which led to 2.4 fold more runs than this year. This results suggested that the longer-gel separation of the same testis samples might increase the total and unique protein identification by multi-protease strategy. Consistently, the 415 identified proteins in this study showed lower abundance against the overlapped proteins compared with the other two datasets (Figure 4C), which indicated that multi-protease strategy might improve the identification of low-abundance protein as the unconventional proteases could produce more detectable peptides. Compared 7,838 protein groups identified with multi-protease strategy with the neXtProt MPs list (2017-01), 30 proteins were candidate MPs after two years of MPs searching with the same human testis samples (Figure 4D). These candidate MPs were mainly distributed in chromosome 1, 2, 17, 20, 22 and X (Figure 4E). These results might also shed light on the 12

ACS Paragon Plus Environment

Page 12 of 31

Page 13 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

opportunity to digging more MPs in other tissues with multi-protease strategy in the near future. Multi-protease Strategy Increased the Evidence for Detection of Candidate MPs Of these candidate MPs, 20 proteins were identified by at least two proteases, which increased the credibility of candidate MPs. Among them, the LysargiNase identified the largest number of candidate MPs (25) (Figure 5A). As expected, the commonly identified 8 candidate MPs in all of these three datasets had higher correlation among three proteases (Figure S-6). This was further confirmed by the overlapping sequence of unique peptide for MP groups resulted from various proteases digestion (Figure 5B). Twelve peptides were identified from Q8N7Z2 with three different proteases. Two peptides of “LWNRLNQQQEE” and “KLQAQVEENELWN” were generated by Glu-C and LysargiNase, respectively. The adjacent overlapped peptide sequences of “LQAQVEENELWNRLNQQQEEK” and “LWNRLNQQQEE” were generated by Lys-C and from Glu-C, respectively. The peptides of “IREQEEMMQEQEEK” and “REQEEMMQEQEE” were produced from Lys-C and LysargiNase, respectively. These identifications not only increased the confidence of the identified proteins, but also greatly improved its sequence coverage especially for the relatively low-abundant proteins from human testis tissues (Figure 5B). On another hand, most of these candidate MPs were low MW proteins compared with the other non-MPs proteins in this study (Figure 5C), which was consistent with previously reported MPs features3. According to the criteria released from Human Proteome Project Mass Spectrometry Data Interpretation Guidelines 2.133, 12 candidate MPs had more than two uniquely peptides(≥9 AA) for their family members . We calculated the miss cleavage number of unique peptides from above 12 candidate MPs and their corresponding values of Percolator PEP Mascot, which was an index for the posterior 13

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

error probability (PEP) of the best PSM for the peptide group that the search node identified. The majority of these candidate MPs were distributed in relative low Percolator PEP Mascot score section with few of miss cleaved peptides (Figure 5D). This result further indicated that the reliability of the newly candidate MPs in this study. Three Confirmed MPs were Identified by Multi-Protease Strategy We further manually check the spectra of the unique peptides from these 12 candidate MPs, and found 8 candidate MPs had well spectra with fewer impure peaks and at least 3 pair of continuous matching b/y ions. After the on-line peptide uniqueness checking, only five peptides were uniquely mapped to three proteins (Q8N688, P0DMU9 and P0C5Z0), while other peptides mapped to protein group members. These group proteins were indistinguishable because of their high sequence similarity and less operate proteolytic sites. The pBuild within pFind software (version 3.0) was used to compare the spectra of synthesized peptides with the endogenous ones for all of 12 candidate MP groups. Their spectrum similarity was calculated based on the matching of the b and y ions and the pattern for peak intensity. Result showed that these spectrum pairs were essentially the same with the cosine score higher than 0.86 for 5 candidate MP groups, while 0.96 for other 3 individual candidate MPs (Figure 5E, Figure S-7 & Table S-4). We obtained two nested peptides from the protein Q8N688, which belongs to the β-defensin family (small protein,